Skip to content

Fix unhandled exception in tiered compilation thread #116716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

janvorli
Copy link
Member

The detection the unhandled exception is not working in the case of the tiered compilation thread, because there is no indication that the exception is going to be caught in the CEEInfo::runWithErrorTrap. So the EH assumes the exception is going to be unhandled and exits the process.
The unhandled exception message was not seen in the repro case of the issue likely because the tiered compilation thread console output didn't get flushed. When running under a debugger and stepping over the InternalUnhandledExceptionFilter_Worker, it was actually printed out.

The fix is to add DebuggerU2MCatchHandlerFrame to the CEEInfo::runWithErrorTrap. That allows the EH to see that the exception will be caught there and let it flow through the CallDescrWorker to reach that trap.

I have also taken this opportunity and renamed the confusing invalidRevPInvoke local variable.

Closes #116676

The detection the unhandled exception is not working in the case of
the tiered compilation thread, because there is no indication that
the exception is going to be caught in the `CEEInfo::runWithErrorTrap`.
So the EH assumes the exception is going to be unhandled and
exits the process.
The unhandled exception message was not seen in the repro case
of the issue likely because the tiered compilation thread console
output didn't get flushed. When running under a debugger and stepping
over the `InternalUnhandledExceptionFilter_Worker`, it was actually
printed out.

The fix is to add `DebuggerU2MCatchHandlerFrame` to the
`CEEInfo::runWithErrorTrap`. That allows the EH to see that the
exception will be caught there and let it flow through the
CallDescrWorker to reach that trap.

Closes dotnet#116676
@janvorli janvorli added this to the 10.0.0 milestone Jun 16, 2025
@janvorli janvorli self-assigned this Jun 16, 2025
@Copilot Copilot AI review requested due to automatic review settings June 16, 2025 21:50
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that exceptions in the tiered compilation thread are correctly detected as handled by adding a debugger catch frame and improves clarity by renaming a confusing local variable.

  • Register a DebuggerU2MCatchHandlerFrame in CEEInfo::runWithErrorTrap so EH knows exceptions will be caught.
  • Rename invalidRevPInvoke to isPropagatingToNativeCode in SfiNext and update related logic and log messages.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/coreclr/vm/jitinterface.cpp Added DebuggerU2MCatchHandlerFrame catchFrame(true) at the start of runWithErrorTrap.
src/coreclr/vm/exceptionhandling.cpp Renamed invalidRevPInvoke to isPropagatingToNativeCode and updated assignments, checks, and logs.
Comments suppressed due to low confidence (3)

src/coreclr/vm/jitinterface.cpp:10588

  • [nitpick] Add a brief comment explaining why this catch frame is needed (e.g., "// Ensure EH sees that this thread has a catch handler for tiered compilation").
DebuggerU2MCatchHandlerFrame catchFrame(true /* catchesAllExceptions */);

src/coreclr/vm/exceptionhandling.cpp:3913

  • [nitpick] The name isPropagatingToNativeCode overlaps with isPropagatingToExternalNativeCode and may be ambiguous; consider a more specific name like hasReversePInvokeFrame to clarify its intent.
CLR_BOOL isPropagatingToNativeCode = FALSE;

src/coreclr/vm/jitinterface.cpp:10588

  • No tests were added to verify that exceptions in the tiered compilation thread are now correctly caught; consider adding a repro unit test covering this scenario.
DebuggerU2MCatchHandlerFrame catchFrame(true /* catchesAllExceptions */);

@jkotas
Copy link
Member

jkotas commented Jun 16, 2025

Should we add a regression test for this?

Comment on lines +3969 to +3970
EH_LOG((LL_INFO100, "SfiNext: reached native frame at IP=%p, SP=%p, isPropagatingToNativeCode=%d\n",
GetIP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), GetSP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), isPropagatingToNativeCode));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EH_LOG((LL_INFO100, "SfiNext: reached native frame at IP=%p, SP=%p, isPropagatingToNativeCode=%d\n",
GetIP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), GetSP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), isPropagatingToNativeCode));
EH_LOG((LL_INFO100, "SfiNext: reached native frame at IP=%p, SP=%p, isPropagatingToNativeCode=%s\n",
GetIP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), GetSP(pThis->m_crawl.GetRegisterSet()->pCurrentContext), (isPropagatingToNativeCode ? "true" : "false")));

It needs to be compiled as optimized and the Console.WriteLine is needed to repro
the issue.
{
for (int i = 0; i < 1_000_000; i++)
{
Console.WriteLine(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce megabytes of output that will get copied around, etc. Can we do something less visible, like Thread.Sleep(10) and change the for-loop to go till like 200? We should only need to hit the loop about CallCountThreshold times (30) to trigger the background tiered compilation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It usually triggers on hundreds of thousands of iterations for me, so even a Sleep(1) was too long. Let me try if a longer sleep and less iterations would give the tiered compilation enough time to crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified it to use the Sleep(10) and less loops. It reproduces the problem without the fix reliably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use the env var for aggressive tiering here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed, the test now repros the problem reliably (with the fix from this PR removed)

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Fix contract

Co-authored-by: Jan Kotas <[email protected]>
@janvorli janvorli deleted the fix-unhandled-exception-in-tiered-compilation-thread branch June 19, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnsafeAccessorType targeting non-existing type crashing on Linux
4 participants