Skip to content

[amd] Set gpu loader breakpoint by name #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2025

Conversation

dmpots
Copy link
Collaborator

@dmpots dmpots commented Jul 21, 2025

Currently we set the gpu loader breakpoint by address in the amd_dbgapi_insert_breakpoint_callback callback. This means we need to halt the CPU in order to correctly set the breakpoint. When we halt the CPU it shows up as a public stop to the user and interrupts the debugging flow.

This PR changes it so that we set the loader breakpoint by name when we first create the gpu connection. We set the breakpoint on the rocr::_loader_debug_state function which was discovered by looking up the address passed into the callback.

This is a bit of a hack since the function name could potentially change over time or may be unavailable if the runtime was statically linked. But we use it for now to make the debugger experience better until we can find a more permanant solution.

Currently we wet the gpu loader breakpoint by address in the
`amd_dbgapi_insert_breakpoint_callback` callback. This means we need to
halt the CPU in order to correctly set the breakpoint. When we halt the
CPU it shows up as a public stop to the user and interrupts the
debugging flow.

This PR changes it so that we set the loader breakpoint by name when we
first create the gpu connection. We set the breakpoint on the
`rocr::_loader_debug_state` function which was discovered by looking up
the address passed into the callback.

This is a bit of a hack since the function name could potentially change
over time or may be unavailable if the runtime was statically linked.
But we use it for now to make the debugger experience better until we
can find a more permanant solution.
@dmpots dmpots force-pushed the set-breakpoin-by-name branch from e2e262d to 3b190a4 Compare July 21, 2025 18:35
@dmpots
Copy link
Collaborator Author

dmpots commented Jul 21, 2025

Long Term Solution

As mentioned in the commit message this change should be a temporary solution. I tried a few other fixes, but each had some problems.

Attempt (1)

My first attempt tried returning the GPUActions by halting the GPU process and then having the CPU run them. The problem was that I could not guarantee that the actions were complete before returning from the amd_dbgapi_insert_breakpoint_callback and so was worried that the cpu could race ahead and we could miss some code object loading. I was unable to block the thread running the callback because we only have a single thread running the MainLoop.

Attempt (2)

My second attempt injected a fake stop reason into the gdb server's stop-reply packet response. This would let us halt the native process immediately from the callback without showing a public stop, but it still did not handle the case where we need to run actions when the process was already stopped.

Desired High-Level Behavior

After my first two attempts I think I can identify the high-level operation I would like to rely on here.

What I really want is the ability to immediately ensure that the cpu is stopped and then provide some actions that guaranteed to execute before it is resumed. If the cpu was not already stopped then it should not generate a public stop for the user, but we still want to ensure that it runs our actions before resuming.

@dmpots dmpots marked this pull request as ready for review July 21, 2025 19:56
@dmpots dmpots merged commit e1f6787 into clayborg:llvm-server-plugins Jul 21, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants