Skip to content

Conversation

mwootton
Copy link
Contributor

@mwootton mwootton commented Feb 27, 2025

#1049

Convert to using rocprofiler-sdk instead of roctracer for collecting hip api calls and AMD gpu activity.

Reuses most existing roctracer infrastructure with a name for name replacement. Simultaneous support for both roctracer and rocprofiler-sdk was deemed impractical. This would require a whole new set of #ifdefs, a major refactor of the roctracer code, and additional build support. Even then, only one could be active at a time (and you wouldn't want both active).

In homage to the abandoned refactor, RocLogger.cpp/h were created to contain the rocprofbase classes and the api filter.

Roctracer has no established end date.
Rocprofiler-sdk is in rocm_3.1 forward.

This will create a dependency where (newest kineto on old rocm) and (old kineto on newest rocm) could fail to build with AMD gpu support. That window is already over 1 year wide.

@mwootton
Copy link
Contributor Author

Could someone remind me of the preferred method to run clang-tidy (or equivalent) on kineto code?
Thanks

@sraikund16
Copy link
Contributor

Could someone remind me of the preferred method to run clang-tidy (or equivalent) on kineto code? Thanks

Our internal CI runs it automatically, I can push whatever changes needed to the PR

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mwootton
Copy link
Contributor Author

mwootton commented Feb 27, 2025

Our internal CI runs it automatically, I can push whatever changes needed to the PR

Full service, lovely.

The roctracer code has evolved quite a bit since my initial commit. I tried to swap this in with as few changes as possible. The specific implementation in RocprofLogger.cpp is very new and very different. There are a few comments in there, but overall rocprofiler-sdk is not super intuitive. Let me know what would help you or what you would like to see improved.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@sraikund16
Copy link
Contributor

Quick update: we are still working on this to get our internal builds working. planning on having this in within the next couple weeks

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@sraikund16
Copy link
Contributor

sraikund16 commented Mar 12, 2025

@mwootton Tried running a profile with this and got a segfault on this line: https://github.com/mwootton/kineto/blob/rocprof/libkineto/src/RocprofLogger.cpp#L731

It looks like s never gets set (prints out as nullptr)

I think it is because we need to call this function first but we never do:
https://github.com/mwootton/kineto/blob/rocprof/libkineto/src/RocprofLogger.cpp#L303

Do you mind fixing this up?

@mwootton
Copy link
Contributor Author

I think it is because we need to call this function first but we never do: https://github.com/mwootton/kineto/blob/rocprof/libkineto/src/RocprofLogger.cpp#L303

Indeed that function is supposed to be called when hip loads. So 'automatic' to us, based on that symbol name.

I'll investigate a bit. It works for me as is. However, it was unclear to me if I should also link 'librocprofiler-register.so'.

Either way, this is a failure path that will happen on earlier versions of rocm that don't have rocprofiler-sdk support. So I'll guard against it.

Do you happen to know what rocm version you were using when you saw the failure?
Thanks.

@sraikund16
Copy link
Contributor

sraikund16 commented Mar 21, 2025

@mwootton The ROCM version is 6.2.0, but I forgot to mention this is failing specifically in our internal test(s). In these tests, we are only linking libamdhip64.so and librocprofiler-sdk.so and the test builds but then comes across the segfault. Do we need to add librocprofiler-register.so to the build of our tests? Will it automatically run that function if we link this?

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mwootton has updated the pull request. You must reimport the pull request before landing.

@mwootton
Copy link
Contributor Author

I have overhauled the implementation for this PR. It is no longer intercepting callbacks for kernel dispatch and copies. This necessitated a state machine and some ref counted storage. While workable, there were a lot of details in the edge cases (e.g. single copy api calls split into multiple device copies, copy api sometime launching launching kernels, etc). The concern was once this was fully implemented and working an sdk or runtime change might require updates.

Falling back to something similar to the cupti and roctracer strategies. Collect kernel and async copy info and timings from buffers (page at a time, after the fact); intercept callbacks for api calls to collect arguments and approximateTime timestamps. This effectively puts the api/dispatch/completion state machine inside rocprofiler-sdk where it will be maintained by that code base.

Merged from main. Adapted the support for approximate time timestamps and maxbuffer via config.

@facebook-github-bot
Copy link
Contributor

@sraikund16 has imported this pull request. If you are a Meta employee, you can view this in D70332180.

@sraikund16
Copy link
Contributor

@mwootton it looks like there are still some merge conflicts, do you mind resolving them?

@pruthvistony
Copy link
Contributor

@mwootton ,
As discussed please redo the PR on a separate branch, since we are not able to resolve the conflicts.

@mwootton
Copy link
Contributor Author

I think I need to replicate these changes in a new branch. This files that are showing conflicts are no longer in my current branch so I don't see how to 'resolve' them (i.e. remove them again). Also some files are incorrectly marked as 'renamed', which is complicating things as well. Let me try to get this sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants