-
Notifications
You must be signed in to change notification settings - Fork 198
Add rocprofiler-sdk support #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Could someone remind me of the preferred method to run clang-tidy (or equivalent) on kineto code? |
Our internal CI runs it automatically, I can push whatever changes needed to the PR |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Full service, lovely. The roctracer code has evolved quite a bit since my initial commit. I tried to swap this in with as few changes as possible. The specific implementation in RocprofLogger.cpp is very new and very different. There are a few comments in there, but overall rocprofiler-sdk is not super intuitive. Let me know what would help you or what you would like to see improved. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Quick update: we are still working on this to get our internal builds working. planning on having this in within the next couple weeks |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mwootton Tried running a profile with this and got a segfault on this line: https://github.com/mwootton/kineto/blob/rocprof/libkineto/src/RocprofLogger.cpp#L731 It looks like I think it is because we need to call this function first but we never do: Do you mind fixing this up? |
Indeed that function is supposed to be called when hip loads. So 'automatic' to us, based on that symbol name. I'll investigate a bit. It works for me as is. However, it was unclear to me if I should also link 'librocprofiler-register.so'. Either way, this is a failure path that will happen on earlier versions of rocm that don't have rocprofiler-sdk support. So I'll guard against it. Do you happen to know what rocm version you were using when you saw the failure? |
@mwootton The ROCM version is 6.2.0, but I forgot to mention this is failing specifically in our internal test(s). In these tests, we are only linking libamdhip64.so and librocprofiler-sdk.so and the test builds but then comes across the segfault. Do we need to add librocprofiler-register.so to the build of our tests? Will it automatically run that function if we link this? |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
@mwootton has updated the pull request. You must reimport the pull request before landing. |
I have overhauled the implementation for this PR. It is no longer intercepting callbacks for kernel dispatch and copies. This necessitated a state machine and some ref counted storage. While workable, there were a lot of details in the edge cases (e.g. single copy api calls split into multiple device copies, copy api sometime launching launching kernels, etc). The concern was once this was fully implemented and working an sdk or runtime change might require updates. Falling back to something similar to the cupti and roctracer strategies. Collect kernel and async copy info and timings from buffers (page at a time, after the fact); intercept callbacks for api calls to collect arguments and approximateTime timestamps. This effectively puts the api/dispatch/completion state machine inside rocprofiler-sdk where it will be maintained by that code base. Merged from main. Adapted the support for approximate time timestamps and maxbuffer via config. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this in D70332180. |
@mwootton it looks like there are still some merge conflicts, do you mind resolving them? |
@mwootton , |
I think I need to replicate these changes in a new branch. This files that are showing conflicts are no longer in my current branch so I don't see how to 'resolve' them (i.e. remove them again). Also some files are incorrectly marked as 'renamed', which is complicating things as well. Let me try to get this sorted. |
#1049
Convert to using rocprofiler-sdk instead of roctracer for collecting hip api calls and AMD gpu activity.
Reuses most existing roctracer infrastructure with a name for name replacement. Simultaneous support for both roctracer and rocprofiler-sdk was deemed impractical. This would require a whole new set of #ifdefs, a major refactor of the roctracer code, and additional build support. Even then, only one could be active at a time (and you wouldn't want both active).
In homage to the abandoned refactor, RocLogger.cpp/h were created to contain the rocprofbase classes and the api filter.
Roctracer has no established end date.
Rocprofiler-sdk is in rocm_3.1 forward.
This will create a dependency where (newest kineto on old rocm) and (old kineto on newest rocm) could fail to build with AMD gpu support. That window is already over 1 year wide.