Skip to content

[release/2.6] NAVI32 specific fixes #2450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 6, 2025

Conversation

iupaikov-amd
Copy link

@iupaikov-amd iupaikov-amd commented Aug 4, 2025

Fixes https://github.com/ROCm/frameworks-internal/issues/12096

Cherry-picked to release/2.7 branch via #2465

Cherry-picked to release/2.7 branch via #2466

Cherry-picked to release/2.8 branch via #2467

Cherry-picked to rocm7.1_internal_testing branch via #2473

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 4, 2025

Jenkins build for 21df250d84ef981cee62f1b519184c7c48ab700e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 5, 2025

Jenkins build for fe52793e8230c26e19cdc80b6bd78abab3138753 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@iupaikov-amd iupaikov-amd changed the title [DRAFT] [release/2.6] NAVI32 specific fixes [release/2.6] NAVI32 specific fixes Aug 6, 2025
@iupaikov-amd
Copy link
Author

!cherry-pick --onto release/2.7

@iupaikov-amd iupaikov-amd marked this pull request as ready for review August 6, 2025 18:31
@iupaikov-amd
Copy link
Author

Hello @jithunnair-amd

I found out that we run out of memory on navi32 on some tests with bigger tensors during benchmarking. No amount of tricks or hacks were able to fix this since the tests were inherently designed for bigger GPUs. We decided to skip the tests not based on arch, but based on available memory. This should avoid any issues with other smaller cards later down the line, we will probably need to upstream this when we have navi CI. Thus I sadly decided to add another ROCm decorator to avoid hitting Nvidia cards and make the logic more readable.

cc @jataylo

@jithunnair-amd
Copy link
Collaborator

we will probably need to upstream this when we have navi CI.

@iupaikov-amd @jataylo Can we please file an upstream PR for this anyway? We are still in process of getting Navi CI in upstream, but that shouldn't prevent us from filing the PR at least. And you can use that PR to consolidate all Navi-related inductor fixes.

@jithunnair-amd jithunnair-amd merged commit f787306 into release/2.6 Aug 6, 2025
@jithunnair-amd jithunnair-amd deleted the iupaikov_gfx1101_fixes_release2.6 branch August 6, 2025 18:39
@jithunnair-amd
Copy link
Collaborator

!cherry-pick --onto release/2.7 release/2.8

@okakarpa
Copy link
Collaborator

okakarpa commented Aug 6, 2025

Created branch autogenerated/release/2.7_cherry-pick_pr-2450 and #2465. It contains a merge conflict. Please resolve it

@okakarpa
Copy link
Collaborator

okakarpa commented Aug 6, 2025

Created branch autogenerated/release/2.7_cherry-pick_pr-2450 and #2466. It contains a merge conflict. Please resolve it

Created branch autogenerated/release/2.8_cherry-pick_pr-2450 and #2467. It contains a merge conflict. Please resolve it

@iupaikov-amd
Copy link
Author

!cherry-pick --onto rocm7.1_internal_testing

@okakarpa
Copy link
Collaborator

okakarpa commented Aug 7, 2025

Created branch autogenerated/rocm7.1_internal_testing_cherry-pick_pr-2450 and #2473. It contains a merge conflict. Please resolve it

jithunnair-amd pushed a commit that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants