Skip to content

[rocm7.0_internal_testing] remove extra transposes in NHWC convolutions on MIOpen #2405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dnikolaev-amd
Copy link

@dnikolaev-amd dnikolaev-amd commented Jul 24, 2025

remove aten::contiguous for NHWC convolutions

Tests:

  • nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
  • nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
image

After:
image

Fixes SWDEV-526887

Cherry-picked to release/2.6 branch via #2408

Cherry-picked to release/2.7 branch via #2409

Cherry-picked to release/2.8 branch via #2410

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 24, 2025

Jenkins build for 3469d2dfecad08910d789216cc91d40834e9f824 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@dnikolaev-amd dnikolaev-amd requested a review from jataylo July 24, 2025 16:33
@dnikolaev-amd dnikolaev-amd changed the title [rocm7.0_internal_testing] remove extra transposes in NHWC convolutions [rocm7.0_internal_testing] remove extra transposes in NHWC convolutions on MIOpen Jul 24, 2025
@dnikolaev-amd dnikolaev-amd force-pushed the remove_extra_contiguous_for_nhwc_conv branch from 3469d2d to 8d405ab Compare July 24, 2025 17:41
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 24, 2025

Jenkins build for 8d405abe0fcc00f0275d636e7eb5fb2216d284f3 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Copy link
Collaborator

@jithunnair-amd jithunnair-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making these fixes. Looks like we only ever did the right thing for NHWC in miopen_convolution_backward. Can you please list out the unit tests or simple test scripts that can exhibit the difference when profiled?

@dnikolaev-amd
Copy link
Author

dnikolaev-amd commented Jul 24, 2025

Simplified convolution test for collecting profile based on nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32:

# file name test_extra_transposes.py
import os
import torch
import torch.nn as nn

#enable NHWC Conv for MIOpen
os.environ["PYTORCH_MIOPEN_SUGGEST_NHWC"] = "1"

def helper(n, c, h, w, out_channels, dtype, kernel_size, groups):
        input = torch.randint(-3, 3, (n, c, h, w), dtype=dtype, device="cuda").to(
            memory_format=torch.channels_last).requires_grad_()
        conv = nn.Conv2d(c, out_channels, kernel_size, groups=groups).to(
            device="cuda", dtype=dtype, memory_format=torch.channels_last
        )
        for p in conv.parameters():
            p.data = torch.randint_like(p, -3, 3)
        out = conv(input)
        grad = torch.randint_like(out, -3, 3)
        out.backward(grad)

# start torch.profiler to capture kernels
prof = torch.profiler.profile()
prof.start()

helper(2, 8, 4, 4, out_channels=8, dtype=torch.float32, kernel_size=3, groups=8)

prof.stop()
#save profiling results
prof.export_chrome_trace(f"conv_profile_decode.json")
#save profiling stats to a text file
with open(f"conv_stats_decode.txt", "w") as f:
    print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=-1), file=f)

The difference can be observed with commands:
Before PR:

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt
  aten::contiguous         0.00%       6.501us         0.10%     179.171us      89.585us       0.000us         0.00%       0.000us       0.000us

After PR (empty output):

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt

conv_stats_decode.txt file contains profile stats
conv_profile_decode.json file contains profile and can be visualized using https://ui.perfetto.dev/

@jithunnair-amd jithunnair-amd merged commit 3d40425 into rocm7.0_internal_testing Jul 25, 2025
0 of 2 checks passed
@jithunnair-amd jithunnair-amd deleted the remove_extra_contiguous_for_nhwc_conv branch July 25, 2025 00:21
@jithunnair-amd
Copy link
Collaborator

! cherry-pick --onto release/2.6 release/2.7 release/2.8

okakarpa pushed a commit that referenced this pull request Jul 25, 2025
…ns on MIOpen (#2405)

remove aten::contiguous for NHWC convolutions

Tests:
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57"
/>

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818"
/>


Fixes SWDEV-526887
okakarpa pushed a commit that referenced this pull request Jul 25, 2025
…ns on MIOpen (#2405)

remove aten::contiguous for NHWC convolutions

Tests:
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57"
/>

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818"
/>


Fixes SWDEV-526887
okakarpa pushed a commit that referenced this pull request Jul 25, 2025
…ns on MIOpen (#2405)

remove aten::contiguous for NHWC convolutions

Tests:
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57"
/>

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818"
/>


Fixes SWDEV-526887
@okakarpa
Copy link
Collaborator

pruthvistony pushed a commit that referenced this pull request Aug 2, 2025
… transposes in NHWC convolutions on MIOpen (#2408)

Cherry-pick of #2405

Co-authored-by: Dmitry Nikolaev <[email protected]>
pruthvistony pushed a commit that referenced this pull request Aug 2, 2025
… transposes in NHWC convolutions on MIOpen (#2409)

Cherry-pick of #2405

Co-authored-by: Dmitry Nikolaev <[email protected]>
dnikolaev-amd added a commit that referenced this pull request Aug 12, 2025
…ns on MIOpen (#2405)

remove aten::contiguous for NHWC convolutions

Tests:
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57"
/>

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818"
/>


Fixes SWDEV-526887
jithunnair-amd pushed a commit that referenced this pull request Aug 12, 2025
…tions on MIOpen (#2410)

Cherry-pick of #2405

Co-authored-by: Dmitry Nikolaev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants