-
Notifications
You must be signed in to change notification settings - Fork 72
[rocm7.0_internal_testing] remove extra transposes in NHWC convolutions on MIOpen #2405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rocm7.0_internal_testing] remove extra transposes in NHWC convolutions on MIOpen #2405
Conversation
Jenkins build for 3469d2dfecad08910d789216cc91d40834e9f824 commit finished as FAILURE |
3469d2d
to
8d405ab
Compare
Jenkins build for 8d405abe0fcc00f0275d636e7eb5fb2216d284f3 commit finished as FAILURE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making these fixes. Looks like we only ever did the right thing for NHWC in miopen_convolution_backward
. Can you please list out the unit tests or simple test scripts that can exhibit the difference when profiled?
Simplified convolution test for collecting profile based on # file name test_extra_transposes.py
import os
import torch
import torch.nn as nn
#enable NHWC Conv for MIOpen
os.environ["PYTORCH_MIOPEN_SUGGEST_NHWC"] = "1"
def helper(n, c, h, w, out_channels, dtype, kernel_size, groups):
input = torch.randint(-3, 3, (n, c, h, w), dtype=dtype, device="cuda").to(
memory_format=torch.channels_last).requires_grad_()
conv = nn.Conv2d(c, out_channels, kernel_size, groups=groups).to(
device="cuda", dtype=dtype, memory_format=torch.channels_last
)
for p in conv.parameters():
p.data = torch.randint_like(p, -3, 3)
out = conv(input)
grad = torch.randint_like(out, -3, 3)
out.backward(grad)
# start torch.profiler to capture kernels
prof = torch.profiler.profile()
prof.start()
helper(2, 8, 4, 4, out_channels=8, dtype=torch.float32, kernel_size=3, groups=8)
prof.stop()
#save profiling results
prof.export_chrome_trace(f"conv_profile_decode.json")
#save profiling stats to a text file
with open(f"conv_stats_decode.txt", "w") as f:
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=-1), file=f) The difference can be observed with commands: python test_extra_transposes.py
grep contiguous conv_stats_decode.txt
aten::contiguous 0.00% 6.501us 0.10% 179.171us 89.585us 0.000us 0.00% 0.000us 0.000us After PR (empty output): python test_extra_transposes.py
grep contiguous conv_stats_decode.txt
|
3d40425
into
rocm7.0_internal_testing
! cherry-pick --onto release/2.6 release/2.7 release/2.8 |
…ns on MIOpen (#2405) remove aten::contiguous for NHWC convolutions Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Fixes SWDEV-526887
…ns on MIOpen (#2405) remove aten::contiguous for NHWC convolutions Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Fixes SWDEV-526887
…ns on MIOpen (#2405) remove aten::contiguous for NHWC convolutions Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Fixes SWDEV-526887
Created branch autogenerated/release/2.6_cherry-pick_pr-2405 and #2408 Created branch autogenerated/release/2.7_cherry-pick_pr-2405 and #2409 Created branch autogenerated/release/2.8_cherry-pick_pr-2405 and #2410 |
… transposes in NHWC convolutions on MIOpen (#2408) Cherry-pick of #2405 Co-authored-by: Dmitry Nikolaev <[email protected]>
… transposes in NHWC convolutions on MIOpen (#2409) Cherry-pick of #2405 Co-authored-by: Dmitry Nikolaev <[email protected]>
…ns on MIOpen (#2405) remove aten::contiguous for NHWC convolutions Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Fixes SWDEV-526887
…tions on MIOpen (#2410) Cherry-pick of #2405 Co-authored-by: Dmitry Nikolaev <[email protected]>
remove aten::contiguous for NHWC convolutions
Tests:
Before:

After:

Fixes SWDEV-526887
Cherry-picked to release/2.6 branch via #2408
Cherry-picked to release/2.7 branch via #2409
Cherry-picked to release/2.8 branch via #2410