-
Notifications
You must be signed in to change notification settings - Fork 72
Skip&Fix some testcases for Navi4x #2401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/2.6
Are you sure you want to change the base?
Conversation
Jenkins build for 7a4c23426839d6a6e563e2a289d840b45ab1f054 commit finished as FAILURE |
Jenkins build for 233a668bd590fe54f865050c15bd24ef32c8b15a commit finished as FAILURE |
Jenkins build for b934af2855b0de92ef517c18433a2e90d293e619 commit finished as NOT_BUILT |
Jenkins build for b934af2855b0de92ef517c18433a2e90d293e619 commit finished as FAILURE |
Jenkins build for b30a77e88246eae96edf436215a6825f8e6e2dd9 commit finished as FAILURE |
[release/2.5][SWDEV-489778] NAVI4x UT parity for distributed config (#2327) I did a sweep of all the distributed failures on NAVI4x. On a high level, we were running into following issues: - MEM_EFF_ATTENTION is not supported on NAVI4x for 2.5 causing tensors not alike issues - Some UTs pass in future releases, skipped those - Some had slight tolerance fixes as we use hipblas in this branch as compared to hipblaslt in future branches Fixes #ISSUE_NUMBER
[release/2.5][SWDEV-489778] NAVI4x UT parity for distributed config (#2327) I did a sweep of all the distributed failures on NAVI4x. On a high level, we were running into following issues: - MEM_EFF_ATTENTION is not supported on NAVI4x for 2.5 causing tensors not alike issues - Some UTs pass in future releases, skipped those - Some had slight tolerance fixes as we use hipblas in this branch as compared to hipblaslt in future branches Fixes #ISSUE_NUMBER
@pruthvistony please review |
Jenkins build for e059c6cae3043510c49bc05ce343ea28e90935cf commit finished as FAILURE |
Only for release/2.6
Jenkins build for 66fdc1da3e3264605c13f3341f220100b34ac937 commit finished as FAILURE |
'subgraph_1', (y0, x0)); invoke_subgraph_3 = None | ||
invoke_subgraph = torch.ops.higher_order.invoke_subgraph(subgraph_0, \ | ||
'subgraph_0', (l_y_, l_x_)) | ||
invoke_subgraph_3 = torch.ops.higher_order.invoke_subgraph(subgraph_1, 'subgraph_1', (x0, y0)); invoke_subgraph_3 = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were all these changes required?
Maintenance will be a problem going forward if we make these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
release/2.6 missed this commit 5c3996c#diff-356a32190ce9ad6efc3e6fd7cb30a179394ce2170c0472b79edc7b24a47b7e95R271 which sorted graph regions(without it assert triggers), so after these changes code will be synced with main/release/2.7 branches. Also rerun tests with EXPECTTEST_ACCEPT=1 helps to adjust expected values automatically according to main code changes.
@@ -15,6 +15,7 @@ | |||
from torch.testing import FileCheck | |||
from torch.testing._internal.common_cuda import xfailIfSM89 | |||
from torch.testing._internal.common_device_type import expectedFailureXPU | |||
from torch.testing._internal.common_utils import skipIfRocm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required, for next testcase of this file line 431:
@skipIfRocm #This test requires triton version 3.3+
def test_split_scan(self):
using dtype = OpaqueType<sizeof(scalar_t)>; | ||
index_kernel_impl<dtype>(iter, index_size, index_stride); | ||
}); | ||
static void index_kernel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to be done for Navi only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In main branch it is done without Navi only -> d02c396. Do you expect that it can brake MI on 2.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is FP8 enablement for index operator.
why is it required to cherry-pick this feature into rel/2.6? Feature back porting shouldnt be done unless it is very important customer request. So can we skip this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, no feature request from customer, I will drop it. done
using dtype = OpaqueType<sizeof(scalar_t)>; | ||
index_kernel_impl<dtype>(iter, index_size, index_stride); | ||
}); | ||
static void index_kernel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is FP8 enablement for index operator.
why is it required to cherry-pick this feature into rel/2.6? Feature back porting shouldnt be done unless it is very important customer request. So can we skip this change?
Jenkins build for 870cde981d39ed1850340042f3aff0112816dc96 commit finished as FAILURE Detected error during Pytorch building:
|
Jenkins build for 8e6b0ab5d4751dafdddc28e963d7bec14331aba5 commit finished as FAILURE |
@pruthvistony Can you please review? |
@pruthvistony please review. |
Fixes for regular and distributed unit tests, including cuda&native code.
Also including partial cherry-pick of [release/2.5][SWDEV-489778] NAVI4x UT parity for distributed config (#2327)
Fixes #SWDEV-523736