-
Notifications
You must be signed in to change notification settings - Fork 49
[0035] Revise matrix element accesses #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0035] Revise matrix element accesses #598
Conversation
This PR removes the row and element accessors in favor of a more general means of accessing per-thread vectors. This change allows initializing a matrix from per-thread vectors as well as splitting it out into constituent vectors. The vectors represent rows in A matrices, and columns in B matrices. Since the layout of Accumulator matrices vary by vendor, this method may only be used to construct A or B matrices. Resolves microsoft#575
98bc052
to
61123b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general direction, and it's much better than being able to directly query or set elements by (row, col).
In addition to my earlier comments, I do think it could still be valuable to expose a more general mechanism for per-element operations on matrices, a sort of map
operation, perhaps one that provides row/col information to the lambda/functor/whatever.
If map
is an issue due to general HLSL concerns, an alternative could be extending element-wise operations to matrices similar to what's done for vectors. Plus, a way to generate "iota" matrices where the value of each element is its row or its column number.
I've been slightly trying to avoid adding lambdas or function pointers to HLSL for this feature since that is a much bigger problem space. I was hoping to support a limited set of functionality through the We do have lambdas working in HLSL in Clang (although they need to be inlined to be valid DXIL). We could consider bringing that in for HLSL 202x and supporting it in DXC, but if we can instead have a reduced (but useful) functionality of pre-selected operations and extend the feature with lambdas in the future that is probably idea.
I had consciously decided not to do this because it complicates the overload story, but maybe that's the wrong decision. |
The main updates here are adding a linalg::AccumulatorLayout() query and documenting the uniformity requirements of the APIs.
This PR removes the row and element accessors in favor of a more general means of accessing per-thread vectors. This change allows initializing a matrix from per-thread vectors as well as splitting it out into constituent vectors.
The vectors represent rows in A matrices, and columns in B matrices. Since the layout of Accumulator matrices vary by vendor, this method may only be used to construct A or B matrices.
Resolves #575, resolves #596