Add support for Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask #116708

tannergooding · 2025-06-16T18:39:20Z

This makes additional progress towards #87097. What remains is the Test, Gather, and Scatter APIs

This provides support for:

Classify - allows identifying the floating-point kind (nan, subnormal, infinity, finite, zero, negative, etc)
CompressStore - the store counterpart to Compress, allowing sequential writing of selected elements
ExpandLoad - the load counterpart to Expand, allowing sequential reading of elements to selected positions
MaskLoad - allows reading only the selected element positions (suppresses faults)
MaskStore - allows writing only the selected element positions (suppresses faults)
MoveMask - allows extract the most significant bits from an AVX512 kmask register

This also covers the various AVX512 intrinsic variants for 128/256-bit paths where that intrinsic guarantees mask usage. For example, Sse41.BlendVariable uses xmm0 for the mask while Avx512F.VL.BlendVariable uses k1. This allows devs to force kmask usage for 128/256-bit code paths.

…insics

…tore

tannergooding · 2025-06-16T18:44:49Z

As is typical, the JIT side changes are small at around 260 lines.

The bulk of the change is the managed API surface since it requires defining the managed signatures + comments and duplicating it across 2 files for each API.

The remaining 1100 lines is the test updates and new test template (most of that being the test template).

Copilot

Pull Request Overview

This PR adds support for several new AVX512 hardware intrinsic APIs including Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask to facilitate more fine‐grained vector operations and improved intrinsics support on 128/256-bit paths.

Adds new test methods in the HardwareIntrinsics test suite to validate the new API behavior.
Updates the System.Private.CoreLib intrinsic implementations and corresponding JIT lowering/codegen logic to incorporate the new intrinsics with proper EVEX embedded mask handling.
Modifies the HW intrinsic lists and lower/emit methods to ensure the new operations are recognized and processed correctly.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/tests/JIT/HardwareIntrinsics/X86/Shared/SseVerify.cs	New MoveMask overloads added for generic, float, and double arrays.
src/tests/JIT/HardwareIntrinsics/X86/Shared/LoadTernOpTest.template	New test cases introduced to exercise the new intrinsic APIs.
src/tests/JIT/HardwareIntrinsics/X86/Shared/Avx512Verify.cs	New intrinsic APIs added for Classify, MaskLoad, and MaskStore operations.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Avx512Vbmi2.cs	Added overloads for CompressStore and ExpandLoad and updated corresponding documentation comments.
src/coreclr/jit/lowerxarch.cpp, hwintrinsicxarch.cpp, and related files	Extended JIT lowering and emitter logic to support the new AVX512 intrinsic cases and EVEX embedded masking.
src/coreclr/jit/hwintrinsic*.{cpp,h}	Adjusted intrinsic lookup and lists to include the new intrinsic IDs.

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Avx512Vbmi2.cs

dotnet-policy-service · 2025-06-16T18:46:20Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

tannergooding · 2025-06-16T19:03:58Z

src/coreclr/jit/instrsxarch.h

@@ -711,8 +711,8 @@ INST3(vcmppd,           "cmppd",            IUM_WR, BAD_CODE,               BAD_
 INST3(vcmpps,           "cmpps",            IUM_WR, BAD_CODE,               BAD_CODE,     PCKFLT(0xC2),                  INS_TT_FULL,                         Input_32Bit    | KMask_Base4     | REX_W0                       | Encoding_EVEX  | INS_Flags_IsDstDstSrcAVXInstruction)                                                                                           // compare packed singles
 INST3(vcmpsd,           "cmpsd",            IUM_WR, BAD_CODE,               BAD_CODE,     SSEDBL(0xC2),                  INS_TT_TUPLE1_SCALAR,                Input_64Bit    | KMask_Base1     | REX_W1                       | Encoding_EVEX  | INS_Flags_IsDstDstSrcAVXInstruction)                                                                                           // compare scalar doubles
 INST3(vcmpss,           "cmpss",            IUM_WR, BAD_CODE,               BAD_CODE,     SSEFLT(0xC2),                  INS_TT_TUPLE1_SCALAR,                Input_32Bit    | KMask_Base1     | REX_W0                       | Encoding_EVEX  | INS_Flags_IsDstDstSrcAVXInstruction)                                                                                           // compare scalar singles
-INST3(vcompresspd,      "compresspd",       IUM_WR, SSE38(0x8A),            BAD_CODE,     BAD_CODE,                      INS_TT_TUPLE1_SCALAR,                Input_64Bit    | KMask_Base2     | REX_W1                       | Encoding_EVEX)                                                                                                                                  // Store sparse packed doubles into dense memory
-INST3(vcompressps,      "compressps",       IUM_WR, SSE38(0x8A),            BAD_CODE,     BAD_CODE,                      INS_TT_TUPLE1_SCALAR,                Input_32Bit    | KMask_Base4     | REX_W0                       | Encoding_EVEX)                                                                                                                                  // Store sparse packed singles into dense memory
+INST3(vcompresspd,      "compresspd",       IUM_WR, SSE38(0x8A),            BAD_CODE,     BAD_CODE,                      INS_TT_FULL_MEM,                     Input_64Bit                      | REX_W1                       | Encoding_EVEX)                                                                                                                                  // Store sparse packed doubles into dense memory


The formal definition of these from the hardware manuals are INS_TT_TUPLE1_SCALAR. However, for the purposes of how the JIT uses this information it should be INS_TT_FULL_MEM.

We use this for both containment purposes and for disassembly output. While compress/expand can touch as few as 0 bytes of memory, they can also touch as much as a full vector of memory. We can't statically know how much they'll touch and so we want to presume they could touch the whole amount.

Similarly we don't want to specify the KMask_Base* amount since we can't automatically support embedded masking. Developers wanting the masking support need to use the explicit CompressStore/ExpandLoad APIs rather than ConditionalSelect + Compress/Expand

src/coreclr/jit/hwintrinsicxarch.cpp

jakobbotsch

JIT changes LGTM

tannergooding · 2025-06-18T21:02:56Z

/ba-g unrelated dead letter failure for ios/tvos

tannergooding added 4 commits June 13, 2025 13:16

Add support for Avx512DQ.Classify

41df985

Ensure that the Avx512 hierarchy correctly exposes existing mask intr…

50ff4b3

…insics

Add support for AVX512 MoveMask

ef8ab83

Add support for AVX512 CompressStore, ExpandLoad, MaskLoad, and MaskS…

e497b7e

…tore

github-actions bot added the area-System.Runtime.Intrinsics label Jun 16, 2025

dotnet-policy-service bot assigned tannergooding Jun 16, 2025

tannergooding marked this pull request as ready for review June 16, 2025 18:44

tannergooding requested review from Copilot and EgorBo June 16, 2025 18:44

tannergooding added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture and removed area-System.Runtime.Intrinsics labels Jun 16, 2025

Copilot AI reviewed Jun 16, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Avx512Vbmi2.cs Show resolved Hide resolved

Add missing new slot in Avx10v1.PNSE.cs

12165c6

tannergooding commented Jun 16, 2025

View reviewed changes

tannergooding added 2 commits June 16, 2025 12:33

Ensure Avx512DQ.PNSE.cs throws PNSE, not null

12deb10

Ensure the HW_Category_MemoryStore code paths are merged

b5400b7

Fixing the encodings of vmovdqu8/16

94ba5ce

jakobbotsch reviewed Jun 17, 2025

View reviewed changes

src/coreclr/jit/hwintrinsicxarch.cpp Show resolved Hide resolved

jakobbotsch approved these changes Jun 17, 2025

View reviewed changes

build-analysis bot mentioned this pull request Jun 17, 2025

browser-wasm windows Debug AllSubsets_CoreCLR builds failing in emcc seemingly unrelated to any code issues #116647

Open

Merge remote-tracking branch 'dotnet/main' into fix-87097

56e1146

tannergooding force-pushed the fix-87097 branch from cee4ce5 to 30fe634 Compare June 17, 2025 18:59

build-analysis bot mentioned this pull request Jun 17, 2025

System.Net.Http.Functional.Tests timeouts #115683

Open

build-analysis bot mentioned this pull request Jun 17, 2025

browser-wasm Windows build error #116746

Open

tannergooding force-pushed the fix-87097 branch from 30fe634 to b5fec10 Compare June 17, 2025 22:09

Fixing some tests to ensure they are correctly validating the results

b5fec10

Merge branch 'main' into fix-87097

61c7122

build-analysis bot mentioned this pull request Jun 18, 2025

System.Net.Quic.Tests.MsQuicTests.WriteTests failed with "System.Net.Quic.QuicException : The connection timed out from inactivity." #105177

Open

tannergooding merged commit 76490ed into dotnet:main Jun 18, 2025
153 of 158 checks passed

tannergooding deleted the fix-87097 branch June 18, 2025 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask #116708

Add support for Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask #116708

tannergooding commented Jun 16, 2025 •

edited

Loading

Uh oh!

tannergooding commented Jun 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 16, 2025

Uh oh!

tannergooding Jun 16, 2025

Uh oh!

Uh oh!

jakobbotsch left a comment

Uh oh!

tannergooding commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Add support for Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask #116708

Add support for Classify, CompressStore, ExpandLoad, MaskLoad, MaskStore, and MoveMask #116708

Conversation

tannergooding commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Jun 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 16, 2025

Uh oh!

tannergooding Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 16, 2025 •

edited

Loading