`intrinsic-test`: Adding x86 behavioural testing. #1894

madhav-madhusoodanan · 2025-08-03T13:31:06Z

Context

This is a redo of PR #1814, since a lot of details have changed with PRs #1863, #1862, #1861, #1852.

crates/intrinsic-test/src/x86/intrinsic.rs

crates/intrinsic-test/src/x86/mod.rs

folkertdev · 2025-08-05T10:36:48Z

crates/intrinsic-test/src/x86/types.rs

+                match str::parse::<u32>(etype_processed.as_str()) {
+                    Ok(value) => data.bit_len = Some(value),
+                    Err(_) => {
+                        data.bit_len = match data.kind() {
+                            TypeKind::Char(_) => Some(8),
+                            TypeKind::BFloat => Some(16),
+                            TypeKind::Int(_) => Some(32),
+                            TypeKind::Float => Some(32),
+                            _ => None,
+                        };
+                    }
+                }


why are only some type kinds covered here? Maybe this could be a method on TypeKind?

crates/intrinsic-test/src/x86/compile.rs

folkertdev

you should rebase on top of the upstream master branch instead of merging it in. That keeps the git history clean.

folkertdev · 2025-08-05T22:48:30Z

ci/run.sh

+    x86_64-unknown-linux-gnu*)
+        CPPFLAGS="${TEST_CPPFLAGS}" RUSTFLAGS="${HOST_RUSTFLAGS}" RUST_LOG=warn \
+            cargo run "${INTRINSIC_TEST}" "${PROFILE}"  \
+            --bin intrinsic-test -- intrinsics_data/x86-intel.xml \
+            --runner "${TEST_RUNNER}" \
+            --cppcompiler "${TEST_CXX_COMPILER}" \
+            --target "${TARGET}"
+        ;;


we'll have to see how to do this exactly, but we want to split these out of the main CI job to speed it up

madhav-madhusoodanan · 2025-09-05T08:19:24Z

Seems like the CI run at this point failed due to this error. I'll retry shortly:

#6 39.45   Could not connect to archive.ubuntu.com:80 (185.125.190.82), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.83), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.83), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.81), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.81), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.36), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.39), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.82), connection timed out
#6 39.45   Unable to connect to archive.ubuntu.com:80:
#6 39.45 Fetched 126 kB in 39s (3208 B/s)
#6 39.45 Reading package lists...
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing-updates/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing-backports/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Some index files failed to download. They have been ignored, or old ones used instead.

folkertdev · 2025-09-10T16:31:59Z

Can you rebase this (and remove the merge commits) sometime? That would make it a lot easier to see what has actually changed.

Also, why do you push to then have CI fail? Running this locally is a lot faster than having CI do it, because you can skip the earlier steps and run just the intrinsic tests.

folkertdev

oh, wait, it probably is up to date? it's just that xml file is enormous. that's crazy. Should we like, try to trim that down?

crates/intrinsic-test/src/x86/mod.rs

madhav-madhusoodanan · 2025-09-10T19:15:00Z

@folkertdev I've been syncing using the below command whenever I'm updating from master:

git rebase master

Am I doing it correctly?

madhav-madhusoodanan · 2025-09-10T19:17:13Z

About the XML file specification, may I ask what "trim down" means?

folkertdev · 2025-09-10T19:17:24Z

depends a bit on your git setup, but it seems to work allright. I was just confused by the enormous number of lines changed, but that's all due to that XML file.

folkertdev · 2025-09-10T19:20:07Z

What i mean is, could we process that XML file into a smaller XML file that only stores what we need? That would reduce the size of the repo (not sure how big that file is, maybe it compresses well?) and the speed of the intrinsic tests.

There are risks too, e.g. what we generate could go out of date with the official XML file. So maybe it's fine this way.

madhav-madhusoodanan · 2025-09-10T19:26:57Z

Ahh, makes sense.

I think it might be best to keep the source of truth as unchanged as possible, since there is no direct way to obtain the sources sometimes.

For example, the XML file originally existed in the stdarch-verify crate, and I've had to check it by manually downloading the x86 reference (which comes in a folder containing HTML, CSS and JS files) at the link called Download: Offline Intel® Intrinsics Guide in the Intel intrinsics reference site

The entire XML data can be found stored as a hardcoded string that is assigned to a variable in a JS file.

immediately

madhav-madhusoodanan · 2025-10-15T07:35:20Z

@folkertdev I've added the ability to sample intrinsics. Currently it's implemented for x86 and about 5% of intrinsics are randomly chosen for testing.

This is intended as a temporary solution really, we can remove it once we restructure the C++/Rust testfiles to run all intrinsics in one go.

folkertdev · 2025-10-15T08:21:32Z

Neat!

So, this does increase the longest CI job to 20 minutes, up from 12 minutes. I think we should try to make this its own CI job at least. I'm also not sure that we want to randomly sample the tests on each run. If we do hit a failure, that will be hard to reproduce. The status quo is no testing at all, so I think testing even a fixed subset is an improvement.

@madhav-madhusoodanan there is already a version of the x86-intel.xml file in crates/stdarch-verify, you can just move that file to intrinsics-data (and change the path in the crates/stdarch-verify/tests/x86-intel.rs)

did this happen? It still looks like the file is added again here.

madhav-madhusoodanan · 2025-10-15T08:39:02Z

Ohh no, let me do that too.

madhav-madhusoodanan · 2025-10-15T08:49:12Z

About the hit failure, hmm yes you have a point.
Let's do a naive partitioning in that case.

step

crates/intrinsic-test/src/common/compile_c.rs

crates/intrinsic-test/src/common/intrinsic_helpers.rs

to char 2. including variable names in template strings instead of passing them as arguments to macros

madhav-madhusoodanan · 2025-10-16T04:07:37Z

I'll need to rename the variables in arm/config.rs and x86/config.rs. I'll do that.

madhav-madhusoodanan · 2025-10-16T07:18:55Z

Okay, so I've split the intrinsic-test run onto separate CI processes.
The x86 run of the same is the longest among them and takes 10 mins.

Now the longest process among all the CI processes, is the Test CI process for x86 which takes 12 mins.

folkertdev

I think this looks good, but the CI changes are quite large, so could you split those out into their own PR (and then just add the x86 stuff here)?

folkertdev · 2025-10-17T10:02:01Z

ci/run.sh

+        PATH="$PATH":"$(pwd)"/c_programs
+        export PATH


why is this needed? it really should not be I think?

Okay, so it turns out that for aarch64, qemu's executable resolution algorithm looks into the current directory too.
However, sde64 looks only at PATH.

This was intended to fix that discrepancy, but now that I think about it, it might be cleaner to use ./intrinsic-test-programs in compare_outputs.rs.

madhav-madhusoodanan · 2025-10-17T17:24:52Z

crates/intrinsic-test/src/arm/config.rs

+    uint16_t temp = 0;
+    memcpy(&temp, &value, sizeof(float16_t));
+    std::stringstream ss;
+    ss << "0x" << std::setfill('0') << std::setw(4) << std::hex << temp;
+    os << ss.str();
+    return os;
+}


@folkertdev do we need stringstream here? Seems to me that we can make this definition shorter.

My c++ is not good enough to know for sure. If there is something shorter that works I'm all for it

crates/intrinsic-test/src/common/mod.rs

madhav-madhusoodanan · 2025-10-17T17:34:12Z

Okay, so I've reverted that last one commit (that split the CI job) and the PATH update.
I'll make that PR as soon as we merge this one.

folkertdev · 2025-10-17T17:48:21Z

Maybe this was not clear but my intention was to merge the ci changes separately first, them rebase this pr on top

madhav-madhusoodanan · 2025-10-18T05:36:20Z

Ohh ohh, my bad.
Sure let me do that.

Created a PR for that: #1941

madhav-madhusoodanan · 2025-10-18T07:13:15Z

Turns out std::hex is sticky for the stream that it is applied on.
So stringstream's usage actually makes sense.

adamgemmell · 2025-10-20T11:41:56Z

crates/intrinsic-test/src/common/mod.rs

            // compile this cpp file into a .o file
-            info!("compiling main.cpp");
+            trace!("compiling main.cpp");
            let output = cpp_compiler


I'm having trouble running this locally without the intrinsic sampling on, I get an apparent segfault at this clang invocation. Were you seeing that too?

Can you show me the stdout/stderr? Perhaps I could help

adamgemmell · 2025-10-20T11:48:12Z

crates/intrinsic-test/src/common/argument.rs

        indentation: Indentation,
        loads: u32,
    ) -> std::io::Result<()> {
        for arg in self.iter().filter(|&arg| !arg.has_constraint()) {


I note that arguments of the same type seem to have identical _vals arrays. Not required for this but as a future optimisation we could just emit one array for each unique argument type

Yeah that's something I want to work on (when this PR is merged)

adamgemmell · 2025-10-20T11:52:25Z

crates/intrinsic-test/src/x86/constraint.rs

+
+pub fn map_constraints(imm_type: &String, imm_width: u32) -> Option<Constraint> {
+    if imm_width > 0 {
+        let max: i64 = 2i64.pow(imm_width);


Some x86 intrinsics have a lot of possible immediate values, for example _mm256_fpclass_pd_mask has an immediate of width 8 giving 256*20 loop iterations just for that one intrinsic.

We never really saw so many constraint values on aarch64 - perhaps testing every possible constraint value isn't a reasonable strategy here

adamgemmell · 2025-10-20T11:56:20Z

ci/run.sh

+            RUST_LOG=warn RUST_BACKTRACE=1 \
+            cargo run "${INTRINSIC_TEST}" "${PROFILE}"  \
+            --bin intrinsic-test -- intrinsics_data/x86-intel.xml \
+            --runner "${TEST_RUNNER}" \


I would suggest trying just this tool with qemu-x86_64 -cpu=max rather than intel SDE which I expect would run a lot quicker, potentially at the cost of less coverage

qemu does not support (most of?) avx-512, which is where most of the complexity is. So we'll have to split that out.

which I'd rather not do in this PR

Agreed, none of my comments should be blocking - this PR is useful without them

Maybe the situation is better today - these are the qemu cpu flags containing avx from 10.1 which is included in 25.10:

avx avx-ifma avx-ne-convert avx-vnni avx-vnni-int16 avx-vnni-int8 avx10 avx10-128 avx10-256 avx10-512 avx2 avx512-4fmaps avx512-4vnniw avx512-bf16 avx512-fp16 avx512-vp2intersect avx512-vpopcntdq avx512bitalg avx512bw avx512cd avx512dq avx512er avx512f avx512ifma avx512pf avx512vbmi avx512vbmi2 avx512vl avx512vnni

And here's what lscpu reports with -cpu=max

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch

QEMU is pretty quick to build to build a single backend from source if we wanted something newer than what's bundled with ubuntu too.

Qemu-user supports avx512 with only the kvm backend, but as we run it in docker in CI, we don't have kvm. Idk about qemu-system though, it should be the same

madhav-madhusoodanan · 2025-10-20T19:18:42Z

Just so that the review process is easier, would it make sense to break down the ARM and common changes into their own PRs?

A couple of them were dependent on the changes that arose to accommodate x86 in an attempt to maintain a clean separation of concerns.

rustbot assigned folkertdev Aug 3, 2025

madhav-madhusoodanan commented Aug 3, 2025

View reviewed changes

crates/intrinsic-test/src/x86/intrinsic.rs Outdated Show resolved Hide resolved

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 2 times, most recently from 41db5a8 to 5fc0f3b Compare August 5, 2025 10:16

folkertdev reviewed Aug 5, 2025

View reviewed changes

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 9e28106 to 111cd5d Compare August 5, 2025 16:22

folkertdev reviewed Aug 5, 2025

View reviewed changes

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from f3f87f2 to 2ec747c Compare August 9, 2025 12:20

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 2ec747c to a8313d0 Compare September 5, 2025 08:16

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 13 times, most recently from 126a3ce to e79129a Compare September 10, 2025 05:58

folkertdev reviewed Sep 10, 2025

View reviewed changes

crates/intrinsic-test/src/x86/mod.rs Outdated Show resolved Hide resolved

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 3 times, most recently from c9ea3bd to 18aed56 Compare October 15, 2025 06:45

feat: updated exclusion list with more intrinsics, that can be fixed

5162772

immediately

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 18aed56 to 5162772 Compare October 15, 2025 07:03

chore: remove x86-intel.xml from stdarch-verify crate

3bc6676

madhav-madhusoodanan added 2 commits October 15, 2025 14:33

chore: move from random testing to testing only the first N intrinsics

f5b9c29

chore: convert println! logging to trace! logging during compilation

1646a8b

step

folkertdev reviewed Oct 15, 2025

View reviewed changes

crates/intrinsic-test/src/common/compile_c.rs Show resolved Hide resolved

crates/intrinsic-test/src/common/intrinsic_helpers.rs Outdated Show resolved Hide resolved

crates/intrinsic-test/src/common/intrinsic_helpers.rs Show resolved Hide resolved

feat: code cleanup 1. changing array bracket prefixes from &'static str

58d85a5

to char 2. including variable names in template strings instead of passing them as arguments to macros

chore: make names in config.rs files uniform across architectures

e11a9ea

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 524bc52 to 6df5eca Compare October 16, 2025 07:01

folkertdev reviewed Oct 17, 2025

View reviewed changes

fix: remove the PATH update in ci/run.sh

1abb103

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 6df5eca to 1abb103 Compare October 17, 2025 16:57

madhav-madhusoodanan commented Oct 17, 2025

View reviewed changes

crates/intrinsic-test/src/common/mod.rs Show resolved Hide resolved

madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 212e16d to 1abb103 Compare October 18, 2025 07:13

adamgemmell reviewed Oct 20, 2025

View reviewed changes

intrinsic-test: Adding x86 behavioural testing. #1894

Are you sure you want to change the base?

intrinsic-test: Adding x86 behavioural testing. #1894

Uh oh!

Conversation

madhav-madhusoodanan commented Aug 3, 2025

Context

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

folkertdev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madhav-madhusoodanan commented Sep 5, 2025

Uh oh!

folkertdev commented Sep 10, 2025

Uh oh!

folkertdev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

madhav-madhusoodanan commented Sep 10, 2025

Uh oh!

madhav-madhusoodanan commented Sep 10, 2025

Uh oh!

folkertdev commented Sep 10, 2025

Uh oh!

folkertdev commented Sep 10, 2025

Uh oh!

madhav-madhusoodanan commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madhav-madhusoodanan commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev commented Oct 15, 2025

Uh oh!

madhav-madhusoodanan commented Oct 15, 2025

Uh oh!

madhav-madhusoodanan commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madhav-madhusoodanan commented Oct 16, 2025

Uh oh!

madhav-madhusoodanan commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

folkertdev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

madhav-madhusoodanan commented Oct 17, 2025

Uh oh!

folkertdev commented Oct 17, 2025

Uh oh!

madhav-madhusoodanan commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madhav-madhusoodanan commented Oct 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madhav-madhusoodanan Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

`intrinsic-test`: Adding x86 behavioural testing. #1894

`intrinsic-test`: Adding x86 behavioural testing. #1894

madhav-madhusoodanan commented Sep 10, 2025 •

edited

Loading

madhav-madhusoodanan commented Oct 15, 2025 •

edited

Loading

madhav-madhusoodanan commented Oct 16, 2025 •

edited

Loading

madhav-madhusoodanan commented Oct 18, 2025 •

edited

Loading

madhav-madhusoodanan Oct 20, 2025 •

edited

Loading

adamgemmell Oct 20, 2025 •

edited

Loading

madhav-madhusoodanan commented Oct 20, 2025 •

edited

Loading