Open
Description
The following code works on gcc without optimizations.
Clang 20.1.7 yields with the parameters -std=c++23 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Wall
the following output
Entirely on gpu
Assertion failure at kmp_csupport.cpp(539): this_thr->th.th_set_nproc >= 1.
Assertion failure at kmp_csupport.cpp(539): this_thr->th.th_set_nproc >= 1.
OMP: Error #13: Assertion failure at kmp_csupport.cpp(539).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
OMP: Error #13: Assertion failure at kmp_csupport.cpp(539).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
Gcc 15.1 (without optimizations) has the following output:
Ordinary matrix multiplication, on gpu
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
80 90 100 110
176 202 228 254
272 314 356 398
368 426 484 542
A Cholesky decomposition with the multiplication on gpu
4 12 -16
12 37 -43
-16 -43 98
2 0 0
6 1 0
-8 5 3
Now the cholesky decomposition is entirely done on gpu
2 0 0
6 1 0
-8 5 3
Now we do the same with the lu decomposition
1 -2 -2 -3
3 -9 0 -9
-1 2 4 7
-3 -6 26 2
Just the multiplication on gpu
1 0 0 0
3 1 0 0
-1 -0 1 0
-3 4 -2 1
1 -2 -2 -3
0 -3 6 0
0 0 2 4
0 0 0 1
Entirely on gpu
1 0 0 0
3 1 0 0
-1 -0 1 0
-3 4 -2 1
1 -2 -2 -3
0 -3 6 0
0 0 2 4
0 0 0 1
Now we do the same with the qr decomposition
12 -51 4
6 167 -68
-4 24 -41
Just the multiplication on gpu
0.857143 -0.394286 -0.331429
0.428571 0.902857 0.0342857
-0.285714 0.171429 -0.942857
14 21 -14
-2.22045e-16 175 -70
-3.10862e-15 -4.79616e-14 35
Entirely on gpu
0.857143 -0.394286 -0.331429
0.428571 0.902857 0.0342857
-0.285714 0.171429 -0.942857
14 21 -14
-1.11022e-16 175 -70
-1.77636e-15 -1.59872e-14 35
With -O1, i get an ICE also in gcc...