Skip to content

OMP: Error #13: Assertion failure at kmp_csupport.cpp(539). #146262

Open
@bschulz81

Description

@bschulz81

The following code works on gcc without optimizations.

Clang 20.1.7 yields with the parameters -std=c++23 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Wall
the following output

Entirely on gpu
Assertion failure at kmp_csupport.cpp(539): this_thr->th.th_set_nproc >= 1.
Assertion failure at kmp_csupport.cpp(539): this_thr->th.th_set_nproc >= 1.
OMP: Error #13: Assertion failure at kmp_csupport.cpp(539).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
OMP: Error #13: Assertion failure at kmp_csupport.cpp(539).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.
 

Gcc 15.1 (without optimizations) has the following output:

Ordinary matrix multiplication, on gpu
1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16 

0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 

80 90 100 110 
176 202 228 254 
272 314 356 398 
368 426 484 542 

A Cholesky decomposition with the multiplication on gpu
4 12 -16 
12 37 -43 
-16 -43 98 

2 0 0 
6 1 0 
-8 5 3 

Now the cholesky decomposition is entirely done on gpu
2 0 0 
6 1 0 
-8 5 3 

Now we do the same with the lu decomposition
1 -2 -2 -3 
3 -9 0 -9 
-1 2 4 7 
-3 -6 26 2 

Just the multiplication on gpu
1 0 0 0 
3 1 0 0 
-1 -0 1 0 
-3 4 -2 1 

1 -2 -2 -3 
0 -3 6 0 
0 0 2 4 
0 0 0 1 

Entirely on gpu
1 0 0 0 
3 1 0 0 
-1 -0 1 0 
-3 4 -2 1 

1 -2 -2 -3 
0 -3 6 0 
0 0 2 4 
0 0 0 1 

Now we do the same with the qr decomposition
12 -51 4 
6 167 -68 
-4 24 -41 

Just the multiplication on gpu
0.857143 -0.394286 -0.331429 
0.428571 0.902857 0.0342857 
-0.285714 0.171429 -0.942857 

14 21 -14 
-2.22045e-16 175 -70 
-3.10862e-15 -4.79616e-14 35 

Entirely on gpu
0.857143 -0.394286 -0.331429 
0.428571 0.902857 0.0342857 
-0.285714 0.171429 -0.942857 

14 21 -14 
-1.11022e-16 175 -70 
-1.77636e-15 -1.59872e-14 35 

With -O1, i get an ICE also in gcc...

main_omp.cpp.txt
mdspan_omp.h.txt
CMakeLists.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:to-be-triagedShould not be used for new issuescrashPrefer [crash-on-valid] or [crash-on-invalid]needs-reductionLarge reproducer that should be reduced into a simpler formopenmp

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions