Skip to content

cuda generator emits invalid COO pack/unpack code #438

Open
@Infinoid

Description

@Infinoid

The command line parameter -write-source=<filename> writes several functions to the specified file. This includes utility functions and macros, I think it is intended to write everything needed to run the compute kernel. This is the only way I know of to get the TACO_MIN and TACO_MAX macros, for example.

When a GPU schedule is specified, the generated pack functions are wrong.

$ bin/taco 'C(i,j) = A(i,k) * B(k,j)' -f=b:ds -f=b:sd -s="split(i,i0,i1,2),split(j,j0,j1,2),split(k,k0,k1,2),reorder(i0,j0,k0,i1,j1,k1),parallelize(i0,GPUBlock,IgnoreRaces),parallelize(j0,GPUThread,Atomics)" -write-source=source.cu
$ nvcc --gpu-architecture=compute_61 -c -o source.o source.cu
source.cu(295): error: identifier "A_COO1_pos" is undefined

source.cu(299): error: identifier "A_COO1_crd" is undefined

source.cu(307): error: identifier "A_COO2_crd" is undefined

source.cu(308): error: identifier "A_COO_vals" is undefined

source.cu(336): error: identifier "B_COO1_pos" is undefined

source.cu(340): error: identifier "B_COO1_crd" is undefined

source.cu(348): error: identifier "B_COO2_crd" is undefined

source.cu(349): error: identifier "B_COO_vals" is undefined

source.cu(422): error: identifier "C_COO1_pos_ptr" is undefined

source.cu(423): error: identifier "C_COO1_crd_ptr" is undefined

source.cu(424): error: identifier "C_COO2_crd_ptr" is undefined

source.cu(425): error: identifier "C_COO_vals_ptr" is undefined

12 errors detected in the compilation of "source.cu".

The generated pack functions from the C codegen are correct.

$ bin/taco 'C(i,j) = A(i,k) * B(k,j)' -f=b:ds -f=b:sd -s="split(i,i0,i1,2),split(j,j0,j1,2),split(k,k0,k1,2),reorder(i0,j0,k0,i1,j1,k1),parallelize(i0,CPUThread,Atomics)" -write-source=source.c 
$ gcc -c -o source.o source.c  

The difference is that the pack_A, pack_B and unpack functions do not take the necessary parameters.

$ grep pack source.c source.cu | grep "int "
source.c:int pack_A(taco_tensor_t *A, int* A_COO1_pos, int* A_COO1_crd, int* A_COO2_crd, double* A_COO_vals) {
source.c:int pack_B(taco_tensor_t *B, int* B_COO1_pos, int* B_COO1_crd, int* B_COO2_crd, double* B_COO_vals) {
source.c:int unpack(int** C_COO1_pos_ptr, int** C_COO1_crd_ptr, int** C_COO2_crd_ptr, double** C_COO_vals_ptr, taco_tensor_t *C) {
source.cu:int pack_A(taco_tensor_t *A) {
source.cu:int pack_B(taco_tensor_t *B) {
source.cu:int unpack(taco_tensor_t *C) {

If I copy the parameter lists over from source.c to source.cu, the cuda version now builds successfully.

The full generated code can be found here: https://gist.github.com/Infinoid/a3f64f5b2c6a291f381d3274dd567d53

The schedules used in these examples come from the scheduling.lowerSparseMatrixMul test case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions