Description
The command line parameter -write-source=<filename>
writes several functions to the specified file. This includes utility functions and macros, I think it is intended to write everything needed to run the compute kernel. This is the only way I know of to get the TACO_MIN
and TACO_MAX
macros, for example.
When a GPU schedule is specified, the generated pack functions are wrong.
$ bin/taco 'C(i,j) = A(i,k) * B(k,j)' -f=b:ds -f=b:sd -s="split(i,i0,i1,2),split(j,j0,j1,2),split(k,k0,k1,2),reorder(i0,j0,k0,i1,j1,k1),parallelize(i0,GPUBlock,IgnoreRaces),parallelize(j0,GPUThread,Atomics)" -write-source=source.cu
$ nvcc --gpu-architecture=compute_61 -c -o source.o source.cu
source.cu(295): error: identifier "A_COO1_pos" is undefined
source.cu(299): error: identifier "A_COO1_crd" is undefined
source.cu(307): error: identifier "A_COO2_crd" is undefined
source.cu(308): error: identifier "A_COO_vals" is undefined
source.cu(336): error: identifier "B_COO1_pos" is undefined
source.cu(340): error: identifier "B_COO1_crd" is undefined
source.cu(348): error: identifier "B_COO2_crd" is undefined
source.cu(349): error: identifier "B_COO_vals" is undefined
source.cu(422): error: identifier "C_COO1_pos_ptr" is undefined
source.cu(423): error: identifier "C_COO1_crd_ptr" is undefined
source.cu(424): error: identifier "C_COO2_crd_ptr" is undefined
source.cu(425): error: identifier "C_COO_vals_ptr" is undefined
12 errors detected in the compilation of "source.cu".
The generated pack functions from the C codegen are correct.
$ bin/taco 'C(i,j) = A(i,k) * B(k,j)' -f=b:ds -f=b:sd -s="split(i,i0,i1,2),split(j,j0,j1,2),split(k,k0,k1,2),reorder(i0,j0,k0,i1,j1,k1),parallelize(i0,CPUThread,Atomics)" -write-source=source.c
$ gcc -c -o source.o source.c
The difference is that the pack_A
, pack_B
and unpack
functions do not take the necessary parameters.
$ grep pack source.c source.cu | grep "int "
source.c:int pack_A(taco_tensor_t *A, int* A_COO1_pos, int* A_COO1_crd, int* A_COO2_crd, double* A_COO_vals) {
source.c:int pack_B(taco_tensor_t *B, int* B_COO1_pos, int* B_COO1_crd, int* B_COO2_crd, double* B_COO_vals) {
source.c:int unpack(int** C_COO1_pos_ptr, int** C_COO1_crd_ptr, int** C_COO2_crd_ptr, double** C_COO_vals_ptr, taco_tensor_t *C) {
source.cu:int pack_A(taco_tensor_t *A) {
source.cu:int pack_B(taco_tensor_t *B) {
source.cu:int unpack(taco_tensor_t *C) {
If I copy the parameter lists over from source.c
to source.cu
, the cuda version now builds successfully.
The full generated code can be found here: https://gist.github.com/Infinoid/a3f64f5b2c6a291f381d3274dd567d53
The schedules used in these examples come from the scheduling.lowerSparseMatrixMul test case.