Skip to content

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

License

Notifications You must be signed in to change notification settings

loreloc/triturus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦎 Triturus 🦎

The following table describes the implemented kernels.

Kernel ID Description Operation Source
vadd Vector addition $a_i+b_i$ add
vamax Vector maximum $\max_i a_i$ max
vmax Vector maximum with indices $(\max_i a_i, \arg\max_i a_i)$ max
matmax Matrix maximum along one axis $\max_i a_{ij}$ or $\max_j a_{ij}$ max
mm Matrix multiplication $\sum_j a_{ij}b_{jk}$ mm
lm2exp Batch log-matmul, one matrix in log-space $\log(\sum_j a_{rij} \exp b_{rjk})$ lm2exp
lt2exp Batch log-Tucker2, two matrices in log-space $\log(\sum_{i,j} w_{rsij} \exp a_{rik} \exp b_{rjk})$ lt2exp

Benchmarks Gallery

Kernel ID Benchmark Description Baselines Results
vmax Vector maximum with and without indices torch here
matmax Matrix maximum along rows and columns torch here
mm Matrix multiplication with square matrices torch here
lm2exp Batch log-matmul, square and rectangular batch matrices torch + jit here
lt2exp Batch log-Tucker2, square and rectangular batch matrices torch + jit here

Benchmark of vmax

vmax

Benchmark of matmax

matmax

Benchmark of mm

mm

Benchmark of lm2exp

lm2exp

Benchmark of lt2exp

lt2exp

About

A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming

Topics

Resources

License

Stars

Watchers

Forks