Description
Summary
This RFC proposes adding a new padded
property to mode formats, which describes whether or not explicit zeros are stored. The lowering machinery can use this property to determine whether/how to generate code that filters out explicit zeros in the result.
Motivation
Currently, an expression such as a(i) = b(i)
, where a
is compressed and b
is dense, is always lowered to the following code:
// Algorithm 1
int32_t pa1 = 0;
for (int32_t i = 0; i < b1_dimension; i++) {
a_vals[pa1] = b_vals[i];
pa1++;
}
which stores all components of b
into a
, including all the zero components. This is often not desirable, since the whole point of having compressed formats in the first place is to avoid needing to store zeros. Instead, we might want to generate the following code:
// Algorithm 2
int32_t pa1 = 0;
for (int32_t i = 0; i < b1_dimension; i++) {
if (b_vals[i] != 0.0) {
a_vals[pa1] = b_vals[i];
pa1++;
}
}
which only stores non-zero components of b
into a
. This is clearly beneficial if most components of b
are zeros. However, having to filter out zeros incurs additional performance overhead, so if we know that b
contains only (or even mostly) nonzeros then Algorithm 1 would actually be preferable. Thus, we want TACO to be able to generate both Algorithms 1 and 2.
Proposed Changes
I propose adding a new padded
property to mode formats. A padded mode format may store explicit zero values (or subtensors), whereas an unpadded mode format only stores nonzero values (or subtensors). The lowering machinery can then use this property to determine whether (and how) to generate code that filters out explicit zeros in the result. For instance:
- If
a
from the earlier example is stored in a padded version of the compressed mode format, then the result is allowed to store explicit zeros.
Thus, the lowering machinery can emit Algorithm 1 for the vector assignment, since explicit zeros do not need to be filtered out. - If
a
is stored in an unpadded version of the compressed mode format andb
is stored in a padded version of the dense mode format, then the input may contain explicit zeros but the result must not store explicit zeros.
Thus, the lowering machinery must emit Algorithm 2 for the vector assignment in order to filter out explicit zeros. - If
b
is also stored in an unpadded version of the dense mode format though, then the lowering machinery can infer that the result will not contain zeros.
Thus, the lowering machinery can again emit Algorithm 1 for the vector assignment, since there will not be explicit zeros that need to be filtered out.
For matrices and higher-order tensors, arbitrary combinations of padded and unpadded mode formats may be used to store tensor dimensions. CSR without explicit zeros, for instance, can be expressed as (dense
, compressed
) where the dense
row dimension is padded (to indicate that empty rows may still be stored in the pos
array) and the compressed
column dimension is unpadded. By contrast, using padded compressed
to store the column dimension describes CSR with explicit zeros, while using unpadded dense
to store the row dimension describes a CSR matrix with no empty row.