This repository contains the code for implementing efficient low-bit multiplication algorithms on CPU, as described in the paper "Neural network compression using binarization and few full-precision weights".
The library provides implementations and scripts to execute matrix multiplication scenarios:
1_vs_1
: both operands are stored using 1 bit.1_vs_2
: one operand uses 1-bit values, the other uses 2-bit values.2_vs_2
: both operands are stored using 2 bits.
The include
directory contains the kernel implementations. See the paper for more details on the algorithms used.
The src
directory contains scripts to launch experiments—one script for each configuration.
Each script takes as input the dimensions of the matrices to multiply:
m
: number of rows in the first matrixk
: number of columns in the first matrix (and rows of the second)n
: number of columns in the second matrix
If the -c
flag is passed, the script checks the correctness of the algorithm and exits. Otherwise, it prints the matrix sizes along with the minimum and average execution times to stderr
.
./perf_1bit_vs_1bit -m 512 -k 128 -n 512 -t 1 -c
-t
specifies the method used.
To also run competitor implementations (onednn
, fbgemm
, and blis
), compile the project with the appropriate flags:
mkdir build && cd build
cmake -DUSE_ONEDNN=ON -DUSE_BLIS=ON -DUSE_FBGEMM=ON ..
make -j
We provide the specific commit IDs for reproducibility, but we recommend comparing against the latest version of each library when possible.
cd external/oneDNN
git checkout 7981216b8341b8603e54472f5a0dd7a12ef9cf67
cd -
cd external/FBGEMM
git checkout d0eb1847bd3705246ed1697b7d47eb7d9e00ba46
git submodule update --init --recursive
cd -
git clone https://github.com/flame/blis.git
cd ./blis
git checkout 56772892450cc92b3fbd6a9d0460153a43fc47ab
./configure auto
make -j
sudo make -j
If you find this code useful, please consider citing
@article{DBLP:journals/isci/NardiniRTV25,
author = {Franco Maria Nardini and
Cosimo Rulli and
Salvatore Trani and
Rossano Venturini},
title = {Neural network compression using binarization and few full-precision
weights},
journal = {Inf. Sci.},
volume = {716},
pages = {122251},
year = {2025},
url = {https://doi.org/10.1016/j.ins.2025.122251},
}