BLARE is a regular expression matching framework that decomposes regular expressions into components and uses an adaptive runtime evaluation plan to speed up the evaluation.
BLARE is modular and can be built on top of any existing regex library. Currently we have example BLARE implementation on 4 commonly used regex libraries: RE2, PCRE2, Boost Regex, and ICU Regex.
The paper is available: BLARE
BLARE is implemented in C++, and we provide cmake file for building the project with the external dependencies excluding g++
, cmake
, and Boost
. Make sure you have g++
and cmake
in you system. For Ubuntu as an example, you can do
sudo apt update && sudo apt upgrade && sudo apt-get update
-
g++ (version 8.4.0 or higher)
sudo apt install build-essential
-
cmake (version 3.14 or higher)
sudo apt install cmake
-
pkg-config
sudo apt-get -y install pkg-config
-
Abseil
git clone https://github.com/abseil/abseil-cpp.git && cd abseil-cpp mkdir build && cd build cmake .. cmake --build . --target all sudo make install
-
Boost Library (version 1.65.1.0 or higher)
sudo apt-get install libboost-all-dev
-
Git-LFS
sudo apt-get install git-lfs
Make sure to check if the version satifies the requirement by
g++ --version
cmake --version
dpkg -s libboost-all-dev | grep 'Version'
We evaluate the performance of BLARE on two production workloads and one open-sourced workload. We have included the December 21st version of the open-source workload: US Accident Dataset and the regular expressions in the repository. The newest version of the dataset has different format and may not work with the hard-coded csv parsing logic in the original codebase; consider modifiy the read_traffic
function in the code that you will be running. To use the original version of the dataset, install git lfs
then run
git lfs pull
to retrieve the dataset.
The code in the root directory is under continuous developement, and may not produce results identical to that in the BLARE paper. To reproduce most accurate results in the paper, compile and run the original experiment code in BLARE_CODE folder. The instruction for compilation and running is in the original_codebase folder
To build BLARE and experiments that can be run on customized workloads, follow the commands below:
mkdir build && cd build
cmake ..
make
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${PWD}/_deps/ICU-build/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${PWD}/_deps/pcre2-build/lib
Run BLARE with
cd src
./blare regex_lib_name input_regex_file input_data_file [output_file]
Run specific experiments comparing BLARE with underlying regex libraries, use
cd experiments
./[base_regex_library]_expr output_file [-n num_repeat] [-r input_regex_file] [-d input_data_file]
The paper contains results from 2 Microsoft-internal workloads and 1 public workload. To run original code comparing BLARE, 3-Way-Split, Multi-Way-Split, with underlying regex libraries on the public US-Accident dataset, use
cd original_codebase/BLARE_CODE
./blare_[base_regex_library]
The original_codebase
folder also contains a Jupyter notebook to plot the bar plot and box plot for re2_traffic.csv
generated by running blare_re2
.