This repository creates a benchmark and library dataset from some of the open-source PULP platform's designs in System Verilog, with a dual goal:
- Leveraging the generated benchmark to assess LLM-based RTL design and verification capabilities
- Leveraging the generated library as a retrieval database for LLM-based hardware design agents, enabling them to consult and reuse modules in a way that mirrors how human designers reference existing third party IPs.
The reference RTL designs and testbenches are single-source, self-contained
files from the original PULP codebase, generated using bender
and morty
open-source tool. Specification prompts are generated by an LLM.
The idea of this repository is inspired by verilog-eval from NVIDIA.
- Bender >= 0.27.1
- Morty >= 0.9.0
- Python >= 3.11
Additionally, you should set "OPENAI_API_KEY", "ANTHROPIC_API_KEY" or other keys in your env variables to use a cloud LLM provider's APIs, or create key.cfg file. The file should be in format of:
OPENAI_API_KEY= 'xxxxxxx'
ANTHROPIC_API_KEY= 'xxxxxxx'
VERTEX_SERVICE_ACCOUNT_PATH= 'xxxxxxx'
VERTEX_REGION= 'xxxxxxx'
This project uses Git submodules that have to be initialized. Either clone the repository recursively using:
git clone --recursive <url>
or fetch the submodules afterwards in the repository:
git submodule update --init --recursive
Install the dependencies:
# get latest stable rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# bender (TODO: we need an unreleased version; update when merged and released)
cargo install --git https://github.com/pulp-platform/bender.git --branch aottaviano/filter-unused
# morty
cargo install --git https://github.com/pulp-platform/morty.git
# python environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
The list of (RTL DUT, TB) pairs and their original asset have to be listed in a
JSON file. We provide a sample file in $ROOT/assets.json
file. Then call:
./scripts/bench-lib-gen.sh \
--json assets.json \
--out out \
--provider openai \
--model gpt-4o-2024-08-06 \
--key-cfg ./key.cfg \
--max-token 8192 \
--tokens 60000 \
--temperature 0.6 \
--top-p 0.95
Input (--tokens
) and output (--max-token
) tokens should respect RPM and TPM
limits of the model.
Name | Description |
---|---|
$ROOT/out/bench/ProbXXX_<dut_name>_ref.sv |
Reference DUT RTL design for automatic spec generation. |
$ROOT/out/bench/ProbXXX_<dut_name>_test.sv |
Reference testbench for the DUT. Used by the assessed LLM to verify its generated RTL DUT in-the-loop. TBs are self-checking. |
$ROOT/out/bench/ProbXXX_<dut_name>_test_golden.sv |
Reference testbench instantiating the reference DUT. Serves as a golden reference for comparison with the LLM-generated RTL DUT. |
$ROOT/out/bench/ProbXXX_<dut_name>_prompt.txt |
LLM-generated natural language input spec based on the reference design and, if any specified, testbench. Can also be written from scratch if a reference design is missing. |
Name | Description |
---|---|
$ROOT/out/lib/<dut_name>.json |
LLM-generated structured input spec (json) based on the reference design. |
An exemplary input json assets.json
:
{
"assets/common_cells": [
["delta_counter", "delta_counter.sv", "", "all"],
["fifo_v3", "fifo_v3.sv", "fifo_tb.sv", "all"]
]
}
The json keys for each asset directory follow the pattern, from left to right:
- top-level module name
- .sv source file where the top-level (DUT) is declared
- .sv source file with a testbench for the DUT. If not available, leave empty
- <bench|lib|all> to generate benchmark files, library files, or both
A sample output generated with the designs fifo_v3
and delta_counter
from
PULP platform's common_cells is
provided in out/{bench,lib}
.