Welcome to the code repository of Autocomp. Check out our introductory 📝 blog post!
Update (9/22/2025): Added code/documentation for setting up CUDA/KernelBench backend, plus code for RVV optimization. Check out 📝 blog post 2 for more details.
📚 Paper: Autocomp: LLM-Driven Code Optimization for Tensor Accelerators
✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, and Yakun Sophia Shao (UC Berkeley)
Currently supported backends:
- CUDA via KernelBench (kb_setup.md)
- Gemmini (gemmini_setup.md)
Partially supported backends:
- RISC-V Vector (RVV) on Canaan Kendryte K230. See
k230branch for code. As the implementation is very hacky, we do not currently recommend using this backend.
autocomp/search/search.py is the entry point for running Autocomp optimization. Various parameters such as backend, models used, beam size, number of plans, number of code implementations, dropout, etc. can be configured here.
Notable parameters:
backend: The hardware backend to use. Currently supported backends arecudaandgemmini.models: The list of models to use. For example,o3-mini,gpt-4o. A variety of endpoints (OpenAI, Anthropic, Gemini, Together) are supported but routing is somewhat hacky; seeautocomp/common/llm_utils.py.simulator: The evaluation method to use.- For CUDA,
kernelbench - For Gemmini,
spike(only optimizes instruction counts, not cycle counts) orfiresim
- For CUDA,
iterations: The number of iterations to run.search_strategy: The search strategy to use. Currently onlybeamis supported.prob_type: The problem type to use.- For CUDA,
kb-level1,kb-level2,kb-level3, orkb-level4. - For Gemmini,
gemm,conv, oradmm-multifunction.
- For CUDA,
prob_id: The problem ID to use.
autocomp/ - Core Autocomp code.
search/- Core search and optimization infrastructuresearch.py- Main search algorithm implementation. Implements the beam search described in the paper. Change search parameters within this file.llm_agent.py- LLM agents for planning and code optimization. Implements the two prompt phases described in the paper. The optimization menu is defined within this file.llm_ensemble.py- Wrapper around LLM agents that enables calls to be split between multiple agents.prob.py- Wrapper for tests (parsed from thetests/directory) that edits the test file and appends LLM-generated code in order to test it.code_repo.py- Abstraction for managing code candidates generated during optimization.
backend/- Hardware evaluation utilities for different backends.hardware_backend.py- Base class for hardware backends.gemmini_eval.py- Hardware evaluation utilities for Gemmini. Must configure paths to Chipyard/FireSim/Gemmini here.kb_eval.py- Hardware evaluation utilities for KernelBench.
common/- Shared utilities and helper functionsllm_utils.py- LLM interaction utilities. Works with OpenAI, Claude, Gemini, Together. Implements parallel calls for OpenAI and Together.my_logging.py- Custom logging functionality.utils.py- General utility functions.
prompts/ - Contains various prompts imported by autocomp/search/llm_agent.py.
isa_prompt_conv.py- Accelerator ISA section of the prompt, used for GEMM and convolution.isa_prompt_admm.py- Accelerator ISA section of the prompt, used for TinyMPC.opt_system/- Prompts and examples used for optimizationgemmini_rules.py- Rules section of the prompt (helps constrain output and encourage functional correctness).plan_prompt.py- Planning phase prompt (note that implementation prompt is entirely contained withinautocomp/search/llm_agent.pyabove).tiling_example.py- Tiling optimization example.if_example.py- Conditional optimization example (from convolution).if_example_matmul.py- Conditional optimization example (from GEMM).
sols/ - Contains baseline code for the benchmarks in the paper.
exo/- Exo unoptimized and optimized baseline code for the GEMM benchmarks in the paper.sol{id}_exo_baseline.cis the unoptimized code and is used byautocomp/search/search.pyas the starting code fro optimization.gemm/- Additional GEMM benchmarks used for schedule reuse. No hand-optimized code available.exo-conv/- Exo unoptimized and optimized baseline code for the convolution benchmarks in the paper.admm-multifunction/- TinyMPC unoptimized and optimized baseline code. Only problem IDs 1 and 2 are used in the paper. Run with FP32 4x4 Gemmini.
tests/ - Contains test cases corresponding to sols/ above.
exo/,gemm/,exo-conv/,admm-multifunction/- Test cases corresponding to directories insols/above.
@misc{hong2025autocomp,
title={Autocomp: LLM-Driven Code Optimization for Tensor Accelerators},
author={Charles Hong and Sahil Bhatia and Alvin Cheung and Yakun Sophia Shao},
year={2025},
eprint={2505.18574},
archivePrefix={arXiv},
primaryClass={cs.PL},
url={https://arxiv.org/abs/2505.18574},
}

