A semantic search engine for Lean 4 projects.
Also see Herald for the idea used to translate formal statements into natural language.
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
-
Download PostgreSQL (you can find the installation guide here).
-
Create database:
createdb my_database_name
Memorize the database name, you will later need to set it in your
.env
file.
- Clone the jixia repo:
git clone [email protected]:frenzymath/jixia.git
;cd jixia
- Make sure
lean-toolchain
in jixia andlean-toolchain
in the project you will be indexing match.
If Lean versions don't match, you will get"... failed to read file ..., invalid header"
error when you try to index the project. - Build jixia:
lake build
(should take around 70s)
-
Copy the
.env.example
file to.env
:cp .env.example .env
-
Edit the
.env
file and set the required variables according to your setup.
Note
We strongly recommend using DeepSeek v3 model for a balance between quality and cost.
In this case, OPENAI_API_KEY
should be set to your DeepSeek api key, OPENAI_BASE_URL
should be set to https://api.deepseek.com
, and OPENAI_MODEL
should be set to deepseek-chat
.
-
Index your Lean project (uses jixia, puts results into PostgreSQL)
python -m database jixia <project root> <prefixes>
Options:
project root
: Path to the project to index. This is where thelakefile.toml
orlakefile.lean
is located.prefixes
: Comma-separated list of module prefixes. A module is indexed only if its module path starts with one of prefixes listed here. For example,Init,Lean,Mathlib
will include onlyInit.*
,Lean.*
, andMathlib.*
modules.
Note: to check what modules are available in your project, and to determine how prefixes work, you can use
python -m prefix --project_root <project_root> --prefixes <prefixes>
helper command. -
Create informal descriptions (uses DeepSeek api, puts results into PostgreSQL)
python -m database informal
Natural-language descriptions can be created using any OpenAI-compatible API, above we advise DeepSeek.
-
Create embeddings (uses locally-downloaded
e5-mistral-7b-instruct
model, puts results into Chromadb)python -m database vector-db
Note that indexing a large project like Mathlib requires a significant amount of both API calls (to create informal descriptions) and computational power (to compute the semantic embedding). Use with caution.
To search the database, run:
python search.py <query1> <query2> ...
Note that queries containing whitespaces must be quoted, e.g., python search.py "Hello world"
.