LeanSearch

A semantic search engine for Lean 4 projects.

Also see Herald for the idea used to translate formal statements into natural language.

Installation

Install Python deps

python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Install Postgres

Download PostgreSQL (you can find the installation guide here).
Create database:
```
createdb my_database_name
```
Memorize the database name, you will later need to set it in your .env file.

Install jixia

Clone the jixia repo: git clone [email protected]:frenzymath/jixia.git; cd jixia
Make sure lean-toolchain in jixia and lean-toolchain in the project you will be indexing match.
If Lean versions don't match, you will get "... failed to read file ..., invalid header" error when you try to index the project.
Build jixia: lake build (should take around 70s)

Set up the .env file

Copy the .env.example file to .env:
```
cp .env.example .env
```
Edit the .env file and set the required variables according to your setup.

Note

We strongly recommend using DeepSeek v3 model for a balance between quality and cost.
In this case, OPENAI_API_KEY should be set to your DeepSeek api key, OPENAI_BASE_URL should be set to https://api.deepseek.com, and OPENAI_MODEL should be set to deepseek-chat.

Usage

Indexing

Index your Lean project (uses jixia, puts results into PostgreSQL)
```
python -m database jixia <project root> <prefixes>
```
Options:
- project root: Path to the project to index. This is where the lakefile.toml or lakefile.lean is located.
- prefixes: Comma-separated list of module prefixes. A module is indexed only if its module path starts with one of prefixes listed here. For example, Init,Lean,Mathlib will include only Init.*, Lean.*, and Mathlib.* modules.
Note: to check what modules are available in your project, and to determine how prefixes work, you can use python -m prefix --project_root <project_root> --prefixes <prefixes> helper command.
Create informal descriptions (uses DeepSeek api, puts results into PostgreSQL)
```
python -m database informal
```
Natural-language descriptions can be created using any OpenAI-compatible API, above we advise DeepSeek.
Create embeddings (uses locally-downloaded e5-mistral-7b-instruct model, puts results into Chromadb)
```
python -m database vector-db
```

Note that indexing a large project like Mathlib requires a significant amount of both API calls (to create informal descriptions) and computational power (to compute the semantic embedding). Use with caution.

Searching

To search the database, run:

python search.py <query1> <query2> ...

Note that queries containing whitespaces must be quoted, e.g., python search.py "Hello world".

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
database		database
prompt		prompt
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
augment.py		augment.py
prefix.py		prefix.py
requirements.txt		requirements.txt
retrieve.py		retrieve.py
ruff.toml		ruff.toml
search.py		search.py
server.py		server.py
test_server.http		test_server.http

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LeanSearch

Installation

Install Python deps

Install Postgres

Install jixia

Set up the .env file

Usage

Indexing

Searching

About

Uh oh!

Releases

Packages

Languages

License

j991222/LeanSearch

Folders and files

Latest commit

History

Repository files navigation

LeanSearch

Installation

Install Python deps

Install Postgres

Install jixia

Set up the .env file

Usage

Indexing

Searching

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages