STAC-RAG: Agent-assisted Dataset Discovery with Vector Embeddings

This demo shows how to use LLMs to improve discoverability of Earth observation datasets.

Enrich STAC collection descriptions with application-focused context (e.g., agriculture monitoring, disaster response).
Embed the enriched descriptions and topics as vectors, stored locally in Parquet and queried with DuckDB.
Retrieve & Refine results for user queries using vector similarity search, with an LLM agent acting as a semantic judge.

The result is a lightweight prototype of a RAG-style system for semantic dataset discovery.

Disclaimer: this is the product of some weekend hacking to upskill with these technologies. I'm not claiming this is the best or even the right way to do this.

Setup

Installation

This project uses uv (see installation instructions).

After cloning the repo and installing uv, run uv sync to create a virtual environment and install dependencies.

Note: I developed the Jupyter Notebook in VSCode so jupyterlab is not included in the project requirements. To run the notebook within a Jupyter server, run uv sync --extra jupyterlab followed by jupyter lab to start the server.

API Keys

This demo uses OpenAI's API so an API key is required. The notebook is also instrumented with logfire for observability. This is totally optional, but useful to understand what is happening when agents are running.

Run cp .env.example .env and add your OpenAI API key. Optionally, set the Logfire API key.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
stac-rag.ipynb		stac-rag.ipynb
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STAC-RAG: Agent-assisted Dataset Discovery with Vector Embeddings

Setup

Installation

API Keys

About

Uh oh!

Releases

Packages

Languages

bmcandr/stac-rag-demo

Folders and files

Latest commit

History

Repository files navigation

STAC-RAG: Agent-assisted Dataset Discovery with Vector Embeddings

Setup

Installation

API Keys

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages