Skip to content

bmcandr/stac-rag-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STAC-RAG: Agent-assisted Dataset Discovery with Vector Embeddings

This demo shows how to use LLMs to improve discoverability of Earth observation datasets.

  • Enrich STAC collection descriptions with application-focused context (e.g., agriculture monitoring, disaster response).
  • Embed the enriched descriptions and topics as vectors, stored locally in Parquet and queried with DuckDB.
  • Retrieve & Refine results for user queries using vector similarity search, with an LLM agent acting as a semantic judge.

The result is a lightweight prototype of a RAG-style system for semantic dataset discovery.

Disclaimer: this is the product of some weekend hacking to upskill with these technologies. I'm not claiming this is the best or even the right way to do this.

Setup

Installation

This project uses uv (see installation instructions).

After cloning the repo and installing uv, run uv sync to create a virtual environment and install dependencies.

Note: I developed the Jupyter Notebook in VSCode so jupyterlab is not included in the project requirements. To run the notebook within a Jupyter server, run uv sync --extra jupyterlab followed by jupyter lab to start the server.

API Keys

This demo uses OpenAI's API so an API key is required. The notebook is also instrumented with logfire for observability. This is totally optional, but useful to understand what is happening when agents are running.

Run cp .env.example .env and add your OpenAI API key. Optionally, set the Logfire API key.

About

STAC-RAG: Agent-assisted Earth Observation Dataset Discovery with Vector Embeddings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published