DugBot is an advanced AI-powered research assistant designed to help researchers discover and understand biomedical studies through natural language queries. Built on LangGraph and LangChain frameworks, it provides intelligent search capabilities over the NHLBI-BioData Catalyst research studies using both vector-based similarity search and knowledge graph traversal.
DugBot employs Retrieval Augmented Generation (RAG) to provide accurate, grounded responses about biomedical studies, variables, and their relationships. The system processes 200+ studies from the BioData Catalyst database and enables researchers to find relevant information through conversational queries.
- Intent Agent: Analyzes user queries and extracts intent/preferences
- Supervisor Agent: Routes queries to appropriate lookup mechanisms
- KG Lookup Agent: Retrieves information through knowledge graph traversal
- The KG Lookup agent uses biomedical knowledge graphs to provide concept-driven query understanding. It identifies biomedical terms from a user’s query, maps them to UMLS IDs, and searches for study variables associated with those concepts. Studies connected to those variables are retrieved, and their abstracts are used to generate additional context. Finally, the system combines the context from both QV Lookup and KG Lookup, feeding it into an LLM to generate a richer, more precise response that blends semantic similarity with concept-driven retrieval.
- QV Lookup Agent: Performs vector-based similarity search
- The QV Lookup agent processes BDC study descriptions from dbGaP to enable semantic question answering. Each study description was analyzed with an LLM to extract research questions that could be answered from the study. These questions were then converted into vector embeddings and stored in a vector database. When a user submits a query, the system generates an embedding for that query and retrieves the closest matches from the database. This provides context that links user queries to studies with related research questions.
- Vector-Based Retrieval: Semantic similarity search over pre-generated questions
- Knowledge Graph Retrieval: Concept-based graph traversal for entity relationships

- Python 3.8+
- Docker (optional, for containerized deployment)
- Ollama (for local LLM inference)
- Qdrant (vector database)
- Redis (knowledge graph storage)
The system provides multiple server endpoints for different use cases:
- Agent Server (Multi-agent routing)
python src/agent_server.py
# Serves on port 8099
- Knowledge Graph Server (KG-only queries)
python src/kg_app_server.py
# Serves on port 8094
- Combined QVKG Server (Vector + KG)
python src/qvkg_app_server.py
# Serves on port 8005
POST /agent/invoke
- Full agent routing with intent analysisPOST /agent/stream
- Streaming responsesGET /agent/score/{trace_id}/{score}
- Feedback scoring
POST /kg-app/invoke
- Knowledge graph-based queriesPOST /kg-app/stream
- Streaming KG responses
POST /qvkg-app/invoke
- Combined vector + KG queriesPOST /qvkg-app/stream
- Streaming combined responses
The system includes tools for generating training questions from study abstracts:
# Generate questions from study abstracts
python src/core.py
# Create embeddings for question-answer pairs
python db_builder/create_embeddings.py
# Load data into Qdrant
python db_builder/qdrant_loader.py
Koios-develop/
├── src/
│ ├── agents/ # Multi-agent implementations
│ │ ├── intent_agent_graph.py
│ │ ├── route_agentic_graph.py
│ │ ├── combined_context_graph.py
│ │ └── supervisor.py
│ ├── chains/ # Processing chains
│ │ ├── kg_chain.py
│ │ ├── question_lookup_chain.py
│ │ ├── qvkg_chain.py
│ │ └── user_intent_chain.py
│ ├── models/ # Data models
│ │ ├── agent_state.py
│ │ └── user_question.py
│ ├── databases/ # Database connectors
│ │ ├── qdrant.py
│ │ └── redis_graph.py
│ ├── guardrails/ # Input validation
│ │ └── input_guard.py
│ └── util/ # Utility functions
├── db_builder/ # Database setup tools
├── prompts/ # Prompt templates
├── ragas_benchmark/ # Evaluation framework
└── requirements.txt
This project is licensed under the MIT License