A Streamlit-based chatbot powered by Retrieval-Augmented Generation (RAG) and OpenAI. Upload your PDFs and chat with them! This app leverages LangChain, FAISS, and OpenAI’s GPT models to extract and query document content with metadata-aware answers.
- 🔍 Upload multiple PDFs and query across all of them
- 📄 Metadata-rich answers with filename and page references
- 🧠 Uses LangChain + FAISS for semantic search
- 🤖 Streamlit Chat UI for natural conversation
- 💾 OpenAI API support with streaming responses
.
├── .gitignore
├── LICENSE
├── README.md # ← You're reading it
├── app.py # Main Streamlit app
├── brain.py # PDF parsing and vector index logic
├── compare medium.gif # Optional UI illustration
├── requirements.txt # Python dependencies
└── thumbnail.webp # Preview image
git clone https://github.com/aimaster-dev/chatbot-using-rag-and-langchain.git
cd chatbot-using-rag-and-langchain
pip install -r requirements.txt
Create a .streamlit/secrets.toml
file with:
OPENAI_API_KEY = "your-openai-key"
Or export it via environment variable:
export OPENAI_API_KEY="your-openai-key"
streamlit run app.py
- Upload PDFs via the UI
- Each PDF is parsed using
PyPDF2
and chunked via LangChain’sRecursiveCharacterTextSplitter
- Chunks are embedded using OpenAI Embeddings
- Stored in a FAISS vector store for semantic similarity search
- Queries are matched to top PDF chunks and passed to ChatGPT with context
- Answers include file name and page number metadata for citation
- Streamlit – UI framework
- LangChain – PDF chunking and retrieval
- FAISS – Vector search backend
- OpenAI GPT – LLM-based answer generation
- PyPDF2 – PDF parsing
"What are the main points from the introduction?"
Answer: The introduction highlights... (example.pdf, page 1)
This project is licensed under the MIT License.
Made with ❤️ by aimaster-dev. Contributions welcome!