Skip to content

Flask-Langchain Chatbot #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions flask-langchain-app/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
venv
__pycache__
*.pyc
*.pyo
*.pyd
.Python
static/uploads/
.env
.git
.gitignore
.DS_Store
30 changes: 30 additions & 0 deletions flask-langchain-app/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Use official Python image
FROM python:3.12-slim

# Set work directory
WORKDIR /app

# Install system dependencies for python-magic and pymupdf
RUN apt-get update && apt-get install -y \
build-essential \
libmagic1 \
mupdf-tools \
&& rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt ./

# Install Python dependencies
RUN pip install --upgrade pip && pip install -r requirements.txt

# Copy app code
COPY . .

# Create uploads directory
RUN mkdir -p static/uploads

# Expose port
EXPOSE 5000

# Run the app with Gunicorn
CMD ["gunicorn", "-b", "0.0.0.0:5000", "app:app"]
143 changes: 143 additions & 0 deletions flask-langchain-app/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Flask Document Chatbot

[![Build Status](https://img.shields.io/badge/build-passing-brightgreen)](https://github.com/your-repo)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

A modern web application that allows users to upload documents (PDF, DOCX, TXT) and ask questions about their content using ChromaDB for document storage and retrieval. Powered by Flask, LangChain, and FAISS.

---

## 🚀 Quick Start

### Run with Docker (Recommended)
```bash
git clone <repository-url>
cd flask-chromadb-app
docker-compose up --build
```
Visit: [http://localhost:5001](http://localhost:5001)

### Run Locally (Python)
```bash
git clone <repository-url>
cd flask-chromadb-app
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.py
```
Visit: [http://localhost:5000](http://localhost:5000)

---

## ✨ Features
- Modern, responsive UI with smooth animations
- Drag-and-drop file upload
- Support for PDF, DOCX, and TXT files
- Real-time chat interface
- Document management system
- Semantic search using ChromaDB & FAISS
- Beautiful loading animations and transitions

---

## 🖼️ Demo
<!--
Add a screenshot or GIF of the app below. Example:
![Demo Screenshot](static/demo-screenshot.png)
-->

---

## 📦 Project Structure
```
flask-chromadb-app/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build file
├── docker-compose.yml # Docker Compose config
├── .dockerignore # Docker ignore file
├── static/
│ ├── css/
│ │ └── style.css # Custom styles
│ ├── js/
│ │ └── main.js # Frontend JavaScript
│ └── uploads/ # Uploaded documents
├── templates/
│ └── index.html # Main template
└── db/ # ChromaDB storage
```

---

## ⚙️ Configuration & Customization
- **UI Customization:** Edit `static/css/style.css` and `templates/index.html` for branding and layout changes.
- **File Size Limit:** Adjust `MAX_CONTENT_LENGTH` in `app.py`.
- **Allowed File Types:** Update `ALLOWED_EXTENSIONS` in `app.py`.

---

## 📝 Usage
1. **Upload** a document by dragging and dropping it or clicking "Browse Files".
2. **Wait** for the document to be processed (progress bar will show).
3. **Ask** a question in the chat input field.
4. **View** the chatbot's answer based on your document content.

**Example Q&A:**
- Q: "What is the main topic of this document?"
- Q: "Summarize the second section."
- Q: "List all dates mentioned."

---

## 🛠️ Dependencies
- `flask`
- `langchain`
- `langchain-community`
- `faiss-cpu`
- `numpy==1.26.4` (required for FAISS compatibility)
- `python-docx`, `pymupdf`, `python-magic`, etc.

---

## 🐳 Docker Notes
- The app runs on port **5001** by default (see `docker-compose.yml`).
- Uploaded files and ChromaDB data persist in `static/uploads` and `db`.
- To stop the app: `Ctrl+C` then `docker-compose down`.

---

## ❓ FAQ & Troubleshooting

**Q: I get `ModuleNotFoundError: No module named 'numpy.distutils'` or FAISS import errors.**
- A: Ensure your `requirements.txt` includes:
```
numpy==1.26.4
faiss-cpu
```
Then rebuild Docker: `docker-compose build && docker-compose up`

**Q: Port 5000 is already in use!**
- A: The Docker app is mapped to port **5001**. Visit [http://localhost:5001](http://localhost:5001)

**Q: How do I change the upload size or allowed file types?**
- A: Edit `MAX_CONTENT_LENGTH` and `ALLOWED_EXTENSIONS` in `app.py`.

---

## 🤝 Contributing
1. Fork the repository
2. Create a new branch for your feature
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

---

## 📄 License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

---

## 📬 Contact & Support
For questions, issues, or feature requests, please open an issue on GitHub or contact the maintainer at [[email protected]].
Binary file not shown.
102 changes: 102 additions & 0 deletions flask-langchain-app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import os
from flask import Flask, render_template, request, jsonify
from werkzeug.utils import secure_filename
import fitz # PyMuPDF
from docx import Document
import magic
from datetime import datetime
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.embeddings import FakeEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms.fake import FakeListLLM

app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'static/uploads'
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024 # 16MB max file size
app.config['ALLOWED_EXTENSIONS'] = {'pdf', 'docx', 'txt'}

# In-memory document store
documents = []

# Helper functions

def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']

def extract_text_from_pdf(file_path):
text = ""
with fitz.open(file_path) as doc:
for page in doc:
text += page.get_text()
return text

def extract_text_from_docx(file_path):
doc = Document(file_path)
text = ""
for paragraph in doc.paragraphs:
text += paragraph.text + "\n"
return text

def extract_text_from_txt(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
return file.read()

@app.route('/')
def index():
return render_template('index.html')

@app.route('/upload', methods=['POST'])
def upload_file():
if 'file' not in request.files:
return jsonify({'error': 'No file part'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No selected file'}), 400
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S_')
filename = timestamp + filename
file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
file.save(file_path)
file_type = magic.from_file(file_path, mime=True)
if file_type == 'application/pdf':
text = extract_text_from_pdf(file_path)
elif file_type == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document':
text = extract_text_from_docx(file_path)
elif file_type == 'text/plain':
text = extract_text_from_txt(file_path)
else:
return jsonify({'error': 'Unsupported file type'}), 400
documents.append({'filename': filename, 'text': text})
return jsonify({'message': 'File uploaded and processed successfully', 'filename': filename})
return jsonify({'error': 'Invalid file type'}), 400

@app.route('/query', methods=['POST'])
def query():
data = request.get_json()
query_text = data.get('query')
if not query_text:
return jsonify({'error': 'No query provided'}), 400
if not documents:
return jsonify({'error': 'No documents uploaded yet.'}), 400
# Combine all docs for demo; in production, use per-doc QA
all_text = '\n'.join([doc['text'] for doc in documents])
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = splitter.split_text(all_text)
# Use fake embeddings and LLM for demo; replace with real ones for production
embeddings = FakeEmbeddings(size=32)
vectordb = FAISS.from_texts(texts, embeddings)
retriever = vectordb.as_retriever()
llm = FakeListLLM(responses=[f"Pretend answer for: {query_text}"])
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
answer = qa.run(query_text)
return jsonify({'results': [answer]})

@app.route('/documents', methods=['GET'])
def list_documents():
return jsonify({'documents': [{'source': doc['filename']} for doc in documents]})

if __name__ == '__main__':
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
app.run(debug=True)
11 changes: 11 additions & 0 deletions flask-langchain-app/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: '3.8'
services:
flask-app:
build: .
ports:
- "5001:5000"
volumes:
- ./static/uploads:/app/static/uploads
environment:
- FLASK_ENV=production
restart: unless-stopped
13 changes: 13 additions & 0 deletions flask-langchain-app/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
flask==3.0.2
langchain==0.1.12
langchain-community
numpy==1.26.4
faiss-cpu
python-dotenv==1.0.1
python-docx==1.1.0
pymupdf==1.23.26
Werkzeug==3.0.1
gunicorn==21.2.0
python-magic==0.4.27
flask-wtf==1.2.1
python-magic-bin==0.4.14; sys_platform == 'win32'
24 changes: 24 additions & 0 deletions flask-langchain-app/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

# Navigate to the project directory
cd "$(dirname "$0")"

# Create virtual environment if it doesn't exist
if [ ! -d "venv" ]; then
python3 -m venv venv
fi

# Activate virtual environment
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Create uploads directory if it doesn't exist
mkdir -p static/uploads

# Run the Flask app
export FLASK_APP=app.py
export FLASK_ENV=development
flask run
Loading