Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
0c24ab6
Add hybrid RAG pipeline with Elasticsearch and S3 sources
vannestn Sep 10, 2025
7d74975
Add verification logic to hybrid RAG pipeline
vannestn Sep 10, 2025
94e70a5
feat(pipeline): clean up script, fix SKIPPED, add NER (prompter opena…
vannestn Sep 10, 2025
611709f
chore(notebook): add MD handles to base script; add markdown_blocks.y…
vannestn Sep 10, 2025
412f0d6
Major improvements to hybrid RAG pipeline
vannestn Sep 11, 2025
8162e6c
Enhanced parallel workflow architecture documentation
vannestn Sep 11, 2025
2e06ef4
Add result verification and improve documentation
vannestn Sep 11, 2025
dfd4073
working pipeline except for validation after runs)
vannestn Sep 11, 2025
2d056a6
ran notebook to generate outputs
vannestn Sep 11, 2025
7eb0fdc
working notebook, but needs data preprocessing added
vannestn Sep 12, 2025
170a477
added notebook print statements from running
vannestn Sep 12, 2025
3b220a6
added source data zip files and creation
vannestn Sep 12, 2025
5224cee
added consolidated sales-records-consolidated zip
vannestn Sep 12, 2025
1ed8829
updated zip file naming conventions
vannestn Sep 12, 2025
a41bbbf
fixed github link issue
vannestn Sep 12, 2025
e79190b
added example notebook outputs
vannestn Sep 12, 2025
b40c34c
created modularized system for jupyternotebook compilation
vannestn Sep 12, 2025
c8f0dc5
made the notebook less verbose
vannestn Sep 12, 2025
72a564c
added rag example and images
vannestn Sep 12, 2025
4bc3e5d
updated language in the notebook
vannestn Sep 15, 2025
a4a6f2e
fixed rag demo dependency issue
vannestn Sep 15, 2025
f6e8c67
fixed env variable template to be more consistent and define rag vari…
vannestn Sep 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 45 additions & 5 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,46 @@
# Elastic Cloud Configuration
# Copy this file to .env and fill in your actual credentials
# Hybrid RAG Pipeline Environment Configuration
# Copy this file to .env and fill in your actual values

# Authentication Option 2: API Key (recommended for production)
# Use either username/password OR api_key, not both
ELASTIC_API_KEY=elastic-search-api-key-for-elasticsearch-index
# ===================================================================
# AWS CONFIGURATION
# ===================================================================
# AWS credentials for S3 access (both source and destination)
AWS_ACCESS_KEY_ID=your-aws-access-key-id
AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
AWS_REGION=us-east-1

# ===================================================================
# UNSTRUCTURED API CONFIGURATION
# ===================================================================
# Get your API key from: https://unstructured.io
UNSTRUCTURED_API_KEY=your-unstructured-api-key
UNSTRUCTURED_API_URL=https://platform.unstructuredapp.io/api/v1

# ===================================================================
# ELASTICSEARCH CONFIGURATION
# ===================================================================
# Elasticsearch Cloud host URL (without https://)
# Example: my-cluster-abc123.es.us-east-1.aws.found.io:9243
ELASTICSEARCH_HOST=your-elasticsearch-host-url

# Elasticsearch API key (base64 encoded)
# Generate this in Kibana: Stack Management > API Keys
ELASTICSEARCH_API_KEY=your-elasticsearch-api-key

# ===================================================================
# PIPELINE DATA SOURCES
# ===================================================================
# S3 bucket containing Bose product PDFs (manuals, troubleshooting, MSDS)
S3_SOURCE_BUCKET=example-data-bose-headphones

# Elasticsearch index containing synthetic sales data
ELASTICSEARCH_INDEX=sales-records

# ===================================================================
# OPTIONAL: ADVANCED CONFIGURATION
# ===================================================================
# AWS Session Token (only needed for temporary credentials)
# AWS_SESSION_TOKEN=your-session-token

# Custom S3 endpoint (only needed for S3-compatible services)
# S3_ENDPOINT_URL=https://s3.amazonaws.com
33 changes: 32 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
.env
venv/
venv/
.env.backup
__pycache__/
*.pyc
*.pyo
*.pyd
.DS_Store
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
output-downloads-*/

hybrid_rag_pipeline.py.bak
hybrid_rag_pipeline_enriched.ipynb.bak

feedback_r1.md
feedback_r1_cleaned.md
feedback_r2.md
feedback_r2_cleaned.md
Loading