The iDelivery Courier Support Platform is a server-side application that hosts a sophisticated Conversational AI Agent for real-time engagement with food delivery personnel. This system utilizes a Retrieval-Augmented Generation (RAG) architecture, leveraging an OpenAI-powered (LLM) for natural dialogue synthesis, grounded by domain-specific knowledge of courier profile data stored in NoSql and comprehensive FAQ content stored in Vector Database. The aplication does realtime conversation evaluation and offers a dashboard for monitoring.
The input data using in this project was generated with AI. The food delivery company is called iDelveiry. Courier profiles, food delivery courier relevant FAQs and courier contracts are all fictional. You can find the generated dataset in dataset
folder and the prompts in the dataset/prompts
folder.
The courier maintains a conversation by sending questions to a server and receiving relevant answers.
The question is first used to search in a vector FAQ DB for more questions and answers with high similarity. The courier's profile is also loaded. An AI prompt is generated using the followings: question, non sensitive data from courier profile, similar FAQ questions as company related information for more context. The prompt is sent to the LLM to act as a virtual agent and returns relevant answer. Then the courier question and the LLM answer is added to another prompt and sent to LLM again to evaluate the accuracy of the answer. This part is very important for identifying which conversations are irelevant and what kind of questions don't have enough context in from FAQ DB. After all this, the system stores metadata of the conversation for monitoring and sends the answer to the courier. At the same time, the coutier has the option to send feedback for the conversation based on a conversation ID. This feedback information is persisted and is very useful to improuve the quality of the FAQ questions and the system overall.
The conversation always focuses on delivery topics due to the context the system is providing. Even with general offtopic questions, the conversation remains focused on the delivery context. The virtual agent remembers the past conversation context between questions, this makes conversations more humain. The fact that the virtual agent also knows some non sensitive private information about the courier makes the conversation more engaging.
- Python 3.10
- Qdrant vector DB to store FAQ data
- TinyDB NoSql DB to store Courier profile data
- OpenAi as LLM for RAG AI integration
- OpenAi as LLM evaluator
- Flask as API app server
- Grafana + PostgresDb for realtime monitoring
- use Python v3.10.x
pip install pipenv
pipenv shell
pipenv install
to install all dependencies- Start Qdrant:
docker run --rm -p 6333:6333 -p 6334:6334 -v "$(pwd)/tmp_datastore/tmp_qdrant_storage:/qdrant/storage:z" qdrant/qdrant
- To access Qdrant UI open in browser:
http://localhost:6333/dashboard#/collections
python -m ipykernel install --user --name=my_openai_env --display-name="OpenAI Project"
- run
jupyter notebook
- in Jupyter notebook select Python kernel "OpenAI Project"
- copy
notebooks/keys_secret.py.tmp
tonotebooks/keys_secret.py
and add your OpenAI API key tokeys_secret.py
- run notebooks in order:
-
main.ipynb
-evaluation-generating_ground_truth.ipynb
-evaluation-retrieval.ipynb
-evaluation-RAG.ipynb.ipynb
- present in
evaluation-retrieval.ipynb
- initial evaluation results using default query parameters:
{'hit_rate': 0.84, 'mrr': 0.71}
- after evaluating multiple query parameter combinations, results have improuved to:
{'hit_rate': 0.94, 'mrr': 0.862}
using params:'score_threshold': 0.7,'limit': 5}
Evaluation was done by sending the question, LLM answer and correct answer to LLM using OpenAI 3.5 Turbo.
I evaluated generating the LLM answers separately with gpt-3.5-turbo
, gpt-4o-mini
and gpt-4o
.
LLM answer was generated based on the provided prompt template with FAQ answers as context and the courier profile.
- evaluation code is present in
evaluation-RAG.ipynb
- evaluation results based on 100 records:
- with
gpt-3.5-turbo
:
PARTLY_RELEVANT 52
RELEVANT 37
NON_RELEVANT 11
- with
gpt-4o-mini
:
RELEVANT 55
PARTLY_RELEVANT 35
NON_RELEVANT 10
- with
gpt-4o
:
RELEVANT 51
PARTLY_RELEVANT 44
NON_RELEVANT 5
docker-compose up --build
to start application with all dependencies and setup the DBs and monitoring- run the
curl
commands from below to interact with the server - open monitoring in browser: http://localhost:3000/d/automatedsetupdashboard/courier-support-agent (credentials are admin:admin)
- open Qdrant UI in browser: http://localhost:6333/dashboard#/collections
- to start app with dependecies again but without resetting the DB and montiring configs, edit he Dockerfile:
"setup_dbs=true", "setup_grafana=true"
flags. Setting these to false after the first initialisation will aloow you to preserve data between restarts. - to reset the DBs, first stop containers with
podman-compose down
than runrm -rf tmp_datastore/tmp_qdrant_storage && rm -rf tmp_datastore/postgres_data && rm -rf tmp_datastore/grafana_storage
Request:
$ curl --request POST 'http://127.0.0.1:9696/question' \
--header 'Content-Type: application/json' \
-d '{"question": "Can I keep the provided bike?", "courier_id": 0}'
Question response:
{
"question":"Can I keep the provided bike?",
"answer":"No, you cannot keep the provided bike. The bike is for your use during deliveries and must be returned when you no longer work with iDelivery.",
"conversation_id":"034394fdbc0e4f4593acb12defc5a2f0",
"model_used":"gpt-4o-mini"
}
Feedback request:
$ curl --request POST 'http://127.0.0.1:9696/feedback' \
--header 'Content-Type: application/json' \
-d '{"conversation_id": "034394fdbc0e4f4593acb12defc5a2f0", "feedback": 1}'
Feedback response:
{
"message": "Feedback received"
}
Run commands from root folder:
- use Python v3.10.x
pip install pipenv==2025.0.4
pipenv shell
pipenv install
to install all dependencies- optionally run
pipenv requirements > requirements.txt
to regenerate the requirements.txt which is used in the dockerised version of this app - Start docker dependencies:
podman-compose up --build qdrant postgres grafana
- Qdrant UI:
http://localhost:6333/dashboard#/collections
- copy and rename
app/keys_secret.py.tmp
toapp/keys_secret.py
- fill in
app/keys_secret.py
with your own OpenAI secret key - run
export TOKENIZERS_PARALLELISM=false
to disable now noisy warning - run
python app/setup_dbs.py
to ingest FAQ and Courier profile data to DBs using - run
python grafana/init_grafana.py
to setup Grafana dashboard with Postgres datascource - start API server running one of:
gunicorn --bind 0.0.0.0:9696 --chdir=app server:app
- run above
curl
commands to interact with entire system
Application is saving conversations data in PostgresDB. Grafana is used to monitor the application in realtime. Monitoring tracks:
- LLM evaluation relevance
- courier conversation feedback
- OpenAI tokens
- OpenAI costs
- API response time (including LLM answer generation and LLM evaluation)
It is recommanded to follow the sections that start entire application with dependencies described above. If you still need to do manual Grafana setup follow these steps:
- open Grafana UI: http://localhost:3000
- setup new datasource for Postgres with host
postgres
and credentialsuser
anduser
and disable TLS. - take the ID of the new datascoruce (see ID in URL) and replace all the
ef04twmg20feoa
ingrafana/dashboard.json
to the new ID. - create new dashboard in grafana UI by importing the updated
grafana/dashboard.json
- Grafana API docs: https://grafana.com/docs/grafana/latest/developers/http_api/data_source/
- Datatalks zoomcamp: https://github.com/DataTalksClub/llm-zoomcamp/tree/main
- The other Agentic RAG Demo: https://github.com/razorcd/llm-training/tree/main/agent
- generate random Delviery courier profiles unsig AI
- persist courier profiles to a NoSQL DB (TinyDb)
- generate random FAQ courier questions unsig AI
- persist FAQ and answers to a Vector DB (Qdrant)
- generate Courier working contracts (employee and freelance) using AI for a file
- generate complete prompt with non private courier profile information, courier question and best matching FAQ data
- use LLM to get an answer
- generate ground truth data for evaluation
- implement evaluation retrieval
- hyperparameter tuning for evaluation retrieval
- implement evaluation RAG
- evaluation of different LLMs for RAG
- put all code behind an API
- add LLM realtime evaluation
- add Grafana realtime monitoring
- dockerise application
- add better logging
Optional:
- use LLM to ask for Contract data when needed
- use LLM to ask for more profile information when needed by queying the NoSql DB
- use LLM to ask for more FAQ data when needed by queying the Vector DB
- use LLM to update Courier profile
- use LLM to add new questions and answers to the FAQ DB
- add chat history on demand to improuve prompt accuracy