Courier support agent with LLM integration

Project description

The iDelivery Courier Support Platform is a server-side application that hosts a sophisticated Conversational AI Agent for real-time engagement with food delivery personnel. This system utilizes a Retrieval-Augmented Generation (RAG) architecture, leveraging an OpenAI-powered (LLM) for natural dialogue synthesis, grounded by domain-specific knowledge of courier profile data stored in NoSql and comprehensive FAQ content stored in Vector Database. The aplication does realtime conversation evaluation and offers a dashboard for monitoring.

The input data using in this project was generated with AI. The food delivery company is called iDelveiry. Courier profiles, food delivery courier relevant FAQs and courier contracts are all fictional. You can find the generated dataset in dataset folder and the prompts in the dataset/prompts folder.

Application flow

The courier maintains a conversation by sending questions to a server and receiving relevant answers.

The question is first used to search in a vector FAQ DB for more questions and answers with high similarity. The courier's profile is also loaded. An AI prompt is generated using the followings: question, non sensitive data from courier profile, similar FAQ questions as company related information for more context. The prompt is sent to the LLM to act as a virtual agent and returns relevant answer. Then the courier question and the LLM answer is added to another prompt and sent to LLM again to evaluate the accuracy of the answer. This part is very important for identifying which conversations are irelevant and what kind of questions don't have enough context in from FAQ DB. After all this, the system stores metadata of the conversation for monitoring and sends the answer to the courier. At the same time, the coutier has the option to send feedback for the conversation based on a conversation ID. This feedback information is persisted and is very useful to improuve the quality of the FAQ questions and the system overall.

Architecture design

Conversation example

The conversation always focuses on delivery topics due to the context the system is providing. Even with general offtopic questions, the conversation remains focused on the delivery context. The virtual agent remembers the past conversation context between questions, this makes conversations more humain. The fact that the virtual agent also knows some non sensitive private information about the courier makes the conversation more engaging.

Technolofgies

Python 3.10
Qdrant vector DB to store FAQ data
TinyDB NoSql DB to store Courier profile data
OpenAi as LLM for RAG AI integration
OpenAi as LLM evaluator
Flask as API app server
Grafana + PostgresDb for realtime monitoring

Running the Jupyter notebooks locally

use Python v3.10.x
pip install pipenv
pipenv shell
pipenv install to install all dependencies
Start Qdrant: docker run --rm -p 6333:6333 -p 6334:6334 -v "$(pwd)/tmp_datastore/tmp_qdrant_storage:/qdrant/storage:z" qdrant/qdrant
To access Qdrant UI open in browser: http://localhost:6333/dashboard#/collections
python -m ipykernel install --user --name=my_openai_env --display-name="OpenAI Project"
run jupyter notebook
in Jupyter notebook select Python kernel "OpenAI Project"
copy notebooks/keys_secret.py.tmp to notebooks/keys_secret.py and add your OpenAI API key to keys_secret.py
run notebooks in order: - main.ipynb - evaluation-generating_ground_truth.ipynb - evaluation-retrieval.ipynb - evaluation-RAG.ipynb.ipynb

Evaluation Vector DB retrieval

present in evaluation-retrieval.ipynb
initial evaluation results using default query parameters: {'hit_rate': 0.84, 'mrr': 0.71}
after evaluating multiple query parameter combinations, results have improuved to: {'hit_rate': 0.94, 'mrr': 0.862} using params: 'score_threshold': 0.7,'limit': 5}

Evaluation RAG

Evaluation was done by sending the question, LLM answer and correct answer to LLM using OpenAI 3.5 Turbo.

I evaluated generating the LLM answers separately with gpt-3.5-turbo, gpt-4o-mini and gpt-4o. LLM answer was generated based on the provided prompt template with FAQ answers as context and the courier profile.

evaluation code is present in evaluation-RAG.ipynb
evaluation results based on 100 records:
with gpt-3.5-turbo:

PARTLY_RELEVANT    52
RELEVANT           37
NON_RELEVANT       11

with gpt-4o-mini:

RELEVANT           55
PARTLY_RELEVANT    35
NON_RELEVANT       10

with gpt-4o:

RELEVANT           51
PARTLY_RELEVANT    44
NON_RELEVANT        5

Running entire app using Docker

docker-compose up --build to start application with all dependencies and setup the DBs and monitoring
run the curl commands from below to interact with the server
open monitoring in browser: http://localhost:3000/d/automatedsetupdashboard/courier-support-agent (credentials are admin:admin)
open Qdrant UI in browser: http://localhost:6333/dashboard#/collections
to start app with dependecies again but without resetting the DB and montiring configs, edit he Dockerfile: "setup_dbs=true", "setup_grafana=true" flags. Setting these to false after the first initialisation will aloow you to preserve data between restarts.
to reset the DBs, first stop containers with podman-compose down than run rm -rf tmp_datastore/tmp_qdrant_storage && rm -rf tmp_datastore/postgres_data && rm -rf tmp_datastore/grafana_storage

Request:

$ curl --request POST 'http://127.0.0.1:9696/question' \
--header 'Content-Type: application/json' \
-d '{"question": "Can I keep the provided bike?", "courier_id": 0}'

Question response:

{
        "question":"Can I keep the provided bike?",
        "answer":"No, you cannot keep the provided bike. The bike is for your use during deliveries and must be returned when you no longer work with iDelivery.",
        "conversation_id":"034394fdbc0e4f4593acb12defc5a2f0",
        "model_used":"gpt-4o-mini"
}

Feedback request:

$ curl --request POST 'http://127.0.0.1:9696/feedback' \
--header 'Content-Type: application/json' \
-d '{"conversation_id": "034394fdbc0e4f4593acb12defc5a2f0", "feedback": 1}'

Feedback response:

{
  "message": "Feedback received"
}

Running the Python APP locally

Run commands from root folder:

use Python v3.10.x
pip install pipenv==2025.0.4
pipenv shell
pipenv install to install all dependencies
optionally run pipenv requirements > requirements.txt to regenerate the requirements.txt which is used in the dockerised version of this app
Start docker dependencies: podman-compose up --build qdrant postgres grafana

Qdrant UI: http://localhost:6333/dashboard#/collections
copy and rename app/keys_secret.py.tmp to app/keys_secret.py
fill in app/keys_secret.py with your own OpenAI secret key
run export TOKENIZERS_PARALLELISM=false to disable now noisy warning
run python app/setup_dbs.py to ingest FAQ and Courier profile data to DBs using
run python grafana/init_grafana.py to setup Grafana dashboard with Postgres datascource
start API server running one of:
- gunicorn --bind 0.0.0.0:9696 --chdir=app server:app
run above curl commands to interact with entire system

Monitoring

Application is saving conversations data in PostgresDB. Grafana is used to monitor the application in realtime. Monitoring tracks:

LLM evaluation relevance
courier conversation feedback
OpenAI tokens
OpenAI costs
API response time (including LLM answer generation and LLM evaluation)

Manually setup Grafana for the running application:

It is recommanded to follow the sections that start entire application with dependencies described above. If you still need to do manual Grafana setup follow these steps:

open Grafana UI: http://localhost:3000
setup new datasource for Postgres with host postgres and credentials user and user and disable TLS.
take the ID of the new datascoruce (see ID in URL) and replace all the ef04twmg20feoa in grafana/dashboard.json to the new ID.
create new dashboard in grafana UI by importing the updated grafana/dashboard.json

Links:

Grafana API docs: https://grafana.com/docs/grafana/latest/developers/http_api/data_source/
Datatalks zoomcamp: https://github.com/DataTalksClub/llm-zoomcamp/tree/main
The other Agentic RAG Demo: https://github.com/razorcd/llm-training/tree/main/agent

TODO:

Optional:

use LLM to ask for Contract data when needed
use LLM to ask for more profile information when needed by queying the NoSql DB
use LLM to ask for more FAQ data when needed by queying the Vector DB
use LLM to update Courier profile
use LLM to add new questions and answers to the FAQ DB
add chat history on demand to improuve prompt accuracy

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.ipynb_checkpoints		.ipynb_checkpoints
app		app
dataset		dataset
grafana		grafana
notebooks		notebooks
tmp_datastore/tmp_tinydb_storage		tmp_datastore/tmp_tinydb_storage
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
architecture_design.png		architecture_design.png
conversation.png		conversation.png
courier.png		courier.png
docker-compose.yml		docker-compose.yml
grafana.png		grafana.png
requirements.txt		requirements.txt
start_server.sh		start_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Courier support agent with LLM integration

Project description

Application flow

Architecture design

Conversation example

Technolofgies

Running the Jupyter notebooks locally

Evaluation Vector DB retrieval

Evaluation RAG

Running entire app using Docker

Running the Python APP locally

Monitoring

Manually setup Grafana for the running application:

Links:

TODO:

About

Uh oh!

Releases

Packages

Languages

razorcd/llm-project

Folders and files

Latest commit

History

Repository files navigation

Courier support agent with LLM integration

Project description

Application flow

Architecture design

Conversation example

Technolofgies

Running the Jupyter notebooks locally

Evaluation Vector DB retrieval

Evaluation RAG

Running entire app using Docker

Running the Python APP locally

Monitoring

Manually setup Grafana for the running application:

Links:

TODO:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages