Skip to content

razorcd/llm-project

Repository files navigation

Courier support agent with LLM integration

Courier support agent with LLM integration

Project description

The iDelivery Courier Support Platform is a server-side application that hosts a sophisticated Conversational AI Agent for real-time engagement with food delivery personnel. This system utilizes a Retrieval-Augmented Generation (RAG) architecture, leveraging an OpenAI-powered (LLM) for natural dialogue synthesis, grounded by domain-specific knowledge of courier profile data stored in NoSql and comprehensive FAQ content stored in Vector Database. The aplication does realtime conversation evaluation and offers a dashboard for monitoring.

The input data using in this project was generated with AI. The food delivery company is called iDelveiry. Courier profiles, food delivery courier relevant FAQs and courier contracts are all fictional. You can find the generated dataset in dataset folder and the prompts in the dataset/prompts folder.

Application flow

The courier maintains a conversation by sending questions to a server and receiving relevant answers.

The question is first used to search in a vector FAQ DB for more questions and answers with high similarity. The courier's profile is also loaded. An AI prompt is generated using the followings: question, non sensitive data from courier profile, similar FAQ questions as company related information for more context. The prompt is sent to the LLM to act as a virtual agent and returns relevant answer. Then the courier question and the LLM answer is added to another prompt and sent to LLM again to evaluate the accuracy of the answer. This part is very important for identifying which conversations are irelevant and what kind of questions don't have enough context in from FAQ DB. After all this, the system stores metadata of the conversation for monitoring and sends the answer to the courier. At the same time, the coutier has the option to send feedback for the conversation based on a conversation ID. This feedback information is persisted and is very useful to improuve the quality of the FAQ questions and the system overall.

Architecture design

Architecture design

Conversation example

The conversation always focuses on delivery topics due to the context the system is providing. Even with general offtopic questions, the conversation remains focused on the delivery context. The virtual agent remembers the past conversation context between questions, this makes conversations more humain. The fact that the virtual agent also knows some non sensitive private information about the courier makes the conversation more engaging.

conversation

Technolofgies

  • Python 3.10
  • Qdrant vector DB to store FAQ data
  • TinyDB NoSql DB to store Courier profile data
  • OpenAi as LLM for RAG AI integration
  • OpenAi as LLM evaluator
  • Flask as API app server
  • Grafana + PostgresDb for realtime monitoring

Running the Jupyter notebooks locally

  • use Python v3.10.x
  • pip install pipenv
  • pipenv shell
  • pipenv install to install all dependencies
  • Start Qdrant: docker run --rm -p 6333:6333 -p 6334:6334 -v "$(pwd)/tmp_datastore/tmp_qdrant_storage:/qdrant/storage:z" qdrant/qdrant
  • To access Qdrant UI open in browser: http://localhost:6333/dashboard#/collections
  • python -m ipykernel install --user --name=my_openai_env --display-name="OpenAI Project"
  • run jupyter notebook
  • in Jupyter notebook select Python kernel "OpenAI Project"
  • copy notebooks/keys_secret.py.tmp to notebooks/keys_secret.py and add your OpenAI API key to keys_secret.py
  • run notebooks in order: - main.ipynb - evaluation-generating_ground_truth.ipynb - evaluation-retrieval.ipynb - evaluation-RAG.ipynb.ipynb

Evaluation Vector DB retrieval

  • present in evaluation-retrieval.ipynb
  • initial evaluation results using default query parameters: {'hit_rate': 0.84, 'mrr': 0.71}
  • after evaluating multiple query parameter combinations, results have improuved to: {'hit_rate': 0.94, 'mrr': 0.862} using params: 'score_threshold': 0.7,'limit': 5}

Evaluation RAG

Evaluation was done by sending the question, LLM answer and correct answer to LLM using OpenAI 3.5 Turbo.

I evaluated generating the LLM answers separately with gpt-3.5-turbo, gpt-4o-mini and gpt-4o. LLM answer was generated based on the provided prompt template with FAQ answers as context and the courier profile.

  • evaluation code is present in evaluation-RAG.ipynb
  • evaluation results based on 100 records:
  • with gpt-3.5-turbo:
PARTLY_RELEVANT    52
RELEVANT           37
NON_RELEVANT       11
  • with gpt-4o-mini:
RELEVANT           55
PARTLY_RELEVANT    35
NON_RELEVANT       10
  • with gpt-4o:
RELEVANT           51
PARTLY_RELEVANT    44
NON_RELEVANT        5

Running entire app using Docker

  • docker-compose up --build to start application with all dependencies and setup the DBs and monitoring
  • run the curl commands from below to interact with the server
  • open monitoring in browser: http://localhost:3000/d/automatedsetupdashboard/courier-support-agent (credentials are admin:admin)
  • open Qdrant UI in browser: http://localhost:6333/dashboard#/collections
  • to start app with dependecies again but without resetting the DB and montiring configs, edit he Dockerfile: "setup_dbs=true", "setup_grafana=true" flags. Setting these to false after the first initialisation will aloow you to preserve data between restarts.
  • to reset the DBs, first stop containers with podman-compose down than run rm -rf tmp_datastore/tmp_qdrant_storage && rm -rf tmp_datastore/postgres_data && rm -rf tmp_datastore/grafana_storage

Request:

$ curl --request POST 'http://127.0.0.1:9696/question' \
--header 'Content-Type: application/json' \
-d '{"question": "Can I keep the provided bike?", "courier_id": 0}'

Question response:

{
        "question":"Can I keep the provided bike?",
        "answer":"No, you cannot keep the provided bike. The bike is for your use during deliveries and must be returned when you no longer work with iDelivery.",
        "conversation_id":"034394fdbc0e4f4593acb12defc5a2f0",
        "model_used":"gpt-4o-mini"
}

Feedback request:

$ curl --request POST 'http://127.0.0.1:9696/feedback' \
--header 'Content-Type: application/json' \
-d '{"conversation_id": "034394fdbc0e4f4593acb12defc5a2f0", "feedback": 1}'     

Feedback response:

{
  "message": "Feedback received"
}

Running the Python APP locally

Run commands from root folder:

  • use Python v3.10.x
  • pip install pipenv==2025.0.4
  • pipenv shell
  • pipenv install to install all dependencies
  • optionally run pipenv requirements > requirements.txt to regenerate the requirements.txt which is used in the dockerised version of this app
  • Start docker dependencies: podman-compose up --build qdrant postgres grafana
  • Qdrant UI: http://localhost:6333/dashboard#/collections
  • copy and rename app/keys_secret.py.tmp to app/keys_secret.py
  • fill in app/keys_secret.py with your own OpenAI secret key
  • run export TOKENIZERS_PARALLELISM=false to disable now noisy warning
  • run python app/setup_dbs.py to ingest FAQ and Courier profile data to DBs using
  • run python grafana/init_grafana.py to setup Grafana dashboard with Postgres datascource
  • start API server running one of:
    • gunicorn --bind 0.0.0.0:9696 --chdir=app server:app
  • run above curl commands to interact with entire system

Monitoring

Application is saving conversations data in PostgresDB. Grafana is used to monitor the application in realtime. Monitoring tracks:

  • LLM evaluation relevance
  • courier conversation feedback
  • OpenAI tokens
  • OpenAI costs
  • API response time (including LLM answer generation and LLM evaluation)

grafana

Manually setup Grafana for the running application:

It is recommanded to follow the sections that start entire application with dependencies described above. If you still need to do manual Grafana setup follow these steps:

  • open Grafana UI: http://localhost:3000
  • setup new datasource for Postgres with host postgres and credentials user and user and disable TLS.
  • take the ID of the new datascoruce (see ID in URL) and replace all the ef04twmg20feoa in grafana/dashboard.json to the new ID.
  • create new dashboard in grafana UI by importing the updated grafana/dashboard.json

Links:

TODO:

  • generate random Delviery courier profiles unsig AI
  • persist courier profiles to a NoSQL DB (TinyDb)
  • generate random FAQ courier questions unsig AI
  • persist FAQ and answers to a Vector DB (Qdrant)
  • generate Courier working contracts (employee and freelance) using AI for a file
  • generate complete prompt with non private courier profile information, courier question and best matching FAQ data
  • use LLM to get an answer
  • generate ground truth data for evaluation
  • implement evaluation retrieval
  • hyperparameter tuning for evaluation retrieval
  • implement evaluation RAG
  • evaluation of different LLMs for RAG
  • put all code behind an API
  • add LLM realtime evaluation
  • add Grafana realtime monitoring
  • dockerise application
  • add better logging

Optional:

  • use LLM to ask for Contract data when needed
  • use LLM to ask for more profile information when needed by queying the NoSql DB
  • use LLM to ask for more FAQ data when needed by queying the Vector DB
  • use LLM to update Courier profile
  • use LLM to add new questions and answers to the FAQ DB
  • add chat history on demand to improuve prompt accuracy

About

RAG project with evaluation and monitoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published