A voice AI agent built with LiveKit's Agent Framework 1.x that supports real-time speech-to-text, language model processing, and text-to-speech capabilities.
- Real-time voice conversation with STT-LLM-TTS pipeline
- Function tools for extended capabilities (time, weather, calculations, reminders)
- Noise cancellation powered by LiveKit Cloud
- Turn detection for natural conversation flow
- Multiple AI providers integration (Deepgram, OpenAI, Cartesia)
- Console and web modes for testing and deployment
- Python 3.9 or later
- LiveKit Cloud account (or self-hosted LiveKit server)
- API keys for your chosen AI providers (see comparison below)
| LLM Providers | Strengths | Best For |
|---|---|---|
| OpenAI | High quality, widely supported | General purpose, complex reasoning |
| Groq | Ultra-fast inference (10x faster) | Real-time applications, cost optimization |
| Anthropic Claude | Safety-focused, excellent reasoning | Analysis, creative writing, ethical AI |
| Google Gemini | Multimodal, multilingual | Creative tasks, translations, vision |
| AWS Bedrock | Enterprise-grade, multiple models | Large organizations, compliance |
| Ollama | Local, private, free | Privacy-focused, offline applications |
| STT Providers | Strengths | Best For |
|---|---|---|
| Deepgram | Fast, accurate, cost-effective | Recommended for most use cases |
| OpenAI Whisper | High accuracy, multilingual | High-quality transcription |
| Google Cloud | Good accuracy, Google ecosystem | Google-integrated applications |
| TTS Providers | Strengths | Best For |
|---|---|---|
| Cartesia | High quality, fast, affordable | Recommended for most use cases |
| ElevenLabs | Premium voice quality, cloning | High-end applications, custom voices |
| OpenAI TTS | Good quality, simple | OpenAI-integrated applications |
pip install -r requirements.txtCreate a .env file
Edit the `.env` file with your actual API keys:
```env
# LiveKit Cloud Configuration (Required)
LIVEKIT_API_KEY=your_livekit_api_key_here
LIVEKIT_API_SECRET=your_livekit_api_secret_here
LIVEKIT_URL=wss://your-project.livekit.cloud
# AI Provider API Keys for Groq Agent
GROQ_API_KEY=your_groq_api_key_here
CARTESIA_API_KEY=your_cartesia_api_key_here
python agent_groq.py download-filesTest your agent in different modes:
python agent_groq.py consoleSpeak directly with the agent using your computer's microphone and speakers.
python agent_groq.py devConnect to LiveKit and use the Agents Playground for web interaction.
python agent_groq.py startRun the agent with production optimizations.
A lightning-fast voice assistant powered entirely by Groq:
- STT: Groq Whisper Large V3 (ultra-fast speech recognition)
- LLM: Groq Llama 3 70B (powerful conversation model)
- TTS: Cartesia Sonic (high-quality voice synthesis)
- Function Tools: Time, weather, and general assistance
- β‘ Ultra-fast responses - Groq's optimized inference
- ποΈ Real-time voice conversation
- π οΈ Function calling - Get time, weather, calculations
- π High-quality voice - Cartesia TTS
- π― Accurate transcription - Groq Whisper
Console Mode (Local Testing):
python agent_groq.py consoleDevelopment Mode (Connect to LiveKit):
python agent_groq.py devProduction Mode:
python agent_groq.py startUser Audio β Groq Whisper STT β Groq Llama 3 LLM β Cartesia TTS β User Audio
β
Function Tools
- Time/Date
- Weather
- General Assistance
LiveKit Agent Framework supports 15+ AI providers. You can mix and match:
# OpenAI (Original)
llm=openai.LLM(model="gpt-4o-mini")
# Groq (Lightning fast)
llm=groq.LLM(model="llama3-8b-8192")
# Anthropic Claude
llm=anthropic.LLM(model="claude-3-haiku-20240307")
# Google Gemini
llm=google.LLM(model="gemini-1.5-flash")
# AWS Bedrock (Multiple models)
llm=aws.LLM(model="anthropic.claude-3-sonnet-20240229-v1:0")
# Ollama (Local models)
llm=openai.LLM(model="llama3", base_url="http://localhost:11434/v1/")# Deepgram (Recommended)
stt=deepgram.STT(model="nova-3")
# OpenAI Whisper
stt=openai.STT(model="whisper-1")
# Google Cloud Speech
stt=google.STT()
# Azure Speech
stt=azure.STT()# Cartesia (High quality)
tts=cartesia.TTS(model="sonic-2")
# ElevenLabs (Premium voices)
tts=elevenlabs.TTS(voice="Josh")
# OpenAI TTS
tts=openai.TTS(voice="nova")
# Google Cloud TTS
tts=google.TTS()session = AgentSession(
stt=deepgram.STT(model="nova-3"), # Fast, accurate
llm=groq.LLM(model="llama3-8b-8192"), # Lightning fast inference
tts=elevenlabs.TTS(voice="Josh"), # Premium voice quality
vad=silero.VAD.load(),
turn_detection=MultilingualModel(),
)Create new function tools:
@function_tool
async def my_custom_tool(context: RunContext, parameter: str) -> str:
"""Description of what this tool does."""
# Your logic here
return "Result"
# Add to agent
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="Your instructions",
tools=[my_custom_tool] # Add your tool here
)Update the agent's instructions:
class CustomAssistant(Agent):
def __init__(self):
super().__init__(
instructions="You are a friendly assistant specialized in..."
)- Missing API Keys: Ensure all required environment variables are set
- Model Download Fails: Run
python agent.py download-filesagain - Audio Issues: Check microphone permissions and audio device settings
- Connection Issues: Verify LiveKit credentials and network connectivity
Add logging to debug issues:
import logging
logging.basicConfig(level=logging.DEBUG)Create workflows with multiple specialized agents. See LiveKit documentation for examples.
Connect your agent to phone systems using LiveKit's SIP integration.
Build web or mobile apps using LiveKit's client SDKs to create custom user interfaces.
This project is open source and available under the Apache 2.0 License.