A real-time voice assistant using FastRTC for low-latency audio communication between browsers and a Python backend, powered by DeepSeek LLM and ElevenLabs voice synthesis.
FastRTC Voice Assistant creates a seamless, responsive voice assistant experience by leveraging WebRTC technology for real-time audio streaming. The application connects browser clients to a Python backend server, where audio is processed through DeepSeek's large language model for understanding and generating responses, which are then converted to natural-sounding speech using ElevenLabs' voice synthesis technology.
- Real-time audio streaming using WebRTC for ultra-low latency
- Advanced speech recognition and natural language understanding with DeepSeek LLM
- High-quality, natural-sounding voice responses through ElevenLabs TTS
- Contextual conversation capabilities with memory of previous interactions
- Simple browser-based client interface
- Customizable voice personas and response styles
- Frontend: JavaScript, HTML/CSS
- Backend: Python with FastAPI
- WebRTC: For real-time audio communication
- AI Models:
- DeepSeek LLM for natural language understanding and response generation
- ElevenLabs for high-quality text-to-speech conversion
- Python 3.7+
- Modern web browser with WebRTC support (Chrome, Firefox, Edge, etc.)
- DeepSeek API key
- ElevenLabs API key
-
Clone the repository:
git clone https://github.com/twelve2five/fastrtc-voice-assistant.git cd fastrtc-voice-assistant -
Install the required dependencies:
pip install -r requirements.txt
-
Set up environment variables for API keys:
export DEEPSEEK_API_KEY="your_deepseek_api_key" export ELEVENLABS_API_KEY="your_elevenlabs_api_key"
Alternatively, create a
.envfile in the project root directory:DEEPSEEK_API_KEY=your_deepseek_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key
The application uses DeepSeek's language model for natural language understanding and response generation. You can customize the model parameters in config.py:
- Model selection (DeepSeek-V2, etc.)
- Temperature and top-p settings for response generation
- System prompt and conversation context management
- Custom knowledge base integration
Voice synthesis is handled by ElevenLabs' API. Configure voice settings in config.py:
- Voice ID selection
- Stability and similarity boost settings
- Speech rate and pitch adjustments
- Custom voice cloning options (if using premium features)
-
Start the application:
python app.py
-
Open your web browser and navigate to the displayed URL (typically http://localhost:8080)
-
Allow microphone access when prompted by your browser
-
Start speaking to interact with the voice assistant
-
The workflow is as follows:
- Your voice is streamed in real-time to the server via WebRTC
- Audio is transcribed and processed by DeepSeek's LLM
- A response is generated based on your query and conversation history
- ElevenLabs converts the text response to natural-sounding speech
- The audio response is streamed back to your browser
Create different assistant personalities by editing the system prompts in prompts.py. You can define different conversation styles, knowledge domains, and personality traits.
The assistant maintains conversation history to provide contextually relevant responses. You can adjust the context window size and management strategy in the configuration.
app.py- Main application entry pointwebrtc_handler.py- Backend WebRTC connection handlingwebrtc_client.js- Frontend WebRTC implementationllm_service.py- DeepSeek LLM integration for text processingtts_service.py- ElevenLabs integration for speech synthesisconfig.py- Application configurationdebug.py- Debugging utilities
For development and debugging purposes, you can run: This provides additional logging and diagnostic information useful during development, including:
- WebRTC connection statistics
- DeepSeek API request/response logs
- ElevenLabs API request/response logs
- Audio processing metrics
- Adjust audio sampling rates and buffer sizes in
config.pyto balance quality and latency - Configure DeepSeek model parameters for faster response times
- Tune ElevenLabs voice settings for optimal audio quality
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- DeepSeek for their powerful language model
- ElevenLabs for their natural-sounding text-to-speech technology
- FastRTC for enabling real-time communication capabilities