Agent Voice Response - OpenAI Speech-to-Speech Integration

This repository showcases the integration between Agent Voice Response and OpenAI's Real-time Speech-to-Speech API. The application leverages OpenAI's powerful language model to process audio input from users, providing intelligent, context-aware responses in real-time audio format.

Prerequisites

To set up and run this project, you will need:

Node.js and npm installed
An OpenAI API key with access to the real-time API
WebSocket support in your environment

Setup

1. Clone the Repository

git clone https://github.com/agentvoiceresponse/avr-sts-openai.git
cd avr-sts-openai

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root of the project to store your API keys and configuration. You will need to add the following variables:

OPENAI_API_KEY=your_openai_api_key
PORT=6030
OPENAI_MODEL=gpt-4o-realtime-preview  # Optional, defaults to gpt-4o-realtime-preview
OPENAI_INSTRUCTIONS="You are a helpful assistant that can answer questions and help with tasks."  # Optional
OPENAI_TEMPERATURE=0.8  # Optional, controls randomness (0.0-1.0), defaults to 0.8
OPENAI_MAX_TOKENS=100  # Optional, controls response length, defaults to "inf"

Replace your_openai_api_key with your actual OpenAI API key.

4. Running the Application

Start the application by running the following command:

node index.js

The server will start on the port defined in the environment variable (default: 6030).

How It Works

The Agent Voice Response system integrates with OpenAI's Real-time Speech-to-Speech API to provide intelligent audio-based responses to user queries. The server receives audio input from users, forwards it to OpenAI's API, and then returns the model's response as audio in real-time using WebSocket communication.

Key Components

Express.js Server: Handles incoming audio streams from clients
WebSocket Communication: Manages real-time communication with OpenAI's API
Audio Processing: Handles audio format conversion between 8kHz and 24kHz
Real-time Streaming: Processes and streams audio data in real-time

Audio Processing

The application includes two main audio processing functions:

Upsampling (8kHz to 24kHz):
- Converts client audio from 8kHz to 24kHz using linear interpolation
- Required for OpenAI's API which expects 24kHz input
Downsampling (24kHz to 8kHz):
- Converts OpenAI's 24kHz output back to 8kHz
- Ensures compatibility with client audio systems

API Endpoints

POST `/speech-to-speech-stream`

This endpoint accepts an audio stream and returns a streamed audio response generated by OpenAI.

Request:

Content-Type: audio/x-raw
Format: 16-bit PCM at 8kHz
Method: POST

Response:

Content-Type: text/event-stream
Format: 16-bit PCM at 8kHz
Streamed audio data in real-time

Customizing the Application

Environment Variables

You can customize the application behavior using the following environment variables:

OPENAI_API_KEY: Your OpenAI API key (required)
PORT: The port on which the server will listen (default: 6030)
OPENAI_MODEL: The OpenAI model to use (default: gpt-4o-realtime-preview)
OPENAI_INSTRUCTIONS: Custom instructions for the AI (optional)
OPENAI_TEMPERATURE: Controls randomness in responses (0.0-1.0, default: 0.8)
OPENAI_MAX_TOKENS: Controls the maximum length of the response (default: "inf")

Error Handling

The application includes comprehensive error handling for:

WebSocket connection issues
Audio processing errors
OpenAI API errors
Stream processing errors

All errors are logged to the console and appropriate error messages are returned to the client.

Support & Community

GitHub: https://github.com/agentvoiceresponse - Report issues, contribute code.
Discord: https://discord.gg/DFTU69Hg74 - Join the community discussion.
Docker Hub: https://hub.docker.com/u/agentvoiceresponse - Find Docker images.
NPM: https://www.npmjs.com/~agentvoiceresponse - Browse our packages.
Wiki: https://wiki.agentvoiceresponse.com/en/home - Project documentation and guides.

Support AVR

AVR is free and open-source. If you find it valuable, consider supporting its development:

License

MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent Voice Response - OpenAI Speech-to-Speech Integration

Prerequisites

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Running the Application

How It Works

Key Components

Audio Processing

API Endpoints

POST `/speech-to-speech-stream`

Customizing the Application

Environment Variables

Error Handling

Support & Community

Support AVR

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

agentvoiceresponse/avr-sts-openai

Folders and files

Latest commit

History

Repository files navigation

Agent Voice Response - OpenAI Speech-to-Speech Integration

Prerequisites

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Running the Application

How It Works

Key Components

Audio Processing

API Endpoints

POST /speech-to-speech-stream

Customizing the Application

Environment Variables

Error Handling

Support & Community

Support AVR

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

POST `/speech-to-speech-stream`

Packages