This repository provides a real-time speech-to-text transcription service using Google Cloud Speech-to-Text API integrated with the Agent Voice Response system. The code sets up an Express.js server that accepts audio streams from Agent Voice Response Core, transcribes the audio using the Google Cloud API, and streams the transcription back to the Agent Voice Response Core in real-time.
Before setting up the project, ensure you have the following:
- Node.js and npm installed.
- A Google Cloud account with the Speech-to-Text API enabled.
- A Service Account Key from Google Cloud with the necessary permissions to access the Speech-to-Text API.
git clone https://github.com/agentvoiceresponse/avr-asr-google-cloud-speech.git
cd avr-asr-google-cloud-speech
npm install
Create a keyfile.json
by downloading your service account key from Google Cloud. Then, set the environment variable to use this key in your Node.js application:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
Alternatively, you can set this variable in your .env
file (you can use the dotenv
package for loading environment variables).
Ensure that you have the following environment variables set in your .env
file:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/keyfile.json
PORT=6001
SPEECH_RECOGNITION_LANGUAGE=en-US
SPEECH_RECOGNITION_MODEL=telephony
You can adjust the port number as needed.
This application sets up an Express.js server that accepts audio streams from clients and uses Google Cloud Speech-to-Text API to transcribe the audio in real-time. The transcribed text is then streamed back to the Agent Voice Response Core. Below is an overview of the core components:
The server listens for audio streams on a specific route (/audio-stream
) and passes the incoming audio to the Google Cloud API for real-time transcription.
A custom class that extends Node.js’s Writable
stream is used to write the incoming audio data to the Google Cloud API.
The API processes the audio data received from the client and converts it into text using speech recognition models. The results are then streamed back to the client in real-time.
This route accepts audio streams from the client and transmits the audio for transcription. The transcription is sent back to the client as soon as it’s available.
Here’s a high-level breakdown of the key parts of the code:
-
Server Setup: Configures the Express.js server and the Google Cloud Speech-to-Text API.
-
Audio Stream Handling: A function,
handleAudioStream
, processes the incoming audio from clients. It:- Initializes a
Speech API recognize stream
. - Sets up event listeners to handle
error
,data
, andend
events. - Creates an
AudioWritableStream
instance that pipes the incoming audio to the Speech API. - Sends the transcriptions back to the client through the HTTP response stream.
- Initializes a
-
Express.js Route: The route
/audio-stream
calls thehandleAudioStream
function when a client connects.
To start the application:
npm run start
or
npm run start:dev
The server will start and listen on the port specified in the .env
file or default to PORT=6001
.
You can send audio streams to the /audio-stream
endpoint using a client that streams audio data (e.g., a browser, mobile app, or another Node.js service). Ensure that the audio stream is compatible with the Google Cloud Speech-to-Text API format.