This project implements a basic voice authentication system in Python that operates entirely offline after an initial setup phase. It allows users to register their voiceprint associated with a username (stored in an SQLite database) and then authenticate using their voice later. Key features include multi-sample enrollment for robustness, an energy check to prevent authentication on silence, similarity score display with visualization, and a placeholder for liveness detection. Noise reduction is available but currently disabled due to test results showing better speaker discrimination without it in this setup.
-
Database Setup: A simple SQLite database (
voice_auth_resemblyzer.db
) is used, managed bydb_utils.py
. It contains a single tableusers
with columns:username
(Primary Key),voiceprint
(BLOB - stores a 256-dimension embedding), andregistration_date
(TEXT). The database and table are automatically created on the first run if they don't exist. -
Registration (
register.py
):- The user provides a username.
- The script checks if the username already exists in the database. If so, it prompts for overwrite confirmation.
- The user is prompted to speak a fixed, phonetically rich passphrase (e.g., "The quick brown fox...") multiple times (default: 3).
- Each recording (default: 10 seconds) is saved as a WAV file backup in
recordings/
. resemblyzer
(preprocess_wav
andembed_utterance
) extracts a speaker embedding (voiceprint) from each valid recording.- These multiple embeddings are averaged to create a more stable reference voiceprint for the user.
- The averaged voiceprint (NumPy array converted to BLOB) and timestamp are saved into the
users
table invoice_auth_resemblyzer.db
.
-
Authentication (
authenticate.py
):- The user provides their username.
- The script queries the database to retrieve the stored (averaged) voiceprint BLOB.
- The BLOB is converted back into a NumPy array.
- The user is given a generic prompt to speak clearly for the set duration (e.g., 10 seconds). While any speech can be used, using the * via unique usernames (primary key) in the SQLite database.
- Database Storage: Uses SQLite (
voice_auth_resemblyzer.db
) for storing voiceprints. - Silence Detection: Includes an RMS energy check on the recorded audio file to fail authentication if the input signal is too quiet.
- Liveness Detection (Placeholder): Includes a placeholder function where liveness detection logic would reside. WARNING: Does not provide actual anti-spoofing.
- Confidence Score: Displays the calculated cosine similarity score.
- Similarity Visualization: Displays a simple text-based bar comparing the score to the threshold.
- Increased Duration: Uses longer recording time (10s) to potentially capture more stable embeddings.
- Python 3.7+
- Libraries listed in
requirements.txt
:numpy
scipy
sounddevice
wave
resemblyzer
librosa
torch
(CPU version is sufficient)torchaudio
- PortAudio: A cross-platform audio I/O library required by
sounddevice
.
- Clone or download the repository.
- Install PortAudio: (See previous instructions based on OS).
- Install Python dependencies: Navigate to the project directory and run:
pip install -r requirements.txt
- Download Resemblyzer Model (First Run Only): The first time you run
python register.py
orpython authenticate.py
,resemblyzer
will download its model (requires internet). Subsequent runs are offline.
- The system uses an SQLite database file named
voice_auth_resemblyzer.db
. - It contains one table:
users (username TEXT PRIMARY KEY, voiceprint BLOB, registration_date TEXT)
. - The
voiceprint
column stores the averaged NumPy embedding array as a BLOB.
- Setup: Ensure requirements are installed and the Resemblyzer model has been downloaded (run once with internet). Delete any older
.db
files if switching configurations. - Run Registration:
python register.py
- Follow prompts to enter a username.
- Record the fixed passphrase (e.g., "The quick brown fox...") clearly 3 times.
- Run Authentication:
python authenticate.py
- Enter the registered username.
- Speak clearly for 10 seconds (any phrase).
- The system performs checks (liveness placeholder, energy check), compares embeddings, and displays the result.
- Database & Recordings:
voice_auth_resemblyzer.db
stores user data.recordings/
contains original WAV backups.
- Initial Setup (Requires Internet): Installing Python libraries (
pip install
) and the automatic one-time download of the speaker embedding model byresemblyzer
. - Runtime (Fully Offline): Once set up, all operations run locally using
sounddevice
,resemblyzer
(with cached model),sqlite3
,numpy
,scipy
. Noise reduction (if re-enabled) also runs offline.
- Audio Recording: Captured via
sounddevice
, saved as WAV. - Energy Check (Auth): RMS energy calculated from the saved WAV file; below-threshold signals rejected.
- Resemblyzer
preprocess_wav
: Reads audio from file path, performs internal processing (likely VAD, resampling). - Resemblyzer
embed_utterance
: Generates embedding from preprocessed audio data. - Averaging (Registration): Embeddings from multiple recordings are averaged.
- Threshold Tuning (
SIMILARITY_THRESHOLD
): CRITICAL. Start around 0.70 and adjust based rigorously on genuine user vs. imposter testing. The goal is to find a value that maximizes the separation. - Multi-Sample Enrollment: Averaging multiple registration samples aims to create a more stable and representative voiceprint, potentially improving robustness.
- Longer Duration: 10 seconds provides more data for embedding generation, potentially improving stability.
- Energy Threshold (
MIN_RMS_ENERGY
): Prevents authentication on silence. Tune if it incorrectly rejects quiet speech or accepts background noise. - Passphrase (Registration): Using a phonetically rich phrase during multi-sample registration might help capture more voice characteristics. Authentication can be attempted with any phrase.
- Speaker Distinguishability: This remains the hardest challenge. Even with enhancements, Resemblyzer may struggle with very similar voices. The observed gap between genuine and imposter scores dictates the system's practical security.
- Voice Registration (Multi-Sample): Records user voice multiple times for a fixed passphrase, averages embeddings, and saves the robust voiceprint to an SQLite database.
- Voice Authentication: Compares new voice input (generic prompt) against a stored voiceprint.
- Offline Capability: Works entirely offline after initial library installation and Resemblyzer model download.
- Basic CLI: Simple command-line interface.
- Multi-user Support: Managed via unique usernames in the database.
- Database Storage: Uses SQLite (
voice_auth_resemblyzer.db
) for storing voiceprints (256-dim float64). - Silence Detection: Includes an RMS energy check (reading from file) to fail authentication on quiet input.
- Liveness Detection (Placeholder): Includes a non-functional placeholder.
- Confidence Score: Displays the calculated cosine similarity score.
- Similarity Visualization: Displays a simple text-based bar showing the score relative to the threshold.
- Python 3.7+
- Libraries listed in
requirements.txt
(or install manually):numpy
scipy
sounddevice
wave
resemblyzer
librosa
torch
(CPU version is sufficient)torchaudio
noisereduce
(Optional: Only needed if noise reduction is re-enabled in the code)
- PortAudio: A cross-platform audio I/O library required by
sounddevice
.
- Clone or download the repository.
- Install PortAudio: (See previous instructions for Linux/macOS/Windows).
- Install Python dependencies: Navigate to the project directory and run:
(Or use the updated
pip install numpy scipy sounddevice wave resemblyzer librosa torch torchaudio # Add noisereduce if needed
requirements.txt
if provided). - Download Resemblyzer Model (First Run Only): The first time you run
register.py
orauthenticate.py
,resemblyzer
downloads its model (requires internet). Subsequent runs are offline.
- Uses SQLite file:
voice_auth_resemblyzer.db
. - Table:
users (username TEXT PRIMARY KEY, voiceprint BLOB, registration_date TEXT)
. voiceprint
stores the 256-dimension float64 NumPy embedding array as a BLOB.- Managed via
db_utils.py
.
- Setup: Install dependencies, ensure PortAudio is working. Run once with internet for model download. Delete any older
.db
files if switching between versions. - Run Registration:
python register.py
- Enter username. Record the same fixed passphrase clearly for each of the 3 requested samples. An averaged voiceprint is saved.
- Run Authentication:
python authenticate.py
- Enter registered username. Speak clearly for the duration when prompted (using the registration passphrase recommended for consistency).
- System performs checks (liveness placeholder, energy check), compares embeddings, displays results.
- Database & Recordings:
voice_auth_resemblyzer.db
stores user data.recordings/
contains WAV backups.
Relies on the locally cached Resemblyzer model, standard Python libraries (sqlite3
, wave
, etc.), and audio processing libraries (numpy
, scipy
, sounddevice
) that run locally. Internet needed only for initial setup.
- Audio Recording: Captured via
sounddevice
, saved as WAV. - Energy Check (Authentication): RMS calculated from the WAV file; fails if below threshold.
- Resemblyzer
preprocess_wav
: Handles reading audio, performs Voice Activity Detection (VAD), and ensures correct internal sample rate. - Resemblyzer
embed_utterance
: Generates the speaker embedding from the preprocessed audio. - Averaging (Registration): Embeddings from multiple samples are averaged.