SpeechEmotionRecognition

Determine the importance of speech and text features for the task of emotion recognition.

Brief Summary

Generate transcripts from audio files
Extract text features from the transcripts generated.
Extract audio features like MSF and eGEMAPS from the audio files.
Use the features to generate emotion.

Note: Only two emotions, anger and sadness, were considered for the experiments. These emotions represent their corresponding arousal levels.

Libraries required

tensorflow==2.4.1
numpy==1.19.2
torch==1.9.0
transformers==4.8.2
pandas==1.2.4
librosa==0.8.1
json==2.0.9
sklearn==0.24.2
tqdm==4.61.1

Directory Structure

For more details, follow the README.md files in the corresponding directories. (comming soon)

STFE : Contains some important object definitions that are necessary for all the computations and experiments for this project.
text_csv : Contains the csv files of clean audio files. These files have been directly from the MELD dataset.
noise_csv : Contains all transcripts in .csv files for various configurations of audio files.
text_test : Contains fils that test only on text features for emotion recognition. Helps in choosing appropriate text features.
audio_test : Contains files that test only the audio features for emotion recognition. Helps in choosing appropriate audio features.
audio_text : Contains files essential to determine results for various combinations of noises for text and audio to determine the emotions.
augmentation : Contains files similar to audio_text but solely for testing augmented data.
result : Contains all results and loss graphs from training.
utils : Contains files that are helper files which help in extraction, saving, processing, of data into some other location.

Data set currently used is MELD, https://affective-meld.github.io

Citing

S. Zahiri and J. D. Choi. Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks. In The AAAI Workshop on Affective Content Analysis, AFFCON'18, 2018.

S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, R. Mihalcea. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation. ACL 2019.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpeechEmotionRecognition

Brief Summary

Libraries required

Directory Structure

Citing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
STFE		STFE
audio_test		audio_test
audio_text		audio_text
augmentation		augmentation
noise_csv		noise_csv
result		result
text_csv		text_csv
text_test		text_test
utils		utils
README.md		README.md

done-n-dusted/SpeechEmotionRecognition

Folders and files

Latest commit

History

Repository files navigation

SpeechEmotionRecognition

Brief Summary

Libraries required

Directory Structure

Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages