A Multi-Modal approach towards

Music Emotion Recognition

Using Audio, Lyrics and Electrodermal Activity

Image by Gordon Johnson from Pixabay

The Dimensional Model of Emotion

Russel’s 2D Emotional Space

♬ Data Preprocessing and Feature Extraction

Variance in the Dataset for 1) STATIC and 2) DYNAMIC Annotations

Audio Data

Electrodermal Activity (EDA) Data

Representation of EDA data for one particular song for 10 different people

Lyrical Data

Format of the Lyrics provided in the PMEmo Dataset
Final Data Distribution across the three modalities

♬ Uni-modal Training and Results

Training and Cross-Validation

Optimization

Scoring Metric

Results

RMSE Scores for Regressor Models tested on Static Feature Sets for 1) Arousal and 2) Valence
RMSE Scores for Regressor Models tested on DynamicFeature Sets for 1) Arousal and 2) Valence

♪ Observations and Analysis

Best Models as per RMSE Scoring Metric

♬ Multi-modal Training and Results

Modelling of 1) Stacking Regressor and 2) Voting Regressor

♪ Results

♪ Analysis and Observations

RMSE Test Scores for Ensemble Estimators
Actual vs Predicted Valued for Arousal and Valence using Static Features.
Actual vs Predicted Valued for Arousal and Valence using Dynamic Features.
Residual Plots for 1) Static Ensemble Model and 2) Dynamic Ensemble Model Predictions over Test Data

♬ Conclusion

♬ Future Work and Scope

♬ Acknowledgements

♬ References

I am a sample size of one, neither statistically significant nor representative. I’m an outlier, though! ✿