1 - Intelligent Data Systems Laboratory

A Regression Approach
to Music Emotion Recognition
Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16,
NO. 2, FEBRUARY 2008
Intelligent Database Systems Lab
School of Computer Science & Engineering
Seoul National University, Seoul, Korea
Sung Eun Park
2009-11-20
Contents
 Introduction

Simple concept of the model
 Body

Regression approach

Model Explanation

Evaluation
 Conclusion

Discussion

Contribution

Q&A
Copyright  2008 by CEBT
2
Brief Concept of the Model
♬
♬
♬
♬
♬
Thayer’s arousal-valence emotion plane.
Copyright  2008 by CEBT
3
An application using this concept
 Musicovery based on the same concept of this model.
Find
relevant music
of the point
click
Copyright  2008 by CEBT
4
Regression Approach
 Many good regressor(regression algorithms ) are readily
available.
 Given N inputs (xi, yi), 1≤ i ≤ N, where xi is a feature vector for
the ith input sample, and yi ∈ R is the real value to be
predicted for the ith sample, the regression system trains a
regression algorithm(regressor) R(∙) such that the mean
squared error ε is minimized.
Find this!! Predicted
Value
minimize
Real Value a feature vector
Copyright  2008 by CEBT
5
The model
♬
♬
♬
♬
Feature
Extraction
Musical
Features
Subjective
test
Ground
Truth
Regression
Regressor
Reg.A and Reg.V
♬
Emotion Visualization
Copyright  2008 by CEBT
6
The model in detail
Training Data
Test Data
Preprocessing
Preprocessing
Subjective
Test
Feature
extraction
Feature extraction
Reg.A
Regressor
Training
Reg.A
Reg.V
Emotion
Visualization
Reg.V
Copyright  2008 by CEBT
7
An Issue of the Continuous Perspective
 The dependency between the two dimensions,
arousal and valence


What is the positive music?

Then what is the energetic music?
energetic
Principle Component Analysis(CPA)
is common way of reducing the correlation
between variables.
Computed
by PCA
Original data
calm
Principle
component
Copyright  2008 by CEBT
8
Reducing Correlation Between Variables
AV plane:
some dependency
exists
PC plane:
no dependency
exists
Test in PQ plane and compare with AV plane
Details follow in the later presentation
Copyright  2008 by CEBT
Train regressor
Rp ,Rq
9
Dataset
Training
Data
Preprocessing
Subjective Feature
Test
extraction
Regressor
Training
Reg.A
195 popular songs selected
from a number of Western, Chinese,
and Japanese albums.
1) These songs should be
distributed uniformly in each quadrant
of the emotion plane.
2) Each music sample should express
a certain dominant emotion.
Reg.V
Copyright  2008 by CEBT
10
Subjective Test
Training
Data
Preprocessing
Subjective Feature
Test
extraction
 253 volunteers from the campus
 Is asked to listen to ten music samples
randomly drawn from the music
database and to label the AV values
from –1.0 to 1.0 in 11 ordinal levels.
 Label the evoking emotion rather than
the perceived one
Regressor
Training
 Standard deviation of evaluation to the
same song is 0.3( which is okay)
Reg.A
Reg.V
 Same person tend to label same with
same music.
Copyright  2008 by CEBT
11
Feature Extraction
Training
Data
Preprocessing
Subjective Feature
Test
extraction
Regressor
Training
Reg.A
Reg.V
Copyright  2008 by CEBT
12
Feature Extraction
Training Data
Preprocessing
• Psysound aims to model parameters of
Auditory sensation based on
some psychoacoustic models.
• Earlier research found that 15 of the
features are more closely related to emotion
perception.
Subjective Feature
Test
extraction
Regressor
Training
Reg.A
Reg.V
Copyright  2008 by CEBT
13
Feature Extraction
Training
Data
Preprocessing
Subjective Feature
Test
extraction
 Select features from all extracted features
which is related to Emotion.
 RReliefF is used as a feature extraction
algorithm(FSA).
 RRFm,n is a space with top-m and top-n
selected features.
Regressor
Training
Reg.A
Reg.V
Copyright  2008 by CEBT
14
Regression Algorithms
Training
Data
Preprocessing
Subjective Feature
Test
extraction
Regressor
Training
Reg.A
Reg.V
Three regression algorithms:
1. Multiple linear regression (MLR)
• Assumes lineal relationship
• Simple method
2. Support vector regression (SVR)
• Nonlinearly maps input features
into higher dimensional feature space
• In many cases superior to existing
machine learning methods
3. AdaBoost.RT (BoostR)
• Nonlinear regression algorithm
• A number of regression trees
are trained iteratively and weighted
according to the prediction accuracy
Copyright  2008 by CEBT
15
Evaluation
 Method

R2 Statistics : showing how much prediction and real value are
close.
 AV and PC Plane comparison :

The effect of variance dependency
<
<
Copyright  2008 by CEBT
The best combination
No significant
difference
16
Evaluation
Regressor Comparison
Selected feature space
A plane with no correlation
Copyright  2008 by CEBT
17
Evaluation – The Prediction Accuracy
+ Ground Truth
Prediction Result
The best performance of the regression approach
reaches 58.3% for arousal and 28.1% for valence by using PC
RRF SVR
Copyright  2008 by CEBT
18
Performance Evaluation
 Using same ground truth data and feature data
=100.3
=117.7
Copyright  2008 by CEBT
19
Discussion
 Subjectivity issue

Individual difference : influence of many factors. Cultural
background, generation, sex, and personality.
 GWMER(Group-wise MER scheme)
Users
G1
Regressor
G2
G3
G…
G4
R1
R2
R3
R…
Regressor
Choosing
R4
 Personalization can be an alternative way.
Copyright  2008 by CEBT
20
Contribution
 One of the first attempts that develop an MER system from a
continuous perspective.(Each song maps to a point in the
emotion plane)
 A sound theoretical foundation is proposed.

Regression theory.
 Extensive performance study.

Several algorithms are tested
 Dealing with subjectivity issues of Music Emotion
Retrieval(MER).

Emotion is different from person to person

Two demensions in emotion plane are not dependent.
Copyright  2008 by CEBT
21
Q&A
Thank
Thankyou…
you…