A Regression Approach to Music Emotion Recognition Yi-Hsuan Yang, Yu-Ching Lin, Ya-Fan Su, and Homer H. Chen, Fellow, IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Sung Eun Park 2009-11-20 Contents Introduction Simple concept of the model Body Regression approach Model Explanation Evaluation Conclusion Discussion Contribution Q&A Copyright 2008 by CEBT 2 Brief Concept of the Model ♬ ♬ ♬ ♬ ♬ Thayer’s arousal-valence emotion plane. Copyright 2008 by CEBT 3 An application using this concept Musicovery based on the same concept of this model. Find relevant music of the point click Copyright 2008 by CEBT 4 Regression Approach Many good regressor(regression algorithms ) are readily available. Given N inputs (xi, yi), 1≤ i ≤ N, where xi is a feature vector for the ith input sample, and yi ∈ R is the real value to be predicted for the ith sample, the regression system trains a regression algorithm(regressor) R(∙) such that the mean squared error ε is minimized. Find this!! Predicted Value minimize Real Value a feature vector Copyright 2008 by CEBT 5 The model ♬ ♬ ♬ ♬ Feature Extraction Musical Features Subjective test Ground Truth Regression Regressor Reg.A and Reg.V ♬ Emotion Visualization Copyright 2008 by CEBT 6 The model in detail Training Data Test Data Preprocessing Preprocessing Subjective Test Feature extraction Feature extraction Reg.A Regressor Training Reg.A Reg.V Emotion Visualization Reg.V Copyright 2008 by CEBT 7 An Issue of the Continuous Perspective The dependency between the two dimensions, arousal and valence What is the positive music? Then what is the energetic music? energetic Principle Component Analysis(CPA) is common way of reducing the correlation between variables. Computed by PCA Original data calm Principle component Copyright 2008 by CEBT 8 Reducing Correlation Between Variables AV plane: some dependency exists PC plane: no dependency exists Test in PQ plane and compare with AV plane Details follow in the later presentation Copyright 2008 by CEBT Train regressor Rp ,Rq 9 Dataset Training Data Preprocessing Subjective Feature Test extraction Regressor Training Reg.A 195 popular songs selected from a number of Western, Chinese, and Japanese albums. 1) These songs should be distributed uniformly in each quadrant of the emotion plane. 2) Each music sample should express a certain dominant emotion. Reg.V Copyright 2008 by CEBT 10 Subjective Test Training Data Preprocessing Subjective Feature Test extraction 253 volunteers from the campus Is asked to listen to ten music samples randomly drawn from the music database and to label the AV values from –1.0 to 1.0 in 11 ordinal levels. Label the evoking emotion rather than the perceived one Regressor Training Standard deviation of evaluation to the same song is 0.3( which is okay) Reg.A Reg.V Same person tend to label same with same music. Copyright 2008 by CEBT 11 Feature Extraction Training Data Preprocessing Subjective Feature Test extraction Regressor Training Reg.A Reg.V Copyright 2008 by CEBT 12 Feature Extraction Training Data Preprocessing • Psysound aims to model parameters of Auditory sensation based on some psychoacoustic models. • Earlier research found that 15 of the features are more closely related to emotion perception. Subjective Feature Test extraction Regressor Training Reg.A Reg.V Copyright 2008 by CEBT 13 Feature Extraction Training Data Preprocessing Subjective Feature Test extraction Select features from all extracted features which is related to Emotion. RReliefF is used as a feature extraction algorithm(FSA). RRFm,n is a space with top-m and top-n selected features. Regressor Training Reg.A Reg.V Copyright 2008 by CEBT 14 Regression Algorithms Training Data Preprocessing Subjective Feature Test extraction Regressor Training Reg.A Reg.V Three regression algorithms: 1. Multiple linear regression (MLR) • Assumes lineal relationship • Simple method 2. Support vector regression (SVR) • Nonlinearly maps input features into higher dimensional feature space • In many cases superior to existing machine learning methods 3. AdaBoost.RT (BoostR) • Nonlinear regression algorithm • A number of regression trees are trained iteratively and weighted according to the prediction accuracy Copyright 2008 by CEBT 15 Evaluation Method R2 Statistics : showing how much prediction and real value are close. AV and PC Plane comparison : The effect of variance dependency < < Copyright 2008 by CEBT The best combination No significant difference 16 Evaluation Regressor Comparison Selected feature space A plane with no correlation Copyright 2008 by CEBT 17 Evaluation – The Prediction Accuracy + Ground Truth Prediction Result The best performance of the regression approach reaches 58.3% for arousal and 28.1% for valence by using PC RRF SVR Copyright 2008 by CEBT 18 Performance Evaluation Using same ground truth data and feature data =100.3 =117.7 Copyright 2008 by CEBT 19 Discussion Subjectivity issue Individual difference : influence of many factors. Cultural background, generation, sex, and personality. GWMER(Group-wise MER scheme) Users G1 Regressor G2 G3 G… G4 R1 R2 R3 R… Regressor Choosing R4 Personalization can be an alternative way. Copyright 2008 by CEBT 20 Contribution One of the first attempts that develop an MER system from a continuous perspective.(Each song maps to a point in the emotion plane) A sound theoretical foundation is proposed. Regression theory. Extensive performance study. Several algorithms are tested Dealing with subjectivity issues of Music Emotion Retrieval(MER). Emotion is different from person to person Two demensions in emotion plane are not dependent. Copyright 2008 by CEBT 21 Q&A Thank Thankyou… you…
© Copyright 2026 Paperzz