Low-Level Fusion of Audio and Video Feature for Multi-modal Emotion Recognition Chair for Image Understanding and Knowledge-based Systems Institute for Informatics Technische Universität München Sylvia Pietzsch [email protected] Overview Video low-level descriptors Model-based image interpretation Structural features Temporal features Audio low-level descriptors Combining video and audio descriptors Experimental results Conclusion and outlook Technische Universität München Sylvia Pietzsch 2008, January 23rd 2/15 Model-based Image Interpretation The model The model contains a parameter vector that represents the model’s configuration. The objective function Calculates a value that indicates how accurately a parameterized model matches an image. The fitting algorithm Searches for the model parameters that describe the image best, i.e. it minimizes the objective function. Technische Universität München Sylvia Pietzsch 2008, January 23rd 3/15 Local Objective Functions Technische Universität München Sylvia Pietzsch 2008, January 23rd 4/15 Ideal Objective Functions P1: Correctness property: Global minimum corresponds to the best fit. P2: Uni-modality property: The objective function has no local extrema. ¬ P1 P1 ¬P2 P2 Don’t exist for real-world images Only for annotated images: fn( Technische Universität München Sylvia Pietzsch I , x ) = | cn – x | 2008, January 23rd 5/15 Learning the Objective Function Ideal objective function generates training data Machine Learning technique generates calculation rules xx x x xxx x xxx x xxx xxx x x x xxx x x xx x x x x Technische Universität München Sylvia Pietzsch 2008, January 23rd 6/15 Skin Color Extraction original image Location of contour lines and skin colored parts fixed classifier Adaptive to image context conditions adapted classifier Correctly detected pixels: fixed classifier: adapted classifier: Technische Universität München Sylvia Pietzsch 90.4% 97.5% 74.8% 87.5% 2008, January 23rd 40.2% 97.0% 7/15 Structural Features Deformation parameters describe a distinctive state of the face. Technische Universität München Sylvia Pietzsch 2008, January 23rd 8/15 Temporal Features Facial expressions emerge from muscle activity. Optical flow vectors are calculated at equally distributed feature points connected to the shape model. Technische Universität München Sylvia Pietzsch 2008, January 23rd 9/15 Audio Low-level Descriptors Aiming at independence of phonetic content and speaker Coverage of prosodic, articulatory, and voice quality aspects 20ms frames, 50% overlap, Hamming window function Zero crossing rate (ZCR) Pitch 7 formants Energy Spectral development Harmonics-to-Noise-Ratio (HNR) Durations of voiced sounds by HNR Durations of silences by bi-state energy SMA filtering of LLDs Addition of 1st and 2nd order LLD regression coefficients Technische Universität München Sylvia Pietzsch 2008, January 23rd 10/15 Combining Audio and Video LLDs Time series constructed for LLDs (audio, video separately) Application of functionals to combined low-level descriptors Linear moments (mean, std. deviation) Quartiles Durations Resulting feature vector: 276 audio features 1048 video features Technische Universität München Sylvia Pietzsch SVM 2008, January 23rd 11/15 Experimental Results (1) Database: Airplane Behavior Corpus Guided storyline 8 subjects (25 to 48 years old) 11.5 hours of video in total 10-fold stratisfied cross validation Feature pre-selection by SVM-SFFS (sequential forward floating search) Audio Video Audiovisual Features [#] 92 156 200 Accuracy [%] 73.7 61.1 81.8 Technische Universität München Sylvia Pietzsch 2008, January 23rd 12/15 Experimental Results (2) Main confusions: neutral, nervous cheerful, intoxicated Aggressive behavior recognized best Technische Universität München Sylvia Pietzsch 2008, January 23rd 13/15 Conclusion and Outlook Combined feature set superior over individual audio or video feature set Future work: Investigation on further data sets Comparison to late fusion approaches Performance of asynchronous feature fusion Application of hierarchical functionals Technische Universität München Sylvia Pietzsch 2008, January 23rd 14/15 Thank you! Technische Universität München Sylvia Pietzsch 2008, January 23rd 15/15
© Copyright 2026 Paperzz