slides - Computer Graphics and Interactive Media Lab at University

Perceptually Guided
Expressive Facial Animation
Zhigang Deng and Xiaohan Ma
Computer Graphics and Interactive Media Lab
Department of Computer Science
University of Houston
http://graphics.cs.uh.edu
Talk Outline
• Motivation
• Related Work
• Our Approach
– Construction of facial perceptual metric
– Perceptually Guided Facial Animation Algorithms
• Results and User Studies
• Conclusions and Discussion
Motivations
• How to efficiently measure and synthesize realistic
expressive facial animation?
– The ultimate measuring stick is human perception.
– Intrinsic sensitivity to the subtlety of animated faces
• Current popular means is to conduct subjective user
studies (offline, post-production evaluation tool)
– Not automated (tedious human involvements)
– Inefficient (time-consuming experiment setup and user
study)
– Costly (participant cost)
Related Work
• Facial animation techniques
– Geometric deformation [Singh and Fiume 98, Noh and Neumann 01,Sumner and
Popovic 04], physically-based [Lee et al 95, Sifakis et al. 05], performance
driven facial animation [Williams 90], facial expression synthesis and
editing [zhang et al. 03, Joshi et al. 03]
• Data-driven approaches for facial animation [Brand 99, Chuang et al.
02, Vlasic et al. 05,Wampler et al. 07,Bregler et al. 97, Kshirsagar and Thalmann 03, Cao et
al. 04, Deng and Neumann 06]
– Focus on the mathematical accuracy/efficiency of their algorithms
– Little attention has been paid to perceptual aspects to their algorithms
• Automatic analysis of facial expressions in computer vision
community [Pantic and Rothkrantz 00, Tian et al. 01, Valstar and Pantic 06]
– Focus on analysis side of facial expressions
Related Work
• User studies for character animation [Hodgins et al. 98, Sullivan and
Dingliana 01, Sullivan et al. 03, Watson et al. 01, Reitsma and Pollard 03, Wang and
Bodenheimer 04, McDonnell et al. 06, McDonnell et al. 07]
– Measure the association between human perception and factors of
character animation
• Subjective evaluation have been also conducted to gain
human perceptual insight on facial animation [Cunningham et al.
03, Cunningham et al. 04, Wallraven et al. 05, Wallraven et al. 08, Geiger et al. 03,
Cosker et al. 05]
– Most of these efforts are centered at the qualitative side
• Our work
– Aim to quantitatively model (perceptual metric) the association
between facial motion and perceptual outcomes
– Further exploit this perceptual metric to build perceptually guided
expressive facial animation algorithms
Our Work
•
A novel computational facial
perceptual metric (FacePEM)
– Measuring and predict the
expressiveness (type/scale) of
synthetic expressive facial
animations
– Learn statistical perceptual
prediction model (FacePEM) which
can measure and predict the
perceptual outcomes of arbitrary
facial motion
•
FacePEM-guided facial animation
algorithms
– Perceptual metric-guided speech
animation synthesis
– Expressive facial motion editing
enhanced with expressiveness cues
Construction of FacePEM
Data Acquisition
• Optical motion capture
system for facial motion
capture
• 103 facial markers (95
face markers, 4 head
markers, and 4 neck
markers)
• Four human subjects
spoke sentences with four
emotions (happy, angry,
sad, and neutral)
• Remove head motion and
align facial motions of
different subjects
Construction of FacePEM
User Evaluation Study
• Participants were required to identify perceived emotion and
corresponding emotional expressiveness scale (1 to 10) as a
nonforced-choice task after viewing each clip.
• 68 facial motion clips, 30 participants (university students)
• Perceptual Outcome Vector (POV) for each facial motion
clip
Facial Motion Analysis
• Face segmentation
– PCA is a global transformation; no
explicit correspondence between
PCA eigen-vectors and localized
facial movements.
– Physically-motivated segmentation
[Joshi et al. 03]: six regions (forehead,
eye, left cheek, right cheek, mouth,
nose)
• Region-based motion reduction
– Apply PCA for the movements of
each region
– Region-based PCA eigen-vectors
correspond to meaningful, localized
facial movements [Li and Deng 07].
The first/second largest eigen-vectors of
the mouth region
The first/second largest eigen-vectors of
the eye region
Facial Motion Modeling
• Modeling region-based expressive facial motion patterns
– Use the M-order Linear Dynamical Systems (LDS) [Pavlovic et al. 00,
Chai and Hodgins 07] to model region-based facial motion patterns for
any specific emotion
– 18 LDSs are fitted (6 regions*3 emotions), xn= ∑(Ai*xn-i)+vi
• Objective Matchness Vector (OMV)
– Closeness function: Describes the closeness of any given facial
motion sequence at a specific facial region representing a specific
emotion
– A OMV vector encloses 18 components
Pemo,reg (Si) = e-E
E = -ln F(S) = -ln F(x1:T) ≈ C*∑||xt-∑(Ajxt-j)-vj||2
Construction of FacePEM
Learning Perceptual Prediction
• Statistical perceptual prediction model
– Predict its POV, based on the OMV of input facial motion sequence
– Three approaches: Least-square based linear fitting, Radial Basis Functions
(RBFs) network approach, and Support Vector Machines (SVMs)
– SVMs-based perceptual prediction model achieve the minimum error.
• Cross-Validation for test and validation
– 54 facial motion clips as the training dataset
– 14 facial motion clips as the test/validation dataset
Review of FacePEM Framework
Perceptually Guided
Facial Animation Algorithms
• Perceptually guided expressive speech animation synthesis
– The core part of many data-driven speech animation synthesis [Brelger et
al. 97, Kshirsagar and Thalmann 03, Cao et al. 04, Deng and Neumann 06]
– Cost = PhoMtchCost+ConstrCost+SmoCost
– More intelligent and smart “cost function” with perceptual metric
• Expressive facial motion editing enhanced with expressiveness
cues
– Current facial motion editing techniques [Chuang et al. 02, Cao et al. 03, Joshi et
al. 03, Vlasic et al. 05, Li and Deng 07] do not provide feedback or
expressiveness cues
– FacePEM will measure and display its updated emotion type and
expressiveness scale to users in a timely means.
Perceptually Guided Expressive
Speech Animation Synthesis
• Choose an expressive
speech animation synthesis
algorithm [Deng and Neumann 06]
for test/validation
• Predict emotion type and
expressiveness scale of
facial motion sequence
being synthesized, and
Incorporate emotion scale
into synthesis algorithm
– EC(s,Emo)=C*(1-Same(Emo,
EmoLabel(s)))
– EC(s, Emo) = C*(1CalcEmo(s)[Emo])
Facial Motion Editing
Enhanced with Expressiveness Cues
• Choose an expressive facial
motion editing system for
validation [Li and Deng 07]
• Predict emotion type and
expressiveness scale of the
edited facial motion sequence
• Display expression type/scale
to users as feedback
Results & Evaluation - I
• 20 expressive facial animation clips (10 from this
work, 10 from previous work)
• 20 participants rated the visual fidelity (expression)
of these clips
Results & Evaluation - II
• 10 expressive edited facial motion clips (5 from
this work, 5 from previous work)
• 20 participants rated the visual fidelity (expression)
of these clips
Discussion
• It is hard to know how much data would be
enough to train well-behaviored statistical learning
approaches
• We did not consider the effects of eye gaze motion
• Limited emotion types (3 basic emotion) are
studied.
• Idiosyncratic motion signals might exist in current
used facial motion dataset.
Conclusions
• We present a novel computational perceptual
metrics for measuring and predict the expression
(type/scale) of facial animations
– Bridge human perceptual insights with objective facial
motion patterns
• Demonstrated perceptually guided expressive
facial animation algorithms on two cases
– Expressive speech animation synthesis
– Interactive expressive facial motion editing
• Future Work
– Study more data and expression types
– Remove the idiosyncratic components from the data
Questions?