ARTICLE IN PRESS Int. J. Human-Computer Studies 66 (2008) 233–242 www.elsevier.com/locate/ijhcs The effect of dynamics on identifying basic emotions from synthetic and natural faces Jari Kätsyri, Mikko Sams Laboratory of Computational Engineering, Helsinki University of Technology, P.O. Box 9203, FIN-02015 HUT, Finland Received 29 May 2007; received in revised form 31 August 2007; accepted 1 October 2007 Communicated by G. Lindgaard Available online 7 October 2007 Abstract The identification of basic emotions (anger, disgust, fear, happiness, sadness and surprise) has been studied widely from pictures of facial expressions. Until recently, the role of dynamic information in identifying facial emotions has received little attention. There is evidence that dynamics improves the identification of basic emotions from synthetic (computer-animated) facial expressions [Wehrle, T., Kaiser, S., Schmidt, S., Scherer, K.R., 2000. Studying dynamic models of facial expression of emotion using synthetic animated faces. Journal of Personality and Social Psychology 78 (1), 105–119.]; however, similar result has not been confirmed with natural human faces. We compared the identification of basic emotions from both natural and synthetic dynamic vs. static facial expressions in 54 subjects. We found no significant differences in the identification of static and dynamic expressions from natural faces. In contrast, some synthetic dynamic expressions were identified much more accurately than static ones. This effect was evident only with synthetic facial expressions whose static displays were non-distinctive. Our results show that dynamics does not improve the identification of already distinctive static facial displays. On the other hand, dynamics has an important role for identifying subtle emotional expressions, particularly from computer-animated synthetic characters. r 2007 Elsevier Ltd. All rights reserved. Keywords: Facial animation; Basic emotions; Movement perception 1. Introduction1 Faces provide crucial information in social communication. Static facial features are important for identifying identity, gender and age from faces. Transient changes on face, driven by complex facial musculature, convey both verbal (speech) and non-verbal information. Facial expressions regulate turn-taking between speakers, emphasize Abbreviations: CK, Cohn–Kanade facial expression collection; EF, Ekman–Friesen facial expression collection (‘‘Pictures of Facial Affect’’); FACS, Facial Action Coding System; TAS-20, twenty-item Toronto Alexithymia Scale; TH, facial expression material created with a computer-animated ‘‘talking head’’ developed at Helsinki University of Technology (TKK); TKK, facial expressions recorded at Helsinki University of Technology (TKK). Corresponding author. Tel.: +358 9 451 5326; fax: +358 9 451 4833. E-mail address: [email protected].fi (J. Kätsyri). 1 Preliminary results of this study have been published in Kätsyri et al. (2003). 1071-5819/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.ijhcs.2007.10.001 speech, convey culture-specific signals and, importantly, reflect feelings of the speakers (Pelachaud et al., 1991). A long research tradition suggests that at least six emotions (anger, disgust, fear, happiness, sadness and surprise) are ‘basic’ because they are identified distinctively from their characteristic facial expressions in all human cultures (Ekman et al., 1982). Although this claim has been debated, sometimes heatedly (e.g., Ortony and Turner, 1990; Ekman, 1994; Izard, 1994; Russell, 1994, 1995), several studies have supported the consistent identification of basic facial expressions in different cultures. Studies with a forced-choice task requiring participants to match presented facial expressions with a given list of emotion labels have confirmed that basic expressions are most often matched with intended basic emotions (e.g., Ekman, 1973, 1984; Ekman et al., 1982). Similar results have been found when participants have been asked to rate the intensity of each basic emotion in stimuli (e.g., Ekman et al., 1987) or to produce emotional labels for them freely (e.g., ARTICLE IN PRESS 234 J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 Rosenberg and Ekman, 1995; Haidt and Keltner, 1999). Such studies have also shown that although basic expressions are most often identified as their intended emotions, certain confusions are common. Most notably, fearful faces tend to be confused with surprise, and angry and disgusted faces with each other, whereas happy and surprised faces are seldomly confused with any other emotion. Happy and surprised faces are typically identified the best and anger, disgust and fear the worst. Studies of basic emotions have typically utilized pictures of posed facial expressions such as the Ekman–Friesen collection of facial affects (Ekman and Friesen, 1978). The use of posed instead of authentic emotional facial expressions has raised questions on the ecological validity of such studies (e.g., Russell, 1994). Trivially, authentic everyday expressions are more natural than posed expressions. However, the latter ones do exhibit some advantages over the former. When using posed stimuli, actors can be trained in posing certain theoretically derived facial configurations exactly, producing homogeneous and distinctive emotional displays. On the other hand, authentic emotional expressions are more heterogeneous and their emotional content is typically ambiguous (cf. Ekman, 1973). For example, in a recently published collection of authentic basic expressions (O’Toole et al., 2005), several instances of happy facial displays were evoked with few instances of anger and fear (O’Toole, personal communication). Perhaps a more important issue than the use of posed instead of authentic emotional displays is the predominant use of pictures instead of moving emotional stimuli. Because most studies have been conducted with static facial pictures, the role of dynamic information on perceiving facial emotions has received little attention. Previous studies have shown that dynamics (head movement and facial expression transitions) may have an important role in recognizing identity and age from faces. Identity recognition is to some extent possible from dynamic point-light displays (moving points) extracted from original faces of actors (Bruce and Valentine, 1988). Both identity and sex can be recognized when original facial movements are replicated on a computer-animated head showing none of the original static features (Hill and Johnston, 2001). These studies indicate that movement alone conveys some information about person’s identity and sex. Direct comparisons between static and dynamic displays have shown that observing movement enhances identity identification from faces when their presentation has been degraded by inversion, pixelation, blurring, luminance value thresholding or color negation (Knight and Johnston, 1997; Lander et al., 1999, 2001). There is evidence of the importance of dynamics also in identifying emotions from facial expressions. Studies with dynamic point-light displays have indicated that facial emotions can be identified from pure movement information (Bassili, 1978; Bruce and Valentine, 1988). Some neurological patients impaired in identifying emotions from still images of facial expressions do nevertheless recognize them from video sequences (Adolps et al., 2003) and point-light displays (Humphreys et al., 1993). By using schematic animations as stimuli, Wehrle et al. (2000) found better identification of dynamic rather than static displays of emotions. However, it is not clear whether this result was specific to synthetic facial stimuli because their results were not compared to dynamic vs. static natural facial expressions. Importantly, the synthetic static stimuli were identified worse than their static natural counterparts, suggesting that their result should not be generalized to natural faces. Studies using natural facial expressions have provided inconsistent results. Harwood et al. (1999) reported better identification of dynamic over static facial displays of emotions; however, such effect was observed only for anger and sadness. Kamachi et al. (2001) used image morphing to generate a video of a face changing from neutral to emotional, and found no difference between such dynamic and original static displays. Ehrlich et al. (2000) have reported better identification of basic emotions from dynamic rather than static facial expressions. However, because their results were pooled over good-quality facial expressions and their degraded versions, it is possible that the better identification of dynamic expressions was specific to the degraded stimuli. Recently, Ambadar et al. (2005) demonstrated that dynamics improves the identification of subtle, low-intensity, facial emotion expressions. However, full intensity facial expressions were not used as control stimuli. As a conclusion, dynamics appears to improve the identification of facial emotions from synthetic faces whose static presentations are not identified optimally. It is not established whether this effect applies also to full-intensity and non-degraded natural facial expressions. The aim of the present study was to compare the identification of basic expressions from static and dynamic natural and synthetic faces in the same experiment. Natural stimuli consisted of posed, clearly distinguishable facial expressions of basic emotions. Synthetic stimuli were created with a three-dimensional head animation model (Frydrych et al., 2003). The used model did not capture all realistic facial details (cf. Section 2.2.1). Our hypothesis was that dynamics has no effect on the identification of already well-recognizable natural facial emotions but that it does improve the identification of those synthetic facial animations that are identified poorly from static displays. 2. Methods 2.1. Participants Participants were 54 university students (36 males, 18 females; 20–29 years old) from Helsinki University of Technology (TKK) who participated in the experiment as a part of their studies. All participants were native speakers of Finnish and had either normal or corrected vision. The level of subjects’ alexithymic personality trait (Taylor et al., 1991) was evaluated with a Toronto Alexithymia Scale ARTICLE IN PRESS J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 235 Fig. 1. Pictures of synthetic facial expressions at their apexes shown both without superimposed facial texture (upper images) and with it (lower images). The intended emotional displays are (from left to right): neutral, anger, disgust, fear, happiness, sadness and surprise. Dynamic facial animations contained transitions from a neutral face to the emotional apexes. (TAS-20) self-report questionnaire (Bagby et al., 1994). Alexithymia is defined as involving difficulties in expressing and experiencing emotions and an external orientation on own subjective emotional experiences. Alexithymia is also known to be associated with a reduced ability in identifying facial emotions (Parker et al., 1993; Mann et al., 1994; Lane et al., 1996). The TAS-20 scores of our subjects did not differ statistically significantly from the reference values of Finnish population (Salminen et al., 1999). 2.2. Stimuli Synthetic stimuli were created with a three-dimensional animated ‘‘talking head’’ (TH) (Frydrych et al., 2003) developed in our laboratory. Dynamic natural stimuli were collected from an existing Cohn–Kanade (CK) collection (Kanade et al., 2000) of posed basic expressions. Additional dynamic stimuli were recorded specifically for this study (TKK stimuli). Because the CK and TKK stimuli had not been evaluated by naı̈ve observers before, stimuli from a widely used Ekman–Friesen (EF) collection (Ekman and Friesen, 1978) served as control stimuli. In total, the stimuli were six full basic facial expression sets (neutral, anger, disgust, fear, happiness, sadness and surprise)2 of static and dynamic natural (CK and TKK) and synthetic (TH) faces, and two control sets (EF) of static faces. Dynamic sets consisted of short (mean7S.D.: 0.870.2 s) video sequences (25 frames per second) showing transitions from neutral to emotional faces. Static sets consisted of original picture material or, if the originals were video sequences, last images from the video 2 However, neutral facial expressions were included only in the static sets because dynamic versions of such stimuli were not available. The results for static neutral faces are available in Kätsyri et al. (2003). Table 1 FACS stereotypes for basic emotions Expression FACS prototype Anger Disgust Fear Happiness Sadness Surprise AU4+5+7+15+24 AU9+10+17 AU1+2+4+5+7+20+25 AU6+12+25 AU1+4+7+15+17 AU1+2+5+25+26 sequences. Stimulus size was either 22 cm 17 cm (CK and TKK sets) or 14 cm 20 cm (EF and TH sets). 2.2.1. Synthetic stimuli Two synthetic sets of static and dynamic facial expressions were created with the TH animation software (Fig. 1). Stereotypical facial expressions for six basic emotions (Table 1) were defined in Facial Action Coding System (FACS) taxonomy (Ekman et al., 2002a) on the basis of existing literature (Ekman and Friesen, 1975; Ekman et al., 2002b) and were implemented manually on the animation model. The intensity of displays, i.e. the extent of changes between emotional and neutral facial expressions, was restricted by the current technical implementation based on rotational deformators allowing only relatively limited facial dynamics (cf. Frydrych et al., 2003). Dynamic facial expressions were extrapolated as a continuum between neutral and emotional faces. A certified FACS coder ensured that dynamic facial changes contained all of the intended FACS components. Two stimulus sets were created by using plain facial texture and by superimposing frontal and side photographs of a real person on the general mesh of TH by using an algorithm developed by Kalliomäki and Lampinen (2002). Superimposing a more ARTICLE IN PRESS 236 J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 realistic facial texture produced purely aesthetical changes without any changes in the underlying facial expression model. 2.2.2. Natural stimuli Two static/dynamic sets were selected from the CK collection (items 11-001, 11-004, 11-005, 14-002, 65-002 and 65-004; 14-004, 65-003, 66-001, 66-003, 71-002 and 71-004). An original FACS coding of the CK material was used to select such stimuli that resembled the TH basic expression stereotypes (Table 1) as closely as possible. Since it was not possible to find a suitable full set of all basic emotions from any single actor on this basis, such facial expressions were selected from various actors whose FACS codes were similar to the intended configurations. Consequently, the two sets contained stimuli from a total of five different actors. Two static/dynamic sets were recorded in TKK from two certified FACS coders (initials JK and VK) (Ekman et al., 2002a) trained in controlling facial muscles associated with FACS Action Units. The TH basic expression stereotypes (Table 1) were used as basis for posing the facial expression configurations. 2.2.3. Control stimuli Two static sets were selected from the EF collection, both sets containing pictures from one actor (items 1-04, 1-05, 1-14, 1-23, 1-30, 2-11 and 2-18 from actor MO; 2-05, 2-12, 2-16, 3-01, 3-11, 3-16 and 5-06 from actor WF). These items were selected on the basis of good recognition accuracy (88–100%) in the original EF evaluation study (Ekman and Friesen, 1978). 2.3. Procedure The subjects were distributed randomly into two groups (n ¼ 27), which were shown either static or dynamic stimuli sets. A between-subject design was used to avoid learning effects. As an exception, the control stimuli sets (EF) served as control stimuli for both groups. The static group evaluated also neutral faces in addition to the emotional expressions. ‘‘Static group’’ saw eight static sets of seven facial expressions (in total 56 stimuli) whereas ‘‘dynamic group’’ saw six dynamic sets and two static sets of six facial expressions (48 stimuli). The latter group evaluated six instead of seven different facial expressions because no dynamic versions of neutral stimuli were available. The TAS-20 scores did not differ significantly between the two groups. Stimuli were presented with Presentation software (Version 0.53, www.neurobs.com). The order of stimuli was randomized. The subjects had to evaluate how well each of the six emotions (anger, disgust, fear, happiness, sadness and surprise) described the presented facial expressions. Evaluations were made on a scale from 1 to 7 (1 ‘‘totally disagree’’, 4 ‘‘uncertain’’, 7 ‘‘totally agree’’) and given on a keyboard. In addition, subjects were asked to rate the naturalness of the stimuli using a similar scale (seven evaluations for each stimulus). Requests to evaluate were presented below the stimulus, one at a time in a random order. Each stimulus was shown until all seven (six basic emotion and naturalness) evaluations were made. There was no time limit for responses. Before the actual experiment, the subjects participated in a training session consisting of two randomly selected stimuli. 2.4. Analysis 2.4.1. Dependent variables 2.4.1.1. Identification accuracy. To analyze the identification accuracies with one dependent variable, we converted the six original emotion ratings to single identification scores. A similar approach has been utilized in earlier rating studies (e.g., Kamachi et al., 2001; Parker et al., 2005). The scoring method was devised to measure how distinctly the presented basic expressions were evaluated as depicting their intended basic emotions. When the intended basic emotion was rated higher than all other emotions, its identification was considered optimal and a maximum score of 1 was given. Each confusion, i.e. an unintended basic emotion receiving at least as high rating as the intended emotion, reduced the score by a constant of 0.4. For example, when there were two or three confusions, scores 0.6 (1–10.4), 0.2 (1–20.4), 0 (12.50.4) and 1 (150.4) were given. Ratings given at random would have produced an average of 2.5 confusions per evaluation and, respectively, a mean identification score of 0 (1–2.50.4). When the intended emotion was confused with all other emotions, i.e. it was rated no higher than any of the five remaining emotions, a minimum score of 1 was given (1–50.4). 2.4.1.2. Naturalness ratings. The original naturalness rating scale was re-expressed with standard transformation methods (e.g., Behrens, 1997) to meet normality assumptions required by parametric statistical tests. Because of the infrequent use of rating 4 (the ‘‘uncertain’’ answer), results for the original ratings 3 and 4 were pooled together. The obtained six-step scale was power transformed (with exponent 1.5) to reduce large negative skewness evident in the evaluation of several facial expressions. The re-expressed naturalness scale ranged from 1 to 14.7 (i.e., from 11.5 to 61.5). 2.4.1.3. Response latencies. For analyzing response latencies of evaluated stimuli, mean response latencies were calculated over their individual ratings. 2.4.2. Statistical analyses Significance level was set to a ¼ 0.01 for all analyses. All post-hoc comparisons were conducted with Newman– Keuls post-hoc tests. ARTICLE IN PRESS J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 3. Results 3.1. Effect of dynamics A mixed-design ANOVA with factors Display (static, dynamic), Type (natural (CK and TKK), synthetic (TH)), and Expression (six basic expressions) was used with naturalness ratings and identification scores to evaluate the significance of the first factor and its interactions with the other factors. Control stimuli (EF) were excluded from this analysis as they were always static. With naturalness ratings, only the interaction between Display and Expression reached significance (F(5,260) ¼ 3.12, p ¼ 0.01). Analysis of simple effects indicated that dynamic happy expressions were evaluated less natural than static ones (mean7S.E.M.: 6.670.4 vs. 8.470.3; F(1,52) ¼ 9.44, p ¼ 0.004). Further analysis with additional factor Actor (all CK, TKK and TH stimulus sets) indicated no significant differences between individual actors in the naturalness ratings of dynamic and static happy facial expressions (F(5,260)o1). With identification scores, the main effect of Display (F(1,52) ¼ 12.97, po0.001) and the interaction Display Type (F(1,52) ¼ 31.52, po0.001) reached significance. Mean identification scores for static and dynamic synthetic and natural faces are depicted in Fig. 2. As clearly indicated by the figure, dynamics increased the identification of synthetic (F(1,52) ¼ 24.90, po0.001) but not natural stimuli (F(1,52)o1). Because response times were not controlled, the observed dynamics effect could have reflected a more careful evaluation strategy allowed by longer evaluation times. For evaluating this possibility, response latencies were analyzed with a mixed-design ANOVA with factors Display (static, dynamic) and Type (natural (CK, TKK), synthetic (TH)). A significant Display main effect confirmed that response latencies were significantly longer for dynamic rather than static stimuli (mean7S.E.M.: 1.0 * 237 4.470.3 s vs. 3.570.2 s; F(1,52) ¼ 8.39, po0.01). However, response latencies were significantly longer for dynamic vs. static stimuli both with synthetic (4.670.3 s vs. 3.770.2 s; F(1,52) ¼ 8.90, po0.005) and real faces (4.270.3 s vs. 3.370.2 s; F(1,52) ¼ 9.05, po0.005), and a non-significant Display Source interaction suggested that these differences were roughly equal. If the better identification of dynamic over static synthetic stimuli were due to longer response latencies, dynamic basic expressions should have been identified better also from real faces. Because this was not the case, it is unlikely that the dynamics effect of synthetic stimuli was related to response latencies. Plausibly, dynamic stimuli were evaluated longer than static ones because participants preferred to watch a full replication of dynamic video sequence before giving their answer. The ANOVA for identification scores received significant Display Expression (F(5,260) ¼ 9.65, po0.001) and Display Type Expression (F(5,260) ¼ 10.16, po0.001) interactions. Furthermore, the interaction between Display and Expression was significant with synthetic (F(5,260) ¼ 11.58, po0.001) but not with natural faces (F(5,260) ¼ 1.21, p ¼ n.s.), indicating that dynamics effect differences between basic expressions occurred only with synthetic faces. Identification scores for static and dynamic displays of different synthetic expressions are shown in Fig. 3. Tests of simple effects showed significantly better identification of dynamic rather than static expressions of anger (F(1,52) ¼ 18.60, po0.001) and disgust (F(1,52) ¼ 32.35, po0.001). 3.2. Evaluation of synthetic faces Synthetic facial expressions were identified worse than natural ones both from static (F(1,52) ¼ 153.65, po0.001) and dynamic displays (F(1,52) ¼ 19.89, po0.001) (cf. Fig. 2). The significance of identification score differences between synthetic and natural basic expressions were studied further with contrast tests, static dynamic static dynamic * 0.8 * 0.8 Identification score Identification score 1.0 0.6 0.4 0.6 0.4 0.2 0.0 anger fear happin. sadness surprise -0.2 0.2 synthetic natural -0.4 Fig. 2. Mean identification (7S.E.M.) of static and dynamic emotional facial expressions from synthetic (TH stimuli) and natural faces (CK, TKK). Asterisk (*) indicates a significant difference between static and dynamic stimuli. disgust Fig. 3. Mean identification (7S.E.M.) of emotions from individual static and dynamic synthetic expressions. Asterisk (*) indicates a significant difference between static and dynamic displays. ARTICLE IN PRESS J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 238 Table 2 Mean identification scores (and S.E.M.) for static and dynamic displays of synthetic vs. natural basic expressions Anger Disgust Static Synthetic Natural F(1,52) p 0.10 (0.10) 0.68 (0.05) 27.23 0.000 0.19 (0.13) 0.85 (0.03) 92.08 0.000 0.47 (0.09) 0.60 (0.04) 1.72 0.20 0.60 (0.09) 0.98 (0.01) 23.71 0.000 0.66 (0.07) 0.89 (0.04) 9.88 0.003 0.84 (0.05) 0.96 (0.02) 7.56 0.008 Dynamic Synthetic Natural F(1,52) p 0.69 (0.09) 0.62 (0.06) 0.35 0.56 0.69 (0.08) 0.91 (0.02) 4.11 0.05y 0.32 (0.11) 0.64 (0.06) 10.26 0.002 0.84 (0.05) 0.90 (0.04) 0.66 0.42 0.56 (0.08) 0.90 (0.03) 21.51 0.000 0.90 (0.02) 0.91 (0.02) 0.13 0.72 y Fear Happiness Sadness Surprise Significant with a ¼ 0.05. Significant with a ¼ 0.01. conducted separately for static and dynamic displays (Table 2). The results showed that all static synthetic facial expressions except fear were identified significantly worse than their natural counterparts. With dynamic displays, fear and sadness were identified significantly and disgust close to significantly worse from synthetic rather than natural faces. The identification score of each static and dynamic synthetic expression was also compared with chance performance (zero score). With the exception of static anger (t(26)o1) and disgust (t(26)o0), all synthetic expressions were identified more accurately than chance (t(26)42.8, po0.005 (one-tailed)). The significance of possible identification differences caused by the superimposed facial texture was studied with a mixed-design ANOVA with factors Texture (non/textured), Display (static, dynamic) and Expression (six basic expressions). The main effect of Texture and its interactions failed to reach significance (Fo2.7, p ¼ n.s.). Replicating the earlier results, significant main effects of Display (F(1,52) ¼ 24.90, po0.001) and Expression (F(5,260) ¼ 15.31, po0.001) and a significant interaction between them (F(5,260) ¼ 11.58, po0.001) were observed. Post-hoc analysis showed that, in general, angry (mean7S.E.M.: 0.3970.08), disgusted (0.2570.10) and fearful facial animations (0.3970.07) were identified worse than the remaining animations, and sad animations (0.6170.05) worse than surprised animations (0.8770.02). The remaining differences were not significant. 3.3. Evaluation of natural faces To study whether systematic differences between CK and TKK stimuli and EF control stimuli could have affected obtained results, a mixed-design ANOVA with factors Display (static, dynamic), Source (CK, TKK, EF) and Expression (six basic expressions) was conducted with identification scores. The main effect of Expression (F(5,265) ¼ 38.69, po0.001) and the interaction Source Expression (F(10,530) ¼ 5.12, po0.001) were significant. Post-hoc analysis showed that in general, fearful faces (mean7S.E.M.: 0.6070.03) were identified worse than all other expressions, angry faces (0.7170.03) worse than the remaining expressions and sad faces (0.8370.02) worse than happy (0.9570.01) and surprised expressions (0.9470.01). The remaining differences were nonsignificant. Because the identification score differences between basic expressions were not significant between CK and TKK sources (F(5,260) ¼ 2.01, p ¼ n.s.), their results were pooled together and compared with those of EF stimuli. Analysis of simple effects showed that angry faces were identified significantly worse (0.6570.04 vs. 0.8370.03; F(1,52) ¼ 9.62, po0.004), and sad faces significantly better (0.8970.03 vs. 0.7270.05; F(1,52) ¼ 11.70, po0.002) from CK and TKK in comparison to EF faces. For evaluating the former result further, a new repeated-measures ANOVA with factor Actor (all six human actors) was conducted specifically with the identification scores of angry faces. The main result was significant (F(5,265) ¼ 10.41, po0.001), confirming that the identification of anger varied between actors. Post-hoc analysis was conducted in order to find those actors whose identification scores fell significantly below either one of the EF actors. The only such result was the worse identification of anger from TKK actor VK (0.3570.10) in comparison to EF actors WF (0.7970.06) and MO (0.8770.05). When the result for this expression was removed, the difference in identification of anger was no longer significant between CK and TKK (0.7570.03) in comparison to EF stimuli (F(1,52) ¼ 1.83, p ¼ n.s.). Removing this facial expression from other earlier analyses did not change their overall results. 4. Discussion We studied the effect of dynamics on identifying six basic emotions from natural (human actors’) and synthetic (computer-animated) facial expressions. Our results showed no significant differences in the identification of static and dynamic expressions from natural faces. In contrast, dynamics increased the identification of synthetic ARTICLE IN PRESS J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 facial expressions, particularly those of anger and disgust. Although our static synthetic stimuli were identified generally worse than their natural counterparts, the static angry and disgusted expressions were the only expressions whose identification failed to exceed chance level. Apparently, the identification of basic emotions is already at ceiling from typical facial expressions posed by human actors (e.g., Ekman and Friesen, 1978), making dynamic information redundant. However, dynamics is of importance with non-distinctive static emotional facial displays, as was the case with our synthetic facial expressions of anger and disgust. These findings are congruent with previous studies showing that dynamics enhances the identification of facial identity only from degraded facial images (Knight and Johnston, 1997; Lander et al., 1999, 2001). Combining these results, it can be concluded that (i) dynamics does not enhance the processing of facial information under optimal viewing conditions, (ii) the role of dynamics becomes more important under compromised conditions, i.e. when the static features have been degraded and (iii) these conclusions apply both to the identification of identity and emotional expressions from faces. The previous studies have shown that dynamics improves the identification of emotions from synthetic facial expressions (Wehrle et al., 2000) and from subtle (Ambadar et al., 2005) or degraded (Ehrlich et al., 2000) natural facial expressions. Ehrlich et al. (2000) studied both degraded and non-degraded natural stimuli; however, they did not report dynamics results separately for these two kinds of stimuli. Studies using non-degraded and fullintensity natural facial expressions as stimuli (Harwood et al., 1999; Kamachi et al., 2001) have provided inconsistent results. The present study builds upon these studies by comparing synthetic facial expressions and non-degraded natural facial expressions in the same experiment, and by showing that the effect of dynamics is restricted to the former. The similar identification of static and dynamic natural facial expressions in our study is consistent with the earlier study by Kamachi et al. (2001) but inconsistent with the study by Harwood et al. (1999) where dynamics was found to improve at least the identification of anger and sadness. Because the static facial expressions used by Harwood et al. were identified as well as those selected from a previously evaluated collection (Mazurski and Bond, 1993), it is unlikely that the dynamics effect reported in their study was due to poor identification of static stimuli. The dynamic video sequences used in the present study were shorter (mean below 1 s) than those used by Harwood et al. (10 s). However, because video sequences were played in a loop until subjects gave their answers and were carefully evaluated (mean evaluation time was over 4 s for each basic emotion evaluation of a stimulus), it is highly unlikely that stimulus presentation time would have affected the results. On the other hand, the results could have been influenced by different kinds of movement information. Whereas in the present study video sequences showed a transition from 239 a neutral face to an emotional apex with little or no rigid whole-head movements, the exact nature of movement in the study by Harwood et al. is uncertain. For example, extensive whole-head movements, obviously visible in dynamic but not static displays, could have enhanced the identification of dynamic emotional facial expressions disproportionally. Similar identification results should be obtained for natural and synthetic stimuli if the latter ones are realistic enough. Comparison to standard facial expression stimuli (Ekman and Friesen, 1978) confirmed that the natural facial expressions used in the present study were excellent exemplars of the six basic emotions. In contrast, all static synthetic facial expressions except fear were identified significantly worse than natural ones. Identification of static anger and disgust, whose identification scores failed to exceed chance level, was especially poor. The poor identification of anger and disgust reflects the well-known fact that these two expressions are easily confused with each other even with natural faces (Ekman, 1973, 1984; Ekman et al., 1982, 1987; Rosenberg and Ekman, 1995; Haidt and Keltner, 1999). Even if dynamics facilitated the identification of synthetic facial expressions, at least fearful and sad synthetic facial expressions were identified worse than natural ones. Two plausible explanations could be given for the worse identification of synthetic facial expressions. Firstly, as is the case with most animation models (e.g., Massaro, 1998; Wehrle et al., 2000); however, see also (Waters, 1987; Terzopoulos and Waters, 1993), the used animated face did not include realistic modeling of facial features such as skin wrinkling and bulging that are important for natural facial expressions (Ekman et al., 2002a). The lack of such static features could explain the poor identification of some of the used synthetic stimuli. This explanation is, however, made unlikely by evidence showing that the identification of facial expressions generated with non-detailed TH (Bartneck, 2001) or even with schematic line-drawings (Katsikitis, 1997) may reach or exceed that of real faces. More likely, the poor identification of the used synthetic facial expressions was due to their relatively low intensities (i.e., small changes between emotional and neutral faces) caused by technical limitations. Studies using both naturalistic (Hess et al., 1997) and schematic (Wehrle et al., 2000; Bartneck and Reichenbach, 2005) faces have shown that the identification accuracy of facial expressions declines as their intensity is reduced. The fact that our natural and synthetic stimuli contained roughly similar FACS action unit changes confirms that their facial configurations resembled each other. However, the worse identification of static synthetic facial expressions suggests that their intensity levels were below those of natural stimuli. Apparently, synthetic angry and disgusted facial expressions in particular contained static cues that were too subtle to allow for correct identification. Interestingly, the effect of dynamics was pronounced with these two expressions. The fact that dynamics would improve the ARTICLE IN PRESS 240 J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 identification of low-intensity facial expressions is consistent with the recent study by Ambadar et al. (2005) using extremely low-intensity facial expressions as stimuli. In both our study and the study by Wehrle et al. (2000), synthetic facial expressions were identified worse than natural ones from static presentations. Both studies also found better identification of emotions from dynamic rather than static synthetic faces. Although the facial animation models used in our and Wehrle et al.’ studies have some obvious visual differences (e.g., the former is based on animating the full three-dimensional facial surface (Frydrych et al., 2003) whereas the latter is apparently based on animating only some facial contours (Wehrle et al., 2000)), they also share some important similarities. In both synthetic models, facial expressions were created manually on the basis of theoretically derived FACS configurations. The synthetic faces of Wehrle et al. appear to have been of rather low intensity in comparison to their natural counterparts (e.g., see Figs. 2 and 3 in Wehrle et al., 2000), as was the case in our study. Therefore, it is plausible that the better identification of dynamic rather than static dynamic displays in Wehrle et al.’ study was mostly due to the low intensity of static synthetic faces. Although our main interest was in the identification accuracy of emotions, we also asked our participants to evaluate the naturalness of presented stimuli. A statistically significant dynamics effect was found in that dynamic happy facial expressions were considered less natural than their static versions. This effect was similar across the individual happiness expressions, including the synthetic ones. We are not aware of a similar result from previous work. In the present study, dynamic stimuli contained fast transitions from neutral to posed emotional configurations without any additional movements. Although such movement sequences are optimal in representing their intended emotions as distinctively as possible, they may also differ from spontaneous emotional expressions. It is possible that evaluators are hypersensitive to such differences with happy facial expressions because happy expressions are seen more often in everyday life than the other studied expressions. Arguably, the expression of happiness is suppressed less by cultural display rules than negative emotions (cf. Ekman, 1973). The present study concentrated on studying the effect of dynamics on identifying emotional facial expressions. Obviously, motion perception is a general phenomenon restricted not only to faces or emotional facial expressions. For example, it is known that not only facial identity or facial emotions (Bassili, 1978; Bruce and Valentine, 1988) but also walking persons and even inanimate objects can be identified from pure movement information contained in moving point-lights (see Roark et al., 2003 for a review). Plausibly, the enhanced identification of dynamic emotional facial expressions observed in this study is at least partly due to the general capability of primate visual system for processing rapid changes in the environment in addition to its constant features. However, other factors specific to faces or emotional facial expressions could also be involved. Different hypotheses for the facilitating effect of motion in recognizing facial emotions have been studied recently by Ambadar et al. (2005). They managed to exclude explanations related to the larger amount of information contained in video sequences vs. single pictures and to the facilitation of configural processing, i.e. enhanced processing of relations between individual facial features (cf. Maurer et al., 2002). They also considered the existence of certain intermediate emotion-characteristic movements in dynamic video sequences. This explanation was discarded because presenting only the first and last frames from full video sequences produced as large dynamics effect as the original video sequences. The authors concluded that the facilitating effect of dynamics was due to enhanced change perception between emotional and neutral faces. This is a viable hypothesis; however, as their study used facial expressions that were too brief to contain intermediate emotion-specific movements (4–7 frames selected from the beginning of full-intensity emotional facial expressions), their conclusion is not fully supported. The present study confirmed that dynamics does not improve the identification of basic emotions from already distinctive static facial displays, such as typical emotional expressions posed by human actors. In contrast, the results from a synthetic face suggested that dynamics does improve the identification of facial expressions involving subtle changes on the face. The results add to earlier knowledge on the perception of static facial expressions of basic emotions. On the other hand, the results should be interesting also in relation to human–computer interaction. In real communication between humans, subtle expressions probably have a more important role than high-intensity emotional facial expressions. For example, autistic individuals having severe social interaction deficits appear to identify basic expressions without difficulties but to have severe difficulties in understanding the emotional meaning of more subtle changes on the face (Baron-Cohen et al., 1997, 2001). Our results highlight the importance of subtle facial changes also on interaction between humans and computeranimated characters. A study by Bartneck and Reichenbach (2005) has shown that humans are able to recognize relatively subtle emotional facial expressions from the faces of animated characters. The evaluation of dynamic instead of static faces is closer to real interaction between agents (be they human or computer-generated) whose faces are constantly changing. The drastic difference between the identification of dynamic and static displays of some synthetic facial expressions shows that in actual interaction, some subtle facial expressions can be understood even if they were not recognizable from static pictures. Acknowledgments We thank Robotics Institute of Carnegie Mellon University for access to the Cohn–Kanade facial expression ARTICLE IN PRESS J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 database. We thank Michael Frydrych, Martin Dobšik and Vasily Klucharev for insightful discussions and for their concrete contributions to the present study. This study was supported partly by the Academy of Finland (Grant 213938 to M.S.; Finnish Centre of Excellence Programs 2000–2005 and 2006–2011, Grant nos. 202871 and 213470). References Adolps, R., Tranel, D., Damasio, A.R., 2003. Dissociable neural systems for recognizing emotions. Brain and Cognition 52, 61–69. Ambadar, Z., Schooler, J.W., Cohn, J.F., 2005. Deciphering the enigmatic face: the importance of facial dynamics in interpreting subtle facial expressions. Psychological Science 16 (5), 403–410. Bagby, R.M., Parker, J.D.A., Taylor, G.J., 1994. The twenty-item Toronto Alexithymia Scale. Journal of Psychosomatic Research 38, 23–40. Baron-Cohen, S., Jolliffe, T., Mortimore, C., Robertson, M., 1997. Another advanced test of theory of mind: evidence from very high functioning adults with autism or Asperger Syndrome. Journal of Child Psychology and Psychiatry 38, 813–822. Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., Plumb, I., 2001. The ‘‘Reading the mind in the eyes’’ test revised version: a study with normal adults, and adults with Asperger Syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry 42, 241–252. Bartneck, C., 2001. How convincing is Mr. Data’s smile: affective expressions of machines. User Modeling and User-Adapted Interaction 11 (4), 279–295. Bartneck, C., Reichenbach, J., 2005. Subtle emotional expressions of synthetic characters. Journal of Human–Computer Studies 62, 179–192. Bassili, J.N., 1978. Facial motion in the perception of faces and of emotional expression. Journal of Experimental Psychology: Human Perception and Performance 4 (3), 373–379. Behrens, J.T., 1997. Principles and procedures of exploratory data analysis. Psychological Methods 2 (2), 131–160. Bruce, V., Valentine, T., 1988. When a nod’s as good as a wink: the role of dynamic information in face recognition. In: Gruneberg, M.M., Morris, P.E., Sykes, R.N. (Eds.), Practical Aspects of Memory: Current Research and Issues, vol. 1. Wiley, Chichester. Ehrlich, S.M., Schiano, D.J., Sheridan, K., 2000. Communicating facial affect: it’s not the realism, it’s the motion. Paper Presented at the ACM CHI 2000 Conference on Human Factors in Computing Systems. Ekman, P., 1973. Cross-cultural studies of facial expression. In: Ekman, P. (Ed.), Darwin and Facial Expression. Academic Press Inc., UK, pp. 169–222. Ekman, P., 1984. Expression and the nature of emotion. In: Scherer, K., Ekman, P. (Eds.), Approaches to Emotion. Lawrence Erlbaum, Hillsdale, NJ. Ekman, P., 1994. Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique. Psychological Bulletin 115 (2), 268–287. Ekman, P., Friesen, W., 1975. Unmasking the Face. A Guide to Recognizing Emotions from Facial Expressions. Consulting Psychologists Press, Palo Alto, CA. Ekman, P., Friesen, W., 1978. Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto, CA. Ekman, P., Friesen, W.V., Ellsworth, P., 1982. What emotion categories or dimensions can observes judge from facial behavior? In: Ekman, P. (Ed.), Emotions in the Human Face. Cambridge University Press, UK, pp. 39–55. Ekman, P., Friesen, W.V., O’Sullivan, M., Chan, A., DiacoyanniTarlatzis, I., Heider, K., et al., 1987. Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53 (4), 712–717. 241 Ekman, P., Friesen, W., Hager, J., 2002a. Facial Action Coding System, second ed. Research Nexus eBook, Salt Lake City. Ekman, P., Friesen, W., Hager, J., 2002b. Facial Action Coding System: Investigator’s Guide, second ed. Research Nexus eBook, Salt Lake City. Frydrych, M., Kätsyri, J., Dobsı́k, M., Sams, M., 2003. Toolkit for animation of Finnish talking head. Paper Presented at the AVSP, St. Jorioz, France. Haidt, J., Keltner, D., 1999. Culture and facial expression: open-ended methods find more expressions and a gradient of recognition. Cognition and Emotion 13 (3), 225–266. Harwood, N.K., Hall, L.J., Shinkfield, A.J., 1999. Recognition of facial emotional expressions from moving and static displays by individuals with mental retardation. American Journal on Mental Retardation 104 (3), 270–278. Hess, U., Blairy, S., Kleck, R.E., 1997. The intensity of emotional facial expressions and decoding accuracy. Journal of Nonverbal Behavior 21 (4), 241–257. Hill, H., Johnston, A., 2001. Categorizing sex and identity from the biological motion of faces. Current Biology 11 (11), 880–885. Humphreys, G.W., Donnelly, N., Riddoch, M.J., 1993. Expression is computed separately from facial identity, and it is computed separately for moving and static faces: neuropsychological evidence. Neuropsychologia 31 (2), 173–181. Izard, C.E., 1994. Innate and universal facial expressions: evidence from developmental and cross-cultural research. Psychological Bulletin 115 (1), 102–141. Kalliomäki, I., Lampinen, J., 2002. Feature-based inference of human head shapes. Paper Presented at the Finnish Conference on Artificial Intelligence (STeP), Oulu, Finland. Kamachi, M., Bruce, V., Mukaida, S., Gyoba, J., Yoshikawa, S., Akamatsu, S., 2001. Dynamic properties influence the perception of facial expressions. Perception 30, 875–887. Kanade, T., Cohn, J.F., Tian, Y., 2000. Comprehensive database for facial expression analysis. Paper Presented at the 4th IEEE International Conference on Automatic Face and Gesture Recognition. Katsikitis, M., 1997. The classification of facial expressions of emotion: a multidimensional-scaling approach. Perception 26 (5), 613–626. Kätsyri, J., Klucharev, V., Frydrych, M., Sams, M., 2003. Identification of synthetic and natural emotional facial expressions. Paper Presented at the ISCA Tutorial and Research Workshop on Audio Visual Speech Processing (AVSP), St. Jorioz, France. Knight, B., Johnston, A., 1997. The role of movement in face recognition. Visual Cognition 4 (3), 265–273. Lander, K., Christie, F., Bruce, V., 1999. The role of movement in the recognition of famous faces. Memory and Cognition 27 (6), 974–985. Lander, K., Bruce, V., Hill, H., 2001. Evaluating the effectiveness of pixelation and blurring on masking the identity of familiar faces. Applied Cognitive Psychology 15, 101–116. Lane, R.D., Sechrest, L., Reidel, R., Weldon, V., Kaszniak, A., Schwartz, G.E., 1996. Impaired verbal and nonverbal emotion recognition in Alexithymia. Psychosomatic Medicine 58, 203–210. Mann, L., Wise, T., Trinidad, A., Kohanski, R., 1994. Alexithymia, affect recognition, and the five-factor model of personality in normal subjects. Psychological Reports 74 (2), 563–567. Massaro, D.W., 1998. Perceiving Talking Faces. The MIT Press, Cambridge, MA. Maurer, D., Le Grand, R., Mondloch, C.J., 2002. The many faces of configural processing. Trends in Cognitive Science 6 (6), 255–260. Mazurski, E.J., Bond, N.W., 1993. A new series of slides depicting facial expressions of affect: a comparison with the pictures of facial affect series. Australian Journal of Psychology 45, 41–47. Ortony, A., Turner, T.J., 1990. What’s basic about basic emotions? Psychological Review 97 (3), 315–331. O’Toole, A.J., Harms, J., Snow, S.L., Hurst, D.R., Pappas, M.R., Ayyad, J.H., et al., 2005. A video database of moving faces and people. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (5), 812–816. ARTICLE IN PRESS 242 J. Kätsyri, M. Sams / Int. J. Human-Computer Studies 66 (2008) 233–242 Parker, J.D.A., Taylor, G.J., Bagby, R.M., 1993. Alexithymia and the recognition of facial expression of emotion. Psychotherapy and Psychosomatics 59, 197–202. Parker, P.D., Prkachin, K.M., Prkachin, G.C., 2005. Processing of facial expressions of negative emotion in alexithymia: the influence of temporal constraint. Journal of Personality 73 (4), 1087–1107. Pelachaud, C., Badler, N.I., Steedman, M., 1991. Linguistic issues in facial animation. In: Magnenat-Thalmann, N., Thalmann, D. (Eds.), Computer Animation ‘91. Springer, pp. 15–30. Roark, D.A., Barrett, S.E., Spence, M.J., Hervé, A., O’Toole, A.J., 2003. Psychological and neural perspectives on the role of motion in face recognition. Behavioral and Cognitive Neuroscience Reviews 2 (1), 15–46. Rosenberg, E.L., Ekman, P., 1995. Conceptual and methodological issues in the judgment of facial expressions of emotion. Motivation and Emotion 19, 111–138. Russell, J.A., 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin 115 (1), 102–141. Russell, J.A., 1995. Facial expressions of emotion: what lies beyond minimal universality? Psychological Bulletin 118 (3), 379–391. Salminen, J.K., Saarijärvi, S., Äärelä, E., Toikka, T., Kauhanen, J., 1999. Prevalence of alexithymia and its association with sociodemographic variables in the general population of Finland. Journal of Psychosomatic Research 46 (1), 75–82. Taylor, G.J., Bagby, R.M., Parker, J.D.A., 1991. The Alexithymia construct. A potential paradigm for psychosomatic medicine. Psychosomatics 32, 153–164. Terzopoulos, D., Waters, K., 1993. Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (6), 569–579. Waters, K., 1987. A muscle model for animating three-dimensional facial expressions. Computer Graphics 21 (4), 17–24. Wehrle, T., Kaiser, S., Schmidt, S., Scherer, K.R., 2000. Studying dynamic models of facial expression of emotion using synthetic animated faces. Journal of Personality and Social Psychology 78 (1), 105–119.
© Copyright 2025 Paperzz