Emotion Perception in Emotionless Face Images

To appear in Journal of Vision.
Emotion Perception in Emotionless Face Images Suggests a Norm-based
Representation
Donald Neth and Aleix M. Martinez
Dept. Electrical and Computer Engineering
Dept. Biomedical Engineering
The Ohio State University
Abstract
Perception of facial expressions of emotion is generally assumed to correspond to
underlying muscle movement. However, it is often observed that some individuals have
sadder or angrier faces, even for neutral, motionless faces. Here, we report on one such
effect caused by simple static configural changes. In particular, we show four variations in
the relative vertical position of the nose, mouth, eyes, and eyebrows that affect the
perception of emotion in neutral faces. The first two configurations make the vertical
distance between the eyes and mouth shorter than average, resulting in the perception of
an angrier face. The other two configurations make this distance larger than average,
resulting in the perception of sadness. These perceptions increase with the amount of
configural change, suggesting a representation based on variations from a norm
(prototypical) face.
Understanding emotion is a central problem of cognitive science and is fundamental to the study
of human behavior (Darwin, 1872; Russell, 2003). To properly study this, it is essential that we
identify how emotions are represented in the brain – their psychological space. Here, we are
concerned with the underlying dimensions of the face space used to represent and recognize
sadness and anger from pictures. Ekman and colleagues (Ekman & Friesen, 1978; Ekman &
Oster, 1979) following in the footsteps of Darwin (1872), identified six universal emotional
expressions (anger, sadness, fear, surprise, happiness, and disgust) assumed to have some role
within the theory of evolution (Schmidt & Cohn, 2001). According to this hypothesis, all people
should be equipped with the capacity to express and perceive these emotions and their associated
facial expressions. Ekman and Friesen’s (1978) fundamental work demonstrated how each of
these emotional expressions is created by moving particular facial muscles to certain positions,
and several authors have shown their universality (Ekman & Oster, 1979; Izard, 1994). Under
this view, motion primitives constitute the dimensions of the underlying computational face
space.
However, everyday experience suggests that other elements may be employed to classify faces
into one group or another (Zebrowitz, 1997). For example, Montepare and Dobish (2003) show
that there is an agreement among subjects judging the perception of emotions and social cues
To appear in Journal of Vision.
from faces with no particular emotional expression. Zebrowitz et al. (2007) found that
impressions of emotion expression on faces were partially mediated by babyfacedness. And,
Lyons et al. (2000) found that an upward tilt of the head produces a two-dimensional retinal
image with facial features associated with negative affect. Similarly, a downward tilt produces a
two-dimensional image of positive affect.
One possibility is that neutral faces resembling configurations of a given expression elicit
perception of such emotion (Zebrowitz, 1997; Montepare & Dobish, 2003; Zebrowitz et al.,
2007). This effect is known as the emotion overgeneralization hypothesis. Under this view, the
major question remains as to which facial features are used to represent emotions and thus which
ones are associated with these overgeneralizations. This is important, because these features are
the ones used to code the psychological (face) space.
Using Ekman and Friesen’s (1978) collection of face images, where subjects were instructed to
move their face muscles to display each of the six universal emotions, we observed that the
distance between the inner corner of the two eyebrows and the mouth increases when one
expresses sadness (from an average of 233.12 pixels in neutral expressions to 248.8 for sadness).
The same distance decreases to 220.12 pixels when one expresses anger. Hence, one hypothesis
is that individuals with an “elongated” face (i.e., a larger vertical distance between internal
features) will be perceived sadder than average, while people with a reduced vertical distance
between facial features will be perceived as angrier.
In the present paper, we show results in favor of this hypothesis. In Experiment 1, we show that
the vertical distance between eyes, nose and mouth of an individual face is correlated with the
perception of anger and sadness in face images displaying a neutral expression. When the
vertical distance between eyes and mouth is made larger than that of the average face (i.e., the
average distance within a population), the perception of sadness increases. A decrease of the
same distance from that of the average face increases the perception of sadness. This result
suggests that configural (second-order) cues are used by the cognitive system to represent and
recognize anger and sadness. Our results further demonstrate that the perception of anger and
sadness increases linearly with the eye-mouth distance, suggesting the existence of a norm-based
face space where each face is represented as a variation of a norm (sometimes called mean or
prototype) face (Valentine, 1991). We also show that this effect is asymmetric, because the linear
increase only occurs when the second image has a large intra-feature distance than the first one.
When the larger eye-to-mouth distance is on the first image, this effect is diminished. This is
because when the second image is closer to the mean face, the density of the face space
increases, making it more difficult to distinguish between categories.
If the face space used to represent facial expressions of emotion is indeed norm-based, then those
images with a shorter than average eye-mouth distance should not be perceived as sadder, while
those with a larger than average distance should not be seen as angrier. That is to say, the
To appear in Journal of Vision.
perception of sadness and anger should not traverse to the other side of the “mean” face, since
these correspond to two distinct psychological categories. This is the same effect seen in face
recognition, where moving along a single dimension of the face space results in the observation
of two distinct identities after crossing over to opposite sides of the mean face. This effect is
verified in Experiment 2. In this second experiment, subjects were shown the images with shorter
and larger eye-to-mouth distance simultaneously. Their task was now to determine whether the
second image appeared more angry or sad than the first. Subjects consistently classified the
images with larger vertical distance between internal facial features as sadder and image with a
shorter distance as angrier.
The results reported in this paper have implications in the interpretation and design of
psychophysical experiments, social behavior and political studies as well as in defining
computational models and developing human-computer interaction systems. Furthermore, our
results are important for the analysis of art, where artist may have used such configurations to
emphasize expressions --- as it seems to be the case in Grant Wood’s “American Gothic”
portrait. The perceptual biases described here are also relevant to interpret some social
interactions (Hassin & Trope, 2000), and seem to affect the perception of attractiveness
(Paunonen, 1999; Thornhill & Gangestad, 1999) and identity (Martinez, 2003; Ackerman et al.,
2006). These results also call for additional research to help determine to what extent configural,
shape and motion cues are combined with one another to influence the perception of facial
expressions of emotion.
EXPERIMENT 1
METHODS
Procedure. To assess the role of static configural differences in perception of emotion, face
images were modified and presented on a computer screen. The relationships among internal
features of the face were manipulated without affecting the overall shape of the face or head,
resulting in a configural change. Placement of the eyes, brows, nose, and mouth were varied
along the vertical axis. Four types of configural change were applied to each of 12 individual
face images taken from the AR Database (Martinez & Benavente, 1998). For each type of
configural change, individual images were created at the following percentages of displacement:
0% (i.e., original image), 25% (5 pixel displacement), 50% (10 pixels), 75% (15 pixels), and
100% (20 pixels). One of the unchanged originals is shown in Fig. 1a. Examples of each type of
configural change are shown at 100% displacement in Fig. 1b-e. The images are of 768 x 576
pixels.
Image sets from the four configurations were divided into two groups. The angry group
included face images in which the mouth was displaced upward and face images in which the
eyes and brows were displaced downward. The sad group included face images in which the
To appear in Journal of Vision.
mouth was displaced downward and face images in which the eyes and brows were displaced
upward. Pairs of these face images were displayed sequentially, Fig. 2. Prior to the display of the
first image of each pair, an initial visual cue (crosshair) was shown for 600 ms in a randomly
selected location to alert the subject as to where the first image would be displayed. The first
image was then displayed for 600 ms. This was followed by a mask with a duration of 500 ms. A
second visual cue was then displayed in another randomly selected location for 600 ms followed
by the display of the second image for another 600 ms. This prevented the perception of implied
motion.
Subjects were asked to indicate whether the second image seemed more, the same, or less angry
(or sad) relative to the first image displayed. Subjects were required to respond with a key-press
for more, same, or less angry (during the session where the angry image group was used) and sad
(for the sad group set). Subject responses and reaction times were recorded. Responses with
reaction times greater than two times the standard deviation of all reaction times were considered
outliers and were eliminated from subsequent analysis.
Subjects. Twenty subjects with normal or corrected to normal vision were seated at a personal
computer with a nineteen inch LCD monitor. Subjects were drawn from the population of
faculty, staff, and students at The Ohio State University. The typical viewing distance was 50
cm, providing a percept of approximately 15 degrees vertically and 8 degrees horizontally. Each
session lasted between 30 and 40 minutes. Subjects were given the opportunity to take a brief
rest at the 25%, 50%, and 75% completion points.
RESULTS
Responses for the changes in each configural position within the sad group and within the angry
group are shown in Figs. 3a and 3b, respectively, with significant differences (p<0.01) denoted
by an asterisk. The combined responses for the sad group and for the angry group are shown in
Fig. 3c. The abscissa reflects the difference in positive or negative change level between the first
and second image in each presentation. For example, if the first image represents a configural
change of 75% and the second image is at a level of 25%, the subject’s response would fall into
the -50% grouping (negative change). Likewise, if the first image is at 50% and the second
image is at 75%, the response would fall into the 25% grouping (positive change). The ordinate
reflects the percentage of less, same, and more responses made at each grouping.
The pattern observed in the more responses in Fig. 3 are sigmoidal as expected, since they
approximate zero before the 0% and will saturate after 100%. This response type is common of
many perceptual studies. Nonetheless, we note that the percentage of more responses increases
linearly with the amount of positive differences of change (i.e., from 0% to 100%). This is
because the sigmoidal response approximates a linear function with very low slope in this
interval. This means that, in the sad group, if the distance between the baseline of the eyes and
the mouth increases by x, the percentage of more responses increases by fs(x), where fs(.) is a
To appear in Journal of Vision.
linear function (r2 = 0.981). For the angry group, when the distance between the eyes and the
mouth decreases by x, the percentage of more response increases by fa(x), with fa(.) also linear (r2
= 0.986). Similarly, the percentage of less responses increases linearly with the amount of
negative change. In this case, for the sad group, the r2 value is 0.995, while for the angry group
r2=0.998. The percentage of same responses (i.e., identical perception of sadness or anger in the
first and second image) also decreases linearly. For the sad group, the percentage of same
responses for negative change has an r2 value of 0.988, while the percentage of same responses
for positive change has an r2 value of 0.966. For the angry group, these percentages are r2=0.991
and 0.971, respectively.
These results suggest an underlying dimension in a multidimensional face space where sadness
and anger are represented as variations from a norm (prototypical or mean) face. This normbased face space is well documented for the task of identification (Valentine & Bruce, 1986;
Leopold, Bondar & Giese, 2006; Rhodes & Jeffery, 2006), but has not been previously shown
for the representation and recognition of expressions. This is a surprising finding, because the
perception of facial expressions of emotion is purported to be categorical (Beale & Keil, 1995;
Calder et al., 1996; Young et al., 1997). Nonetheless, our results suggest that although the
perception of emotion may be categorical, the underlying representation is (at least in part)
continuous, in the sense that the perception of an emotion is made clearer as we move away from
the norm face (center of the psychological space) and a perception of neutral expression is
obtained at the “center” of this face space (Russell, 1980).
A χ2 goodness-of-fit test was applied to determine whether the responses at different levels of
displacement are indeed statistically different from those at 0% change. To properly perform this
test, the cumulative responses for the sad group and for the angry group were examined, Fig. 3c.
For each of the sad and angry groups, the responses for the 0% change in displacement were
used as the expected values. Therefore, the null hypothesis (H0) states that there is no difference
between the response profile for any given level of change in facial feature displacement when
compared to the response profile for no change in displacement. All comparisons for both the
sad group and angry group yielded a significant χ2 value for p = 0.01 with two degrees of
freedom (χ2cv= 9.21; df=2). For both the angry group and sad group, the residuals indicate that
the less responses are the major contributors to the significant differences for the negative
changes. Similarly, the more responses are the major contributors to the significant differences
for the positive changes. These results are again consistent with a norm-based representation.
It is well known that in many perceptual and psychophysical studies, the responses will be
stronger at the extremes (Attneave, 1950), which is generally where the density of the underlying
cognitive multidimensional space is lower (Krumhansl, 1978). This effect has been observed in
the recognition of identity in face images (Valentine & Ferrara, 1991; Benson & Perrett, 1994).
In these studies, faces closer to the mean face (i.e., the center of the face space) are more difficult
to identify, while faces far from the mean are recognized more easily. Therefore, we can predict
To appear in Journal of Vision.
that if the representation of expression is indeed norm-based, then a similar pattern should be
observed in our results. To test this hypothesis, we analyze the different responses to each of the
configural changes from 25% to 75%. Note that each configural change includes several possible
pairs. For example, the 25% change includes eight possible scenarios, because the variation
between the image with 0% displacement and that of 25% is the same as those from 25% to
50%, 50% to 75%, 75% to 100%, and to the mirror pairs. We show the responses to a difference
of 25% change for the sad and angry groups in Fig. 4a. In this plot, the abscissa values (n%-m%) illustrate the percentage of facial feature displacement in the first image (n%) and that in the
second image (m%).
Our results are again consistent with a norm-based model. For positive changes (i.e., when n<m),
we see a linear increment in the percentage of responses. That is, the more we move away from
the center of the face space, the more apparent the percept becomes. When the same difference
of change is on the face stimuli farthest from the mean face, the perception of sadness and anger
is maximized. We also note, however, that the plot is asymmetric, because the same does not
apply to the negative changes (i.e., n>m). We specifically see this by looking at the difference in
same responses. Note that while on the positive side of the plot the percentage of same responses
decreases linearly, on the negative side these are practically identical, i.e., the responses have
flattened. A quantitative comparison of the (negative) same responses yields differences of less
than 5% in all conditions. This means the direction of change is also important, since the
perception of anger/sadness is most visible when the change is positive (n<m). The same is true
for the plots at 50% and 75% change given in Fig. 4b-c. This suggests the face space is warped,
also a known phenomenon in psychology (Tversky, 1977) and some face recognition tasks
(Rotshtein et al., 2006), but previously not reported in the perception of facial expressions.
Asymmetries like the one observed in these results arise when the more salient of the two objects
shown to the subject is placed in a different location in the psychological space and, hence, plays
a distinct role. We note that when we go from large transformation (e.g., the extremes at 100%)
to a smaller one (e.g., 25%), the density of points in the face space increases. This is because the
second image is closer to the mean face, and the density of points representing faces increases as
we approach the mean face. As we get closer to the dense areas, it becomes harder to distinguish
between percepts, and the perception of sadness and anger diminishes. The opposite is true when
we go from a denser to a less dense region.
We also performed an analysis to assess any learning that may have taken place over the course
of an experimental session and may be responsible for the increased perception of sadness and
anger. To assess the influence of learning on the subjects’ responses, the percentages of correct
responses were examined across the sequence of trials. Responses that reflected the actual
change in deformation were considered correct, i.e., less responses were considered correct when
the changes were negative, likewise, more responses were considered correct for positive
changes. The data revealed a moderate improvement of around 5% from the first trial to the final
To appear in Journal of Vision.
one. To see whether this improvement could have affected the results reported above, we divided
the data into two halves to see if this improvement was prominent in one of the conditions. We
observed that all conditions had an almost identical increase, and thus the pattern of the data
remained unaltered, i.e., a plot of the first half of the data and a plot of the second half have the
exact same (linear) pattern, with the first-half plot being about 5% below that of the second-half.
The reported data sits in between these two plots and still has the exact same observable pattern.
Hence, although there was a small learning effect, this did not change the pattern of responses
described in this section.
EXPERIMENT 2
Although the results reported above are in agreement with our hypothesis that a shorter than
average eye-to-mouth distance increases the perception of anger and a larger than average
distance increases the perception of sadness, other explanations may be possible. In particular, it
has been argued that as a face deviates from the norm face, negative affect toward it increases
(Zebrowitz, 1997). It is possible that when subjects noticed a deviation from the mean face in the
stimuli presented above, they tended to respond toward the negative category rather than the
actual impression of sadness and anger. If this alternate hypothesis were true, subjects would not
be able to distinguish between the sad and angry stimuli defined above when presented in the
same experiment. To counter this alternative explanation of our results, we prepared a second
experiment where subjects were forced to make decisions of sadness versus anger.
METHODS
The same stimuli defined in Experiment 1 were used in this second experiment. However, the
images were now used in a single session. Face images were presented sequentially in 500 pairs
with the identity, type and magnitude of displacement randomly selected for the initial image,
Fig. 2. The identity and type of configural change applied to the second image remained the
same as in the first image. No image pair was used more than once.
As opposed to our first experiment presented above, the magnitude of configural change in the
second image was randomly selected from all cases representing no change or an increase in
displacement relative to the first image, i.e., a positive change as defined in Experiment 1. For
example, if the initial image displayed the nose-mouth down condition at 50% displacement, the
second image could only be chosen from the nose-mouth down condition at 50%, 75%, or 100%
displacement. Images were shown in sequences using the same procedure described in
Experiment 1. Seventeen subjects, with normal or corrected to normal vision, were asked to
indicate whether the second image seemed more angry, the same, or more sad relative to the first
image displayed. None of these subjects had participated in the previous study.
To appear in Journal of Vision.
RESULTS
Subject responses are summarized in Fig. 5. The abscissa reflects the magnitude and direction of
change from the first image to the second image in a pair. The 0% condition represents all cases
in which the first image and second image were identical, regardless of the magnitude of
displacement. Positive percentages to the right of the 0% represent cases in which the
displacement of nose-mouth down and eyes-brows up was increased. Positive percentages to the
left of the 0% represent cases in which displacement of nose-mouth up and eyes-brows down was
increased. Therefore, the first group (i.e., nose-mouth down and eyes-brows up) corresponds to a
set of stimuli with larger than average eye-to-mouth distance and an increment of it in the second
image relative to the first. The second group (i.e., nose-mouth up and eyes-brows down) includes
the stimuli with smaller than average eye-to-mouth distance and a smaller distance in the second
image relative to the first.
If our hypothesis were true, then subjects should select the more sad option when they see an
image of the first group and the more angry option when they observe the images in the second
group. If subjects were responding to random negative labels when the second image deviates
more from the mean face than the first image, then they would equally respond to more sad and
more angry in the two change conditions (i.e., right and left of the 0% condition). In Fig. 5, we
see that subjects’ selections clearly agree with our hypothesis. This result further demonstrates
that the perception of sadness and anger increases as the difference between the first and second
image grows. Most importantly, this result substantiates that there is no perception of anger on
the side of the face space representing sadness, and no perception of sadness on the side of the
space describing anger. These results provide further proof for a norm-based coding, with
distinct emotion perception at each side of the normative face.
A χ2 goodness-of-fit test was used to compare the more angry, same, and more sad response
profiles at different levels of configural change to the response profile for 0% change. The χ2 test
was applied to frequency data in this case. For the cumulative responses, the χ2 test shows a
significant difference for all levels of change relative to no change (0%). The percentage of times
subjects selected the more sad response increased linearly (r2=0.978) with the difference in
distance between eyes and mouth as in Experiment 1. Similarly, more angry responses increased
linearly (r2=0.994) as the eye-mouth distance decreased.
An analysis to assess the existence of any learning curve was also conducted. As we did in
experiment 1, the responses were examined across the sequence of trials by plotting the
percentage of correct responses over the trial number. No difference between the trials was
observed. A comparison of the first and second halves of the responses showed differences of
less than 1%. Hence, no significant learning that could affect the results reported above was
observed in the second experiment.
To appear in Journal of Vision.
DISCUSSION
Identifying the dimensions of the computational space underlying the representation of faces and
emotions has proven to be a challenging problem, yet it is key to the understanding of the
processes associated with these tasks and, ultimately, cognition.
It seems evident from the results reported here that configural effects (i.e., second-order
relations) can influence the perception of emotion in an otherwise neutral face. The changes in
configuration studied in this paper were made without adding any of the distortions associated
with activation of the facial musculature. This implies that face appearance (which is a direct
consequence of the physical shape of the skull and jaw and the muscles overlying these) strongly
influences people’s perception. Furthermore, this perceptual bias is predictable, since the number
of times the perception of anger and sadness is observed over many trials increases linearly with
the increase/decrease of the distance between the baseline of the eyes and the mouth. Hence,
these results suggest that motion primitives are not the sole features used by the cognitive system
when representing facial expressions of emotion.
A similar effect may be expected in other facial expressions of emotion. However, not all facial
expressions carry a vertical configural change and, hence, not all the expression may in fact be
affected by configural cues. Moreover, it is still possible that shape and motion cues still carry a
larger weight when dealing with general expressions of emotion. In a recent paper, Nusseck et al.
(2008) study the different contributions of motion cues to several facial expressions. The results
reported in the present paper suggest that additional studies which include configural and shape
features on top of motion ones should also be conducted.
The results reported in this paper also justify the general observation that some faces seem
sadder or angrier than average even when no facial movements are present. This is also clear
from some face composites constructed by people. For example, young children tend to drop
their jaws as much as possible when they want to show they are sad. This creates the impression
that the baseline of the mouth is more separated from the baseline of the eyes than usual,
resulting in the perception of sadness. A similar effect seems to be the one responsible for an
increased perception of sadness in the male subject of Wood’s “American Gothic,” Fig. 6a. It is
clear from this picture that, although there is no strong facial expression in the male character,
the distance between eyes and mouth is uncannily large. These physical characteristics create the
illusory percept of sadness. To demonstrate this effect, we can now reduce the distance between
eyes and mouth in the male character, Fig. 6b. Most viewers experience a reduction in the level
of sadness evoked from the male’s face and an increase in the perception of anger. Also, we
showed a selection of the original neutrals and their angry and sad sets (Fig. 1) in isolation. For
the modified images, we used the set with 75% of the deformation for nose and mouth down
(sad) and nose and mouth up (angry). Viewers consistently classify the angry set as angry, the
sad set as sad, and the neutral set as neutral.
To appear in Journal of Vision.
These results can be an overgeneralization of the perception of actual sad and angry facial
expressions. Evidence for this was given earlier when we illustrated the difference in browmouth distances in angry, sad and neutral expressions. Furthermore, the percentage of times that
subjects select the more category increases as the distance from the norm face grows, which
suggests a norm-based coding. In this sense, as one increases the distance from the norm, the
percept becomes clearer, i.e., more noticeable.
These results also suggest that an increase in the eye to mouth distance will increase the
perception of sadness in sad expressions, while a decrease of this distance will make angry
expressions look angrier. This effect is illustrated in Fig. 7. In Fig. 7a, we show the original sad
face expression and the same image with a larger eye-mouth distance. In Fig. 7b, we show the
original image of an angry expression and the modified image where the eye-mouth distance has
been decreased. When shown to human subjects, the modified images are generally perceived as
sadder and angrier than their original counterparts. In some instances, configural changes may
lead to the perception of a distinct expression. Therefore, even if the configural effect described
in this paper were found to have a small influence in other expressions, it needs to be considered
in studies on anger and sadness.
An immediate consequence of these results is in studies of facial expressions of emotion. It is
clear that these may be biased toward one result or another depending on the configural
information of the particular faces used. According to the results reported in this paper, one
would need to use images of faces close to the mean (prototype) face, because these are the ones
less affected by the configurational bias. However, due to the increased density near the center of
the face-space, certain discrimination tasks would be made more difficult. Moreover, the
cognitive face space for expression shares a number of aspects with the face space for identity.
The similar nature of both face spaces suggests that there is some direct or indirect interaction
between expression and identity (Martinez, 2003). Also, deficits such as those in schizophrenics
may be partly rooted in an impairment of facial expression recognition. It has been recently
proposed (Chambon et al, 2006) that configural information may be related to such deficits. A
better understanding of the underlying face space will facilitate studies in these directions.
Acknowledgements
We thank the reviewers for their constructive comments. This research was supported in part by
the National Institutes of Health grant R01 DC 005241 and the National Science Foundation
grant IIS 0713055. DN was supported in part by a fellowship from Ohio State’s Center for
Cognitive Sciences.
To appear in Journal of Vision.
References
Ackerman, J.M., Shapiro, J.R., Neuberg, S.L., Kenrick, D.T., D.V., Griskevicius, V., Maner,
J.K., & Schaller, M. (2006). They all look the same to me (unless they're angry). Psychological
Science, 17, 836-840.
Attneave, F. (1950). Dimensions of similarity. The American Journal of Psychology, 63, 516556.
Beale, J.M. & Keil, F.C. (1995). Categorical effects in the perception of faces. Cognition, 57,
217-239.
Benson, P.J.& Perrett, D.I. (1994). Visual processing of facial distinctiveness. Perception, 23,
75-93.
Calder, A.J., Young, A.W., Perrett, D.I., Etcoff, N.L. & Rowland, D. (1996) Categorical
perception of morphed facial expressions. Visual Cognition, 3, 81-117.
Chambon, V., Baudouin, J.Y. & Franck, N. (2006). The role of configural information in facial
emotion recognition in schizophrenia. Neuropsychologia, 44, 2437-2444.
Darwin, C. (1872). The Expression of the emotions in man and animal. London: J. Murray.
Ekman, P.J. & Friesen, W.V. (1978). The facial Action coding system: A technique for the
measurement of facial movement. San Diego: Consulting Psychology Press.
Ekman, P., & Oster, H. (1979). Facial expressions of emotion. Annual Review of Psychology, 30,
527-54.
Hassin, R., & Trope, Y. (2000). Facing faces: Studies on the cognitive aspects of physiognomy.
Journal of Personality and Social Psychology, 78, 837-852.
Izard, C. (1994). Innate and universal facial expressions: evidence from developmental and
cross-cultural research. Psychological Bulletin, 115, 288-299.
Krumhansl, C.L. (1978) Concerning the applicability of geometric models to similarity data: the
interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.
Leopold, D.A., Bondar, I.V. & Giese, M.A. (2006). Norm-based face encoding by single
neurons in the monkey inferotemporal cortex. Nature, 442, 572-575.
Lyons, M.J., Campbell, R., Plante, A., Coleman, M., Kamachi, M. & Akamatsu, S. (2000). The
Noh mask effect: vertical viewpoint dependence of facial expression perception. Proceedings of
the Royal Society of London B 267, 2239-2245.
Martinez, A.M. & Benavente, R. (1998). The AR-face database. Computer Vision Center,
Technical Report 24.
To appear in Journal of Vision.
Martinez, A.M. (2003). Matching expression variant faces. Vision Research, 43, 1047-1060.
Montepare, J.M. & Dobish, H. (2003). The contribution of emotion perception and their
overgeneralization to trait impressions. J. Nonverbal Behavior, 27, 237-254.
Nusseck, M., Cunningham, D. W., Wallraven, C., & Bülthoff, H. H. (2008). The contribution of
different facial regions to the recognition of conversational expressions. Journal of Vision,
8(8):1, 1-23, http://journalofvision.org/8/8/1/
Paunonen, S.V., Ewan, K., Earthy, J., Lefave, S., & Goldberg, H. (1999) Facial features as
personality cues. Journal of Personality, 67, 555-583.
Rhodes, G. & Jeffery, L. (2006). Adaptive norm-based coding of facial identity. Vision
Research, 46, 2977-2987.
Rotshtein, P., Henson, R.N.A., Treves, A., Driver, J. & Dolan, R.J. (2006). Morphing Marilyn
into Maggie dissociates physical and identity face representations in the brain. Nature
Neuroscience, 8, 107-113.
Russell, J.A. (1980). A circumplex model of affect. Journal of Personality and Social
Psychology, 39, 1161-1178.
Russell, J.A. (2003). Core affect and the psychological construction of emotion. Psychological
Review, 110, 145-172.
Schmidt, K.L., & Cohn, J.F. (2001). Human facial expressions as adaptations: Evolutionary
questions in facial expression research. American Journal of Physical Anthropology, S33, 3-24.
Thornhill, R., & Gangestad, S.W. (1999) Facial attractiveness. Trends in Cognitive Sciences, 3,
452-460.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.
Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and race in
face recognition. The Quarterly Journal of Experimental Psychology, 43A, 161-204.
Valentine, T. & Bruce, V. (1986). The effects of distinctiveness in recognising and classifying
faces. Perception, 15, 525-535.
Valentine, T. & Ferrara, A. (1991). Typicality in categorization, recognition and identification:
evidence from face recognition. British Journal of Psychology, 82, 87-102.
Young, A.W., Rowland, D., Calder, A.J., Etcoff, N.L., Seth, A. & Perrett, D.I. (1997). Facial
expression megamix: Tests of dimensional and category accounts of emotion recognition.
Cognition, 63, 271-313.
To appear in Journal of Vision.
Zebrowitz, L.A. (1997). Reading faces: window to the soul? Boulder, CO: Westview Press.
Zebrowitz, L.A., Kikuchi, M. & Fellous, J. (2007). Are effects of emotion expression on trait
impressions mediated by babyfaceness? evidence from connectionist modeling. Personality and
Social Psychology Bulletin, 33, 648-662.
To appear in Journal of Vision.
a
b
c
d
e
Figure 1: Facial Configurations. a) The original face with a neutral expression. b,c) The angry
group comprises face images in which the distance between the eyes and mouth has been
decreased. In b, this is accomplished by displacing the eyes and brows downward. In c, the
mouth is displaced upward. d,e) The sad group comprises faces in which the distance between
the eyes and mouth is increased. This is done by displacing the eyes and brows upward (d) or by
displacing the mouth downward (e). The examples shown here correspond to a 20-pixel
displacement of the facial features (100%).
To appear in Journal of Vision.
Figure 2: Stimulus time-line. An initial visual cue is shown in a randomly selected location to
alert the subject as to where the first image will be displayed. After 600 ms, the first face image
is displayed for a period of 600 ms. A mask (blank screen) is then presented for 500 ms. A
second visual cue is displayed in another randomly selected location for 600 ms. The second face
image is then displayed for 600 ms. Finally, a blank screen with a large question mark (“?”) at
the center is displayed until the subject responds with a key-press.
To appear in Journal of Vision.
Nose Mouth Down
*
80
*
*
90
*
70
*
60
Less
50
*
40
Same
*
*
More
30
20
*
*
*
*
*
80
*
70
% Response
*
% Response
90
Eyes Brows Up
*
60
*
*
*
Less
*
50
Same
40
*
30
20
*
10
*
*
*
*
-100% -75%
-50%
More
*
*
*
25%
50%
*
10
0
0
-100% -75%
-50%
-25%
0%
25%
50%
75%
100%
-25%
% Change
0%
75%
100%
% Change
a
Nose Mouth Up
90
% Response
70
90
*
*
*
*
*
60
Less
50
Same
*
*
*
30
*
20
*
*
*
*
More
*
*
*
*
*
*
60
Less
50
*
40
Same
*
*
*
*
-100% -75%
-50%
More
*
30
20
10
*
*
70
*
40
*
80
*
% Response
80
Eyes Brows Down
*
*
*
*
25%
50%
75%
10
0
0
-100% -75%
-50%
-25%
0%
25%
50%
75%
100%
-25%
0%
% Change
% Change
Sad
Angry
100%
b
80
% Response
70
90
*
*
*
*
*
*
70
60
Less
50
40
Same
*
*
*
More
30
20
80
*
*
*
*
*
% Response
90
*
*
*
*
*
60
Less
50
40
*
Same
*
*
30
20
*
*
*
More
*
*
*
*
-100% -75%
-50%
-25%
*
*
*
25%
50%
75%
10
10
0
0
-100% -75%
-50%
-25%
0%
25%
% Change
50%
75%
100%
0%
100%
% Change
c
Figure 3: Subject Responses. a) Responses for Sad Group: Relative percentages for less-samemore responses at each level of change between the pairs of faces for both downward
displacement of the mouth and upward displacement of the eyes and brows. b) Responses for
Angry Group: Relative percentages for less-same-more responses at each level of change
between the pairs of faces for both upward displacement of the mouth and downward
displacement of the eyes and brows. c) Combined responses for Sad Group and Angry Group:
Cumulative responses for the sad group (nose/mouth down and eyes/brows up) and the angry
group (nose/mouth up and eyes/brows down). Significant differences (p<0.01) among less-samemore responses at each level are marked with an asterisk. Bars denote standard error.
To appear in Journal of Vision.
Sad 25%
90
*
*
70
% Response
90
*
*
Less
50
Same
*
30
More
*
20
*
*
*
*
*
60
Less
50
40
*
*
Same
More
*
30
20
*
10
*
*
70
60
40
*
80
*
*
% Response
80
Angry 25%
*
*
*
*
10
0
*
*
0
100-75 75-50
50-25
25-0
0-25
25-50
50-75 75-100
100-75 75-50
50-25
Level of Change
25-0
0-25
25-50
50-75 75-100
Level of Change
a
Sad 50%
Angry 50%
90
90
80
80
*
Less
50
*
40
*
30
20
*
*
*
*
*
*
70
*
60
Same
More
% Response
% Response
70
*
Less
50
*
40
*
*
*
Same
More
*
30
20
10
*
60
*
*
*
10
0
0
100-50
75-25
50-0
0-50
25-75
50-100
100-50
75-25
50-0
0-50
Level of Change
Level of Change
Sad 75%
Angry 75%
25-75
50-100
b
*
90
80
*
90
80
*
*
60
Less
50
40
30
*
*
70
Same
*
*
*
*
*
More
*
% Response
% Response
70
60
40
30
20
20
10
10
0
Less
50
Same
*
*
*
More
*
0
100-25
75-0
0-75
Level of Change
25-100
100-25
75-0
0-75
25-100
Level of Change
c
Figure 4: Combined responses for each level of displacement in Sad and Angry Groups: a) 25%
displacement; b) 50% displacement; c) 75% displacement. These differences are specified in the
x axis in each plot as n%-m%, where n% corresponds to the percentage of displacement in the
first image and m% that of the second. For the positive changes (n<m) we see the familiar linear
pattern, while for negative changes (n>m) this pattern changes (flattens). This asymmetry is
further illustrated by the statistical significant values (p<.001) marked with asterisks.
To appear in Journal of Vision.
More Angry or More Sad
90
80
*
*
70
*
*
*
*
*
60
More Angry
50
40
Same
*
*
More Sad
30
20
*
*
50%
25%
*
*
10
0
100%
75%
Angry Group
0%
25%
50%
75%
100%
Sad Group
Figure 5: Cumulative responses for Experiment 2. Significant differences at p<0.01 are denoted
by an asterisk. On the left-hand side of the 0% condition, we show the responses obtained when
subjects see the angry condition stimuli. On the right-hand side of the 0% condition, we have
subjects’ responses to the sad stimuli.
To appear in Journal of Vision.
a
b
Figure 6: American Gothic revisited. The original image of the male subject’s face is shown in
a. The modified image, with the mouth moved vertically upward, is shown in b. © The Art
Institute of Chicago.
To appear in Journal of Vision.
a
b
Figure 7: Modified emotional expressions. a) Original sad expression versus same features with
increased eye-mouth distance. b) Original angry expression versus same features with decreased
eye-mouth distance. (Images from FACS, © Ekman & Friesen, 1978).