Spanish and Swedish interpretations of Spanish and Swedish

Proceedings, FONETIK 2004, De pt. of Linguistics, Stockholm University
Spanish and Swedish interpretations of Spanish and
Swedish emotions – the influence of facial expressions
Åsa Abelin
Department of Linguistics, University of Göteborg, Sweden
Abstract
This study presents an experiment in crosslinguistic multimodal interpretation of emotional
expressions. Earlier studies on multimodal
communication have shown an interaction between the visual and auditive modalities. Other,
cross-cultural, studies indicate that facial expression of emotions is more universal than
prosody is. Cross-linguistic interpretation of
emotions could then be more successful multimodally than only vocally. The specific questions asked in the present study are to what degree Spanish and Swedish listeners can interpret Spanish emotional prosody, and whether
simultaneously presented faces, expressing the
same emotions, improve the interpretation.
Audio recordings of Spanish emotional expressions were presented to Spanish and Swedish
listeners, in two experimental settings. In the
first setting the listeners only attended to prosody, in the second one they also saw a face, expressing different emotions. The results indicate
that intra-linguistic as well as cross-linguistic
interpretation of emotional prosody is improved
by visual stimuli. Cross-linguistic interpretation
of prosody is more poorly accomplished than
inter-linguistic, but seems to be greatly augmented by multimodal stimuli.
Introduction
The aim of this study is to investigate how
speakers of Spanish and Swedish interpret vocally and multimodally expressed emotions of
Spanish. More specifically the following questions are asked:
Can Swedish as well as Spanish speakers accurately interpret Spanish emotional prosody?
Is there an influence between the auditive and
visual channel in cross-linguistic interpretation
of emotional expressions?
Several studies have been made on the expression and interpretation of emotional prosody:
for a review of different studies on emotional
prosody, see e.g. Scherer (2003) where different
research paradigms, methods and results are
discussed.
The present investigation concerns the interaction between non-verbal and vocal expression of
emotions. One question is if emotional expressions – non-verbal or prosodic, are universal.
Many researchers, beginning with Darwin,
(1872/1965) have shown that some facial expressions are probably universal. In many cross
linguistic studies of emotional prosody it has
been shown that emotional expressions are
quite well interpreted, especially for certain
emotions, for example anger, while other emotion words, for example joy, are less well interpreted (cf. Abelin and Allwood, 2000). One
possible explanation for this is that the expressions of some emotions vary more than that of
other emotions; another possibility is that some
emotions are expressed more in the gestural dimension. There could be evolutionary reasons
for this (Darwin, 1872/1965). These studies
also show that speakers are generally better at
interpreting the prosody of speakers of their
native language.
In the field of multimodal communication, there
have been some studies of emotions, see e.g.
overview in Scherer (2003:236), showing that
judges are almost as accurate in inferring different emotions from vocal as from facial expression. Massaro (2000), working under the assumption that multiple sources of information
are used to perceive a persons emotion, as well
as in speech perception, made experiments with
an animated talking head expressing four emotions in auditory, visual, bimodal consistent and
bimodal inconsistent conditions. Overall performance was more accurate with two sources
of consistent information than with either
source of information alone. In another study
de Gelder and Vroomen (2000), asked participants to identify an emotion, given a photograph and/or an auditory spoken sentence.
They found that identification judgments were
influenced by both sources of information, even
when they were instructed to base their judgment on just one of the sources.
Proceedings, FONETIK 2004, De pt. of Linguistics, Stockholm University
Matsumoto et al (2002) review many studies
on the cultural influence on the perception of
emotion, and mean that there is universality as
well as culture-specificity in the perception of
emotion. What has been particularly studied is
facial perception, where almost all studies show
good interpretations of the six basic emotions.
Universality in combination with display rules,
i.e. culture specific rules for how much certain
feelings are shown when there are other people
present, is generally accepted in psychology.
Normal human communication is multimodal
and there has been a vast amount of studies in
multimodal communication that shows interaction between the senses in perception (e.g.
Massaro, 2000). This research has shown that
speech perception is greatly influenced by visual perception. There is reason to believe that
emotional expressions are also interpreted in a
holistic way.
Assuming that there is interaction between the
senses, and that facial expression of emotion is
more universal than prosody is, then crosslinguistic interpretation of emotions should be
more successful multimodally than only vocally.
The hypotheses are:
Swedish listeners can interpret Spanish emotional prosody to the same extent as Spanish
listeners can interpret Swedish emotional prosody, (which is to a lesser extent than native
speakers)1.
The cross-linguistic interpretation of emotional expressions is improved by multimodal
stimuli.
Method
In the present article a method of elicitation is
used where speakers are enacting emotional
speech from the stimuli of drawings of facial
expressions, originally used in therapy. The
emotional expressions were elicited and recorded. Thereafter the speech material was presented to listeners for interpretation of the emotions expressed. The interpretations can be described as belonging to the attributional stage in
Scherer’s (2003) Brunswikian lens model of vocal communication of emotion. The listeners
first listened to voices alone. Then they listened
to voices in combination with looking at a facial
expression, but were told to judge the voice.
The results were compared to other studies.
1
Cf. Abelin & Allwood, 2000.
Elicitation of speech material
Recordings were made of a male speaker of
Spanish expressing eight different emotions.
The method of elicitation was the following: the
speaker was presented with the stimuli of
schematic drawings of faces expressing eight
different emotions.
The speaker was instructed to try to experience
emotionally what the face was expressing and
then express this emotion with the voice, while
uttering the name ”Amanda”. The expression
was recorded into the software PRAAT. After
each recording of an emotion the speaker named
the emotion he had just expressed, and this label
was used as the correct answer in the following
listening experiments. In this way the facial
stimuli can be used for speakers of several languages, and cannot be said to be culture specific.
In evoking vocal expression directly from facial
stimuli, the possibility to get a greater control
over the prosodic emotional expressions of
speakers of different languages was assumed;
this could be less probable if the speakers
where to express emotions with the stimuli of
emotion words. The emotions expressed by the
Spanish speaker were, according to himself, the
following: 1. sad and tired, 2. angry, 3. sad 4.
sceptical, 5. delighted, 6. afraid, 7. depressed, 8.
very happy (cf. Figure 1).
Elicitation of listener’s responses
The listener group consisted of 15 Swedish native speakers. They were first presented with
the speech material over a computer’s loud
speakers, and named the different emotions
they heard, one by one. They speech material
was presented once. Later on the listeners were
presented with the speech material at the same
time as they saw the faces presented on a computer screen, and they named the different emotions as they heard/saw each expression2. The
faces were presented with the emotional expression that was produced for this particular
face. They were told that the emotions expressed were partly different from in the first
experiment.
Comparison with earlier experiments
The results of the experiment were compared
with results from a prosodic study (Abelin &
2Utilizing the Psyscope software.
Proceedings, FONETIK 2004, De pt. of Linguistics, Stockholm University
Allwood, 2000) of a Swedish speaker who was
interpreted by Spanish and Swedish speakers. In
that experiment we did not use multimodal stimuli. Table 1 shows the number of subjects in the
different experimental conditions, which will be
compared.
Inter and intracultural interpretations of prosody
100
90
80
70
60
Table 1: Number of speakers and listeners in the
different experiments
angry
sad
happy
afraid
50
40
30
Swedish
listeners
Spanish
listeners
Swedish
voice
32
Spanish
voice
15
Spanish
voice + face
15
25
10
10
10
0
Spanish interpretation of
Swedish voice
Swedish interpretation of
Spanish voice
Spanish interpretation of
Spanish voice
Swedish interpretation of
Swedish voice
Figure 2: The diagram compares cross-linguistic
and inter-linguistic interpretations of four emotions expressed by Spanish and Swedish speakers.
Results
The Swedish interpretations of the Spanish
speakers prosodic expressions of the emotions
are shown below in Figure 1. There are clear differences in how well the Swedish listeners interpreted the different emotions of the Spanish
speaker. The two expressions of sadness were
interpreted much more accurately than the other
emotions – but only to 53% accuracy. The other
emotions were interpreted quite poorly.
Swedish interpretations of Spanish voice
60
50
40
sad and tired
% correct
20
angry
sad
30
sceptical
happy
afraid
depressed
very happy
20
10
0
interpretation voice
Figure 1: Swedish interpretations of Spanish
prosody produced from facial stimuli. To the
right are the classifications of the faces made by
the speaker.
Figure 2 compares cross-linguistic interpretations of four basic emotions. It shows that:
Cross-linguistically sadness is interpreted best,
for both languages,
Anger is well interpreted by Spanish speakers
listening to Swedish, but poorly interpreted by
Swedish speakers listening to Spanish
Happiness and fear was interpreted quite
poorly cross-linguistically by both groups
The Spanish group was generally better at interpreting the Swedish speaker than vice versa.
The most successful results are the Swedes
interpreting the Swedes and the least successful
are the Swedes interpreting the Spanish speakers.
The Spanish speakers were in fact better at interpreting Swedish anger and sadness than
Spanish anger and sadness. One possible explanation to this is that the Swedish speech stimuli was longer.
Spanish and Swedish interpretations of Spanish voice and voice + face
120
100
The results for the interpretations of only prosody, in Figure 1, are now compared with the results of Abelin & Allwood (2000), where Swedish and Spanish speakers interpreted Swedish
emotional prosody. The comparison is made for
four of the emotions: angry, sad, happy, afraid, in
Figure 2.
80
sp interpr. voice
sp interpr. voice+face
sw interpr. voice
sw. interpr. voice +face
60
40
20
0
1
2
3
4
5
6
7
8
Figure 3: Percent correct interpretations of Spanish voice and voice+face of the eight different
emotions. Spanish and Swedish listeners.
The following six conclusions can then be
added.
Proceedings, FONETIK 2004, De pt. of Linguistics, Stockholm University
Prosody alone is more difficult to interpret
than multimodal stimuli. Prosody with simultaneous facial expression produces a much better
interpretation for all but one emotion (very
happy).
Some emotions were easier to interpret than
others.
There were some emotions that were more difficult to interpret in all modalities, while others
were easier in all modalities, cf. sad, happy,
afraid, depressed.
The emotion that is recognized relatively well
in vocal expression is sadness.
Interpretation of emotional expressions is always more successful when the stimuli are multimodal, also for the Spanish listeners
The Swedish listeners, who were poor at interpreting the Spanish voice, seem to be greatly
aided by the face; the difference between the oral
and multimodal interpretations were larger for
the Swedish than for the Spanish listeners
For multimodal stimuli none of the language
groups was generally better at interpreting the
stimuli
Discussion
The results show that perception of emotional
expressions are more successful when the stimuli are multimodal, in this cross-linguistic setting,
even though the facial stimuli were very schematic. The cross-linguistic prosodic comparison
with the earlier experiment of Abelin & Allwood
(2000) shows that Spanish listeners were better
at interpreting Swedish than vice versa. The present study also shows that certain emotions
(happy and afraid) are more difficult to interpret
from prosody, by both language groups.
There are a number of problems involved in the
study of cross-cultural interpretation of linguistic
phenomena such as the expression of emotions.
There are translation problems due to different
categorizations of the emotional spectrum, different display rules for different emotions, listeners
differing knowledge of different display rules,
word finding problems etc. This study has tried
to handle the translation problem by evoking
prosodic expressions directly from facial stimuli.
Expectations on different display rules are
avoided with listeners not knowing the native
language of the speaker.
Conclusions
Emotions are more appropriately interpreted intra- and cross-linguistically when both the visual
and auditory signals are present, than when only
the auditory signal is present.
Some emotions are more easily interpreted, both
intra- and cross-linguistically, prosodically and
multimodally, e.g. sadness.
Swedish listeners could not, in the comparison
of the two studies, interpret Spanish emotional
prosody to the same extent as Spanish listeners
can interpret Swedish emotional prosody. There
were also differences between the emotions.
There is a possiblitity that the facial expression
of emotion is more universal than prosodic expression. The prosodic production of emotional
expressions per se could be universal, at least for
certain emotions, but the emotional prosody is
never heard in isolation, but always in combination with the speech prosody of each particular
language. Another possibility, which will also be
studied further, is if certain emotions are more
dependent on prosodic information and other
emotions more dependent on facial expression.
References
Abelin, Å..; Allwood, J., (2000) Cross linguistic
interpretation of emotional prosody. ISCA
workshop on Speech and Emotion. Newcastle, Northern Ireland, 110–113.
Cornelius, R. R., (2000) Theoretical approaches
to emotion. Proceedings of the ISCA Workshop on Speech and Emotion. Newcastle,
Northern Ireland, 3–8
Darwin, C., (1872/1965) The expression of the
emotions in man and animals. Chicago:
University of Chicago Press.
De Gelder, B. & Vroomen, J., (2000) The perception of emotions by ear and eye. Cognition and Emotion, 14, 289–311.
Massaro, D. W., (2000) Multimodal emotion
perception: Analogous to speech processes.
Proceedings of the ISCA Workshop on
Speech and Emotion, Newcastle, Northern
Ireland, 114–121.
Matsumoto, D., Franklin, B., Choi, J.-W.,
Rogers, D. Tatani, H., (2002) Cultural influences on the Expression and Perception of
Emotion” in W.B. Gudykunst and B.
Moody, Eds. Handbook of International and
Intercultural Communication, Sage Publications.
Scherer, K. (2003) Vocal communication of
emotion: a review of research paradigms.
Speech Communication 40 (1). 227-256.