Emoacoustics: A Study of the Psychoacoustical and Psychological

PAPERS
Emoacoustics: A Study of the Psychoacoustical
and Psychological Dimensions of Emotional
Sound Design
ERKIN ASUTAY1, DANIEL VÄSTFJÄLL1,2, ANA TAJADURA-JIMÉNEZ3, ANDERS GENELL4,
([email protected])
([email protected])
([email protected])
([email protected])
PENNY BERGMAN1, AND MENDEL KLEINER,1 AES Member
([email protected])
([email protected])
1
2
Division of Applied Acoustics, Chalmers University of Technology, Gothenburg, Sweden
Department of Behavioral Science and Learning, Linköping University, Linköping, Sweden
3
Department of Psychology, Royal Holloway University of London, Engham, Surrey, UK
4
Environment and Traffic Analysis, VTI, Gothenburg, Sweden
Even though traditional psychoacoustics has provided indispensable knowledge about auditory perception, it has, in its narrow focus on signal characteristics, neglected listener and
contextual characteristics. To demonstrate the influence of the meaning the listener attaches to
a sound in the resulting sensations we used a Fourier-time-transform processing to reduce the
identifiability of 18 environmental sounds. In a listening experiment, 20 subjects listened to
and rated their sensations in response to, first, all the processed stimuli and then, all original
stimuli, without being aware of the relationship between the two groups. Another 20 subjects
rated only the processed stimuli, which were primed by their original counterparts. This manipulation was used in order to see the difference in resulting sensation when the subject could
tell what the sound source is. In both tests subjects rated their emotional experience for each
stimulus on the orthogonal dimensions of valence and arousal, as well as perceived annoyance
and perceived loudness for each stimulus. They were also asked to identify the sound source. It
was found that processing caused correct identification to reduce substantially, while priming
recovered most of the identification. While original stimuli induced a wide range of emotional
experience, reactions to processed stimuli were emotionally neutral. Priming manipulation
reversed the effects of processing to some extent. Moreover, even though the 5th percentile
Zwickers-loudness (N5) value of most of the stimuli was reduced after processing, neither
perceived loudness nor auditory-induced emotion changed accordingly. Thus indicating the
importance of considering other factors apart from the physical sound characteristics in sound
design.
0 INTRODUCTION
Psychoacoustical models provide methods to link the
physical properties of sounds and the resulting sensations
[1]. They are widely used in product sound quality and auditory display design applications, where the designer strives
to convey auditory information to users. Hence, in auditory
display design, the central concept is the listener. In order to
achieve successful auditory displays, designers must possess an understanding of how people hear. Traditional psychoacoustical research has offered invaluable knowledge on
how human audition works. However, it does not resolve
other issues related to the specific psychological processes
that may impact the auditory perception of the listener,
who is in an arbitrary affective state, possesses idiosynJ. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
cratic knowledge, experience, and expectations. Thus, we
argue that physical characteristics are not able to fully capture psychological processes in human auditory perception
and that focusing solely on form (i.e., signal) characteristics
of sound would be to oversimplify auditory perception [2].
An alternative approach to psychoacoustics may consider emotional responses to sound and their influence on
sound perception. Emotions play a central role in many
perceptual mechanisms [3–6], and in our daily life we are
constantly subjected to sounds, which often induce emotional reactions in its perceiver [7, 8]. Evoked sensations by
and emotional reactions to sound may also imply (or affect)
quality judgments. Following the discussion by Blauert and
Jekosch [9], emotions and feelings have positive and negative components; and if a positive emotional experience
21
ASUTAY ET AL.
is attributed to a specific sound, it may be judged of high
quality. Therefore, understanding how sounds evoke emotions could improve our knowledge on human audition and
contribute to creation of more successful sound design applications (see emotional design, [10]). Hence, we have
named this alternative approach to psychoacoustics emotional sound design or emoacoustics.
A substantial body of research on auditory-induced emotion has been trying to link physical characteristics of sound
to certain emotional responses, including annoyance (for a
review see [11]). Much of this research has linked physical
characteristics such as loudness [12], spectral bandwidth
[13, 14], temporal variability [15, 16] or frequency content [14, 16] of simple noise-tone complexes to auditoryinduced annoyance. However, unlike traditional psychoacoustics, the ecological approach to psychoacoustics [17]
suggests that instead of hearing different physical properties
of sound, such as intensities, frequencies, and durations, we
hear “auditory” events, which occur in a specific context
and are heard by a specific listener. Therefore, we argue that
apart from physical properties, the listener and the context
are also important factors in evoking emotional responses
to auditory information [18]. In other words, one particular sound could evoke a certain emotional response for
one listener-context combination, while it could induce a
completely different set of emotional experiences and behaviors for another. Partly, this is due to “meaning” the
listener attributes to sounds. The meaning ascribed to a
sound stimulus is related to one’s own personal experience:
evoked sensations and emotional responses are often time
learned or conditioned responses, which are linked to the
associations made when the stimulus is encountered [18].
The principal aim of the present study is to show that auditory stimuli are capable of inducing emotional reactions in
the listener and further, that these responses do not depend
solely on the physical properties of the sound. We argue
that one needs to take into account the meaning ascribed
to sound by the listener in order to understand how sounds
induce emotions. In everyday listening people tend to first
identify the sound source [19] and use this information to
categorize the sound [20]. Therefore, in the present study
we focus on the source identification of everyday sounds in
order to investigate the importance of meaning in auditoryinduced emotion. We attempted to reduce the identifiability
of a sound source while keeping the physical form of the
sound as similar to the original as possible by the use of
an FTT (Fourier-time-transform) processing algorithm. A
similar algorithm was proposed by Fastl [21] in order to investigate the influence of the meaning of the sound on noise
annoyance arguing that existing psychoacoustic models do
not account for non-sensory influences. The proposed algorithm was based on executing FTT, performing spectral
broadening, and executing inverse FTT into time domain.
Later studies [22–24] investigated the effect of this procedure on loudness and annoyance judgments and on a
number of semantic differential scales. As a result, it was
found that the proposed algorithm caused significant reduction in the identifiability of everyday sounds. Further, these
studies found that the procedure did not cause a substantial
22
PAPERS
change in perceived loudness and that annoyance ratings for
the processed stimuli were, on average, higher than those
for the original stimuli. Even though there are similarities
between the aforementioned studies and the present study,
our aim is to experimentally manipulate emotional experience induced by environmental sounds. Since emotional
responses are often related to the associations made when
the stimulus is encountered, we attempted to break these associations by reducing identifiability of the sound source.
We expected a change in auditory-induced emotions as a
result of reduction in identifiability. However, this change
was not expected as an overall increase in annoyance, but
it was hypothesized to depend on the emotional content
of the original stimuli. As a result, the stronger the emotional responses that are lead by these associations made
when the stimulus is encountered, the larger the change
when they are no longer accessible. Therefore, changes in
auditory-induced emotion due to processing are expected to
be larger for emotionally loaded compared to emotionally
neutral stimuli.
1 EXPERIMENT
1.1 Methods
1.1.1 Participants
Forty subjects (15 females) took part in the test in two
different settings (20 participants in each setting). Their
ages varied between 22 and 44 years (25.1 on average).
They were compensated for their participation after the
test. The experiment was carried out in accordance with the
ethical standards in the 1964 Declaration of Helsinki.
1.1.2 Stimuli
Eighteen auditory stimuli were selected from the International Affective Digitized Sounds (IADS) database [25],
which consists of recordings of brief (approximately 6 s)
everyday events. The criteria for selection were to make
sure that chosen stimuli would cover a wide range of emotional responses and to avoid stimuli that contained music,
erotica, and those that were judged to be of poor quality.
First, all stimuli in the IADS database were divided into
three groups according to their normative mean valence
ratings (on a 9-point scale from 1 to 9): pleasant (greater
than 6), neutral (between 4 and 6), and unpleasant (smaller
than 4). Then, six sounds were selected from each group
(Table 1). All stimuli were sampled at 44.1 kHz and were
six-seconds long.
1.1.3 Manipulation
Selected stimuli were manipulated using an FTT processing algorithm, which is based on executing FTT, spectral
averaging, and carrying out inverse FTT [26]. The original
sound is divided into time windows that are then transformed into frequency domain. The amplitudes are averaged within a frequency band and the manipulated signal
is transformed back into time domain without changing its
phase. This algorithm allows distorting spectral and temporal information, thus reducing the possibility to identify
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
PAPERS
EMOACOUSTICS
Table 1.
Auditory stimuli that are used in the experiment and
their normative mean valence and arousal ratings in the IADS
database
Category
Unpleasant
Neutral
Pleasant
Sound
Normative
mean
valence
rating
Normative
mean
arousal
rating
Screaming woman
Alarm
Bees
Growling dog
Pigs
Jackhammer
Cuckoo clock
Cat
Ticking clock
Helicopter
Frogs by a lake
Toilet flush
Cows
Carousel
Rollercoaster
Beverage bottle opening
Laughter
Applause
1.74
2.38
2.58
2.73
3.56
3.66
4.23
4.31
4.38
5.19
5.49
5.57
6.09
6.21
6.90
7.13
7.32
7.69
8.07
7.96
7.01
7.77
6.38
5.60
6.19
5.47
4.56
5.50
4.51
3.37
4.27
5.43
7.36
4.43
6.37
5.57
the sound source, while keeping physical characteristics
such as general frequency content and temporal evolution
as close to the original as possible. By a careful selection of
length of the time window and width of the frequency band
one can control changes in temporal and spectral domains
and keep physical changes at a minimum level.
In the present study, spectral averaging was done in 1/3
octave bands, and 75% overlapping hanning-windows with
different lengths were used (4096, 8192 or 16384 samples).
The authors decided the processing level (i.e., length of the
time window) informally for each sound. The goal was
to determine the lowest level of manipulation needed for
reducing identifiability of sound sources. After the initial
selection, a group of other individuals, unaware of the purpose of the study, were asked to identify the sound source
of the processed versions of the stimuli. The final level
of processing for each sound was determined after these
evaluations.
1.1.4 Measures
In the test, participants were asked to rate how they felt
during each stimulus using the 9-point Self Assessment
Manikin (SAM) scales of valence and arousal [27]. Using
these scales subjects rated, respectively, the pleasantness
and the arousal level of their current feelings.
Participants were also asked to rate how annoying each
stimulus was, using a 9-point unipolar scale from 1 (not at
all annoying) to 9 (very much annoying). Moreover, in a
similar scale participants indicated perceived loudness level
for each sound (1–not loud at all, 9–very loud). Finally,
they were asked to identify the sound source with their own
words.
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
1.1.5 Procedure
The experiment was carried out in a classroom. Sounds
were reproduced through headphones (Sennheiser HD414), and participants rated the sounds on scales using paper
and pencil.
In the first setting, participants rated all the stimuli that
were processed and then rated all the original stimuli presented in a different order (for the sake of simplicity from
now on these two stimulus sets will be referred to as processed and original sounds, respectively). Participants were
not aware of the existence of two sets of stimuli. They were
told that they would listen to and rate 36 auditory stimuli
that were recorded in everyday situations.
In the second setting, participants rated only processed
sounds, which were primed by their original counterparts.
Priming was done by presenting each original sound before
their processed versions. Participants were asked to rate the
sounds in the same way as in the first setting (from now
on this stimulus set will be referred to as primed sounds).
None of the participants took part in both settings. We
expected that after processing, stimuli would be unidentifiable due to the changes inflicted on spectral and temporal
patterns. However, distorting identification does not necessarily mean to eliminate meaning in the sound. Even with
incorrect identification new associations can be made and
new meanings might emerge. Moreover, despite the fact
that we aimed for the lowest level of processing, original
and processed sounds are not physically identical. Hence,
by the use of the priming manipulation we had the aim
of reverting to the original associations, which would be
made inaccessible due to processing; and we attempted to
manipulate a change in emotional experience induced by
processed sounds. Therefore, priming the processed stimuli
is expected to reverse the effects of processing on auditoryinduced emotion to some extent.
Additionally, a third set of participants (22 subjects, mean
age: 25.5 ±1.2) were asked to rate how they felt during each
processed sound, using the aforementioned SAM scales
(valence and arousal). Before hearing each sound participants were presented with a text describing the sound
they would hear. This last group (verbally primed) was
introduced as an additional priming measure, since another
priming technique was needed to control for the fact that
participants in the auditory priming group may think of
original sounds during their judgments, even though they
were explicitly instructed to rate processed stimuli.
1.2 Results
1.2.1 Identifiability
Results of the identification task showed that the processing caused identifiability of the source to decrease dramatically (t = 19.9, df = 17, p < .0001), from a mean of
91% (±3.4%, SE of the mean indicated) correct identification for original sounds to a mean of 11% (±3.4%) for
processed stimuli. Moreover, priming the processed stimuli
caused identifiability to increase significantly compared to
the processed condition (t = 18.8, df = 17, p < .0001), to a
mean of 78% (±2.7%) correct identification. Even though
23
ASUTAY ET AL.
PAPERS
the effects of processing and priming on identifiability were
not uniform across all sounds because of differences in temporal and spectral properties, changes for each individual
sound were statistically significant (p < .01).
1.2.2 Affective Ratings
The aim of the study was to demonstrate that auditoryinduced emotions depend not only on the physical characteristics of the sound, but also on the meaning attributed to
the sound, and to fulfill this aim we experimentally manipulated auditory-induced emotion. The reduction in identifiability was expected to cause larger changes in auditoryinduced emotion for stimuli that have high emotional content (pleasant or unpleasant) compared to neutral stimuli.
For each affective dimension (i.e., valence and arousal),
a three-factor repeated-measures ANOVA, employing processing (original vs. processed), pleasantness of the original stimuli (pleasant, neutral or unpleasant) and sound (6
stimuli in each condition) as within-subject factors, was run.
Results showed that, on the average, reactions along the valence dimension were significantly higher for original than
for processed stimuli (F(1,12) = 32.422, p < .001). Also,
a significant interaction was found between processing and
pleasantness (F(2,24) = 78.046, p < .0001), showing that
the effect of processing on valence differed significantly
according to the pleasantness of the original stimuli. When
the effect of processing was investigated for separate groups
according to pleasantness of original stimuli, it was found
that valence ratings of unpleasant sounds increased as a result of processing (F(1,18) = 20.546, p < .001), whereas
for pleasant and neutral sounds valence ratings decreased
due to processing (F(1,15) = 7.410, p < .03; and F(1,14) =
196.909, p < .0001 for neutral and pleasant stimuli, respectively). For the arousal dimension, the interaction of
processing by pleasantness was also statistically significant
(F(2,20) = 30.007, p < .0001). Further analysis showed that
while arousal ratings of unpleasant sounds significantly decreased (F(1,18) = 35.532, p < .0001) due to processing,
they increased for neutral and pleasant stimuli (F(1,14) =
6.567, p < .03; and F(1,13) = 6.844, p < .03 for neutral
and pleasant stimuli, respectively).
An additional repeated-measures ANOVA, which employed pleasantness and sound within-subject factors and a
between-subjects factor of priming (processed vs. primed),
was run. A significant interaction of priming and pleasantness factors was found for both valence (F(2,54) = 7.809,
p < .01) and arousal (F(2,48) = 3.81,p < .05) dimensions,
indicating that the emotional experience differed depending on the pleasantness of the original stimuli. Univariate
analyses showed that valence ratings for primed stimuli that
were originally pleasant were higher compared with their
processed counterparts (F(1,30) = 9.91,p < .01), whereas
it was lower for originally unpleasant stimuli (F(1,35) =
7.05,p < .05). Moreover, arousal ratings for originally unpleasant primed stimuli were higher compared with processed stimuli (F(1,35) = 7.99,p < .01). Mean valence
and arousal judgments for original, processed, and primed
stimuli can be seen in Fig. 1.
24
Fig. 1. Mean (±SE) valence and arousal judgments for original,
processed, and primed stimuli.
Fig. 2. Effect of processing and priming on Perceived Annoyance.
Stimuli are sorted according to the valence ratings for the original
stimuli. Inset: Processing by pleasantness interaction.
Finally, possible differences between the two different methods of priming were investigated by the use of
repeated-measures ANOVA using pleasantness and sound
as within-subject factors and priming method (verbal vs.
auditory) as between-subjects factor. No significant main
effect of priming method could be found (F(1,35) = .06, p =
.802; and F(1,34) = 1.08, p = .307 for valence and arousal
dimensions, respectively). Further, the priming method factor did not significantly interact with any other factor. This
finding suggests that there was no substantial difference
between the priming methods.
1.2.3 Annoyance
A three-factor ANOVA was run employing processing
(original vs. processed), pleasantness of the original stimuli (unpleasant, neutral or pleasant) and sound (6 sounds in
each condition) as within-subject factors to investigate the
impact of processing on auditory-induced annoyance (for
mean levels see Fig. 2). Apart from the significant main
effect of pleasantness (F(2,36) = 35.736, p < .0001), the
interaction processing by pleasantness (F(2,36) = 46.855,
p < .0001) was found to be significant. This finding suggested that even though processing did not have a significant main effect, the effect of processing on auditoryinduced annoyance differed significantly depending on the
pleasantness of the original stimuli. When the effects were
investigated among the separate groups according to the
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
PAPERS
Fig. 3. Effect of processing and priming on Perceived Loudness.
Stimuli are sorted according to the valence ratings for the original
stimuli. Inset: Processing by pleasantness interaction.
pleasantness of original stimuli, significant effects were
found for unpleasant and pleasant stimuli in opposite directions (F(1,18) = 31.139, p < .0001; and F(1,19) = 37.387,
p < .0001 for unpleasant and pleasant stimuli, respectively).
In other words, due to FTT processing, auditory-induced
annoyance decreased for unpleasant stimuli, while it increased for pleasant stimuli. However, for neutral stimuli
there was no significant effect of processing.
In order to investigate the differences between processed
and primed stimuli, a separate ANOVA that compared the
ratings of participants in the two different settings was carried out. It employed priming (primed vs. processed) as
a between-subjects factor while keeping pleasantness and
sound as within-subject factors. Although no significant
main effect of priming was found, the priming by pleasantness interaction was statistically significant (F(2,76) =
7.725, p < .003), indicating that priming had different effect
on auditory-induced annoyance depending on the pleasantness of the original stimuli. Further univariate tests showed
that after priming auditory-induced annoyance for initially
unpleasant stimuli increased while it decreased for the rest
(see inset in Fig. 2).
1.2.4 Loudness
Since previous research showed that loudness is one of
the most important factors that influence auditory-induced
annoyance, change in perceived loudness as a result of processing was also investigated in the present study (Fig. 3). In
a three-factor ANOVA, which employed processing (original vs. processed), pleasantness (unpleasant, neutral or
pleasant), and sound, there was a significant main effect
of processing indicating that perceived loudness for original sounds was higher on average than it was for processed
sounds (F(1,19) = 6.470, p < .05). Furthermore, there were
significant interactions of the factors sound by processing
(F(5,95) = 4.058, p < .05) and pleasantness by processing (F(2,38) = 28.293, p < .0001), showing that the effect of processing was not uniform on all stimuli. When
the differences in loudness was investigated within separate groups according to the pleasantness of the original
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
EMOACOUSTICS
stimuli, a significant effect of processing was found for unpleasant sounds (F(1,19) = 39.107, p < .0001), indicating
that the perceived loudness for original stimuli was higher
compared with processed stimuli. However, there were no
significant effects of processing for neither pleasant nor
neutral stimuli, which implies that processing caused perceived loudness to decrease for unpleasant stimuli only.
Post-hoc analysis, in which changes in the 5th percentile
Zwickers-loudness (N5) were studied, showed that N5
decreased for almost every stimulus due to processing.
Therefore, in order to further investigate the dependence of
perceived loudness on the psychoacoustics metric (N5), an
additional post-hoc regression analysis was done. The mean
levels of perceived loudness of original and processed stimuli were taken as dependent variables and mean valence,
arousal, annoyance ratings, and N5 were taken as independent variables. Primed stimuli were not included in this
analysis since participants’ loudness judgments might have
been affected due to the fact that they heard original stimuli before rating processed stimuli. The analyses showed
that among the above-mentioned independent variables the
best linear model to explain the variance in perceived loudness (Adj. R2 = .77, p < .0001) included the variables of
auditory-induced arousal (standardized β = 0.69) and N5
(standardized β = 0.39).
Since previous research has claimed a strong relationship between auditory-induced annoyance and loudness
for meaningless sounds [28], we studied the dimensional
correlation between the means of auditory-induced annoyance and perceived loudness for each stimulus. Auditoryinduced annoyance and perceived loudness were positively
correlated for all the three stimulus-groups (original, processed, and primed stimuli), even though the correlation
was weaker for processed stimuli. Pearson Correlation
coefficients were .77, .54, and .74 (N = 18, p < .05
for all three) for original, processed, and primed stimuli,
respectively.
2 DISCUSSION AND CONCLUSIONS
The present study set out to show that physical characteristics of acoustic stimuli are not able to fully capture
auditory-induced emotion. Auditory emotions are often idiosyncratic and depend on the meaning an individual in
a certain circumstance ascribe to sounds [18]. To investigate the meaning that is attached to sounds, we reduced
the identifiability of a number of environmental sounds in
order to distort the meaning attribution process. The results from the identification task showed that the proposed
FTT processing worked as intended, correct identification
was reduced significantly for every stimulus. In order to
experimentally vary the context, and force specific association and meaning, we introduced a priming manipulation.
Correct identification was significantly increased for every
processed stimulus in the priming condition.
Although the change in correct identification due to processing was not the same for every stimulus, auditoryinduced emotion varied systematically depending on
25
ASUTAY ET AL.
emotional content of the original stimuli. The influence
of the implemented processing algorithm on auditoryinduced valence and arousal was in the opposite direction for pleasant and unpleasant stimuli. Also, auditoryinduced annoyance was affected only when the original
stimulus was emotionally charged, i.e., auditory-induced
annoyance increased for pleasant stimuli, whereas it decreased for unpleasant stimuli. These changes on emotional experience due to processing support the hypothesis that sounds with high emotional content become more
neutral when the associations that lead to these strong emotional responses are inaccessible. However, we maintain
that it is likely that even with the processed sounds, subjects
still perceived auditory events rather than physical properties, which may well be plausible here since subjects were
instructed that both original and processed stimuli were
recorded in everyday-situations depicting everyday-events.
Responses to the identification task support this argument,
since subjects identified the source of processed sounds or
had associations even if they were incorrect. Nevertheless,
these associations were emotionally neutral.
Furthermore, it was also hypothesized that once these
associations are formed again by the use of priming, the
effects of processing on auditory-induced emotion would be
reversed, which is supported by the results. In other words,
priming manipulation caused processed stimuli to trigger
emotionally loaded associations consistent with the initial
exposure. As a result, the emotional experience changed
significantly for those sounds that were initially emotional,
i.e., pleasant or unpleasant.
Earlier studies [22–24], which employed a similar
paradigm, suggested that due to processing (1) identifiability of the sounds were greatly reduced, and (2) annoyance
judgments were on average higher compared to original
stimuli, whereas (3) loudness judgments were not affected
by processing. In the present study, we found a reduction
in correct identification due to processing. However, our
results on auditory-induced annoyance were not totally in
line with the previous studies. The reason for this difference might be that in previous research, the stimulus set
was comprised mainly of non-stationary everyday noises
and product sounds most of which had emotionally neutral meaning, while in the present study sounds with a
wide range of emotional content are used, including human and animal vocalizations. Our stimulus set contained
highly unpleasant sounds (Table 1), whose processed versions induced less annoyance compared to their original
counterparts. Moreover, our findings on loudness judgments were not in line with the previous research. However, direct comparison of results on loudness judgments
between studies might not be relevant, since implemented
processing algorithm was not identical to the one that was
used in previous studies. In particular, our processing algorithm caused a decrease in Zwicker loudness (N5), while
processing in previous studies kept loudness-time function
intact.
Emotional responses and evoked sensations are often
time learned responses, which are linked to the associations
made when the stimulus is encountered [18]. In the present
26
PAPERS
research, we attempted to interrupt these associations by
reducing identifiability of sound sources. Looking at participants’ valence, arousal, and annoyance ratings, we claim
that emotional experience induced by stimuli became neutral when the original associations were made inaccessible.
Thus, one can say that processed stimuli were emotionally
neutral. By the use of priming manipulation we tried to force
specific, emotionally loaded associations on (processed)
stimuli that were emotionally neutral. As a result, priming
the processed stimuli helped activating emotionally loaded
associations, and thus, even though sounds were physically
identical, the primed stimuli induced wider range of emotional experience (both pleasant and unpleasant), compared
with processed stimuli. This finding supports our argument
that physical or psychoacoustical characteristics of auditory stimuli cannot fully account for evoked sensations and
emotional reactions.
Like much of previous research (e.g., [11, 12]), which
linked annoying quality of a sound to its loudness, the
present study also found significant correlations between
annoyance and loudness judgments. Furthermore, the implemented algorithm caused the 5th percentile loudness
(N5) to decrease for almost every stimulus. However, neither emotional experience nor loudness judgments followed
this decrease. Looking more carefully at the perceived loudness results, we found that it decreased only for originally
unpleasant stimuli; and that auditory-induced arousal and
N5 could account for 77% of the variance in perceived loudness in a regression analysis, to which the contribution of
auditory-induced arousal was substantially larger compared
to calculated loudness. Hence, these findings suggest that
changes in psychoacoustic metrics might not fully capture
the changes in resulting sensations in an everyday listening
situation. Loudness judgments might have been affected by
the emotional experience. Unpleasant stimuli, which are
more salient compared to neutral or positive stimuli [29],
might be perceived as louder due to the negative affect it
induces. However, this speculation has to be investigated
further before drawing any firm conclusions.
In conclusion, the present study demonstrates that sound
induces emotional reactions in its perceiver and that focusing only on the physical properties of auditory stimuli is
not sufficient to understand auditory-induced emotion. One
also needs to consider the associations made by the listener.
Therefore, we claim that auditory-induced emotion is an
important component in auditory perception to consider in
product sound quality and in sound design applications in
general, where some form of auditory information is conveyed to the listener. Hence, sound designers need to be
aware of the different capacities of physical, psychoacoustical, and psychological dimensions of auditory displays for
causing an emotional reaction in its perceiver, in order to
succeed in designing effective auditory displays.
ACKNOWLEDGMENTS
A preprint version of this paper has been presented at
the 3rd International Workshop on Perceptual Quality of
Systems, PQS, Bautzen, Germany, September 2010.
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
PAPERS
3 REFERENCES
[1] E. Zwicker and H. Fastl, Psychoacoustics: Facts and
Models (Springer-Verlag, Berlin, 1990).
[2] U. Jekosch, “Meaning in the Context of Sound
Quality Assessment,” Acta Acustica, 85 pp. 681–684
(1999).
[3] A. Damasio, Descartes’ Error (Putnam, NY, 1994).
[4] R. Zeelenberg, E. Wagenmakers and M. Retteveel,
“The Impact of Emotion on Perception: Bias or Enhanced
Processing,” Psychological Science, 17(4), pp. 287–291
(2006).
[5] E. A. Phelps, S. Ling and M. Carrasco, “Emotion Facilitates Perception and Potentiates the Perceptual Benefits
of Attention,” Psychological Science, 17(4), pp. 292–299
(2006).
[6] T. Gan, N. Wang, Z. Zhang, H. Li, and Y. Lou,
“Emotional Influences on Time Perception: Evidence from
Event-Related Potentials,” NeuroReport, 20, pp. 839–843
(2009).
[7] M. M. Bradley and P. J. Lang, “Affective Reactions to
Acoustic Stimuli,” Psychophysiology, 37(2), pp. 204–215
(2000).
[8] A. Tajadura-Jiménez and D. Västfjäll, “AuditoryInduced Emotion: A Neglected Channel for Communication in Human-Computer Interaction,” in C. Peter and R.
Beale [Eds.], Affect and Emotion in Human-Computer Interaction: From Theory to Applications, 63–74(SpringerVerlag, Berlin / Heidelberg, 2008).
[9] J. Blauert and U. Jekosch, “A Layer Model of Sound
Quality,” J. Audio Eng. Soc., this issue.
[10] D. Norman, Emotional Design: Why We Love (or
Hate) Everyday Things(Basic Books, NY, 2004).
[11] D. Västfjäll, “Affect as a Component of Perceived
Sound and Vibration Quality in Aircraft,” Ph.D. thesis
(Chalmers TH, 2003).
[12] B. Berglund, U. Berglund and T. Lindvall, “Scaling
Loudness, Noisiness and Annoyance of Aircraft Noise,”
Journal of the Acoustical Society of America, 57, 930–934
(1975).
[13] U. Ländström, E. Åkerlund, A. Kjellberg, and M.
Tesarz, “Exposure Levels, Tonal Components, and Noise
Annoyance in Working Environments,” Environment International21(3), 265–275 (1995).
[14] E. A. Björk, “Laboratory Annoyance and Skin Conductance Responses to Some Natural Sounds,” Journal of
Sound and Vibration, 109(2), 339–345 (1986).
[15] K. Hiramatsu, K. Takagi, and T. Yamamoto, “Experimental Investigation on the Effect of Some Temporal
Factors of Nonsteady Noise on Annoyance,” Journal of the
Acoustical Society of America, 74(6), 1782–1793 (1983).
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
EMOACOUSTICS
[16] E. A. Björk, “Startle, Annoyance and Psychophysiological Responses to Repeated Sound Bursts,” Acta Acustica, 85, 575–578 (1999).
[17] J. G. Neuhoff, Ecological Psychoacoustics (Elseiver Academic Press, San Diego, CA, 2004).
[18] P. Juslin and D. Västfjäll, “Emotional Responses
to Music: The Need to Consider Underlying Mechanisms,” Behavioral and Brain Sciences, 31(5): 559–621
(2008).
[19] W. W. Gaver, “What Do We Hear in the World?
An Ecological Approach to Auditory Event Perception,”
Ecological Psychology, 5(1): 1–29 (1993).
[20] P. Bergman, A. Sköld, D. Västfjäll and N. Fransson,
“Perceptual and Emotional Categorization of Sound,” Journal of Acoustical Society of America, 126(6): 3156–3167
(2009).
[21] H. Fastl, “Neutralizing the Meaning of Sound for
Sound Quality Evaluations,” Proceedings of the 17th ICA
2001, Rome, Italy, edited by Int. Commission of Acoust.,
vol IV, CD-ROM (2001).
[22] W. Ellermeier, A. Zeitler and H. Fastl, “Impact
of Source Identifiability on Perceived Loudness,” Proceedings of the 18th ICA 2004, Kyoto, Japan, 1491–1494
(2004).
[23] W. Ellermeier, A. Zeitler and H. Fastl, “Predicting Annoyance Judgments from Psychoacoustic Metrics:
Identifiable versus Neutralized Sounds,” Proceedings of
the 33rd Internoise 2004, Prague, Czech Republic, Book of
Abstracts, No.267 (2004).
[24] A. Zeitler, W. Ellermeier and H. Fastl, “Significance of Meaning in Sound Quality Evaluation,” Proceedings SFA/DAGA 2004, Strasbourg, France
(2004).
[25] M. M. Bradley and P. J. Lang, “International Affective Digitized Sounds (IADS): Stimuli, Instruction Manual
and Affective Ratings (Tech. Rep. No. B-2)”, Gainesville,
FL : The Center for Research in Psychophysiology,
University of Florida (1999).
[26] A. Genell, “Perception of Sound and Vibration in
Heavy Trucks,” PhD thesis (Chalmers TH, 2008).
[27] P. J. Lang, “Behavioral Treatment and BioBehavioral Assessment: Computer Applications,” in J. B.
Sidowski, J. H. Johnson, and T. A. Williams [Eds.], Technology in Mental Health Care Delivery Systems, 119–137
(Ablex Publishing, Norwood, NJ, 1980).
[28] D. Västfjäll, “Affective Reactions to Meaningless
Sounds,” Cognition and Emotion, in press.
[29] A. Tajadura-Jiménez, A. Väljamäe, E. Asutay
and D. Västfjäll, “Embodied Auditory Perception: The
Emotional Impact of Approaching and Receding Sound
Sources,” Emotion, 10(2): 216–229 (2010).
27
ASUTAY ET AL.
PAPERS
THE AUTHORS
E. Asutay
D. Västfjäll
A. Tajadura-Jiménez
Erkin Asutay is a Ph.D. student at the Division of Applied Acoustics, Chalmers University of Technology. His
research interests are how sound induces affective reactions in the listener and how emotions affect auditory
perception. Also, he carries out research on objective and
subjective characterization of mechanical key action properties of pipe organs, where he studies the importance of
haptic feedback that the organist receives while playing on
perception of the instrument.
r
Daniel Västfjäll has dual Ph.D.s in psychology and
acoustics from Göteborg University and Chalmers University of Technology. He is currently full professor of
cognitive psychology at Linköping University, researcher
at the Division of Applied Acoustics, Chalmers University
of Technology, and research scientist at Decision Research,
Eugene, OR. His interest centers around the role of emotion
in auditory perception, emotional reactions to music, and
emotion-cognition interactions. He has published work on
emoacoustics in various leading scientific journals and has
led several research projects on emotion and sound.
r
Ana Tajadura-Jiménez studied telecommunication engineering at Universidad Politécnica de Madrid, Spain. She
received a Master’s degree in digital communication systems and technology in 2002 and a Ph.D. degree in psychoacoustics in 2008, both from Chalmers University of
Technology, Gothenburg, Sweden. From 2009, she is a
post-doctoral researcher at Royal Holloway, University of
London, UK. She has been a visiting researcher at Universitat de Barcelona, Spain (2006) and at NTT Communication
Science Laboratories, Japan (2007, 2011). Dr. TajaduraJiménez has coauthored a number of papers on auditory
and multisensory perception, emotional processing, and
self-perception in everyday contexts and new media technologies.
r
Anders Genell, after obtaining his MSc in sound and vibration at Chalmers University of Technology, continued as
a doctoral student within the CRAG-MSA research group,
28
A. Genell
P. Bergman
M. Kleiner
focusing on emotional reactions to and quality aspects of
interior vehicle sound and vibrations. In 2008, he presented
his thesis entitled “Perception of Sound and Vibration in
Heavy Trucks.” Since then he has been employed at the
Swedish National Road and Tranpsort Research Institute
(VTI), where he, among other things, has developed a realtime generated sound model for advanced moving base
driving simulators.
r
Penny Bergman studied at Gothenburg University. She
received a Master’s degree in management in 2005 and a
B.Sc. in psychology in 2006. She is currently a doctoral
student at the Division of Applied Acoustics at Chalmers
University of Technology. Her research interests are mainly
the role of affect in interpretation and perception of sounds,
multimodal integration, and the role of sound in restorative
environments.
r
Dr. Mendel Kleiner received his M.Sc.E.E. degree from
Chalmers University of Technology, Göteborg, Sweden, in
1969 and obtained a Ph.D. in building acoustics in 1978.
He held a position as lecturer at Chalmers, teaching and
doing research on active room acoustics and sound field
simulation. In 1989, Dr. Kleiner was awarded the honorary
title of ‘D. He became a fellow of the Acoustical Society of
America in 2000 and is a member of its technical committee on architectural acoustics. Between 2003-2005, Prof.
Mendel Kleiner was professor in the School of Architecture and the director of the program in architectural acoustics at Rensselaer Polytechnic Institute, Troy, NY, USA.
He is the author or coauthor of more than 130 scientific
papers, holds two patents, and has written two books, Worship Space Acoustics and Acoustics and Audio Technology,
both published by J. Ross, Ft. Lauderdale, FL. Current research interests are auditory presence, virtual acoustics and
human behavior, sensory interaction, room acoustics and
auralization (particularly spaces for music and worship),
acoustic measurement, audio and electroacoustics, sound
reproduction systems (microphones, loudspeakers and arrays), sound quality (vehicles, machinery, technical and
communication systems), as well as musical acoustics.
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February