Emotions and Speech: Some Acoustical Correlates
CARL E. W•LLL•MS
Naval A erospace
MedicalResearch
Laboratory,Pensacola,
Florida32512
KENNETH
N.
STEVENS
Massachusetts
Institut• o] Technology,
Cambridge,Massachusetts
02139
(Received 14 March 19;/2)
This paperdescribes
somefurther attemptsto identify and measurethoseparametersin the speechsignal
that reflect the emotionalstate of a speaker.High-quality recordingswere obtainedof professional
"method"actorsreadingthe dialogueof a shortscenariospecifically
written to containvariousemotional
situations.Excerptedportionsof the recordingswere subjectedto both quantitativeand qualitative
analyses.A comparison
was alsomadeof recordingsfrom a real-lifesituation,in which the emotionsof a
speakerwereclearlydefined,with recordings
from an actor whosimulatedthe samesituation.Anger,fear,
and sorrowsituationstendedto producecharacteristicdifferencesin contourof fundamentalfrequency,
averagespeechspectrum,temporalcharacteristics,
precisionof articulation,and waveformregularity of
successive
glottal pulses.Attributes for a given emotionalsituation were not always consistentfrom one
speakerto another.
Sm;J•Cr Ctatss[mcnrlo•: 9.5, 9.3.
INTRODUCTION
Long beforethe availability of moderninstrumentation for the analysisof speech,researchers
attempted
to identify and measurethoseparametersin the speech
signal that reflect the emotionalstate of a speaker.
This paperdescribes
oneof a seriesof studies
t-a undertaken in a further attempt to delineate some of the
acousticcorrelatesof the emotionsof a speaker.
There are two principalreasonswhy it is of interest
to examinethe parametersin the speechsignalthat are
related to the emotionalstate of a speaker.(1) There
are situationsin whichthe physiological
and emotional
state of an individual needsto be monitored,and it
would be convenient if an indication of this state could
be obtainedthroughanalysisof the acousticcharacteristicsof his utterances.(2) Studiesof speechattributes
related to emotional state may help to contribute to-
ward a generaltheory of speechperformance.Such a
theoryshouldhave two components:
one that specifies
the acousticcorrelatesof the linguisticunits usedfor
communication
betweenspeakersof a givenlanguage,
and the other that describes
the extralinguistic
aspects
of speechcommunication.
I.
APPROACH
where there would be no questionas to the emotion
presentin the individualspeaking,and (2) an analysis
of high-qualityrecordingsof professional
actorssimulating various emotions.The secondapproachwas
selectedfor the major portion of our work sinceit
seemedto afford the best opportunity for obtaining
goodrecordings
that couldbe subjectedto bothquantitative and qualitative analyses.Becauseactors are
presumablyable to portray clear and unambiguous
emotions,
theirutterances
providea meansfor exploring
the basic manifestationsof emotionalspeech.Field
recordingsoften reflect the simultaneous
presenceof
severalemotionsand the lack of controlof the speech
material.Whilean approachusingactorsis not novel,•-7
investigatorsemployingit have never performed,to
our knowledge,a spectrographic
analysisof the recordedspeechmaterial.
Sincewe believedthat the emotionsof interestmight
best be describedin terms of specificsituationsinvolving emotional interaction among severalpeople, the
decisionwasmadeto make useof a shortplay. Getting
the actorsinvolvedin clearlydefinedsituationswould,
hopefully,resultin their experiencing
and expressing
the variousemotionsto be studied.The primary function of the play was to elicit the desiredemotionsfrom
the actorsand to serveasthe carrierfor selectedphrases
In planningthe study, two approaches
were con- and sentences to be embedded in different emotional
sidered:(1) a detailedanalysisof "field" recordings situations. These phrases and sentences,so-called
1238
Volume52
Number4 (Pert 2)
1972
EMOTIONS
IN
SPEECH
"controlclusters,"couldthen later be subjectedto detailed acousticalanalyses.Becausechangers
in the
speechsignaldue to the presenceof someemotions
might be subtle,it wasconsidered
necessary
'tobe able
to compareutterancesof identicalmaterial as they
with regardto the linguisticcontentof a message.
The
principallinguisticfunctionsof F0 changesare to indicatestress,and to markboundaries
of differenttypes
of sentence-length
or phrase-lengthunits. Subject
to theseconstraints,
a speakeris relativelyfree to use
occurred in different emotional situations.
changesin F0 to conveynonlinguisticinformation,such
A detailed scene-by-scene
outline of a slhortplay as his emotions,or to conveyspecialemphasisof some
involving three male characterswas constructed.The kind. Furthermore,the fundamentalfrequencycan
outlinewas then given to a playwrightwho wrote the undergovariations that may not be intended or be
dialoguefor the threecharacterswho wouldbe speak- under overt control of the speaker,and hencemay
ing in varioussituations.For eachcharacter,the play- providean indicationof the speaker'semotionalstate.
wright was instructed to include identical speech Further justificationfor the preoccupation
with F0
material--the controlclusters--aspart of the.dialogue contoursas indicators
of emotionscomesfrom a study
for that characterspeakingin differentsitu•tions.In of the literatureon the physiological
correlates
of stress.
additionto analysisof the speechmaterialin the con- It hasbeenstated,for example,that "... respiration
trol clusters,longerportionsof the dialogue,usually is frequentlya sensitiveindicatorin certainemotional
severalsentencessurroundinga control cluster,were situations,especiallystartle, conscious
attempts at
selectedfor study.
deception,and conflict.The respiratorypatternis freThe services
of a professional
directorand threepro- quently disturbedin anxiety states."8 An increasein
fessional
actorswereemployedin recordingthescenario. respirationrate would presumablyresult in an inAll four individuals were former members of the Actor's
creasedsubglottalpressureduringspeech.This heightStudioin New York and had past experience
with the ened subglottalpressurewould give rise to a higher
so-called"method" style of acting.The recordingswere F0 during voiced soundsin speech.The increased
madein a professional
recordingstudio,with the actors respirationrate couldalsolead to shorterdurationsof
speechbetweenbreaths,with a consequent
effecton the
rotatingin eachof the threeroles.
In additionto the analysisof recordings
of the short basictemporalpatternof speech.
play, this study includeda comparisonof recordings Other relevant physiologicaleffectsof certain emofrom a real-life situation in which the emotions of the
tions are drynessof the mouth often observedunder
of emotionalexcitement,anticipation,fear,
speakerwere clearly defined,and recordingsfrom an conditions
of motor
actorwhosimulatedthis situation.The purposeof this and anger, and tremor and disorganization
subsidiaryinvestigation
wasto attempt to validatethe response,observedunder conditionsof emotionalconuseof actorsto simulateemotions,as well as to obtain flict. These effects can have an influence on various
additional data on the acoustic correlates of various
components
of the speechsystem,includingthe larynx,
emotions.
which is directly involved in the controlof Fo. Muscle
activity in the larynx and the conditionof the vocal
II. SELECTION
OF PARAMETERS
cordsare likely to have a more direct influenceon the
FOR ANALYSIS
soundoutput and, in particular, on the fundamental
frequency,than changesin muscleactivity in other
On the basisof the resultsof a preliminarystudyaand parts of the speechgeneratingsystem,such as the
an examinationof the literature on the physiological tongue,lips, and jaw. The reasonis that the vibrating
and acousticcorrelatesof emotion,only a few kindsof vocalcordshavea directeffecton the volumevelocity
acousticparameters
wereselected
for detailedanalysis through the glottis, whereas the other musclesand
in the presentstudy. Before the data are presented, vocal-tract componentssimply shape the resonant
a brief reviewis givenof the acousticcorrelatesupon cavitiesfor soundthat is generatedat the vocalcords.
which attention was focused, indicating reasonsfor Thus, any analysisof the speechsignal that reflects
selectingparticularkindsof data.
vocal-cordactivity is more likely to be influencedby
Studies of the effects of emotion on the acoustic
physiologicalchangesbroughtabout by the emotional
characteristicsof speech have shown thai: average state of the speaker.
valuesand rangesof fundamentalfrequency(F0) differ
Such physiological
changesas increasedsubglottal
fromoneemotionto another.
• In thepreliminarystudy,a
pressure,excessive
drynessor sailration, and decreased
the acousticpropertiesthat appearedto be amongthe
most sensitive indicators of emotion were attributes
smoothness of motor control can have an influence on
that specifiedthe contourof F0 throughoutan utterance.There are severalreasonswhy changesin F0 with
the waveformof the pulsesfrom the vocalcords,as
well as on their frequency.For example,increased
subglottal pressuregenerallygivesrise to a narrowingof
individualglottalpulses,and henceto a changein the
spectrumof the pulses.Under some circumstances,
suchas excessive
salivation,theremay be irregularities
time are potentially capable of providing information
concerningthe emotionalstateof a speaker.First, considerablelatitudeispossible
in the variationsof Fo,since
only certainaspectsof the F0 contourcarryinformation
The Journalof the AcousticalSocietyof America
1239
WILLIAMS
AND
ANGER
NEUTRAL
4OO
STEVENS
•00
•00
I00
0
2OO
I00
o
3oo
Fro.
-r I00
>"
o
I
I
I
SORROW
• 400
• •00
FEAR
1.
(Fo)
selected
control
fre-
contours of
clusters
as
spokenby Voice B in different
situations
Numbers
clusters.
h 200
Fundamental
quency
in
the
scenario.
refer
to
control
•00
0
300
200
I00
0
3O0
4
2OO
I00
I
0
o
05
i.o
I
15
0.5
TIME (SECONOS)
in the waveformof the glottaloutputfrom onepulse fromvisualexamination
of a numberof spectrographic
to the next. In the presentstudy, a roughindication patterns.
of glottal waveformwas obtainedfrom the average
Figure 1 showsF0 contoursof selectedcontrolclusters
acousticspectrumof an utterance,measuredby means as uttered by oneof the threevoices,Voice B, in differof an octave-bandanalyzer. Qualitative evidenceof
changesin glottal waveformand of irregularitiesin
successive
glottal pulseswasobtainedby observationof
ent situations in the scenario. These contours were ob-
observationsof vowel formant frequenciesand of
intensityof releasefor stopconsonants.
syllables.Althoughthe excursions
in F0 were quite
great, there appearedto be a relatively smoothoverall
contourwith oneor two major peaks,but with no large
tained by tracingharmonicsin narrow-bandspectro-
grams of the utterances.For neutral utterances,the
wide-band(300-Hzfilter) spectrograms.
changes
in F0 wererelativelyslow,and the shapeof the
Althoughprimaryattentionin thisstudywasfocused contour throughout each utterance was smooth and
on acousticparametersrelated to laryngealactivity, continuous.
The contourshapesfor utterancesproducedin anger
someobservations
were alsomadeon parametersthat
reflect control of the supralaryngealstructures.These situations showed an F0 that was generally higher
includedmeasurements
of the rate of talking, analysis throughoutthe utterances,suggestingthat they were
of acousticattributesthat providean indicationof the generatedwith greater emphasis.Furthermore,one or
precisionwith which articulatory targets for certain two syllablesin eachphrasewerecharacterizedby large
vowelsand consonants
are achieved,and spectrographic peaksin F0, again indicatingstrongemphasison these
III.
RECORDINGS
OF
SCENARIO
discontinuities.
The contours for utterances
Althougha vast amountof data was made available
by the variousrecordingsof the scenario,only selected
sampleswill be presentedin order to demonstratesome
of the ways in which the data were examined.The
data to be reported include both quantitative results,
made in situations
in-
volving the emotion sorrowwere relatively flat with
few fluctuations,and the F0 was usually lower than
that for neutral situations.For Voice B (Fig. 1), there
was a slowlyfalling contourduring the first half of the
utterance, and a more level contour toward the end.
which were obtained by making variousmeasurements Only rarely was emphasisplacedon syllablesin utterfrom wide- and narrow-bandspectrograms
and graphic ances by any of the three voicesin a sorrowsituation.
The contours for utterances made in fear situations
level recordings,and qualitative observations,which
consistedsimply of settingdown impressions
derived often departed from the prototype shape for neutral
1240
Volume52
Number4 (Part 2)
1972
EMOTIONS
IN
SPEECH
situations.
Occasionally
therewererapidup-and-down
300-
fluctuationswithin a voiced interval, as in cluster 4
for Voice B. Sometimessharp discontinuitieswere
notedfrom onesyllableto the next.
200-
VOICE A
VOICE B
150-
From narrow-bandspectrogra]ns
preparedof the
longerspeechsamplesand controlclusters,measurementsof F0 wereobtainedevery½).15
sec,and distribu-
I00
S
tion curveswere drawn to determine the median F0 and
S
a measureof the rangeof F0 (10th 90th percentile).
Figure2 shows,for eachof the threevoices(A,B,C),
the medianF0 and rangeof F0 (on a logarithmicscale)
for theselongspeechsamples.The lowestF0 was obtainedfor theemotion
sorrow,andthehighest
for anger.
lXleasurements for the neutral and fear situations were
VOICE C
S
70-
S -
SCRROW
N - NEUTRAL
f
- FEAR
A - ANGER
l"m. 2. Median Fo and rangeof Fo for eachof the threevoices
speaking
in neutral,sorrow,fear,andangersituations.
very similarfor VoicesB and C, but the fear situations
showeda wider and higherrangeof F0, and the dis- excursions
to veryhighvaluesof F0.Thewidestrange
tributionforfearwassomewhat
skewed,
with occasimml of F0 (on a linearfrequency
scale)tendedto occurfor
Situation:
Neutral
(3)
Situation:
Neutral
(9)
1
Situation:
Anger (19)
Situation:
Situation:
'1 !
Sorrow
Fear (23)
(58)
I
•4
*
tj
(,
i",'
'
.
0:8
m.O
TIME
1.2
1.4
m.6
1.8
2•0
2•2 •
(SECONOS)
Fro.3. Wide-band
spectrograms
of thecluster"ForGod'ssake"spoken
by VoiceB in fivesituations.
The Journalof the Acoustical
Societyof America
124
WILLIAMS
AND
Situation:
Neutral
Situation:
Anger (19)
STEVENS
(3)
Situation:
Neutral
(9)
3.53.0-
25-
2.0-
0.5C
Situation:
Situation:
Sorro•
Fear (23)
(58)
3.0-
2.",2.0-
0
0.2
0
0.4
0.6
0.8
I,O
TiME
I.Z
1.4
1.6
1.8
2.0
2 2
(SECONDS)
Fro. 4. Narrow-handspectrograms
of the cluster"For God'ssake" spokenby Voice B in five situations.
anger.Rangesfor sorrowand neutralwerevery similar,
but median values of F0 for the neutral situations were
The
vocal-cord
vibrations
for
the
utterance
ex-
emplifyingsorrowappearedto haveconsiderable
fluctu-
higherthan thosefor sorrow.
ations in shapefrom one glottal pulse to the next.
Figure3 showswide-bandspectrograms
of onecontrol This voicingirregularitywasmanifested
by a variation
clusterspokenby Voice B in five situations.For this in darkness
of individualvoicingpulses,particularly
cluster the total duration was least for the neutral
in the high-frequency
regionabove2000Hz. This effect
situation,and greatestfor sorrow. The lengthsof the is particularly evident in the word "God's" at the botutterancesfor fear and for angerwere about the same. tom of Fig. 3 (0.8-1.3 secon the time scale),in the
The increases in duration
for the utterances
made in
entirefrequencyregionabove2000IIz. The spectro-
anger, fear, and sorrowsituationscame in part from
increasesin vowel durations, but primarily from
lengthenedintervalsof closureor vocal-tractconstriction for the consonants.Comparisonof the initial consonantin the word "God's" in the variousspectrograms
showsa longerstop gap and a more intenseburst at
pattern at high frequencies(above3 kHz).
the consonantal release for the various
situations than for the neutral situation.
The narrow-bandspectrograms
shownin Fig. 4 for
the sameutterances,again indicatethat the highest
1242
Volume 52
Number 4 (Part 2)
emotional
1972
grams for the anger situation also demonstratesome
anomaliesthat were presumablydue to irregularities
in the glottal output. For example,the pattern of
giottal vibration is not uniform in the words "God's"
and "sake," as evidencedby the irregularspectral
EMOTIONS
IN
Situation
SPEECH
Neutral
(2)
!
bl
I
iIt, I
ß,-, ,,,,,,
,,
Situation:
,?,,,,,
, *,,, ,,,,,,,,,,
....
,
Anger {17)
Situation:
Situation:
Sorrow
(56)
3-
2-
1-
0
i
0
TIME
(SECONDS)
Fla. 5. Wide-band
spectragrams
of thecluster"I don'tunderstand
it" spoken
by VoiceA in foursituations.
averagefrequency
andthemostrapidfrequency
changes frequencies
than the corresponding
vowelsin neutral
occurredin the angersituation;thereweresharprises situations.Apparentlya wider mouth openingwas
in F0 in the secondand third syllables.Rapid risesin usedin theemphatic
angersituations,
andthisyielded
F0 also occurredin the neutral situation (cluster a higherfirst-formant
frequency.
An exampleis the
locationnumber3, upperleft of Fig. 4), but the rising firstformantof theword"God's"in Fig. 3, whichwas
contourwassmoother,and the peak F0 waalower.The higher for the anger situation than for the neutral
contourfor the fear situationalsoshowedsomerapid situation.
fluctuationsor tremorin F0; in the middlesyllableof
The effectsof the various emotionson the utterances
utterancenumber23, the F0 startshighat the beginning of VoiceA werequalitativelysimilarto thoseon Voice
of the syllable,and then fails, with an initial bumpor B, asexemplified
by the wide-andnarrow-band
specfluctuation.This kind of contourseemedto occuroften, trogramsin Figs. 5 and 6, respectively.
The neutral
but not always, in fear situations.
utterance (Fig. 5) showsa uniform formant structure
Measurements
of the formantfrequencies
for partic- andglottalvibrationpattern,whereas
irregularities
in
ular vowels uttered in the various emotional situations
showedsomesmalldifferences.
Of particular interestis
formantamplitudes
andamplitudes
ofsuccessive
giottal
pulses
at highfrequencies
areevidentforsorrow,
anger,
the fact that, for angersituations,vowelsin the syl- and fear. In the stressedvowel of the word "underlablesuttered with emphasishad higherfirst-formant stand," the amplitudesof the secondand third forThe Journalof the Acoustical
Societyof America
1243
WILLIAMS
AND
Situation:
STEVENS
Neutral
(2)
2.5-
2.015-
I0-
0.50
Situation:
Anger (17)
Situation:
Situation:
Fear (21)
Sorrow (56)
•.5-
2."',2.01.5.
0
o.•
0.4
0.6
0.8
,.o
TIME
,.z
,.,,
&
,'.,
2•o
(SECONDS)
FIO. 6. Narrow-bandspectrograms
of the cluster"I don't understandit" spokenby VoiceA in four situations.
syllablesper second,for eachof the threevoicesspeakgreaterfor the angerand fear situationsthan for the ing in neutral, anger, fear, and sorrowsituations.The
neutralutterance,presumablyreflectinga changein the ranking of the emotionsaccordingto rate of articulaspectrumof the glottalpulses.The consonant
closures lation, from fastestto slowest,was the same (with one
seemto be better definedin the angerand fear spectro- exception)for eachof the threevoices:neutral,anger,
grams than in the neutral one, sincethe intensity fear, and sorrow.The rate of articulation obtained in
changesat the consonantalclosuresand releasesare sorrow situations was less than half that found for the
more abrupt; the durationsof theseparticular utter- other situations.This findingis in agreementwith the
ances,however,are not significantlydifferentfrom that resultsof a study by Fairbanksand Hoaglin,a which
mants relative to that of the first formant appearto be
of the neutral utterance. The narrow-bandspectrogram
for the fear situation (Fig. 6) again indicates some
tremorsand rapid fluctuationsin F0.
Mean ratesof articulationin syllablesper secondwere
determined for each of the three actors' utterances of
involved the use of amateur actors to simulate different
emotionalstates;they found marked decreases
in rate
for grief as compared with anger, sorrow, and
indifference.
Measurements
of the averagespectrum
of thespeech
certainlongerspeechsamplesselectedfrom pointsin signal were made for a number of the control clusters
the scenariowhere the emotionalsituationwas clearly in the scenario.The spectrumwas obtainedby means
defined.Table I showsthe mean rate of articulation, in of an octave-bandanalyzer, and the outputs of in1244
Volume52
Number4 (Part 2)
1972
EMOTIONS
IN
SPEECH
I0
VOICE
A
VOICE
B
õ
0
-5
-I.5
io
FIG. 7. Average spectraof selectedcontrol
clustersas spokenby eachof the three voices
in different
situations
in the scenario. These
spectrawere obtsJnedby taking the long-term
averageoutput from octavetiltcm. (Spectraof
Voice C in neutral and sorrow situations were
identical.)
c
I
-Iõ
I0
0
•
NEUTRAL
•
SORROW I
I
i
o----o
ANGER
I
•---w
FEAR
VOICEC
õ
--
I
I
,
-
-
L
-
-S
-I0
-15
125
250
500
IOOO
ZOOO
4000
OCTAVE
BANDCENTER
FREQUENCY
[Hz]
dividuMfilterswereaveragedoveran intervalof several neutral situations,at least as far as the low-frequency
seconds.
Figure7 showssomeof the results.The absolutelevel
for eachspectrumwas normalizedby settingthe level
for the 250-Hz octavebandarbitrarily at 0 dB. Octaveband spectraof thesekindsdemonstratetwo kindsof
grossacousticeffects.The low-frequencybands (125
and 250Hz) are in a frequency
regionoccupied
by F0
duringvoicedsounds.If F0 is low (in the rangeof 90175 Hz), then it remainsin the 125-Hz octave band,
and hencethere is appreciableaverageenergyin this
band.If Fo remainshigh mostof the time (above175
Hz), then there is less energy in the 125-Hz band.
energy is concerned.Consistentdifferencesamong
emotionsare also observedat the high-frequencyend
of the spectrum.The level at high frequencies
(above
1000Hz) relative to low frequencies
is alwaysgreatest
for the emotionangerand leastfor sorrow.The implicationis that angeris manifestedin a highersubglottal
pressure
anda narrowerglottalpulse,whilethereverse
is true of sorrow.
The changes
in averagespectrumfor theemotionfear,
togetherwith the incidentalobservation
from spectrogramsof a decreasein low-frequency
energy (in a
rangeup to, say, 500 Hz) relativeto energyat higher
Hence, the relative levelsin the 125- and 250-Hz bands
providea very roughindicationof the averagefundamentalfrequency.
T•,nL• I. Mean rate of articulation(syllablesper second)for
The spectrain Fig. 7 showrelativelylessenergyin eachof the threevoicesspeakingin differentsituations.
the 12S-Hzbandfor utterancesmadein angersituations
Neutral Anger Fear Sorrow
(for whichthe F0 is high) than for utterancesmadein
neutral situations. For the one voice where data cor-
responding
to theemotionfearwereobtained(VoiceC),
there is alsorelativelylesslow-frequency
energyfor
that emotion. Utterances made in sorrow situations do
Voice A
Voice B
Voice C
Mean
4.03
4.89
4.02
4.26
4.32
3.88
3.92
3.90
3.57
1.84
2.03
1.86
4.31
4.15
3.80
1.91
not differ in a consistentway from utterancesfor
The Journalof the AcousticalSocietyof America
1245
WILLIAMS
AND
Well here
it
STEVENS
comes, ladies
and gentlemen
4.0'
$5'
3.02.52.01.5'
1.0-
a terrific
crash,
I,
talk,
ladies
and aentlemen
•.•BOo
•) 2.5-
05-
I can't
ladies
and gentlemen
40,
35,
2.5-
2.0.
1.00.5-
I
I II
I i
0.2
I
I
0.4
I
0.6
0 8
i0
TIME
1.2
1.4
L6
1.8
20
2.2
(SECONDS)
Fit}. 8. Radio announcerspeakingbefore (top) and after (middle and bottom) the crashof the HINDENBURG. Narrow-hand.
(From Williamsand Stevens,Ref. 1.)
frequencies,are consistentwith somemeasurementsof clearindication
of the emotions
beingexperienced
by
average spectra of a Soviet cosmonautunder various the speakers.
The objectwas to comparethe acoustic
criticalsituationsduringa spaceflight?The situations
where the emotionfear was expectedto be greatestin
spaceflight alsoled to an upward shift in the centroid
of the spectrumwithin the frequencyrange300-1200
Hz.
IV.
COMPARATIVE
SIMULATED
ANALYSIS
OF
DESCRIPTIONS
HINDENBURG
ACTUAL
AND
OF
DISASTER
In order to provide somejustificationfor the use of
actorsin studiesof the acousticcorrelatesof emotion,
somerecordingswere obtainedof utterancesspokenin
real-life situationsin which the circumstances
gave a
1246
Volume 52
Number 4 (Part 2)
1972
characteristics of the voices in these situations with
the characteristics in the voices of the actors in similar
situations.
One recording
that wasacquiredwasthat
of the radioannouncer
who wasdescribing
the approachof the HINDENBURG at Lakehurst,New
Jersey,when the Zeppelinsuddenlyburst into flames.
The announcer
continuedhis description
(with oneor
two shortbreaks)throughoutthe disaster.
Figure8 showsthreenarrow-band
spectrograms
that
weremadefrom excerptsof the announcer's
voicebefore and after the crash occurred. The announcer's
normalvoicehad a greatdealof inflection,asindicated
by the smoothup-and-downmovementsof F0 in the
upperspectrogram.
The shapeof the contourchanged
EMOTIONS
here
it
IN
SPEECH
comes,
ladies
and gentlemen
crash,
ladies
and gentlemen
40-
$.5•.0-
2.52.01.5-
0.5-
terrific
4.0-
•o
1.5
05
I can't
40.
talk,
ladies
and gentlemen
3.5.
3.0.
25.
2.0-
1.0-
0.5-
01.2
I'"I I I I &•l I m
0.4
O.S
o.e
i.o
TIME
m.a
•.4
I
I I m
(SECONDS)
Fro. 9. Voice C simulatingannouncementof 111NDENBURG crash before (top) and after (middleand bottom) the disaster.
Narrow-band.
abruptly immediatelyafter the crash,as the spectrogramsin the nfiddle and lower portion of the figure
demonstrate.The averageF0 in lhoseportionsis considerably higher, and there is apparently much less
fluctuationin frequency.Quantitativeanalysisof longer
samplesof the announcer's
speechhas indicated,however, that there was, in fact, a greaterfundamentalfrequencyrangeafter the disasterthan before.There
are someirregularbumpsin the contour,which might
be interpretedas a kind of tremor; examplesof these
irregularitiescan be seenin the bottom spectrogram
of Fig. 8, near 0.6 sec and again near 0.8 sec. The
irregularitiesmay reflect a lossof precisecontrol of
musculatureand an irregular respiratorypattern.
The actor designatedas Voice C wasprovidedwith a
transcription
of the radioannouncer's
words(but not
the actual recording)and wasaskedto study and then
to read the script as if he were the radio announcer
describingthe event. He reported that he had never
heard
the recorded
radio
account
of the disaster.
A
recordingwas made of Voice C's "simulated"description of the eventsbeforeand after the crash.Spectrogramstaken from this simulationare shownin Fig. 9.
The excerptsare the same as those examinedfor the
radio announcer,shownin Fig. 8.
Voice C on the before-the-crashrecordinghad less
inflection than the announcer,but the smooth fluctuationsin hisF0 followedthe expectedform. Followingthe
crash,therewasan increasein fundamentalfrequency,
and there were erratic changesin F0 throughout a
phrase.Voice C showedgreater up-and-downfluctuations than did the radio announcer,but the irregular
bumps and atypical contoursare reminiscentof the
spectrograms
for the emotionfear in the scenario(Figs.
The Journalof the AcousticalSocietyof America
1247
WILLIAMS
It's
AND
crashing,
STEVENS
it's
crashing
terrible
4.03.5•.0-
2.52.0-
1.0
-r 0.5
z
It's
040'
crashing
terrible
• •.5.
2.52.01.51.0-
0.5-
0.•' 0.4
0
01.6 01.5 '!0
' •'
'.4
TIME (SECONDS}
,
'.8
•'.0
•m.2
Fia. 10. Radioannouncer(top) speakingduringthe crashof the HINDENBURG and VoiceC (bottom)simulatingthesame
portion of the announcement.Narrow-band.
1, 4, 6). The similarities,as well as the differences, tion before the crash and after the crash. The increased
between the radio announcer'svoice during the crash medianF0 and the greaterrangeof F0 for the emotional
and Voice C's simulation
of the same annonncement
situation are quite apparentboth for the announcer
can be seen in Fig. 10. In both narrow-bandspectra- and for Voice C. The increasein both the median F0
grams,there are instantsin time at which rapid jumps and in the rangeof F0 wasgreaterfor VoiceC than for
the announcer.
or "tremors" in Fo OCthr.
These comparativedata and someadditionallimited
data obtained from real life emotional situations 2 are
not inconsistent with the data obtained from the actors
V. SUMMARY
ATTRIBUTES
OF GROSS
ACOUSTIC
ASSOCIATED
WITH
in this study.
VARIOUS
EMOTIONS
Quantitative data on the medianF0 and the rangeof
A. Anger
F0 for the radio announcerand for Voice C simulating
the descriptionare givenin Table II. The data weredeThe mostconsistent
and strikingacousticmanifestarived from samplesof speechof severalseconds'dura- tion of the emotionangerwasa highFo that persisted
throughouta breathgroup.This increasewas,on the
TABLE II. Median fundamental frequency (Fo) and range of
fundamental frequency for selectedutterancesobtained from the
originalradiodescription
of theHINDE27B URG disasterandfrom
Voice C's simulated announcement of the same event. (From
Williams and Stevens,Ref. 1.)
Fo
Fo Range
166
196
124-196
152 260
138
222
117-168
117-280
Original Radio Descrlpthm
Before crash
After crash
Volc½C's SinrelatedDescription
Before crash
After crash
1248
Volume52
Number4 (Part 2)
1972
average, at least half an octave above the Fo for a
neutral situation.The rangeof F0 observedfor utter-
ancesspokenin angersituations
wasalsoconsiderably
greater than the rangefor the neutral situations.Some
syllableswere producedwith increasedintensityor
emphasis,and the vowelsin thesesyllableshad the
highestfundamentalfrequency.These syllablesalso
tended to have weak first formants,and were often
generated
with somevoicingirregularity(i.e., irregular
fluctuationsfrom one glottal pulseto the next). The
basicopeningand closingarticulatorygesturescharacteristicof the vowel-consonant
alternationin speech
appeared
to bemoreextremewhena speaker
wasangry;
EMOTIONS
IN
SPEECH
the vowelstendedto be producedwith a more open whenthey appearedin unstressed
syllables.
Sentences
vocaltract (and henceto havehigherfirst-formantfre- wereusuallygeneratedwith shorterdurationsthan for
quencies),and the consonants
were generatedwith a the emotional situations.
more clearly definedclosure.The durationsof utterancesspokenin anger were usuallylonger (and the
VI. CONCLUSIONS
syllabicrate lower),but this effectwas not great and
was not always consistentfor all voices.Although the
The aspectof the speech
signalthat appears
to progeneralmanifestationsof anger were similar for the vide the clearest indication of the emotional state of a
three voices, there were some individual differences. talker is the contour of F0 vs time. This contour has a
The increasein fundamentalfrequencywasgreaterfor prototypeshapefor a breathgroupthat is generatedin
somevoicesthan for others,and therewere differences a normalmanner,
withoutmarkedemotions
of anykind.
in the way durationandothercharacteristics
changed. The normalcontouris characterized
by smooth,slow,
andcontinuous
changes
in F0 asa functionof time,the
B. Fear
changesoccurringin syllableson whichemphasis
or
linguistic
stress
isto beplaced.Emotions
appeartohave
The averageF0 for fear was lowerthan that observed severaleffects
on thisbasiccontourshape.
for anger,and for somevoicesit was closeto that for
Whileat presentit iscertainlynotpossible
tospecify
utterancesspokenin neutral situations.Tl•ere were, any quantitative automaticproceduresthat reliably
however,occasional
peaksin the F0 that weremuch indicatetheemotional
stateof a talker,measurements
higher than thoseencounteredin a neutral situation. of themedianF0 andrangeof F0for a sample
of speech
Thesepeakswere interspersed
with regionswhere the of severalseconds'
durationmay at least serveto
fundamentalfrequencywas in a normal range. The classifya talkefts emotionalstate as one of sorrow
pitch contoursin the vicinity of the peakssometimes (reduced
F0 anddecreased
range),or of angeror fear
hadunusualshapes(irregularbumpsor discontinuities), (increased
F0 andrange),assuming
that thenormalF0
and voicingirregularitywas sometimespresent.The and range of F0 for the talker are known. Further
durationof an utterancetendedto be longertlaanin the identification
of the emotionsmust be done by an
caseof angeror netltralsituations.As wasobservedfor experiencedobserverwho must look for certain attrianger,the vowelsand consonants
producedin a fear butesof theF0 contour,
shiftsin spectrum,
changes
in
situationwere often more preciselyarticulatedthan duration,and voicingirregularities.
they werein a neutralsituation.Althoughthesevarious
characteristics
were found for some utterances
of some
voices,observations
of spectrograms
revealedno clear
ACKNOWLEDGMENTS
and consistent correlate for the emotion fear.
The authorsacknowledge
with gratitudethe support
of
the
Bureau
of
Medicine
and
Surgery,Departmentof
C. Sorrow
the Navy, andthe valuablecontributions
madeby the
The averagefundamental
frequency
observed
for the followingindividualsduringthe early stagesof this
actorsspeakingin sorrowsituationswas considerably work: MichaelH. L. Hecker,Dr. StephenG. Langley,
lower than that for neutral situationsand the rangeof
Fo was usuallyquite narrow. This changein F0 was
accompanied
by a markeddecrease
in rate of articulation (by a factorof two or morein syllabicrate, on the
average)andan increasein the durationof an utterance.
The increaseddurationresultedfrom longervowelsand
consonants
and from pausesthat were often inserted
in a sentence.Perhapsthe most striking effe.
ct on the
wide-band spectrogramwas voicing irregularity. On
occasionthe voicing irregularity reduced simply to
noise; i.e., the voiced soundsbecamewhispered,in
Barbara Woods,and Sarah Heintz.
* From the Naval Aerospace
Med. Res. Lab., Pensacola,
Fla.
Opinionsor conclusions
containedin this report are thoseof the
authorsand do not necessarilyreflect the views or endorsement
of the Navy Department.
pThisarticleis basedonportions
ofa monograph
citedin Ref.2
below. Refer to the monographfor a more detaileddescription
of this study,• reviewof the literature,a description
and results
of somelistenertests,and a bibliography.
x C. E. Williams and K. N. Stevens, "On Determining the
EmotionalState of Pilots During Flight: An ExploratoryStudy,"
Aerospace
Med. 40, 1369-1372(1969).
effect.
• C. E. Williamsand K. N. Stevens,"The EmotionalContentof
Speech:SomeExploratoryAcousticalStudies,"NAMRL Monogr.
D. Neutral
18, Naval AerospaceMed. Res. Lab., Pensacola,Fla., (1972).
a C. E. Williams, K. N. Stevens, and M. H. L. Hecker,
"AcousticalManifestationsof EmotionalSpeech,"J. Acoust.Soc.
In neutral situationsthe spectrograms
of the actors Amer.
47, 66(A) (1970).
•enerally showed a well-defined structure durina the
• G. Fairbanks and W. Pronovost,"An Experimental Study of
vowels,with little noiseor irregularitieseitherbetween the Pitch Characteristics
of the VoiceDuring the Expression
of
the formantsor in the high-frequencyreõionswhere Emotions,"SpeechMonogr.6, 87-194 (1939).
formants are often not visible. Consonants were fre-
quently uttered in an imprecisemanner, particularly
• G. FairbanksandL. W. Hoaglin,"An ExperimentalStudyof
the Durational Characteristicsof the Voice During the Expressionof Emotion," SpeechMonogr.8, 85-90 (1941).
The Journalof the AcousticalSocietyof America
1249
WILLIAMS
AND
aJ. R. Davitz, The Communication
of EmotionalMeaning
(McGraw-Hill,New York, 1964),Chap.8, pp. 101-112.
7V. A. Popov,P. V. Simonov,A. G. Tishchenko,M. V.
Frolov,and L. S. Khachatur'yants,"Analysisof Intonational
Characteristics
of Speech
as a Criterionof the EmotionalState
of Man UnderConditions
of SpaceFlight,"Zh. VyssheyNervnoy
Deyatel'nosti
•J. HigherNervous
Activity]16,974-983(1966).
1250
Volume52
Number4 (Part2)
1972
STEVENS
a D. B. Lindsley, "Emotion," in Handbookof Experbnental
Psychology,
S.S. Stevens,Ed. (Wiley, New York, 1951),p.
477.
•V. A. Popov, P. V. Simonov,M. V. Frolov, and L. S.
Khachatur'yants,
"Frequency
Spectrum
of Speech
asanIndicator
of the Degreeand Nature of EmotionalStress,"Zh. Vysshey
NervnoyDeyatel'nosti
1, 104-109(1971).
© Copyright 2026 Paperzz