PROSODICVARIA'TIONAND AUDIENCERESPONSE
Marian Shapley
1.
Introduction
While variation in speakers'voices has been well studied for short
stretches of speech, such as clauses and sentences,less work has been
reportedon the prosody of longer periodsof speechof different types.This
paper describessome aspectsof prosodic style of speakersduring an entire
narrative in a convenational discourseand relatesthe style to the amount of
responsereceived from the listeners.Since neither "discourse mode" nor
"prosody" are well-defined concepts,their meaning for this paper will be
briefly outlined.
1.1
Prosody.
The term "prosody" as used here is equivalent to supra-segmentals,
which are "those aspects of speech which involve more than single
consonantsor vowels" (Ladefoged 1975). The primary prosodic features
include pitch, loudnessand length; voice quality is anotherfeature.Pitch and
speakingrate are used in this study.
1.2
Discourse Mode.
Discourse mode may be described along several dimensions,
including the social context in which the speech takes place, such as
classroom situations, public addresses,telephone conversations, the
relations between the participants, the amount of preparednessor
familiarity with the material, such as in preparedlectures,readingmaterial
aloud, vs. telling storiesor spontaneousconversation, and also the task set
for the speaker in experiments, such as reading different materials or
simulatingvariousemotions.Johns-Lewis(1986b)suggestsas startingbases
for intonationtaxonomiesthe spontaneous/
non-spontaneous
distinction,the
public/private dimension and the relative statusand expertiseof speaker
vis-a-vis audience.
IPz,A Papers in Faagmatics 1, No.2 (1987), 66-79
As there is as yet no comprehensiveclassification of discoursemodes,
the namegiven to a discoursemode by a researcheris usually descriptive of
the settingof the speech,such as "political interview" or "reading aloud" and
ttredimensionsinvolved usually can be infened from this name.
1.3
Studies of discourse modes and prosodic styles
Most studiesof prosodic style compare styles either in terms of timing
data,such as rate of speechand occurrenceofpauses, or of pitch data, such
asaveragepitch, range, and shapeof the pitch distribution.
Studies of discourse mode involving speaking rate have shown that
rate of speechis slower in telling a story than reading it aloud (Levin et al.
1982),slower in casual interviews than in political speechesand political
interviews (Duez 1982), and also slower in spontaneousspeech than in
preparedor semi-preparedmaterial (Barik 1979). While the methodologies
of thesestudiesis not always consistent,for example Duez's and Barik's
definitions of pauses differ, and they used different speakersin different
modes,still the results agree in the sensethat the less forrral, less planned
modesare associatedwith slower rates.
In a study investigating the pitch characteristics of discourse,
Johns-Lewis(1986b), investigated three discourse modes: reading a
monologue as though acting, reading a narrative, and spontaneous
conversation. She found that the acting mode had the highest pitch and
conversationthe lowest. Graddol (1986), comparing differences due to
contentof reading materials, found that subjectshad a higher averagepitch
whenreadinga dialogue than when reading a technical manual. Fernald and
Simon (1984), studying "motherese" in German women, found a higher
pitch for mothers talking to their babies than for talking to adults. These
studiesall startedwith given discoursemodes such as "reading aloud" or
"conversation"and compared their prosodic variables.
While prosodic features can be related to such discourse modes,
prosodymay vary greatly within a mode. A single speakerin a single speech
situation such as casual conversation may make use of various styles. The
present study, using narratives in conversations,contrastsstyles of the same
speakerwithin the samespeechevent, and speechmodesare defined in terms
of amount of listener response.This method has the advantageof controlling
variability among speakersand situations.That is, sociologicalfactors such
as relationshipsof speakerto listener,and expertiseof speakerin the mode
are constant. What varies is the prosodic style, and these styles can be
defined in terms of what is occurring in the speechsituation.
2.
Data and Methodology.
The data consist of two narrations taken from recorded multi-person
conversations.There is one male speaker,Mark, and one female, Fern. The
narrations have semantic and syntactic continuity, and contain the turn
structurestypical of stories,r with the listenersoccasionallyparticipatingin
the talk.
2.1
Measurement of audience response.
The independentvariable of this comparison is called audience
response.The texts were categorized as High responsewhen they were
accompaniedby many responses,and Low responsewhen accompaniedby
few responsesduring the narration. This does not mean that the amount of
responseis a categorical variable, but only that categoriesform a convenient
way of contrasting extremes.Responseswere noted both betweenarticulated
portions of speechand during speaking.Thus speakingby someoneother
than the narrator, whether a successfulturn such as a question or a feedback
responselike "uh huh", or an attempted turn, was a response.Laughter
either during or between speechgroups,was called a response.While the
word "audience" usually refers to listenersand not to the speaker,in this
case the speaker'sown responseto what he is saying is included. That is,
during the telling of something a speakerwill often laugh once, or even
several times in mid-utterance,without indication of pausing, but rather
including a laugh token as anotherword. And the speakermay also laugh
following his own utterance,creating a laugh resPonsebetween speech
intervals. These situationswere called self-responses.So audiencein this
data includes the speaker.2
ol
c(
SC
(v
as
w
]t
69
The following schemedescribesthe possible responses.
Responseswhen narrator is not talking:
By audience: with words or feedbacktokens
By speaken
By both:
*,[f;fr80,.,
with laughter
with laughter
Responsesduring talking:
By audience: speakingwhile talking is going on
(an attempted turn)
with laughter
By speaken
with laughter
By botlu
with laughter
A period of 25 secondswas taken as a unit for measuring the amount
of response.A moving total of the number of responsesin this window was
computed, and when the total remained at 5 or above (0.2 responsesper
secondor a responseevery 5 seconds),the speechwas classedas High mode,
(with much audienceresponse);when it was below this figure, it was classed
asLow mode (with little audienceresponse).By this criterion each narration
was split into trvo sections or sub-modes.,hereafter referred to as modes.
The responsesper secondfor the speaker-modes,are listed in Table 1.
Speaker
Mark
Fern
Responsesper Second
I.ow Mode High Mode
.04
.08
.33
.33
Table 1. Responsesper secondby mode assigned
More information about the kinds of responsesis shown in Tables 2
70
and 3. In Table 2, actual turns at talk are differentiated from laughter as a
response. There are many more laughter responsesthan verbal responses,
and laughter differentiates the modesbetter than talk responses.
Responsesper second
Talk
Laughter
Low Mode
Mark
Fern
High Mode
Mark
Fern
Total
responses
0.01
0.05
0.03
0.02
4
6
0.07
0.04
0.27
0.28
65
25
46
81
Totalresponses
16
Table 2. Turnsandlaughterasresponses
Listener responsesare compared with self-responsesIn Table 3.
Listener responsespredominatehere, with more in High mode than Low
mode.
Responsesper Second
Bylistener Self-response
Low Mode
Mark
Fern
High Mode
Mark
Fern
Total responses
0.03
0.08
o_.ot
0.28
0.24
0.03
0.08
67
14
Table 3. Listener responsesvs. self-responses
2.2
Pitch data.
The data was recordedin non-laboratorysituations,in somecases
7L
with considerablebackground noise, but nevertheless,reasonablepitch data
could be determined. Pitch was recorded in the form of fundamental
frequencies deriv_edfrom spectrogrems. while pitch, a perceptual
phenomenon,and fundamental frequency, anacoustic phenomedon,are not
measuredin equivalent steps, the relation is close enough so that
fundamentalfrequency can be used as an estimate of pitch.-Data were
recordedfor every tenth of a second, as fundamental frequency values, or
pauses.
The pitch variables for each speaker-modeinclude the mean, median,
Tnge, variation,skew and peakedness(kurtosis)of the pitch distribution3.
Skew, a measureof the extent to which the values fall on one side of the
meanscore or the other, and peakednesss,a measureof how peaked or flat
the distribution is in comparison with a normal distribution,-are measures
which have been used to distinguistr prosodic style.
2.3
Timing data"
The number of pauseswas totalled for each mode, as w:rs the number
of syllables.A pause was counted when a period of at least 0.3 seconds
wittrout speechoccurred. More pausesoccurred in the [.ow mode than in the
High mode, and they were longer in the Low mode than in the High mode.
Speaker
& Mode
Number of
pauses
Average
length
Mark/Low
Mark/High
4l
9
5.92
4.78
Fern/Low
Fern/High
35
20
7.51
5.55
Table 4. Number of pausesand averagelength in seconds
72
3.
Results
FigureI showsgraphicallythedistinctionbetweentheHigh andLow
modes.Thepitchin Z-scoresa
is plottedby timefor bothnarratives.
Speaker:Mark
L,owMode
HighMo&
a l a.|rtl
a
Timc = 9E.0 Sccqrds
ar+l+
.a.g +la $ at aa La
Timc = ?5.2 Scconds
Speaker: Fern
p:
Low Mode
aa
High Mode
aa
laf_
a
a
aa
aaulrull
ta.3.lt
ral+t l+
I {t tl r{4J ta
f,
c
T
t
Timc = t6.0 Scconds
Tinrc = 140.0 Sccondr
Figure 1. Pitch values by time for speaker-modes.Dots represent
fundamental frequencyvalues in log Z-scoreson the vertical scale,plotted
against time in l/10th second intervals on the horizontal scale. Arrows
indicatetimes of audienceresponse.
3.1
Pitch Comparisons.
The pitches of the High responseand the Low responsemodes differ
from each other in that the High modes have higher mean and median
pitches.These differencesare significantfor intra-speakercomparisonsat
th e. 00l lev el.
73
The upper and lower bounds of pitch were taken here to be the 5th
and 95th percentiles.This excludes outliers; this measurehas been used by
other researchersbecause it was considered more representativeof the
functionalrange of speakersthan the difference betweenthe absolutehighest
and lowest pitches.Thesemaximum and minimum values also discriminated
betweenmodesfor both speakers.
Speaker
& Mode
Mean
Median
5th Per- 95th Percentile centile
N
Mark/Low
MarUHigh
32.6
37.8
32.0
37.2
27.4
29.5
38.4
47.9
665
412
Fern/Low
FerdHigh
429
46.5
42.4
46.5
38.4
4r.5
49.7
53.2
52r
745
Table 5. Pirch measuresof cenhal tendency
Speaker
& Mode
Standard
Deviation
Range
Skew
Peakedness
Mark/Low
Mark/High
2.9
55
11.0
18.4
.56
.55
.63
.43
Fern/Low
Fern/High
3.6
3.8
11.3
rr.7
.78
.01
.08
.38
Table 6. Pirch measuresof dispenion and skewness
The measures describing the variation and shape of the pitch
distribution did not in general discriminate modes consistently for both
speakers.While the standard deviations and the ranges did differ in a
consistentdirection for each speaker,for Fern the difference was small.
Skewnessand peakednessmeasureswere inconsistentbetween the
speakers.The shape of the pitch distribution may be an individual
phenomenon,a reflection of how much of their range individual speakers
use.
3.2
SpeechRate.
Two rates of speech were computed: the speaking rate, which
measuressyllables per second of elapsedtime, and the articulation rate,
which measuresthe syllablesper secondexcluding pauses. Both speakers
had slower rates in the High mode, by both measures.
Speaker
& Mode
Articulation
rate
Speaking
rate
MarVLow
MarVHigh
5.7
4.2
4.2
3.9
Fern/Low
Fern/High
6.5
4.7
4.4
3.9
Table 7. Rate of speakingin syllablesper second
4.
Discussion.
The categoriesof High audienceresponseand Low audienceresponse
do not fit neatly into the taxonomy suggestedby Johns-Lewis. However if
one considersthe High mode to reflect more spontaneity,then the ratesof
speaking found here are comparableto those found by other researchers,
that is, the more spontaneousthe speech,the slower it is. While the larger
number of pausesin the Low categories(seenin Table 4) would seem to
have the effect of slowing the speechrate, in actuality they were balancedby
the larger number of laughter responsesin the High categories, which
enteredinto the rate computation.
r
i
75
on the other hand, ttre pitch data, which show a higher pitch
for the
more spontaneousmode in this sense,disagreewith that-of
Jbhns-Lewis,
who found rhe lowest pitch for conversati6n.craaaot;s
;;i;-;i
t [n.,
pitch for reading dialogue than forreaghg u technical
manual is norrlally
comparableto the present data. While
lit"tr differences do reflect the
contentof speech,as Graddol claims, and ttri discoursemode
as Johns-Lewis
claims, it^is likely that ttre noted di'fferencebil;;;
styres is due to some
commonfactors in the situations.
When the difference in two modesof speechis expressed
in terms of a
single pitch figure, such as a mean, it can b; in6rpr;ted
as a difference in
register,that is, a raised or loweredpitch re-ve]ttucrirghout
the excerpt,This
'ulluTtd pitch level over a period of severalclauseslias
also beencorrelated
with the expressionof emoiions (e.g. Fonagy tgig, stn.r*
rgzi, will-iams
and Stevens1972)- The statistici rJt the diia i" th;
discoursemodes
of continuo.usspeechhave the same.kint ;f ;;pr, "bou,
-a may as well be
measuring'Joy" or "sorrow", as ..readingdialogue;.-'
That is, differences in style.may reflect differences
in personal
involvement.It is therefore not surprisin! trrat trrr t
rurt"
p"ir.t,
and discoursestyre are sometimei incoisisteoC;i"* involvement
"iil;i.r:;i
is not
always a function of the ty_peof discourse,,although it
often -.v u.. Th.r,
are thosewho can read a phone book, or the arpnaueiwith greai
,,notionut
style,but it is more commonly the casethat such prorodi"
a.iiue.y is-ri-it a
to speechin which the speakeris involvgd, or to acting. rrre
reaiiurrtii,
then to discover the linglistic or situational cues to &1,
invotve-rnent-ana
how suchcuesaffect pitch lever. Ttre diarogue;itu;;
seemsro be one, and
audiencercsponse(or lack of it) is a part-of real dialogue,
and should be
countednmongthe cuesto changeof piich.
In the case of narrations, audience responsemay itself be
related to
the structureof the talk. For example, in telling u rtory it rre
are p"rtr *tri"t
are more likely to
!e expressedwith emotioi ttran oth.rr. r^6*- ttgiz>,
describessix possibleelementsof a story: abstract,orientation,
co.piicutirrg
action (eventstold in temporal order), evaluation, results or resolution
and a
coda, which brings the talk back to the currenf situation. The
evaluation
76
element gives tellers' reactionsand opinions about the storf, and it is largely
in this type of talk in the anecdoteswhere the style differs, and also where
the audienceresponseoccurs.The style difference thereforemay be as much
a function of the story structure as of the response.
There are problems in dealing with distributional descriptions of
pitch, that is describinga discoursemode by a single figure. Such measures
are obviously sensitiveto the measuresused to describethe data, such as the
interval and the rate of responsesdecided upon, and information about the
shapeof the distribution may be obscured. As can be seenin Figure 1, the
selectionof a different interval of measurementcould result in a different
segmentationof the talk into modes, and possibly in different prosodic
measurements.A finer categorizationwhich could be relatedto the content
of the talk might prove fruitful.
5.
Conclusions.
Differences in prosodic style can covary with the interactional factor
of audienceresponse.Differences such as those found between discourse
modes, and between types of textual material, also occur in a single,
conversationaldiscoune mode, in a single story-telling.
The categorization of conversational speech into a high audience
responsemode and a low audience responsemode is independently
motivated by the prosodic variables of pitch level and speaking rates. The
style difference is interpretedas a reflection of a more basic variable such as
the amount speaker involvement. The actual correlates of this kind of
involvement are still to be determined.
This paper is based on data from only two speakersand of course
more data is needed. Since it is based on naturally occurring speech,it
shows a relationship which can occur, but does not show that the
relationship need necessarilyoccur.
r
{
77
NOTES
I For a description
of turns and turn-taking_seesacks, Schegloff and
Jefferson(1974)- For turns in story structure seelefferson
(197g).
In the face-to-facesituation in which the data was
recordedthere are
bound to be nonverbal responsessuch as facial
-J uooy gestures,which
may also play an important role. However, the
data was not available.
3Thr pitch values
are given in a semi-tone(octave)scale.
4 z-rcore, are
computed b-yconverting the mean of a distribution
to
assigning
a
score to eacir value itt t.".-r JIIie oumber of
1er9'9d
standard
deviationsa value is from this mean of zero.
REFERENCES
Barik, H. C. (1979) Cross-linguistic study of temporal characteristicsof
different types of speechmaterials.Languageand Speech20: 116-126.
Duez, D. (1982). Silent and non-silent pausesin three speech styles.
Languageand Speech25: 11-28.
Fonagy,I. (1978) A new methodof investigatingthe perceptionof prosodic
features.Languageand Speech2l:34-49.
Fernald, Anne and Thomas Simon (1984) Affective and perceptivesaliences
in mother's speech.DevelopmentalPsychology20:.104-114.
Graddol, David. Discourse specific pitch behavior. In Johns-Lewis
1986a.
Jefferson,Gail. Sequentialaspectsof storytelling.In Schenkein,Jim (1978)
Studiesin the Organizationof ConversationalInteraction.New York:
Academic Press.
Johns-Lewis, Catherine M. (1986a). Intonation in Discourse. London:
Croom Helm
Johns-Lewis,CatherineM. (1986b). ProsodicDifferentiation of discourse
modes.In Johns-Lewis(1985), pp.l99-219.
Labov, W.(1972). Languagein the Inner City: Studiesin Black English
Vernacular.Philadelphia:University of PennsylvaniaPress.
Ladefoged, Peter (1975). A Course in Phonetics. New York: Harcourt,
Brace Jovanovich.
Levin, H., C.A. Schaffer and C. Snow (1982). The prosodic and
paralinguistic features of reading and telling stories. Language and
Speech25:43-54.
79
Sacks, Harvey, o: A.- Schegloff and G. Jefferson (1974). A
simplest
systematics f91 the organization
of turn-taking for converruiioo.
LanguagE5O:696-735.
Scherer,K. R. (1977). The effect of stresson the fundamental frequency
of
the voice. I- Acousti,calsoc. Anrel 62: supplement r,zs-6(abstract).
[cited in Scherer, Klaus R. (lgTgtrersonatity markeis io ,irr"rr. In
Klaus R. Scherer and Howard Giles
ted's.) sn"iut mirters in
speech.Cambridge: University press
williams, carl E. and Kenneth stevens(1g72).Emotionsand
speech:some
acousticalcorrelates. J. Acoustical soc. Amer. 52: l23g-1i56.
© Copyright 2025 Paperzz