Filled pauses and gestures: It`s not coincidence

Journal of Psycholinguistic Research, Vol. 20, No. L 1991
F i l l e d P a u s e s a n d G e s t u r e s : It's N o t
Coincidence
N i c h o l a s C h r i s t e n f e l d , 1,2 S t a n l e y S c h a c h t e r , 1 a n d F r a n c e s
Bilous ~
Accepted September 24, 1990
Though filled pauses and gestures frequently accompany speech, their function is not
well understood. We suggest that it may be helpful in furthering our knowledge of these
phenomena to examine their relationship to each other. To this end, we carried out two
studies examining whether they tend to occur together, or to occur at separate times.
Both faculty colloquium speakers" and undergraduate subjects used filled pauses less
frequently when they were gesturing than when they were not gesturing. This effect held
for 30 out of 31 subjects. We suggest that detailed theories may be premature, but
speculate that gestures may be an indication that the speech production apparatus has
completed its search for the next word, phrase or idea and is ready to continue.
When people talk, no matter what the content or purpose of their speech,
they tend to do two things. They wave their arms about and they say
"urn." In studies of the speech mannerisms of lecturers in a variety of
academic disciplines (Schachter, Christenfeld, Ravina, & Bilous, 1990),
we noticed what appeared to be a consistent dissociation between these
two phenomena. It was our observation that lecturers rarely seemed to
use filled pauses (the term for such interruptions in the flow of speech
as "uh," "ah," "'er," and "um") while gesturing. It is the purpose of
We thank Bernard Ravina, Julie Odegard, and Kathrin Wanner for their help with the
data collection and Barbara Landau and Robert Krauss for comments on an earlier draft
of this manuscript. This research was supported by a grant to Stanley Schachter, made
for other purposes, from the Russell Sage Foundation.
i Columbia University.
2 Address all correspondence to Nicholas Christenfeld, Department of Psychology, Columbia University, 406 Schermerhorn Hall, New York, New York, 10027.
0090-6905/91/0100--0001506.50/0 9 1991 Plenum Publishing Corporation
2
Christenfeld, Schachter, and Bilous
the present studies to examine this relationship and, after presenting the
evidence, to speculate briefly on the implications of our findings for
theories of gesture and speech disfluency.
" U m s " and gestures share a somewhat nebulous relationship with
verbal output. It is fairly clear that they are both products of the general
speech production system, since they both are obviously related to verbal
output (McNeill, 1985; Rochester, 1973). However, it is not clear for
either whether they serve a role in helping the listener understand the
message or in helping the speaker produce it, or whether they are simply
functionless byproducts of the speech apparatus. Except in the case of a
few very specific tasks (Birdwhistell, 1970), gestures do not seem to
help the receiver of a message understand it better (Krauss, Apple, Morencey, Wenzel, & Winton, 1981). Furthermore, people often gesture
when they are speaking on the telephone or over an intercom, when the
gestures cannot possibly be of use to the listener (Cohen, 1977). In turn,
there is no real evidence that gestures help the speaker formulate the
message. The basic questions about why people gesture have not been
answered.
As to filled pauses, a variety of research has suggested that " u r n s "
are indications of time out while the speech production apparatus searches
for the next word, phrase, or idea (Goldman-Eisler, 1968; Rochester,
1973). This suggests that " u r n s " have a purpose for the speaker as a
means of stalling for time to think. However, that end could just as well
be served with silent pauses, and in the research literature, there are no
hypotheses of which we know to account for the use of filled rather than
silent pauses. As far as the listener goes, there are no indications that
" u m s " serve any particular function. There is some evidence, in both
field studies and experiments, that listeners are almost entirely insensitive
to the use of "'urns" and that their impressions of both the speaker and
the message are unaffected by the frequency with which filled pauses
are used (Schachter, Christenfeld, & Rodstein, 1990). Finally, it has
been suggested that "'urns" serve a floor-keeping function (Maclay &
Osgood, 1959), that is, they indicate to a listener that the speaker has
more to say. Perhaps this is so in conversation, but in formal lectures,
where there is no possibility of interruption, filled pauses are used with
astonishing frequency. 3
Because the function of " u r n s " and gestures is not clear, and they
are both such common companions of speech, it may provide some
3 As can be derived from Table I, colloquium speakers average 3.17 " u m s " per minute
during a 50-rain lecture.
Filled Pauses and Gestures
3
insight into the nature of these phenomena to examine their relationship
to each other. Many people have looked at the relationship between
gestures and verbal output; however, it is hard to make a clear prediction
about the relationship between filled pauses and gestures from this work
because different researchers have focused on different types of movements and disfluencies. Schegloff (1984) and a number of others (Butterworth and Beattie, 1978; Krauss and Morrel-Samuels, 1988) have
found that gestures seem slightly and consistently to precede their lexical
affiliates. These researchers suggest that this may be because gestures
are easier to produce since they are selected from a smaller repertoire.
Butterworth and Goldman-Eisler's (1979) work on the timing of
gestures and pauses was concerned specifically with the onset of what
they term speech-focused movements (SFMs) and silent pauses. They
found that SFMs are as likely to begin during a silent pause as during
the act of speaking. Whether this is the case for filled pauses one cannot
say, for while some people have found similarities between filled and
silent pauses (Beattie & Butterworth, 1979), others have not. For example, in Mahl's extensive work on the effects of anxiety (reviewed in
Mahl, 1987), he has found that most disfluencies increase with anxiety,
but that there is consistently no effect on the rate of filled pauses. In
fact, filled pauses proved so resistant to these manipulations that he threw
them out of his index of speech disturbances. Other researchers have
tried a number of manipulations that affect silent pauses but not filled
pauses (Greene & Lindsey, 1989) or filled pauses but not silent pauses
(Vrolijk, 1974). Butterworth and Goldman-Eisler were concerned only
with silent pauses and only with the onset of SFMs. It is difficult, therefore, to extend their results to make a prediction about the co-occurrence
of filled pauses and gestures.
Ragsdale and Silvia (1982) also examined the temporal relation of
body movements and speech disturbances, but they excluded filled pauses
from their measure, and included much more general movements, such
as posture shifts and movement of the feet. They did find a fairly strong
association of movements and speech errors, with the movement coming
just before or simultaneously with the disfluency. Dittman and Llewellyn
(1969) reported a similar finding, but they were concerned with the
overlap of gestures and what they termed starts, the beginning of a
phonemic clause, a silent pause, or a filled pause. Hadar, Steiner, and
Rose (1984) found that movements instead tend to follow disfluencies,
but their data were based only on movements of the head, and they were
only interested in silent pauses and general repetitions.
Because the existing literature does not explore the temporal rela-
4
Christenfeld, Schachter, and Bilous
tionship between filled pauses and hand gestures, and because we saw
indications of such a relationship, we conducted two studies to address
the issue directly.
S T U D Y 1: O B S E R V A T I O N O F F O R M A L T A L K S
The first study involved systematic observation of 18 successive
speakers at Columbia University's Psychology Colloquium. This is a
biweekly affair in which outside speakers present their most recent research and thinking to the faculty and graduate students of the psychology
department as well as to any interested outsiders. Typically, the audience
consists of some 40-60 people and the talk takes roughly one hour.
Two observers sitting toward the back of the room systematically
noted the speakers' gesturing behavior and tallied their use of filled
pauses. One of the observers--the gesture coder--recorded the amount
of time each speaker spent gesturing. Pointing, scratching, and fiddling
with clothes (deictics and self-manipulations) were not counted as gestures. Self-manipulations were not counted, since they seem fairly clearly
not to be related to speaking (this distinction is discussed in Freedman,
1972), and pointing was not included since it seemed to be simply a
function of the amount of data and type of data presentation the speaker
chose. If, for example, the speaker used slides, he or she was likely to
point to the portion of the figure or table under discussion. All other
hand-arm movements were counted as gestures. This first observer, using
a stopwatch held in one hand, simply kept a cumulative record of the
time that each speaker spent gesturing, starting the watch when the speaker's hands started moving and stopping it when they returned to a rest
position. This observer also used a continuous hand signal to indicate
whether or not the subject was gesturing. This was a simple thumb up
or thumb down signal with the non-stopwatch hand.
The second observer--the " u m " coder--kept track of filled pauses.
He listened to the talk and, relying on the hand signal from the first
observer, recorded whether or not each "um'" occurred during a gesture.
If he heard an "'urn,'" and the gesture observer's thumb was up, he tallied
it as an " u r n " while gesturing, and if the thumb was down, as an " u m "
while not gesturing. The second observer also kept track of the length
of the talk, which was simply the elapsed time from the start to the end
of the talk, excluding questions from the floor, film clips, and other
Filled Pauses and Gestures
5
external impediments to speech. The lecturer was unaware that these
observations were being made.
For seven of the colloquia, we had a second observer record gestures. Before coding any of the colloquia, these gesture coders had practiced coding for many hours. They had coded 11 previous colloquia, as
well as practiced their coding skills on videotapes of people speaking.
With these videotapes, the coders practiced determining when gestures
started and stopped, and also practiced indicating this with the thumb
signal. They were trained to consider a gesture as starting when the hands
left a neutral, resting position--hanging at the speaker's sides, folded in
his or her lap etc.--and to consider it over when the hands returned to
a resting position. This is not a simple matter, since it requires making
rapid decisions about whether a gesture is starting, or if the speaker is
simply adjusting clothing, scratching, or pointing at some specific object.
However, with practice these determinations can be made reliably.
To assess the reliability, we used the intraclass correlation, which
is based on the analysis of variance, to arrive at an estimate of the part
of the measurement that is attributable to true differences between subjects and the part that is due to error. Unlike the Pearson correlation
coefficient, this measure is directly interpretable as the percent of variance attributable to the true differences between subjects. [See Lord &
Novick (1968) and Fleiss (1986) for a more extensive discussion of this
procedure.] For the seven colloquia in this study for which we had a
second gesture coder also record the percent of time that the speaker
spent gesturing, the reliability was R = .99.
The "'urn'" coder similarly had extensive experience at his job. He
had coded 20 previous colloquia, and hours of other speech, as well as
practicing the system with one of the gesture coders on the videotapes.
He was trained to regard any sound such as "'urn,'" " e r , " " u h , " "'ah,'"
and the like as a filled pause, but to exclude any sound that formed part
of a word, however garbled or incomplete. (This task soon became second nature, and our coder had to make a special effort to stop coding
these filled pauses when off duty.) The only real ambiguity occurred
between the indefinite article a and a filled pause, but almost always this
could be resolved by paying some attention to the context. If a speaker
said " u r n " several times in succession, these were each counted as individual occurrences of a filled pause. Although none of the colloquia
in the present study was coded by more than one "urn'" counter, 10
previous ones were. For these, the reliability of the rate of "urns'" per
minute was calculated as R = .99. This reliability, although almost
6
Christenfeld, Schaehter, and Bilous
disturbingly high, is in line with the reliability for similar codings reported by Feldstein, Brenner and Jaffe (1963), Mahl (1987), and Panek
and Martin (1959). This kind of reliability is, in fact, not hard to achieve
if you are willing to sacrifice all understanding of what speakers are
actually saying.
The speakers spoke for an average of 54 min, and gestured for 20%
of the time that they were speaking. They averaged 3.17 "urns" per
minute during the talk.
These data, then, provide measures of the amount of time each
speaker spent gesturing. In addition, these data indicate how many "urns"
the speaker used while gesturing, and how many while not gesturing.
One can then compute the average number of "urns" used per minute
while gesturing, and while not gesturing. If the two phenomena are
unrelated, these two rates should be equal. If "urns" and gestures tend
to occur together, the rate of "urns" while gesturing should be higher
than the rate while not gesturing, and if they tend to occur separately,
the rate should be lower while gesturing. The last two columns of Table
I present the relevant data. The average subject used only 1.33 "urns"
Table I. " U m " Rates While Gesturing and Not Gesturing for Colloquium
Speakers
i
ml
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Average
Minutes
talking
Minutes
gesturing
45.8
53.2
53.2
63.0
44.0
45.0
50.6
53.8
61.1
55.0
56.4
59.6
75.0
48.2
59.2
50.0
42.4
54.0
53.9
5.2
17.7
7.9
4.6
1.8
2.6
13.4
9.8
11.0
11.6
7.3
10.3
16.7
12.9
18.1
15.7
13.7
13.8
10.8
Total
"urns"
131
118
141
204
182
181
209
55
160
591
132
75
172
135
181
230
92
9.__6.6
171
"Ums"/min
gesturing
not gesturing
1.45
0.57
0.76
1.09
1.14
1.14
0.90
0.36
2.45
3.89
0.54
0.19
0.72
2.39
1.99
2.67
1.24
0.5.1
1.33
3.04
3.04
3.05
3.41
4.27
4.20
5.29
1.17
2.65
12.57
2.61
1.48
2.74
2.95
3.53
5.49
2.61
2.2!
3.68
"'Ums"/min
Filled P a u s e s a n d G e s t u r e s
7
per minute while gesturing, and 3.68 " u m s " per minute when not gesturing. Every one of the 18 speakers had a lower "urn'" rate while
gesturing. The two rates are significantly different, with a paired t(17)
= 5.32 with p < .0001.
Although these findings are remarkably strong and consistent, there
is always the possibility that some sampling or methodological artifact
may be responsible for these data. First, this is a narrowly selected
population of subjects, for they are almost all practiced speakers who
make their living lecturing at universities. Second, there is a possibility
of inadvertent bias in the observations, for the observers could both hear
and see the speakers since there was no discreet way that we could
manage to have the observer of filled pauses sit with his back to the
speaker. In order to examine the phenomenon with other sorts of subjects
in a context where we could rule out some of the possible sources of
bias, we analyzed videotapes that had been made in an earlier experiment 4
(Krauss & Morrel-Samuels, 1988). In addition, two pairs of observers
were used in order to check on the reliability of the observation techniques.
STUDY 2: O B S E R V A T I O N S O F U N D E R G R A D U A T E
SPEAKERS
In this study, undergraduate subjects were simply asked to describe
various pictures and sounds to a confederate. Thirteen video tapes made
of these subjects were coded for gestures and filled pauses. The observation techniques were the same as in the colloquium study except that
the observer of filled pauses sat with his back to the monitor so that,
using earphones, he could hear the speech but not see the videotaped
subjects, while the observer of gestures faced the screen and systematically timed and signaled gestures but, lacking earphones, could not hear
what was said.
The data for the 13 subjects are given in Table II, which, for each
subject, presents the average for the two pairs of coders. The subjects
spoke for an average of 11.4 min, and spent 33% of that time gesturing.
In 12 of the 13 cases the rate of " u r n s " was lower while gesturing. On
average, subjects used 3.00 "ums'" per minute while they were gesturing
4 We are grateful to Dr. Krauss for giving us access to his tapes. His research was
supported in part by National Science Foundation grant BNS-8616131.
8
Christenfeld, Sehachter, and Bilous
Table II. " U r n " Rates While Gesturing and Not Gesturing for Undergraduate
Subjects
i
Subject
Minutes
talking
Minutes
gesturing
Total
"urns"
"Ums"/min
gesturing
"Ums"/min
not gesturing
1
2
3
4
5
6
7
8
9
10
11
12
13
Average
8.3
10.6
12.9
11.3
15.4
14.3
10.5
6.3
13.4
18.5
15.6
5.9
5.1
11.4
1.5
2.8
6.5
4.3
7.9
3.7
1.7
1.6
7.0
3.4
7.2
0.1
2.._~1
3.8
36.0
40.0
27.5
27.0
48.5
178.0
51.0
18.5
66.0
111.0
32.0
8.0
15.5
50.7
2.27
3.00
1.23
1.52
2.96
8.60
7.81
0.31
4.45
3.53
1.88
0.00
1.4,5
3.00
4.82
4.03
3.06
2.94
3.35
13.72
4.31
3.80
5.42
6.58
2.20
1.39
.4..11
4.59
and 4.59 " u m s " per minute while they were not gesturing. This is
significantly different, with t(12) = 2.81, p < .02.
For these data, more extensive reliability estimates could be c o m puted. Again, for the percent of time spent gesturing, the reliability was
R = .99. For the rate of " u r n s " per minute it was also R = .99. For
the rate of " u r n s " per minute while the speaker was gesturing, comparing
the two pairs of coders produced a reliability of R = .95. For the rate
while the speaker w a s not gesturing, the reliability was R = .99. It
should be borne in mind that these last reliabilities depend on the gesture
coders agreeing on w h e n the speaker w a s gesturing, the "'urn" coders
agreeing on w h e n the speaker said " u m , " and also on picking up the
signal correctly f r o m the gesture coders. In any case, it is almost excessively clear that this can be measured reliably. Once again, we hasten
to point out that our coders had spent well over 100 hours honing their
coding skills.
DISCUSSION
Taken together, these studies leave little doubt that people tend to
say " ' u m " less frequently while they are gesturing. Of the 31 subjects,
30 showed this trend. Furthermore, the effect existed for two different
speaking tasks, for experienced and inexperienced speakers, and for a
Filled Pauses and Gestures
9
large range of ages. The findings indicate that " u r n s " and gestures are
systematically, and not randomly, distributed in the flow of speech.
In spite of the strength of the finding, one cannot draw any firm
conclusions about the two phenomena and the nature of their relationship.
Since the findings are based on correlational studies, one must make
assumptions about one of the factors in order to conclude anything about
the other.
If we take as a fact that " u r n s " signal time out while there is an
ongoing search for the next word or phrase, then the present finding has
implications for the placement of gestures. Since people tend not to
gesture while they are " u m m i n g , " gestures should be an indication that
no search is in progress. Perhaps gestures are only initiated when a search
has been successful. The gestures may be linked to specific words, in
which case they clearly cannot start until the word has been found, or it
may be that gestures are simply held in check until the verbal channel
is ready to continue. In either case, we should anticipate that pauses
would tend to precede gestures immediately.
This idea of gestures is very different from the common-sense idea
that people grope for words by waving their hands. If this were the case,
then gestures, at least part of the time, should be a sign that a search is
underway. One might then expect that, sharing the same cause, gestures
and " u r n s " would tend to co-occur. The fact that they do not suggests,
but by no means proves, that gestures are not often used to grope for
words.
However, it seems to us that our understanding of gestures is still
so primitive that we are loathe to linger over speculation about the theoretical implications of these findings. The facts are firmly established.
Perhaps they will be useful in furthering our eventual understanding of
filled pauses and gestures.
REFERENCES
Beattie, G. W., & Butterworth, B. L. (1979) Contextual probability and word frequency
as determinants of pauses and errors in spontaneous speech. Language and Speech,
22, 201-211.
Birdwhistell, R. L. (1970). Kinesics and context. Philadelphia: University of Pennsylvania Press.
Butterworth, B. & Beattie, G. W. (1978). Gesture and silence as indicators of planning
in speech. In N. R. Campbell & P. T. Smith (Eds.), Recent advances in the psychology of language: Formal and experimental approaches (pp. 347-360). New
York: Plenum Press.
Butterworth, B., & Goldman-Eisler, F. (1979). Recent studies in cognitive rhythm. In
10
Christenfeld, Schachter, and Bilous
A. W. Siegman & S. Feldstein (Eds.), Of speech and time: Temporal speech patterns in interpersonal contexts (pp. 211-224). Hillsdale, NJ: Erlbaum.
Cohen, A. A. (1977). The communicative function of hand illustrators. Journal of Communication, 27, 54-63.
Dittman, A. T., & Llewellyn, L. G. (1969). Body movement and speech rhythm in
social conversation. Journal of Personality and Social Psychology, 11, 98-106.
Feldstein, S., Brenner, M. S., & Jaffe, J. (1963). The effect of subject sex, verbal interaction
and topical focus on speech disruption. Language and Speech, 6, 229-239.
Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: John
Wiley.
Freedman, N. (1972). The analysis of movement behavior during the clinical interview.
In A. Siegman and B. Pope (Eds.), Studies in dyadic communication (pp. 153-175)
Elmsford, NY: Pergamon Press.
Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in spontaneous speech. Londong: Academic Press.
Greene, J. O., & Lindsey, A. E. (1989). Encoding processes in the production of
multiple-goal messages. Human Communication Research, 16, 120-140.
Hadar, U., Steiner, T. J., & Rose, F. C. (1984). The relationship between head movements and speech dysfluencies. Language and Speech, 27, 333-342.
Krauss, R. M., Apple, W., Moreney, N., Wenzel, C., & Winton, W. (1981). Verbal,
vocal, and visible factors in judgments of another's effect. Journal of Personality
and Social Psychology, 40, 312-320.
Krauss, R. M., & Morrel-Samuels, P. (1988, February). Some things" we do and don't
know about hand gestures. Paper presented at meeting of the American Association
for the Advancement of Science, Boston.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading,
MA: Addison-Wesley.
Maclay, H. & Osgood, C. E. (1959). Hesitation phenomena in spontaneous English
speech. Word, 15, 19-44.
Mahl, G. F. (1987). Explorations in nonverbal and vocal behavior. Hillsdale, NJ: Erlbaum.
McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92,
350-371.
Panek, D. M., & Martin, B. (1959). The relationship between GSR and speech disturbances
in psychotherapy. Journal of Abnormal and Social Psychology, 58, 402--405.
Ragsdale, J. D., & Silvia, C. F. (1982). Distribution of kinesie hesitation phenomena
in spontaneous speech. Language and Speech, 25, 185-190.
Rochester, S. R. (1973). The significance of pauses in spontaneous speech. Journal of
Psycholinguistic Research, 2, 51-81.
Schachter, S., Christenfeld, N. J. S., Ravina, B., & Bilous, F. (1990). Speech disfluency
and the structure of knowledge. Journal of Personality and Social Psychology, (in
press).
Schachter, S., Christenfeld, N. J. S., & Rodstein, B. (1990). On the perception of
pauses in spontaneous speech. Unpublished manuscript.
Schegloff, E. A. (1984). On some gestures' relation to talk. In J. M. Anderson & J.
Heritage (Eds.), Structures of social action: Studies in conversational analyses (pp.
266-296). Cambridge, England: Cambridge University Press.
Vrolijk, A. (1974). Habituation as a mode of treatment of speaking anxiety. Gedrag
Tijdschrift voor Psychologie, 2, 332-338.