Visual Contribution to Speech Intelligibility in Noise

THE
JOURNAL
OF THE
ACOUSTICAL
SOCIETY
OF AMERICA
VOLUME
26, NUMBER
2
MARCH.
1954
Visual Contribution to Speech Intelligibility in Noise*
w. I•. Su•m•t AND I•W• POLLACK
Human FactorsOperationsResearchLaboratories,
Washington25, D.C.
(Received November 5, 1953)
Oral speechintelligibility testswere conductedwith, and without, supplementaryvisual observationof
the speaker'sfacial and lip movements.The differencebetween thesetwo conditionswas examinedas a
functionof the speech-to-noise
ratio and of the sizeof the vocabularyunder test. The visualcontribution
to oral speechintelligibility (relative to its possiblecontribution)is, to a first approximation,independent
of the speech-to-noise
ratio under test. However, sincethere is a much greater opportunity for the visual
contributionat low speech-to-noise
ratios,its absolutecontributioncan be exploitedmostprofitably under
these conditions.
and 50 items were then constructed from each restricted vocabu-
INTRODUCTION
lary-source.A different ordered series was assembledfor each
N manypractical
worksituations,
thestandardgroup of subjectsand for each testing-session.
In a seriesof supplementarytests, words of three different
criteria for speechinterferencelevels, basedupon
spondees,and trisyllaboratoryarticulationtest procedures,may be mis- lengths were considered--monosyllables,
phrases.This serieswas designedto test the generalityof
leading.This conditionis the resultof severalfactors, lablic
the findingsto otherspeechmaterials.The trisyllabicphraseswere
two of which are the subject matter of the present constructedby combininga spondeeand a monosyllableinto a
study' the informationassociatedwith the classof meaningfulpair with equal speechstresson each syllable, e.g.,
possible
messages
and the contributionof visualfactors "hardware store."
to speechintelligibility.
3. Speech Signal
First, if only a small number of possiblemessages
may be communicated,
we can toleratehighernoise Trained speakersread the lists of spondaicwords into a sus-
interferencelevelsthan if the classof possiblemessages pendedmicrophone(RCA 88-A). A high quality auditory system
is large.• And, second,if visual factorssupplementary (q-1 db between25 and 20 000 cps) was employedbetweenthe
microphone and earphones (Permoflux PDR-8 mounted in
to oral speechare utilized,we can toleratehighernoise doughnutcushions).The over-all speechlevel was measuredin
interference
levels
than
if
visual
factors
are
not
utilized.:
termsof theaverage
peakdeflection
of a Dax;en
VU meter.The
signallevelwasmonitoredat a constantlevelby a test supervisor.
This study considersthe interaction of these two
4. Noise
factors.Specifically,we shallexaminethe contribution
of visual factorsto oral speechintelligibility as a funcNoise, derived from a gas-tube source,was mixed electrically
tion of the speech-to-noise
ratio and the size of the with the speechsignal.It was uniform in level per cycle in the
frequencyband of 20-10 000 cps. The level at the listener'sears
possiblevocabulary.
APPARATUS
AND
PROCEDURE
1. Experimental Variables
The experimentalvariablesmanipulatedwere: the absenceor
presence
of supplementary
visualobservation
of a speaker's
lips
and facial movements,the speech-to-noise
ratio under test, and
was db S.P.L., based upon an overall reading of a Daven VU
meter. A SIN ratio of 0 db was definedin terms of an equal
overall reading of each of the two signalsupon the VU meter.
The speech-to-noise
ratio was varied by holding the noise level
constantand varying the speechlevel.
5. Test
Procedure
the size of the vocabulary under examination.
Beforeeachtest list was presented,the speakerrecitedthe test
vocabularyin order to define the wordsunder test. A reference
2. Speech Materials
list, alphabeticallyarranged,of the test vocabularywas furnished
The speechmaterialsemployedwere 256 bisyllabicwordsof to the subject. The speed of reading was determinedby the
the spondaicstresspattern,e.g., cupcake,baseball.Thesewords subjects'responserate. If a word was not clearly received,the
were chosen because they were less subject to inter-speaker subjects were instructed to select a word from the restricted
vocabulary on the basis of any marginally available cues. The
variation than other classes of words examined.
Vocabulariesof 8, 16, 32, 64, and 128 words were randomly orderof presentationof the varioustestsconditionswas varied at
selectedfrom the entire group of 256 spondees.
Test lists of 25 random. No carrier sentencewas used. In its place, a warning
light was turned on approximatelyone secondbeforeeach word
* Reproductionfor any purposesof the U.S. Governmentis was read. Immediately after each test list, each subjectcorrected
permitted.The writerswishto thankMr. JohnSchjelderup
for his own test responses.
his assistance
with the experimentalequipmentand A/3C Paul
Baringerfor his assistance
with tabulationof the experimental
6. Subjects
data. This reportis a condensation
of an HFORL reportwritten
by thefirstauthorwith theguidance
of thesecond
author.
Six subjectswere seatedabout a table in a group.Their average
• Now at the Universityof Virginia,Charlottesville,
Virginia.
• Miller, Heise,and Lichten,J. Exptl. Psychol.41, 329 (1951) distancefrom the speakerwas five feet. Each subjectwore a tight
fitting headset. Each subject handheld the cushionnearest the
especially
Fig. 2, p. 333.
• J. J. O'Neill, unpublisheddoctoraldissertation,The Ohio speakerin order to insure negligibledirect air transmissionover
the noisebackground.
State University, 1951.
212
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014
10:26:09
VISUAL
CONTRIBUTION
TO
SPEECH
Half of the subjectswatched the speaker'sfacial movements
INTELLIGIBILITY
Irvql '
as he spoke(auditoryand visualpresentation);the other half
faced away from the speaker (auditory presentationalone).
Each subject alternated between the two listening conditions.
A total of 129 subjects---enlisted
military and civilian laboratory
o,•/!
I
_/•
i
I
213
'
i
'
•
'
3•_•M"•_ __•?
personnel
andundergraduate
university
students--participated.
No specialpractice in lip-readingwas given and all had normal
auditory and visual acuity. In the supplementarytest series,
nine university undergraduatestudentswere employed.
RESULTS
Speech intelligibility scores, under conditions of
auditory presentationalone, are presentedin Fig. !
asa functionof the speech-to-noise
ratio. The parameter
is the size of the vocabulary under test. In general,
speechintelligibility decreasesas the speech-to-noise
ratio is decreasedand as the size of the vocabulary is
increased. However, little further change in speech
o
-
Ily,7
13•
0
IAUDITORY
S VISUAL
CUES
I.U
13..
-
o
-$o
•
•
-24
•
-18
•
-12
SPEECH-TO-NOISE
-6
o
RATIO IN DB
Fro. 2. Speechintelligibility under conditionsof simultaneous
auditory presentation and visual observation of a speaker's
facial movementsas a function of the speech-to-noise
ratio under
test. The parameter on the curvesis the size of the vocabulary
(spondeewords) under examination.
ioo
1,3
_
_
8-256 WORDS
40
/ //!/
I
z
W 20 o •'"'•//•:,•
IAU?'TORY
C•ESl
0
øl
-30
•
I
-24
•
I
-i 8
•
I
- 12
SPEECH-TO-NOISE
1
- 6
0
RATIO IN DB
Fro. 1. Speech intelligibility under conditions of auditory
presentationaloneas a functionof the speech-to-noise
ratio under
test. The parameter on the curves is the size of the vocabulary
(spondeewords)under examination.Each point in Figs. 1 and 2
representsthe averageresultsfor 450 determinationspooledover
subjects.
between the average intelligibility associated with
auditory presentationalone and that associatedwith
bisensory presentation is presented in Fig. 3. The
major relationship presentedin Fig. 3 is that this
difference-scoreincreasesas the speech-to-noiseratio
is decreased.Specifically, when the speech signal is
inaudibleand where only visual factorsoperate (SIN
ratio of -30 db), the differencesbetweenthe intelligibility scores associatedwith the two experimental
conditionsrange from 40 percent for the 256-word
vocabulary to 80 percent for the 8-word vocabulary.
In contrast, under noise-free conditions, there is little
differencein the intelligibility scoresassociatedwith
the two test conditions.
Each difference-score
may be regarded,alternatively,
as the contributionof visualobservationof the speaker's
i
8O
intelligibility is observedas the size of the vocabulary
under test is increasedbeyond64 words3
A parallel examinationof the resultsassociatedwith
combinedauditory presentationand visual observation
of the speakeris presentedin Fig. 2. The major relationships of Fig. I are again obtained. The outstanding
difference,however,is the higher resistanceto noise
for bisensorypresentation.This finding is illustrated
by the gentlerslopesof the empiricalfunctionsof Fig. 2.
•. 6o
4O
z
a::2.0'
fFor each experimentalcondition,the difference
3This latter finding is contrary to that obtained by Miller,
Heise and Lichten. They found continueddecrementsin performance as the size of the vocabularywas extendedfrom 32 to 1000
words. The discrepancyis due, we suspect,to the fact that our
subjects'check lists were arranged in an arbitrary (alphabetic)
fashion, whereas,Miller's were arranged in logical groupingsof
common vowel sounds. Thus, for larger vocabularies,our lists
were probably of little value to the subject. See: Miller, Heise,
and Lichten, Reference1.
0
-.30
-24
-18
SPEECH-TO-NOISE
-12
-6
RATIO
0
IN
CO
DB
Fro. 3. The differencebetweenthe speechintelligibility scores
under conditionsof bisensorypresentation(Fig. 2) and auditory
presentationalone (Fig. 1) as a function of the speech-to-noise
ratio under test. The parameteron the curvesis the size of the
vocabulary(spondeewords)under test.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014
10:26:09
214
W.
H.
SUMBY
AND
80
u.,• 60
•.•4o
•- •.2o
•-
o
8
16
•SP'
64
SIZE OF MESSAGE
128
256
CLASS
I.
POLLACK
latter finding is due, in part, to the upper ceilingplaced
upon the intelligibility scoreswhich leaves no room
for improvement for highly restricted vocabularies
under high speech-to-noise
ratios.
An alternative examinationof the results,in terms of
measuresof transmitted information, is presentedin
Fig. 5. The information transmitted, relative to the
information presented,is plotted as a function of the
speech-to-noiseratio. The results associatedwith bisensorypresentationare presentedas the upper curves
and with auditory presentation alone as the lower
curves. The parameter is the size of the vocabulary
under test.
In general, the major finding of the intelligibility
Fro. 4. The differencebetween the speechintelligibility scores
under conditionsof bisensorypresentation(Fig. 2) and auditory analysis is verified: the visual contribution to speech
presentationalone (Fig. 1) as a functionof the sizeof the vocabu- intelligibility (in terms of the differencein transmitted
lary. The parameter on the curves is the speech-to-noise
ratio
under
test.
3
facial and lip movementsto oral speechintelligibility
in noise. In these terms, the main conclusionto be
drawn from Fig. 3 is that this visual contribution
becomesmore important as the speech-to-noiseratio
is decreased.
An alternative descriptionof this visual contribution
to oral speech intelligibility is presented in Fig. 4.
The parameter and abscissaof Fig. 3 have been interchanged in Fig. 4 in order to examine the role of
vocabulary-size.In general, under conditions of low
speech-to-noiseratios, the visual contribution (in
terms of the difference-score)is decreasedwith an
increasein vocabulary size. Under conditionsof high
speech-to-noise
ratios, however,the visual contribution
is increasedwith an increasein vocabulary-size.This
-18
. '-- AUDIO)
VISUA
,,o 40
z
•o
b.i
o
z
1
-.30
-2.4
-18
-12.
SPEECH-TO-NOISE
-6
-9
-6
RATIO
0
IN DB
Fro. 6. The information transmitted, relative to the information
presented, as function of the speech-to-noiseratio under test.
The parameter is the type of words employed. Data for vocabularies of eight words. The upper curves are associatedwith
bisensory presentation; the lower curves are associated with
auditory presentationalone.
z
<:1:80
I-
42
SPEECH-TO-NOISE
IOO
z
o
-16
0
RATIO IN DB
Fro. 5. The information transmitted, relative to the information
presented,as a function of the speech-to-noise
ratio under test.
The parameteron the curvesis the sizeof the vocabulary(spondee
words) under test. Upper curves are for auditory and visual
presentation. Lower curves are for auditory presentation alone.
Each point in Figs. 5 and 6 representsthe resultsfor 50 observations, pooledover subjects,for eachof the words.
informationassociatedwith bisensorypresentationand
with auditory presentation alone) increasesas the
speech-to-noise
ratio is decreased.
The resultsof the supplementarytestsare presented
in Fig. 6 in termsof the relative measureof transmitted
information. There is little differencein favor of polysyllabic words under conditionsof bisensorypresentation. Under conditionsof auditorypresentationalone,
higher intelligibility scoresare associatedwith the
polysyllabicwords with little differencebetween the
scoreswith the bi- and trisyllabic words.
DISCUSSION
The information analysis,presentedin Fig. 5, may
be extendedby an analysisof the individualcomponents
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014
10:26:09
VISUAL
of
transmitted
information
CONTRIBUTION
to
achieve
TO
SPEECH
INTELLIGIBILITY
a somewhat
RESPONSE
INFORMATION
'•
more generalresult. In the following analysis, we are
essentiallyduplicating a similar analysispresentedby
McGill 4 in a somewhat different
situation.
It may be noted that the absolutevisual contribution
(as definedby the difference-scores
betweenauditory
presentationalone and bisensorypresentation)must,
necessarily,be small at high speech-to-noiseratios.
The reason is that, under these conditions, there is
little roomfor improvementwith bisensorypresentation
because intelligibility is high under conditions of
auditory presentation alone. Conversely, there is a
much greater opportunity for a visual contribution
at low speech-to-noiseratios because, under these
conditions, intelligibility is low under the condition
of auditory presentationalone.
The more meaningful question, perhaps, is "What
is the visual informational
215
contribution
relative to the
possibleavailablecontributionin the absenceof visual
cues?"The actual contribution,as shownschematically
in Fig. 7 as A, is the differencebetween the scores
associatedwith auditory-visualpresentationand auditory presentationalone. The possibleavailable contribution, as shownschematicallyin Fig. 7 as B, is the
difference between the total responseinformation5
and the transmissionunderauditory presentationalone.
The ratio of thesetwo scores,
A/B answersthe question.
This ratio is approximately constant over a wide
range of speech-to-noiseratios. Specifically,for the
8-word vocabulary,the ratio increasesfrom about 0.81
at a SIN ratio of --30 db to about 0.95 at a SIN ratio
4W. J. McGill, Paper delivered before the American PsychologicalAssociation,September,1953.
• The total responseinformation in our tests is almost indistinguishablefrom the stimulusinformation. This is a value of 100
percent or 1.0 in Fig. 5.
z
Iz
o
AUD
•,=•-,
z
,
ß
SPEEOH-TO-NOISE
Fro. 7. Schematic illustration
RATIO
IN
DB
of the visual contribution
to oral
speechintelligibility relative to its maximumpossiblecontribution.
of --6 db. For the 32-word vocabulary, the ratio
increasesfrom 0.77 to 0.81 over the samerange. Thus,
to a first approximation, the relative visual informational contribution suppliedby observinga speaker's
facial and lip movementsis independent
of the speechto-noise ratio under test.
Practically speaking,however, sincethere is a much
greater opportunity for the visual contribution at low
speech-to-noiseratios, its absolute contribution can
be exploited most profitably under these conditions.
And, since these conditions are the rule, rather than
the exception,in many military and industrial situations, the resultssuggestthat oral speechintelligibility
may be appreciably improved in many practical
situations by arrangement for supplementary visual
observationof the speaker.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014
10:26:09