THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA VOLUME 26, NUMBER 2 MARCH. 1954 Visual Contribution to Speech Intelligibility in Noise* w. I•. Su•m•t AND I•W• POLLACK Human FactorsOperationsResearchLaboratories, Washington25, D.C. (Received November 5, 1953) Oral speechintelligibility testswere conductedwith, and without, supplementaryvisual observationof the speaker'sfacial and lip movements.The differencebetween thesetwo conditionswas examinedas a functionof the speech-to-noise ratio and of the sizeof the vocabularyunder test. The visualcontribution to oral speechintelligibility (relative to its possiblecontribution)is, to a first approximation,independent of the speech-to-noise ratio under test. However, sincethere is a much greater opportunity for the visual contributionat low speech-to-noise ratios,its absolutecontributioncan be exploitedmostprofitably under these conditions. and 50 items were then constructed from each restricted vocabu- INTRODUCTION lary-source.A different ordered series was assembledfor each N manypractical worksituations, thestandardgroup of subjectsand for each testing-session. In a seriesof supplementarytests, words of three different criteria for speechinterferencelevels, basedupon spondees,and trisyllaboratoryarticulationtest procedures,may be mis- lengths were considered--monosyllables, phrases.This serieswas designedto test the generalityof leading.This conditionis the resultof severalfactors, lablic the findingsto otherspeechmaterials.The trisyllabicphraseswere two of which are the subject matter of the present constructedby combininga spondeeand a monosyllableinto a study' the informationassociatedwith the classof meaningfulpair with equal speechstresson each syllable, e.g., possible messages and the contributionof visualfactors "hardware store." to speechintelligibility. 3. Speech Signal First, if only a small number of possiblemessages may be communicated, we can toleratehighernoise Trained speakersread the lists of spondaicwords into a sus- interferencelevelsthan if the classof possiblemessages pendedmicrophone(RCA 88-A). A high quality auditory system is large.• And, second,if visual factorssupplementary (q-1 db between25 and 20 000 cps) was employedbetweenthe microphone and earphones (Permoflux PDR-8 mounted in to oral speechare utilized,we can toleratehighernoise doughnutcushions).The over-all speechlevel was measuredin interference levels than if visual factors are not utilized.: termsof theaverage peakdeflection of a Dax;en VU meter.The signallevelwasmonitoredat a constantlevelby a test supervisor. This study considersthe interaction of these two 4. Noise factors.Specifically,we shallexaminethe contribution of visual factorsto oral speechintelligibility as a funcNoise, derived from a gas-tube source,was mixed electrically tion of the speech-to-noise ratio and the size of the with the speechsignal.It was uniform in level per cycle in the frequencyband of 20-10 000 cps. The level at the listener'sears possiblevocabulary. APPARATUS AND PROCEDURE 1. Experimental Variables The experimentalvariablesmanipulatedwere: the absenceor presence of supplementary visualobservation of a speaker's lips and facial movements,the speech-to-noise ratio under test, and was db S.P.L., based upon an overall reading of a Daven VU meter. A SIN ratio of 0 db was definedin terms of an equal overall reading of each of the two signalsupon the VU meter. The speech-to-noise ratio was varied by holding the noise level constantand varying the speechlevel. 5. Test Procedure the size of the vocabulary under examination. Beforeeachtest list was presented,the speakerrecitedthe test vocabularyin order to define the wordsunder test. A reference 2. Speech Materials list, alphabeticallyarranged,of the test vocabularywas furnished The speechmaterialsemployedwere 256 bisyllabicwordsof to the subject. The speed of reading was determinedby the the spondaicstresspattern,e.g., cupcake,baseball.Thesewords subjects'responserate. If a word was not clearly received,the were chosen because they were less subject to inter-speaker subjects were instructed to select a word from the restricted vocabulary on the basis of any marginally available cues. The variation than other classes of words examined. Vocabulariesof 8, 16, 32, 64, and 128 words were randomly orderof presentationof the varioustestsconditionswas varied at selectedfrom the entire group of 256 spondees. Test lists of 25 random. No carrier sentencewas used. In its place, a warning light was turned on approximatelyone secondbeforeeach word * Reproductionfor any purposesof the U.S. Governmentis was read. Immediately after each test list, each subjectcorrected permitted.The writerswishto thankMr. JohnSchjelderup for his own test responses. his assistance with the experimentalequipmentand A/3C Paul Baringerfor his assistance with tabulationof the experimental 6. Subjects data. This reportis a condensation of an HFORL reportwritten by thefirstauthorwith theguidance of thesecond author. Six subjectswere seatedabout a table in a group.Their average • Now at the Universityof Virginia,Charlottesville, Virginia. • Miller, Heise,and Lichten,J. Exptl. Psychol.41, 329 (1951) distancefrom the speakerwas five feet. Each subjectwore a tight fitting headset. Each subject handheld the cushionnearest the especially Fig. 2, p. 333. • J. J. O'Neill, unpublisheddoctoraldissertation,The Ohio speakerin order to insure negligibledirect air transmissionover the noisebackground. State University, 1951. 212 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014 10:26:09 VISUAL CONTRIBUTION TO SPEECH Half of the subjectswatched the speaker'sfacial movements INTELLIGIBILITY Irvql ' as he spoke(auditoryand visualpresentation);the other half faced away from the speaker (auditory presentationalone). Each subject alternated between the two listening conditions. A total of 129 subjects---enlisted military and civilian laboratory o,•/! I _/• i I 213 ' i ' • ' 3•_•M"•_ __•? personnel andundergraduate university students--participated. No specialpractice in lip-readingwas given and all had normal auditory and visual acuity. In the supplementarytest series, nine university undergraduatestudentswere employed. RESULTS Speech intelligibility scores, under conditions of auditory presentationalone, are presentedin Fig. ! asa functionof the speech-to-noise ratio. The parameter is the size of the vocabulary under test. In general, speechintelligibility decreasesas the speech-to-noise ratio is decreasedand as the size of the vocabulary is increased. However, little further change in speech o - Ily,7 13• 0 IAUDITORY S VISUAL CUES I.U 13.. - o -$o • • -24 • -18 • -12 SPEECH-TO-NOISE -6 o RATIO IN DB Fro. 2. Speechintelligibility under conditionsof simultaneous auditory presentation and visual observation of a speaker's facial movementsas a function of the speech-to-noise ratio under test. The parameter on the curvesis the size of the vocabulary (spondeewords) under examination. ioo 1,3 _ _ 8-256 WORDS 40 / //!/ I z W 20 o •'"'•//•:,• IAU?'TORY C•ESl 0 øl -30 • I -24 • I -i 8 • I - 12 SPEECH-TO-NOISE 1 - 6 0 RATIO IN DB Fro. 1. Speech intelligibility under conditions of auditory presentationaloneas a functionof the speech-to-noise ratio under test. The parameter on the curves is the size of the vocabulary (spondeewords)under examination.Each point in Figs. 1 and 2 representsthe averageresultsfor 450 determinationspooledover subjects. between the average intelligibility associated with auditory presentationalone and that associatedwith bisensory presentation is presented in Fig. 3. The major relationship presentedin Fig. 3 is that this difference-scoreincreasesas the speech-to-noiseratio is decreased.Specifically, when the speech signal is inaudibleand where only visual factorsoperate (SIN ratio of -30 db), the differencesbetweenthe intelligibility scores associatedwith the two experimental conditionsrange from 40 percent for the 256-word vocabulary to 80 percent for the 8-word vocabulary. In contrast, under noise-free conditions, there is little differencein the intelligibility scoresassociatedwith the two test conditions. Each difference-score may be regarded,alternatively, as the contributionof visualobservationof the speaker's i 8O intelligibility is observedas the size of the vocabulary under test is increasedbeyond64 words3 A parallel examinationof the resultsassociatedwith combinedauditory presentationand visual observation of the speakeris presentedin Fig. 2. The major relationships of Fig. I are again obtained. The outstanding difference,however,is the higher resistanceto noise for bisensorypresentation.This finding is illustrated by the gentlerslopesof the empiricalfunctionsof Fig. 2. •. 6o 4O z a::2.0' fFor each experimentalcondition,the difference 3This latter finding is contrary to that obtained by Miller, Heise and Lichten. They found continueddecrementsin performance as the size of the vocabularywas extendedfrom 32 to 1000 words. The discrepancyis due, we suspect,to the fact that our subjects'check lists were arranged in an arbitrary (alphabetic) fashion, whereas,Miller's were arranged in logical groupingsof common vowel sounds. Thus, for larger vocabularies,our lists were probably of little value to the subject. See: Miller, Heise, and Lichten, Reference1. 0 -.30 -24 -18 SPEECH-TO-NOISE -12 -6 RATIO 0 IN CO DB Fro. 3. The differencebetweenthe speechintelligibility scores under conditionsof bisensorypresentation(Fig. 2) and auditory presentationalone (Fig. 1) as a function of the speech-to-noise ratio under test. The parameteron the curvesis the size of the vocabulary(spondeewords)under test. Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014 10:26:09 214 W. H. SUMBY AND 80 u.,• 60 •.•4o •- •.2o •- o 8 16 •SP' 64 SIZE OF MESSAGE 128 256 CLASS I. POLLACK latter finding is due, in part, to the upper ceilingplaced upon the intelligibility scoreswhich leaves no room for improvement for highly restricted vocabularies under high speech-to-noise ratios. An alternative examinationof the results,in terms of measuresof transmitted information, is presentedin Fig. 5. The information transmitted, relative to the information presented,is plotted as a function of the speech-to-noiseratio. The results associatedwith bisensorypresentationare presentedas the upper curves and with auditory presentation alone as the lower curves. The parameter is the size of the vocabulary under test. In general, the major finding of the intelligibility Fro. 4. The differencebetween the speechintelligibility scores under conditionsof bisensorypresentation(Fig. 2) and auditory analysis is verified: the visual contribution to speech presentationalone (Fig. 1) as a functionof the sizeof the vocabu- intelligibility (in terms of the differencein transmitted lary. The parameter on the curves is the speech-to-noise ratio under test. 3 facial and lip movementsto oral speechintelligibility in noise. In these terms, the main conclusionto be drawn from Fig. 3 is that this visual contribution becomesmore important as the speech-to-noiseratio is decreased. An alternative descriptionof this visual contribution to oral speech intelligibility is presented in Fig. 4. The parameter and abscissaof Fig. 3 have been interchanged in Fig. 4 in order to examine the role of vocabulary-size.In general, under conditions of low speech-to-noiseratios, the visual contribution (in terms of the difference-score)is decreasedwith an increasein vocabulary size. Under conditionsof high speech-to-noise ratios, however,the visual contribution is increasedwith an increasein vocabulary-size.This -18 . '-- AUDIO) VISUA ,,o 40 z •o b.i o z 1 -.30 -2.4 -18 -12. SPEECH-TO-NOISE -6 -9 -6 RATIO 0 IN DB Fro. 6. The information transmitted, relative to the information presented, as function of the speech-to-noiseratio under test. The parameter is the type of words employed. Data for vocabularies of eight words. The upper curves are associatedwith bisensory presentation; the lower curves are associated with auditory presentationalone. z <:1:80 I- 42 SPEECH-TO-NOISE IOO z o -16 0 RATIO IN DB Fro. 5. The information transmitted, relative to the information presented,as a function of the speech-to-noise ratio under test. The parameteron the curvesis the sizeof the vocabulary(spondee words) under test. Upper curves are for auditory and visual presentation. Lower curves are for auditory presentation alone. Each point in Figs. 5 and 6 representsthe resultsfor 50 observations, pooledover subjects,for eachof the words. informationassociatedwith bisensorypresentationand with auditory presentation alone) increasesas the speech-to-noise ratio is decreased. The resultsof the supplementarytestsare presented in Fig. 6 in termsof the relative measureof transmitted information. There is little differencein favor of polysyllabic words under conditionsof bisensorypresentation. Under conditionsof auditorypresentationalone, higher intelligibility scoresare associatedwith the polysyllabicwords with little differencebetween the scoreswith the bi- and trisyllabic words. DISCUSSION The information analysis,presentedin Fig. 5, may be extendedby an analysisof the individualcomponents Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014 10:26:09 VISUAL of transmitted information CONTRIBUTION to achieve TO SPEECH INTELLIGIBILITY a somewhat RESPONSE INFORMATION '• more generalresult. In the following analysis, we are essentiallyduplicating a similar analysispresentedby McGill 4 in a somewhat different situation. It may be noted that the absolutevisual contribution (as definedby the difference-scores betweenauditory presentationalone and bisensorypresentation)must, necessarily,be small at high speech-to-noiseratios. The reason is that, under these conditions, there is little roomfor improvementwith bisensorypresentation because intelligibility is high under conditions of auditory presentation alone. Conversely, there is a much greater opportunity for a visual contribution at low speech-to-noiseratios because, under these conditions, intelligibility is low under the condition of auditory presentationalone. The more meaningful question, perhaps, is "What is the visual informational 215 contribution relative to the possibleavailablecontributionin the absenceof visual cues?"The actual contribution,as shownschematically in Fig. 7 as A, is the differencebetween the scores associatedwith auditory-visualpresentationand auditory presentationalone. The possibleavailable contribution, as shownschematicallyin Fig. 7 as B, is the difference between the total responseinformation5 and the transmissionunderauditory presentationalone. The ratio of thesetwo scores, A/B answersthe question. This ratio is approximately constant over a wide range of speech-to-noiseratios. Specifically,for the 8-word vocabulary,the ratio increasesfrom about 0.81 at a SIN ratio of --30 db to about 0.95 at a SIN ratio 4W. J. McGill, Paper delivered before the American PsychologicalAssociation,September,1953. • The total responseinformation in our tests is almost indistinguishablefrom the stimulusinformation. This is a value of 100 percent or 1.0 in Fig. 5. z Iz o AUD •,=•-, z , ß SPEEOH-TO-NOISE Fro. 7. Schematic illustration RATIO IN DB of the visual contribution to oral speechintelligibility relative to its maximumpossiblecontribution. of --6 db. For the 32-word vocabulary, the ratio increasesfrom 0.77 to 0.81 over the samerange. Thus, to a first approximation, the relative visual informational contribution suppliedby observinga speaker's facial and lip movementsis independent of the speechto-noise ratio under test. Practically speaking,however, sincethere is a much greater opportunity for the visual contribution at low speech-to-noiseratios, its absolute contribution can be exploited most profitably under these conditions. And, since these conditions are the rule, rather than the exception,in many military and industrial situations, the resultssuggestthat oral speechintelligibility may be appreciably improved in many practical situations by arrangement for supplementary visual observationof the speaker. Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 195.169.108.172 On: Tue, 07 Oct 2014 10:26:09
© Copyright 2026 Paperzz