J Am Acad Audiol 16:726–739 (2005) Speech Recognition in Multitalker Babble Using Digits, Words, and Sentences Rachel A. McArdle* Richard H. Wilson† Christopher A. Burks† Abstract The purpose of this mixed model design was to examine recognition performance differences when measuring speech recognition in multitalker babble on listeners with normal hearing (n = 36) and listeners with hearing loss (n = 72) utilizing stimulus of varying linguistic complexity (digits, words, and sentence materials). All listeners were administered two trials of two lists of each material in a descending speech-to-babble ratio. For each of the materials, recognition performances by the listeners with normal hearing were significantly better than the performances by the listeners with hearing loss. The mean separation between groups at the 50% point in signal-to-babble ratio on each of the three materials was ~8 dB. The 50% points for digits were obtained at a significantly lower signal-to-babble ratio than for sentences or words that were equivalent. There were no interlist differences between the two lists for the digits and words, but there was a significant disparity between QuickSIN™ lists for the listeners with hearing loss. A two-item questionnaire was used to obtain a subjective measurement of speech recognition, which showed moderate correlations with objective measures of speech recognition in noise using digits (r = .641), sentences (r = .572), and words (r = .673). Key Words: Auditory perception, hearing loss, speech perception, word recognition in multitalker babble Abbreviations: ANSI = American National Standards Institute; S/B = signalto-babble ratio; SNR = signal-to-noise ratio Sumario El propósito de este diseño de modelo mixto fue examinar diferencias en el desempeño de reconocimiento, cuando se miden reconocimiento del lenguaje en medio de balbuceo de hablantes múltiples, en sujetos con audición normal (n = 36) y sujetos con hipoacusia (n = 72), usando estímulos de complejidad lingüística variada (dígitos, palabras y frases). Todos los sujetos se sometieron a dos ensayos de dos listas de cada tipo de material en tasas descendentes de lenguaje/balbuceo. Para cada uno de los materiales, el desempeño de reconocimiento por parte de los sujetos normo-oyentes fue significativamente mejor que el desempeño de aquellos con hipoacusia. La separación media entre los grupos en el punto del 50% de las tasas señal/balbuceo en cada uno de los tres tipos de materiales fue ~8 dB. El 50% de los puntos para dígitos se obtuvo a una tasa señal/balbuceo significativamente menor que para frases o palabras que eran equivalentes. No existieron diferencias inter-listas entre *Bay Pines VA Healthcare System, Bay Pines, Florida; †James H. Quillen VA Medical Center, Mountain Home, Tennessee Rachel A. McArdle, Ph.D., Bay Pines VA Healthcare System, Audiology (126), P.O. Box 5005, Bay Pines, FL 33744; Phone: 727-398-9395; Fax: 727-319-1209; E-mail: [email protected] This work was supported by the Rehabilitation Research and Development Service, Department of Veterans Affairs through a Research Career Development award to the senior author, a Senior Research Career Scientist award to the second author, and a Research Enhancement Award Program (REAP) to Mountain Home. 726 Recognition of Digits, Words, and Sentences in Babble/McArdle et al las dos listas de dígitos y palabras, pero hubo una disparidad significativa entre las listas de QuickSIN™ para los sujetos con hipoacusia. Se utilizó un cuestionario de dos elementos para obtener una medición subjetiva del reconocimiento del lenguaje, que mostrara una correlación moderada con medidas objetivas de reconocimiento del lenguaje en ruido usando dígitos (r = .641), frases (r = .572) y palabras (r = .673). Palabras Clave: Percepción auditiva, hipoacusia, percepción del lenguaje, reconocimiento del lenguaje en medio de balbuceo de hablantes múltiples Abreviaturas: ANSI = Instituto Nacional Americano de Estándares; S/B = tasa señal/balbuceo; SNR = tasa señal/ruido T he most universal complaint reported by older individuals with hearing loss is difficulty understanding speech in the presence of background noise (Carhart and Tillman, 1970; Dubno et al, 1984; van Rooij and Plomp, 1990, 1992; Bronkhorst, 2000; Horwitz et al, 2002; Killion, 2002; Wilson and Strouse, 2002), yet measurement of speech-recognition abilities in quiet continues to be standard practice within most audiology clinics. A recent survey on audiological practices showed that of the 91% of audiologists who responded that they routinely administer word-recognition tests, 92% administer suprathreshold monosyllabic word lists in quiet (Martin et al, 1998). A limitation of word-recognition testing in a quiet background at one presentation level is that the results are not representative of how an individual will perform in real-world situations (e.g., noisy) under amplification (Dirks et al, 1982; Plomp, 1986; CHABA, 1988). Speech-recognition testing in quiet also does not address the chief complaint of the majority of patients with hearing loss, which is difficulty understanding speech in noise. Numerous studies show that pure-tone audiograms and speech-recognition scores in quiet do not predict the ability of an individual with hearing loss to understand speech in noise (Cherry, 1953; Groen, 1969; Carhart and Tillman, 1970; Plomp, 1978; Plomp and Mimpen, 1979; Dirks et al, 1982; Beattie, 1989; Killion and Niquette, 2000; Wilson, 2003). The inability to predict wordrecognition performance in noise scores can be explained using the Plomp (1978) twofactor framework of hearing loss that suggests both attenuation and distortion are involved with decreased hearing function. Pure-tone thresholds and, to some degree, the ability to recognize speech in a quiet background are measures of audibility or lack of audibility that Plomp (1978) refers to as attenuation. The ability to recognize speech in a background noise is a different aspect of auditory behavior that involves a distortion factor. Regardless of whether the distortion is peripheral, such as increased bandwidths that lead to poorer frequency resolution, or whether the distortion is a result of agerelated changes in the central nervous system, a main indication of distortion is a decreased ability of the listener to understand degraded speech stimuli such as speech in background noise. In terms of rehabilitation, hearing aids are used to make speech audible, which remediates the attenuation factor for most individuals. Individuals who also have a distortion component to their hearing loss often demonstrate less benefit from amplification and require extensive counseling on realistic expectations for hearing-aid use. Because of the distortion component of hearing loss (Plomp, 1978), the ability to recognize speech in noise cannot be predicted, and it must be measured directly (Killion, 2002). Incorporating background noise into standardized speech tests improves the sensitivity and validity of word-recognition measures (Findlay, 1976; Beattie, 1989; Willott, 1991; Sperry et al, 1997; Wiley and Page, 1997). Past studies have shown a clear separation in performance between individuals with normal hearing and individuals with hearing loss when measuring speech recognition in multitalker babble (Dubno et al, 1984, Beattie, 1989; Wilson and Strouse, 2002; Wilson, 2003). On average, individuals with hearing loss have required the signal to be 10–12 dB higher than the multitalker babble to obtain a performance level of 50% correct, whereas individuals with normal hearing reach 50% correct at 727 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 signal-to-babble ratios (S/B) of 2–6 dB. Thus, it appears that not only do individuals with impaired hearing have a pure-tone sensitivity loss, but also a signal-to-noise ratio (SNR) loss that is unpredictable from objective measures in quiet. In addition, subjective reports of communication ability in a noisy environment have failed to show any relationship between how a patient perceives his or her ability to recognize speech in a noisy environment and objective measures of speech recognition in multitalker babble (Rowland, et al, 1985; Wilson et al, in press). Information gained from measuring speech recognition in noise is useful not only in differentiating between normal and impaired hearing but for rehabilitation purposes. Measuring SNR loss is also useful for selecting amplification strategies (e.g., directional microphones or FM systems), corroborating a patient’s claim of activity limitations, and counseling concerning expectations for hearing-aid performance. Audiologists, however, have not embraced speech-in-noise testing as a routine part of their clinical procedures. Reasons postulated include: (1) past practice stressing speech audiometry in quiet instead of a more ecologically valid approach and (2) audiologists being unaware of the rehabilitative usefulness provided by the speech-in-noise data (Wilson and Strouse, 2002). Additionally, many audiologists are unfamiliar with interpreting SNR loss since it is often measured in dB S/B versus the more clinically historic metric of percent correct. The importance of objectively assessing speech in noise in the adult population led to the recent development of standardized speech tests using sentence materials that include (1) the Connected Speech test (CST; Cox et al, 1987), (2) the Hearing In Noise Test (HINT; Nilsson et al, 1994), (3) the Speech In Noise Test (SIN™; Etymotic Research, 1993), and (4) the QuickSIN (Etymotic Research, 2001; Killion et al, 2004). An advantage to sentence-length stimuli is the ability to score multiple target words in a short amount of time. For example, one test list in the QuickSIN includes 30 target words arranged into six sentences that can be administered in less than a minute and provide a measurement of SNR loss. On the other hand, a disadvantage to sentence-length stimuli, especially in an older population, is that repeating sentence materials, especially 728 in background noise, involves cognitive skills beyond a simple one-word, speech-recognition task. The additional cognitive demands may have a differential effect, particularly on older listeners (Salthouse, 1985; Craik, 1994). Sentence-length stimuli require an individual to recall multiple words from working memory. Recall of multiple words can be influenced by recency and primacy effects (Murdock, 1962), such that the first and last words of a string of words are easier to recall. Also, the syntactic and semantic structure of sentence-length stimuli influences performance such that it is easier to recall multiple words that follow grammar rules and are meaningful (Wingfield, 1996). By presenting sentence-length stimuli in competing noise, the ability to use compensatory strategies for remembering strings of words such as rehearsal strategies (mental repetition of information to be recalled) and elaborative encoding (linking of new information to knowledge already stored in long-term memory) may be inaccessible (Craik and Lockhart, 1972; Rabbitt, 1990). Recently, monosyllabic word and digit materials in multitalker babble were shown to be sensitive to the different recognition abilities of listeners with normal hearing and listeners with hearing loss (Wilson et al, 2003; Wilson and Weakley, 2004; Wilson, Burks, et al, forthcoming). Although the use of monosyllables, of which monosyllabic digits are a special case, as test stimuli in speechtesting paradigms has been criticized for lacking natural dynamics of real speech such as word stress, co-articulation, and dynamic range (Villchur, 1982), words remain the most popular stimulus type among audiologists and minimize the effects on performance of working memory and linguistic context. This study was designed to examine the differences among speech-in-noise tasks using stimuli with varied linguistic context. Digits, words, and sentences in multitalker babble were used to examine the S/B needed by listeners with normal hearing and hearing loss to reach a criterion of 50% correct on a closed set (digits), an open set with syntactic context (QuickSIN), and an open set without context (monosyllables). It was hypothesized that performance would follow the same continuum such that, as the amount of linguistic context increased from digits to sentences to words, the need for an improved S/B would also increase to reach 50% correct. Furthermore, our interest was to again Recognition of Digits, Words, and Sentences in Babble/McArdle et al examine the relationship between a subjective rating of performance in noise with objective measurements of SNR loss. Research, 2001; Killion et al, 2004), (2) monosyllabic words in multitalker babble (Wilson, 2003), (3) digit triplets in multitalker babble (Wilson and Weakley, 2004), and (4) Northwestern University Auditory Test No. 6 (NU No. 6; Tillman and Carhart, 1966) in quiet. The presentation paradigms and the signal-to-noise ratios of each material were not altered from their original protocols. Each QuickSIN list consisted of six IEEE sentences, each with five target words. The level of the sentences was fixed and the level of the multitalker babble (Auditec of St. Louis), which is continuous throughout the list of sentences, was varied in 5 dB increments from 25 to 0 dB S/B. The sentences were 2.5 to 3.0 sec with a 5 to 6 sec interval between sentences, during which time responses were made and recorded. Each list of sentences was ~55 sec. List 3 and List 4 of the QuickSIN arbitrarily were selected for the current experiment. The materials were taken from the commercial version of the QuickSIN (Etymotic Research, 2001). The purpose of developing the words-innoise (WIN) paradigm was to provide a protocol that could be used to evaluate the abilities of listeners to understand speech in a background noise. Multitalker babble was selected as the background noise because it is probably the most commonly occurring noise that interferes with communication (Plomp, 1978). The multitalker babble, which was also used with the digit materials, was recorded by D. Causey (pers. comm., 1979) and consisted of three females and three males talking simultaneously about different topics (Sperry et al, 1997). Because five-second segments of babble were compiled randomly, the babble was not intelligible. The word/babble segments were edited at the negative going zero crossings, which avoided clicks at the segment boundaries when the segments were concatenated. METHODS T he experimental design for this study formed a 2 x 3 x 2 mixed model factorial. Listener group (normal or impaired hearing) was varied between subjects, and stimulus type (digits, sentences, words) and list effects (3, 4) were varied within subjects. Two lists of each stimulus type were used to examine interlist reliability. In addition, test-retest was addressed. Participants Thirty-six young listeners with normal hearing (18–28 years, mean age = 23.3 years) and 72 older listeners with sensorineural hearing loss (31 to 84 years, mean age = 65.5 years) were studied. The younger listeners were recruited from local universities and had normal hearing (≤20 dB HL, American National Standards Institute [ANSI], 1996) at the 250–8000 Hz octave intervals. Inclusion criteria for the older listeners, who had audiometric test results that were consistent with sensorineural hearing loss, included the following: (1) >30 years of age, (2) a threshold at 500 Hz of ≤30 dB HL, (3) a threshold at 1000 Hz of ≤40 dB HL, and (4) a pure-tone average at 500, 1000, and 2000 Hz of ≤45 dB HL. The mean pure-tone, air-conduction thresholds for the test ear (and standard deviations) are listed in Table 1. MATERIALS T he following four speech-recognition materials were studied: (1) sentences in multitalker babble (QuickSIN; Etymotic Table 1. The Mean (and standard deviations) Ages (years) and Air-Conduction, Pure-Tone Thresholds for the Test Ear (dB HL; ANSI, 1996) for the 36 Listeners with Normal Hearing and the 72 Listeners with Hearing Loss Listeners with Normal Hearing Mean SD Listeners with Hearing Loss Mean SD Frequency in Hertz 1000 2000 Age 250 500 23.3 2.7 5.3 5.6 4.9 5.0 4.6 5.8 65.5 10.5 19.4 8.3 19.7 6.7 21.9 8.4 3000 4000 8000 2.8 6.4 4.4 5.8 2.6 6.4 5.8 8.6 40.4 16.4 56.4 15.0 62.3 16.0 62.4 19.2 729 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 The WIN protocol evolved with the following characteristics (Wilson, 2003; Wilson et al, 2003; Wilson et al, 2005): (1) 70 monosyllabic words from NU No. 6 spoken by the female speaker on the VA compact disc, which enabled the evaluation of recognition performance in quiet and in babble with the same materials spoken by the same speaker; (2) ten unique words presented at each of seven signal-to-babble ratios from 24 dB S/B to 0 dB S/B in 4 dB decrements; (3) each word was time locked to a unique segment of babble, which reduced variability; (4) the level of the babble, which was presented continuously, was fixed and the level of the words varied; (5) the interval between words was 2.7 sec, (6) the 50% point was quantifiable with the Spearman-Kärber equation (Finney, 1952); (7) a stopping rule that terminated the test sequence when ten words at one level were incorrect; and (8) the test provided a 6 to 12+ dB separation in terms of signalto-babble ratio between performance by listeners with normal hearing and performance by listeners with hearing loss. Subsequently, to reduce the test time so the instrument would be acceptable for clinic use, the 70-word protocol was divided into two 35-word equivalent lists based on the error analysis of the recognition performances of 573 listeners with sensorineural hearing loss (Wilson, Weakley, et al, forthcoming). Four randomizations of each list were generated with List 1 designated as the even-numbered lists and List 2 designated as the oddnumbered lists. As with the QuickSIN, Lists 3 and 4 of the WIN were selected for the study. The digits-in-noise protocol was developed as a potential screening protocol following the model suggested by Smits et al (2004). Digits 1 through 10, excluding bisyllabic 7, spoken by a male, were grouped randomly by triplets so that each of the nine digits was presented in quiet and in multitalker babble at each of seven signal-tobabble ratios (S/B) that ranged from 4 to -20 dB in 4 dB decrements. The quiet condition served to acquaint the listener with the listening task and was not intended for analysis. The level of the multitalker babble was fixed, and the level of the digits varied. The digits ranged from 365 msec (5) to 560 msec (9). Each digit was paired with and time locked to a unique segment of multitalker babble with 300 msec of babble preceding and succeeding the digit. The 730 duration of each digit/babble segment was the duration of the digit plus 600 msec. The digit triplets were formed by concatenating the various digit/babble segments with 300 msec segments of babble added before the first digit and after the third digit. These 300 msec segments were shaped with 25 msec rise/fall times, respectively. Again, the editing technique at the segment boundaries produced seamless transitions between boundaries that were not audible or perceptually apparent. The ~3.5 sec digit triplets were separated by a 4 sec quiet interstimulus interval (ISI), during which time the responses were made and recorded. Each of the two lists of the 72 digits recorded was ~3 min. Lists 3 and 4 of the digit triplets were used (Wilson, Burks, et al, forthcoming). For the quiet condition, List 4 of NU No. 6 was divided into two 25-word lists. The same female speaker who spoke the words for the multitalker babble paradigm was used (Department of Veterans Affairs, 1998). Each list of words was ~2 min. The NU No. 6 lists, the WIN lists, and the digits-in-noise lists were recorded on an audio compact disc (Hewlett Packard, Model DVD200i). Procedures Following the pure-tone audiogram, each listener was asked to respond to two questions on a scale of 1 to 10 (Wilson et al, 2005). First, when listening to a conversation in quiet without your hearing aids, how difficult is it for you to understand what the speaker is saying? Second, when listening to a conversation in a noisy background without your hearing aids, how difficult is it for you to understand what the speaker is saying? On the response scale, “1” was no difficulty understanding a speaker, and “10” was extreme difficulty understanding a speaker. Then, two trials were conducted with each of the three speech materials presented monaurally in multitalker babble during a 30minute session. The right ear was tested for the odd-numbered listeners, and the left ear was tested for the even-numbered listeners. With the QuickSIN, the level of the words was fixed at 90 dB SPL, and the level of the babble varied from 25 to 0 dB S/B (65 dB SPL to 90 dB SPL) in 5 dB steps. For each list of digits, three digit triplets (i.e., nine digits) were presented in quiet at 80 dB SPL to acquaint the listener with the task. Then the level of the babble was fixed at 80 dB SPL, and the Recognition of Digits, Words, and Sentences in Babble/McArdle et al level of the digits varied from 4 to -20 dB S/B (84 dB SPL to 60 dB SPL) in 4 dB steps. With the words in babble, the level of the babble was fixed at 80 dB SPL, and the level of the words varied from 24 to 0 dB S/B (104 dB SPL to 80 dB SPL) in 4 dB steps. Thus, the QuickSIN maintains the level of the speech and varies the level of the babble, whereas the WIN and digits-in-noise maintain the level of the babble and vary the level of the speech. These paradigms were maintained in this experiment because the comparisons were among the three protocols as each was intended for use in the clinic. A near common condition between the QuickSIN and words in babble was when the QuickSIN was presented at 10 dB S/B (speech at 90 dB SPL and babble at 80 dB SPL), and when the words in babble were presented at 8 and 12 dB S/B (speech at 88 and 92 dB SPL and babble at 80 dB SPL). The word and sentence materials in multitalker babble were reproduced on a compact disc player (Sony, Model CDP-497), routed through an audiometer (GrasonStadler, Model 10) to a TDH-50P earphone encased in a Telephonics P/N 510C017-1 cushion. The non-test ear was covered with a dummy earphone. On the even-numbered listeners, the right ears were used, with the left ears used on the odd-numbered listeners. All testing was conducted in a double-wall sound booth with the verbal responses of the listeners recorded into a spreadsheet. A counterbalanced design was used in which (1) Trial 1 on each material was completed before Trial 2 was conducted, i.e., all three materials were presented once before the second presentation of any of the materials; (2) the two lists of each material were presented an equal number of times in Trial 1 and Trial 2; and (3) each of the three materials was presented an equal number of times in the six possible orders. Following the Figure 1. The psychometric functions of the mean data for List 3 (open symbols) and List 4 (filled symbols) of the QuickSIN, digits, and word materials are shown for the listeners with normal hearing (squares) and for the listeners with hearing loss (circles). Quadrant IV depicts the mean functions for Lists 4 of the three materials. The key for the fourth panel is shown in the lower right corner with the data from the listeners with normal hearing depicted as open symbols and the data for the listeners with hearing loss depicted as filled symbols. 731 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 Table 2. The Mean 50% Correct Points (dB S/B) and Standard Deviations (dB) Established with the Spearman-Kärber Equation (Finney, 1952) for the Three Materials and the Two Subject Groups. Mean Normal Hearing SD Slope Mean SD Hearing Loss Slope Difference QuickSIN List 3 List 4 4.3 3.9 2.1 2.0 15.8 14.0 13.3 10.1 5.0 4.8 4.7 6.6 9.0 6.2 Digits List 3 List 4 -11.9 -11.8 2.6 3.0 6.5 6.5 -4.0 -3.9 3.8 3.4 6.3 6.6 7.9 7.9 Words List 3 List 4 5.0 4.4 2.4 2.0 5.8 6.7 12.4 12.3 3.5 3.8 6.9 7.2 7.4 7.9 Note: The slopes of the mean functions (%/dB) depicted in Figure 1 also are listed. six babble conditions, two 25-word lists compiled from List 4, NU No. 6, were presented in quiet. Half of the listeners were presented the first 25 words at 60 dB HL, and the second 25 words at 80 dB HL. The order was reversed for the remaining listeners. These two levels corresponded to the levels of the WIN words near the extremes of the function and ensured that audibility was not an issue at the less favorable signal-to-babble ratios. RESULTS T he psychometric functions for the two lists of each of the three speechrecognition materials are shown in Figure 1. The mean data for the listeners with normal hearing (squares) and for the listeners with hearing loss (circles) are shown for List 3 (open symbols) and List 4 (filled symbols) of the respective materials in the upper and left panels. The data in the lower-right panel depict the mean functions for both listener groups on List 4 of each the three materials along with the key to the panel. The lines through the datum points are the best-fit, third-degree polynomials used to describe the data. In addition to the mean data at each presentation level, the 50% points on the individual functions were quantified with the Spearman-Kärber equation (Finney, 1952). The mean 50% points (in dB S/B) and standard deviations (dB) are listed in Table 2 along with the slopes at the 50% points (%/dB) of the mean functions shown in Figure 1. The individual 50% points established with the Spearman-Kärber equation were subjected to a repeated-measures analysis 732 of variance (ANOVA) with one betweensubjects variable, listener group (normal hearing and hearing loss), and two withinsubjects variables, stimulus type (QuickSIN, digits, and words) and list (Lists 3 and 4). As might be expected from the data in Figure 1 and Table 2, the main effects of listener group (F[1,106] = 99.31, p < .001) and stimulus type (F[2,212] = 1875.52, p < .001) were statistically significant. Not as expected was the significant difference between lists (F[1,106] = 13.49, p < .001) with the 50% point for List 3 0.8 dB lower than the 50% point for List 4. In addition to the significant main effects of stimulus type and list, the interaction of the two effects also was statistically significant (F[2,212] = 11.35, p < .001) as well as the three-way interaction between listener group, stimulus type, and list (F[2,212] = 8.43, p < .001). Post hoc t-tests with Bonferroni corrections for multiple comparisons revealed that for the interaction between stimulus type and list, the only significant difference between Lists 3 and 4 was for the QuickSIN materials. Also, for the three-way interaction, post hoc t-tests with Bonferroni corrections for multiple comparisons revealed that the significant difference in list for the QuickSIN was only seen for listeners with hearing loss. No other post hoc comparisons for this three-way interaction were statistically significant. Examination of the mean data in Figure 1 and Table 2 provides a clear picture of the significant main effects from the ANOVA. First, for listener group, each of the three materials provided a significant difference between the performances by listeners with normal hearing and listeners with hearing loss. For equal recognition performance, the Recognition of Digits, Words, and Sentences in Babble/McArdle et al listeners with hearing loss required an ~8 dB more favorable signal-to-babble ratio than did the listeners with normal hearing. The ~8 dB difference between the data from the listeners with normal hearing and the listeners with hearing loss was consistent among all stimulus types, which is supported by a lack of an interaction in the ANOVA between listener group and stimulus type. For the digit and word materials, this ~8 dB difference was consistent throughout the range of signal-to-babble ratios. The differences between the two QuickSIN lists were less systematic and are considered in detail in the discussion section. Second, to examine further the main effect of stimulus type, post hoc t-tests with Bonferroni corrections for multiple comparisons were performed. The results revealed that across listener groups the 50% points for the words and QuickSIN were not significantly different (see Table 2). The 50% points for the digits, however, were at signalto-babble ratios 16–17 dB below the 50% points for the words and QuickSIN, which was a significant difference (p < .001). The data illustrated in Figure 2 provide a visualization of all the individual 50% points involved in the three-way interaction observed in the ANOVA. The three-panel bivariate plot shows the 50% point on List 3 (abscissa) and the 50% point on List 4 (ordinate) for the listeners with normal hearing (squares) and the listeners with hearing loss (circles). The large filled symbols are the means for each group. The diagonal line in each panel represents equal performance on the two lists with the numbers in parentheses indicating the number of datum points above, on, and below the diagonal. There are two sets of numbers in parentheses in each panel. The upper numbers refer to the data from the listeners with hearing loss (circles), and the lower numbers refer to the data from the listeners with normal hearing (squares). Datum points below the diagonal indicate poorer performance on List 3 than on List 4. The solid line represents the best-fit linear regressions for the listeners with hearing loss whereas the dotted line represents the best-fit linear regressions for the listeners with normal hearing. With each of the three speech materials, the datum points from the listeners with normal hearing are equally distributed around the diagonal line indicating equal performance on Lists 3 and Figure 2. The 50% correct points calculated with the Spearman-Kärber equation for Lists 3 (abscissa) and Lists 4 (ordinate) for the QuickSIN, digits, and words. The circles depict the data for the listeners with hearing loss, and the squares depict the data for the listeners with normal hearing. The large filled symbols illustrate the mean data. The numbers in parentheses indicate the number of datum points above, on, or below the diagonal line that depicts equal performance. The thick dashed and solid lines represent linear regressions used to fit the data. 733 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 Figure 3. The 50% correct points calculated with the Spearman-Kärber equation for Trial 1 (abscissa) and Trial 2 (ordinate) for the QuickSIN, digits, and words. The circles depict the data for the listeners with hearing loss, and the squares depict the data for the listeners with normal hearing. The large filled symbols illustrate the mean data. The numbers in parentheses indicate the number of datum points above, on, or below the diagonal line that depicts equal performance. The thick dashed and solid lines represent linear regressions used to fit the data. 734 4. The data for the listeners with hearing loss are not as clear cut. Consider the data for the QuickSIN (upper panel) in which 55 of the 72 listeners with hearing loss had poorer performance on List 3 than on List 4, which was reflected in the significant mean 3.2 dB difference between the 50% points of the two lists (Table 2). The digit data (middle panel) and word data (lower panel) indicate that about half the listeners in both groups performed better on List 3 than on List 4 with the other half demonstrating just the opposite. As the ANOVA three-way interaction indicated, Lists 3 and 4 of the digit and word materials produced equivalent results for the listeners with hearing loss, whereas Lists 3 and 4 for the QuickSIN did not produce equivalent results. To address test-retest, the data for the individual performances on Trial 1 versus Trial 2 for each stimulus type are shown in Figure 3. The format of Figure 3 is identical to the format of Figure 2 except Trial 1 (abscissa) and Trial 2 (ordinate) are the variables. For the QuickSIN and digit materials, more datum points are below the diagonal line (equal performance) than above the line, with no practical difference between the trials on the words. This relation indicates that recognition performance on Trial 1 was poorer than on Trial 2, suggesting a modest practice effect with the QuickSIN and digit materials. The mean signal-to-babble ratios at the 50% point collapsed across the two listener groups, and the three stimulus types were 3.2 dB for Trial 1 and 2.5 dB for Trial 2. The main effect of trial was statistically significant (F[1,106] = 14.27, p < .001) when the data were subjected to a repeatedmeasures ANOVA with one between-groups variable (listener group) and two withingroups variables (stimulus type and trial). Although the main effect of listener group and stimulus type remained statistically significant in this analysis, the interactions between trial and listener group as well as the interactions between trial and stimulus type were not statistically significant; this suggests that there was a practice effect on performance for all three stimulus materials. It is important to note, however, that the 0.7 dB difference between the mean performances on Trial 1 (3.2 dB S/B) and Trial 2 (2.5 dB S/B) is a magnitude that for clinical purposes is not particularly noteworthy. Speech-recognition testing is often criticized for an inability to predict real-world Recognition of Digits, Words, and Sentences in Babble/McArdle et al Table 3. Spearman rho Correlation Coefficients and Associated p-Value Collapsed across Hearing Status Groups Subjective Quiet Ratings NU No. 6 NU No. 6 (60 dB HL) (80 dB HL) Digits .469 <.001 .641 <.001 -.440 <.001 performance. In the current study, all participants were asked to rate their ability to understand speech in quiet and in noise on a scale of 1 to 10, with 1 representing no difficulty and 10 representing extreme difficulty. As expected, listeners with hearing loss reported higher medians in both conditions (3 = quiet, 8 = noise) than listeners with normal hearing (1 = quiet, 3 = noise). There was also more variability in responses for the listeners with hearing loss as indicated by the full range of responses for quiet and noise (1–10) as compared to the responses by listeners with normal hearing for quiet (1–5) and noise (1–7). Because the subjective scaling data were ordinal, the nonparametric correlational technique, Spearman rho, was used to examine the relationship between the subjective reports and objective measurement of speech recognition in both quiet and noise. The self-rating of speech understanding in quiet was correlated with performance on the NU No. 6 words at 60 (-0.469) and 80 dB HL (-0.440). The negative correlation coefficients and associated p-values reported in Table 3 suggest that the relationship between the subjective measures of speechrecognition performance in quiet and objective measures of speech recognition in quiet at low- and high-presentation levels was moderate. As expected, low subjective ratings were correlated with high percent correct performance for objective speech-recognition measures in quiet. A separate analysis was completed for subjective ratings of speech recognition in noise and the objective measures of speech recognition in noise utilizing each of the three stimulus materials. Performance on List 4 of each stimulus type was used because of the unsystematic performance for listeners with hearing loss on List 3 of the QuickSIN materials. The correlation coefficients and associated pvalues reported in Table 3 suggest that the relationship between subjective measures of speech-recognition performance in noise and objective performance for each of the three materials was moderate. Higher subjective Subjective Noise Ratings QuickSIN Words .572 <.001 .673 <.001 ratings, indicating greater difficulty communicating in noise, were obtained for individuals with higher 50% points. The mean correct recognitions for the listeners with hearing loss on the NU No. 6 presented in quiet at 60 and 80 dB SPL were 84.3% and 86.3%, respectively, which was a nonsignificant difference. The recognition performances at these two presentation levels, which corresponded to the 0 and 20 dB S/B conditions with the WIN, ensured that Figure 4. Bivariate plots of the word-recognition performance in quiet (ordinate) versus the signal-to-noise ratio at which the 50% point was obtained in multitalker babble (abscissa) with the QuickSIN (top) and with the words-in-babble (bottom). For graphic clarity, a jittered algorithm was used that randomly multiplied the x and y values by 1.025 to 0.975 in 0.05 steps. The shaded region indicates the area of normal performance for the two variables. The large filled symbols illustrate the mean data. The numbers in parentheses are the number of listeners in each quadrant of the plot. 735 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 audibility was not an issue at either of the signal-to-babble levels. Finally, it was of interest to examine the relationship between recognition performances on List 4 of the NU No. 6 in quiet at 80 dB HL and on the words in babble and the QuickSIN. These comparisons are made in Figure 4, in which correct recognition in quiet is on the ordinate and the 50% correct point (in dB S/B) is on the abscissa. The shaded region in the upper left of each panel represents performance in the 90th percentile by listeners with normal hearing for both the words in noise and the QuickSIN as well as clinically accepted “good” performance on recognition performance in quiet. The filled symbols represent the mean data for listeners with normal hearing (squares) and the listeners with hearing loss (circles). The numbers in parentheses are the number of listeners in each quadrant of the plot. For example, in the upper panel, the “46” indicates the number of listeners who had good word recognition in quiet (i.e., ≥80%) but were out of the normal range on the QuickSIN. The data in both panels indicate that most of the listeners with hearing loss performed within the normal range on the NU No. 6 in quiet but performed outside of the normal range on both the QuickSIN and the words in babble. These findings indicate the difficulty encountered when trying to predict wordrecognition performance in noise from word recognition in quiet, except when word recognition in quiet is poor. GENERAL DISCUSSION AND CONCLUSIONS T he purpose of this study was to examine the objective performance along with the subjective rating of listeners with hearing loss as compared to listeners with normal hearing on speech-in-noise tasks using stimuli with varied linguistic context. The use of digits, words, and sentences in multitalker babble allowed for a systematic examination of performance ability from a closed set (digits), to an open set with syntactic context (QuickSIN), to an open set without context (monosyllables). For objective measures, mean group performances were measured in decibels signal-to-babble ratio at the 50% point of the psychometric function, whereas subjective ratings were obtained using a tenpoint rating scale. As expected, the young listeners with 736 normal hearing performed ~8 dB better on the three speech materials than did the listeners with hearing loss. This represents an ~8 dB hearing loss in terms of signal-to-noise ratio. This result is in agreement with previous reports that examined the separation in performance between listeners with normal hearing and hearing loss on a speech-inbabble task (Dubno et al, 1984; Beattie, 1989, Wilson and Strouse, 2002; Wilson, 2003; Killion, 2004). In addition, the ~8 dB SNR loss quantifies the complaint by older listeners with hearing loss who report difficulty communicating in noisy environments. The results for words and digits are consistent with the findings from previous studies in terms of performance levels on speech-in-noise tasks by listeners with hearing loss. In the current study, words-in-babble data collected for the listeners with hearing loss had a 12.4 dB S/B mean at the 50% point (Table 2) collapsed across lists. This is in agreement with Wilson and Weakley (2004, table 6), who reported a mean 50% point at 12.2 dB S/B for a group of 48 listeners with hearing loss (mean age = 63.5) using twice as many words at each signal-to-babble ratio than the current study. With the digit stimuli, a -6.0 dB S/B at the 50% point was observed by Wilson and Weakley (2004) that is in close agreement with the -4.0 dB S/B at the 50% point in the current study. Walden and Walden (2004) examined the recognition performance on the QuickSIN by listeners with hearing loss and observed a mean 50% point of 8.3 dB S/B,1 which is 3.4 dB below the mean 50% point measured in the current study (11.7 dB S/B). Lists 3 and 4 of the QuickSIN were used in the current study, whereas Walden and Walden used Lists 5 and 6. The current study showed a significant effect of list for the listeners with hearing loss, reflecting a lack of interlist equivalency for the QuickSIN, which may have contributed to the discrepancy between studies. Another interesting result regarding the QuickSIN was the three-way interaction between listener group, stimulus type, and list. Inspection of the psychometric functions in the top right panel of Figure 1 shows that for listeners with hearing loss the data for List 3 of the QuickSIN (dashed line) was irregular compared to the List 4 data. As the signal-to-babble ratio is improved, the expected outcome is improved recognition performance. The data from List 3, however, did not reflect this relation. On List 3, recognition Recognition of Digits, Words, and Sentences in Babble/McArdle et al performance at the 10 and 20 dB S/Bs was poorer than at the adjacent lower signal-tobabble ratios (5 and 15 dB S/B, respectively). The 10 and 20 dB S/B datum points on List 3 also attained substantially lower performances than were obtained from the corresponding levels on List 4. The irregularities observed with List 3 are accountable for the 3.2 dB difference between the 50% points for List 3 (13.3 dB S/B) and List 4 (10.1 dB S/B) for the listeners with hearing loss. A similar, but smaller, discrepancy was observed in the QuickSIN List 3 data for the listeners with normal hearing. The 10 dB S/B point on List 3 is irregular both with respect to the two adjacent datum points and with respect to the corresponding data point on List 4. The discrepancy between and within lists at various signal-to-babble ratios, especially for the listeners with hearing loss, was an unexpected finding. Earlier equivalency data reported on the QuickSIN lists reported in the QuickSIN manual (2001) were completed using low-pass filtering of the QuickSIN lists on young listeners with normal hearing to simulate different degrees of high-frequency hearing loss. The use of masking can control the audibility factor of hearing loss and simulate hearing loss (Plomp, 1978; Souza et al, 2003), but masking may not mimic the distortion factor of hearing loss (Dreschler and Plomp, 1985; Thibodeau, 1991). The QuickSIN has gained some popularity with clinicians because it is quick and easy to administer; however, empirical data have yet to establish the recognition performance differences between the QuickSIN and other measures of speech recognition in noise that use stimuli with less syntactic and semantic context, such as monosyllables, to establish various levels of recognition performance. In the current study, not only was recognition performance for each stimulus type examined, but performance across the psychometric functions revealed two interesting findings in terms of the type of stimulus material. First, the psychometric functions for the digit stimuli, which are a special case of monosyllablic words, were morphologically similar to the functions for the word stimuli. The only difference was a “DC shift” to the more favorable signal-to-babble ratios by the function for the words. This 16–17 dB difference between performances on words and digits has been described and attributed to set-size differences (i.e., open- vs. closed-set paradigms) and calibration differences between digit and word materials (Miller et al, 1951; Wilson, Burks, et al, forthcoming). Although presented as triplets, the digits provide a relatively closed set since there are only nine possible items in each of the three positions. Recognition performance on a closed-set task, such as the one used in this experiment with the digits, is expected to be lower in terms of signal-to-babble ratio than performance on an open-set task since the number of items in the competition stage of lexical access for a closed set is smaller, thereby facilitating better recognition performance at all signal-to-babble ratios (Wilson and Antablin, 1982). Second, Table 2 shows for both listener groups the slopes for the psychometric functions generated with the three stimulus materials. The slopes at the 50% point for words and digits are basically the same for both groups of listeners (6 to 7%/dB). For the listeners with normal hearing, the slopes of the QuickSIN functions (14.0 to 15.8%/dB) are about two times as steep as the slopes for the digits (6.5%/dB) and words (5.8 to 6.7%/dB). Additionally, the slopes of the QuickSIN functions for the listeners with normal hearing are about the same for both lists of materials. The steeper functions for the QuickSIN materials indicate that the performance on sentences by the listeners with normal hearing was more homogenous than the performances on the word and digit materials. Perhaps the syntactic and semantic context of the sentence structure provided cues to the listeners with normal hearing that were not completely available to the listeners with hearing loss. For the listeners with hearing loss, the slopes of the functions for the two QuickSIN lists are different, with the slope of the function for List 4 the same as the slopes of the functions for the digits and words, and the slope of the function for List 3 more gradual. Again, the data for List 3 of the QuickSIN are reflecting the irregularities observed with the data. Also, the slopes of the QuickSIN functions for the listeners with hearing loss are substantially more gradual than the slopes of the functions for the listeners with normal hearing. Since it is known that older listeners are more vulnerable to the effects of noise (Dubno et al, 1984) and show increased difficulty processing information while resisting the interference of noise (Willott, 1991), it is possible that the listeners with hearing loss were unable to compensate with top-down processing (syntactic and semantic 737 Journal of the American Academy of Audiology/Volume 16, Number 9, 2005 context), as were the listeners with normal hearing. In summary, as audiology moves toward evidence-based practice guidelines, routine clinical testing methods such as single-level, monosyllabic word recognition in quiet need to be reevaluated. Although it is important to know how well (or poorly) patients understand speech in quiet, it is equally important to know how well patients understand speech in noise, especially for rehabilitative purposes. The digit, word, and sentence materials presented in multitalker babble each provided bimodal distributions of recognition performances by listeners with normal hearing and listeners with hearing loss. In the current study the differences between groups was ~8 dB in terms of signal-to-babble ratio. Test-retest data for the digits in babble were 1.6 dB (normal hearing) and 1.2 dB (hearing loss) and <1 dB for the QuickSIN and words in babble. The two lists of the QuickSIN materials produced psychometric functions that morphologically were different. The data for List 4 of the QuickSIN were systematic, whereas the data for List 3 were irregular. An examination of the homogeneity of the 18 QuickSIN lists on listeners with hearing loss is currently underway in our laboratories. The important findings in this study were (1) that words and sentences presented in background multitalker babble produce recognition performances by listeners with hearing loss that were equivalent (~12 dB S/B), and (2) that digits, words, and sentences provide the same ~8 dB differentiation between performances by listeners with normal hearing and performances by listeners with hearing loss. This differentiation for the most part has not been provided by performance on wordrecognition tasks in quiet. NOTE 1. The 50% point reported by Walden and Walden (2004) was 6.3 dB since Killion et al (2004) suggest reporting QuickSIN scores in terms of SNR loss, which entails subtracting 2 dB (average recognition performance for normal-hearing listener) from the 50% point obtained using the Spearman-Kärber equation. REFERENCES American National Standards Institute [ANSI]. (1996) Specification for Audiometers (ANSI S3.6-1996). New York: American National Standards Institute. Beattie RC. (1989) Word recognition functions for the CID W-22 test in multitalker noise for normally hear- 738 ing and hearing-impaired subjects. J Speech Hear Disord 54:20–32. Bronkhorst AW. (2000) The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Acoustica 86:117–128. Carhart R, Tillman TW. (1970) Interaction of competing speech signals with hearing loss. Arch Otolaryngol 91:273–279. Cherry EC. (1953) Some experiments on the recognition of speech with one and with two ears. J Acoust Soc Am 25:975–979. Committee on Hearing, Bioacoustics and Biomechanics (CHABA) Working Group on Speech Understanding and Aging. (1988) Speech understanding and aging. J Acoust Soc Am 83:859–895. Cox R, Alexander G, Gilmore C. (1987) Development of the Connected Discourse Test (CST). Ear Hear 8(Suppl.):119S–126S. Craik FIM. (1994) Memory changes in normal aging. Am Psychol Soc 3:155–158. Craik FIM, Lockhart RS. (1972) Levels of processing: a framework for memory research. J Verbal Lang Verbal Behav 11:671–684. Department of Veterans Affairs. (1998) Tonal and Speech Materials for Auditory Perceptual Assessment Disc 2.0. Mountain Home, TN: VA Medical Center. Dirks DD, Morgan DE, Dubno JR. (1982) A procedure for quantifying the effects of noise on speech recognition. J Speech Hear Disord 47:114–123. Dreschler WA, Plomp R. (1985) Relations between psychophysical data and speech perception for hearing-impaired subjects. II. J Acoust Soc Am 78:1261–1270. Dubno JR, Dirks DD, Morgan DE. (1984) Effects of age and mild hearing loss on speech recognition in noise. J Acoust Soc Am 76:87–96. Etymotic Research. (1993) The SIN Test. CD-ROM. Elk Grove Village, IL: Etymotic Research. Etymotic Research. (2001) QuickSIN Speech-in-Noise Test. CD-ROM. Elk Grove Village, IL: Etymotic Research. Findlay RC. (1976) Auditory dysfunction accompanying noise-induced hearing loss. J Speech Hear Disord 41:374–380. Finney DJ. (1952) Statistical Method in Biological Essay. London: C. Griffen. Groen JJ. (1969) Social hearing handicap: its measurement by speech audiometry in noise. Int Audiol 8:182–183. Horwitz AR, Dubno JR, Ahlstrom JB. (2002) Recognition of low-pass-filtered consonants in noise with normal and impaired high-frequency hearing. J Acoust Soc Am 111:409–416. Killion MC. (2002) New thinking on hearing in noise: a generalized articulation index. Semin Hear 23:57–75. Recognition of Digits, Words, and Sentences in Babble/McArdle et al Killion MC. (2004) Myths about hearing in noise and directional microphones. Hear Rev 11(2):14,16,18–19, 72–73. Killion MC, Niquette PA. (2000) What can the puretone audiogram tell us about a patient’s SNR loss? Hear J 53:46–53. Killion MC, Niquette PA, Gundmundsen GI, Revit LJ, Banerjee S. (2004) Development of a quick speechin-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am 116(4):2395–2405. Brooks Air Force Base, TX: USAF School of Aerospace Medicine. van Rooij JC, Plomp R. (1990) Auditive and cognitive factors in speech perception by elderly listeners. Acta Otolaryngol Suppl 476:177–181. van Rooij JC, Plomp R. (1992) Auditive and cognitive factors in speech perception by elderly listeners. III. Additional data and final discussion. J Acoust Soc Am 91:1028–1033. Martin FN, Champlin CA, Chambers JA. (1998) Seventh survey of audiometric practices in the United States. J Am Acad Audiol 9:95–104. Villchur E. (1982) The evaluation of amplitudecompression processing for hearing aids. In: Studebaker G, Bess F, eds. The Vanderbilt Hearing Aid Report. Upper Darby, PA: Monographs in Contemporary Audiology. Miller GA, Heise GA, Lichten W. (1951) The intelligibility of speech as a function of the context of the test materials. J Exp Psychol 41:329–335. Walden TC, Walden BE. (2004) Predicting success with hearing aids in everyday living. J Am Acad Audiol 15:342–352. Murdock BB, Jr. (1962) The serial position effect of free recall. J Exp Psychol 64:482–488. Wiley TL, Page AL. (1997) Summary: current and future perspectives on speech perception tests. In: Medel LL, Danhauer JL, eds. Audiologic Evaluation and Management and Speech Perception Assessment. San Diego: Singular Publishing Group, 201–210. Nilsson M, Soli SD, Sullivan JA. (1994) Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am 95:1085–1099. Plomp R. (1978) Auditory handicap of hearing impairment and the limited benefit of hearing aids. J Acoust Soc Am 63: 533–549. Plomp R. (1986) A signal-to-noise ratio model for the speech-reception threshold of the hearing impaired. J Speech Hear Res 29:146–154. Plomp R, Mimpen AM. (1979) Speech-reception threshold for sentences as a function of age and noise level. J Acoust Soc Am 66:1333–1342. Rabbitt P. (1990) Mild hearing loss can cause apparent memory failures which increase with age and reduce with IQ. Acta Otolaryngol Suppl 476:167–176. Rowland JP, Dirks DD, Dubno JR, Bell TS. (1985) Comparison of speech recognition-in-noise and subjective communication assessment. Ear Hear 6(6):291–296. Salthouse TA. (1985) A Theory of Cognitive Aging. Amsterdam: North-Holland. Smits C, Kapteyn TS, Houtgast T. (2004) Development and validation of an automatic SRT screening test by telephone. Int J Audiol 43:15–28. Souza PE, Tremblay KL, Boike KT. (2003) Effects of decreased audibility produced by high-pass maskers in younger and older adults. J Am Acad Audiol 14:427–433. Sperry JL, Wiley TL, Chial MR. (1997) Word recognition performance in various background competitors. J Am Acad Audiol 8:71–80. Thibodeau LM. (1991) Exploration of factors beyond audibility that may influence speech recognition. Ear Hear 12:109S–115S. Tillman T, Carhart R. (1966) An Expanded Test for Speech Discrimination Utilizing CNC Monosyllabic Words: Northwestern University Auditory Test No. 6. USAF School of Aerospace Medicine Technical Report. Willott JF. (1991) Aging and the Auditory System. San Diego, CA: Singular Publishing Group. Wilson RH. (2003) Development of a speech in multitalker babble paradigm to assess word-recognition performance. J Am Acad Audiol 14:453–470. Wilson RH, Abrams HB, Pillion AL. (2003) A wordrecognition task in multitalker babble using a descending presentation mode from 24-dB S/B to 0dB S/B. J Rehabil Res Dev 40:321–328. http://www.vard.org/jour/03/40/4/pdf/Wilson-B.pdf. Wilson RH, Antablin JK. (1982) The Picture Identification Task: a reply to Dillon. J Speech Hear Disord 47:111–112. Wilson RH, Burks CA, Weakley DG. (2005a) A comparison of word-recognition abilities assessed with digit pairs and digit triplets in multitalker babble. J Rehabil Res Dev 42:499–510. Wilson RH, Burks CA, Weakley DG. (Forthcoming) Word recognition of digit triplets and monosyllabic words in multitalker babble by listeners with sensorineural hearing loss. J Am Acad Audiol. Wilson RH, Strouse A. (2002) Northwestern University Auditory Test No. 6 in multitalker babble: a preliminary report. J Rehabil Res Dev 39:105–113. www.vard.org/jour/02/39/1/wilson.htm. Wilson RH, Weakley DG. (2004) The use of digit triplets to evaluate word-recognition abilities in multitalker babble. Semin Hear 25:93–111. Wilson RH, Weakley DG, Burks CA. (Forthcoming) The use of 35 words to evaluate hearing loss in terms of signal-to-babble ratio: a clinic protocol. J Rehabil Res Dev. Wingfield A. (1996) Cognitive factors in auditory performance: context, speed of processing, and constraints of memory. J Am Acad Audiol 7:175–182. 739
© Copyright 2026 Paperzz