J Am Acad Audiol 16:54–62 (2005) A Comparison of Two Approaches to Assessment of Speech Recognition Ability in Single Cases Rochelle Cherry* Adrienne Rubinstein* Abstract Single-case design with the randomization test (RT) has been proposed as an alternative to the binomial distribution (BD) tables of Thornton and Raffin (1978) to assess changes in speech recognition performance in individual subjects. The present study investigated whether data analyzed using both approaches would result in similar outcomes. Sixty-two adults with normal hearing were evaluated using phoneme scoring and a restricted alternating treatments design under two signal-to-noise conditions. Results revealed a significant correlation between the RT and a BD analysis using at least 50word lists, although the BD analysis was a more sensitive measure. The absence of correlation between the RT with a BD analysis using 25-word lists challenges the common clinical practice of using reduced list size, and supports the use of phoneme scoring and other attempts to find clinically acceptable yet evidence-based solutions to overcome the conflict between time and accuracy. Key Words: Binomial model, phoneme scoring, randomization test single case design, speech recognition tests Abbreviations: AT = alternating treatment; BD = binomial distribution; RT = randomization test Sumario Se ha propuesto el diseño de caso único con prueba de aleatoriedad (RT) como una alternativa a las tablas de distribución binomial (BD) de Thornton y Raffin (1978), para evaluar los cambios en el desempeño del reconocimiento del lenguaje en sujetos individuales. El presente estudio investigó si los datos analizados utilizando ambos enfoques rinde resultados similares. Se evaluaron sesenta y dos adultos con audición normal usando puntuación de fonemas y un diseño de tratamientos restringidos alternantes bajo dos condiciones de tasa de señal-ruido. Los resultados revelaron una correlación significativa entre el RT con un análisis BD utilizando al menos listas de 50 palabras, aunque el análisis BD constituyó una medida más sensible. La ausencia de correlación entre el RT con una análisis BD usando listas de 25 palabras desafía la práctica clínica común de utilizar listas de tamaño reducido, y apoya el uso de puntuación de fonemas y otros intentos de encontrar soluciones clínicamente *Department of Speech Communication Arts and Sciences, Brooklyn College Adrienne Rubinstein, Department of Speech Communication Arts and Sciences, 2900 Bedford Avenue, Brooklyn, New York 11210; Phone: 718-951-5186; Fax: 718-951-4363; E-mail: [email protected] Earlier portions of this paper were presented at the Annual Convention of the American Academy of Audiology, San Diego, CA, April 19–20, 2001. This study was supported by PSC-CUNY Research Grant # 61653-00-30. 54 A s s e s s i n g S p e e c h R e c o g n i t i o n/Cherry and Rubinstein aceptables y basadas en evidencia, para superar el conflicto entre el tiempo y la exactitud. Palabras Clave: Modelo binomial, puntuación de fonemas, diseño de caso único con prueba de aleatoriedad, pruebas de reconocimiento del lenguaje Abreviaturas: AT = Tratamiento alternante; BD = distribución binomial; RT = prueba de aleatoriedad W ord recognition testing is used in audiology to evaluate receptive communication abilities of individuals, to verify hearing aid and cochlear implant fittings, and to differentially diagnose auditory pathologies (Wiley et al, 1995; Martin et al, 2000). According to Martin and colleagues (Martin et al, 1998), 91% of audiologists routinely use word recognition tests, with the most common materials being the CID Auditory Test W-22 (Hirsh et al, 1952) and the Northwestern University Auditory Test No. 6, or NU-6 word lists (Tillman and Carhart,1966). One problem commonly associated with word recognition testing is the time consuming nature of its administration. Numerous attempts have been made to address this issue (Stach et al 1995; Olsen et al 1997; Gelfand 1998, 2003; Hurley and Sells, 2003). One popular technique used clinically is to administer only 25-word lists instead of 50 as it was originally designed (Martin et al, 1998). Reducing the number of items, however, increases test variability, thus making the test less reliable (Thornton and Raffin, 1978; Dillon, 1982). One solution proposed that reduces variability without increasing administration time is to incorporate phoneme instead of whole-word scoring (Boothroyd, 1968; Olsen et al, 1997; Gelfand, 1998, 2003). The AB Short Isophonemic word lists, for example, were specifically developed by Boothroyd (1968) to incorporate phoneme scoring using 10 monosyllabic words per list in order to reduce the number of words needed. Issues of variability and reliability are a concern when evaluating group data, but can be even more problematic when attempting to evaluate the results of individual clients. The typical method for establishing significant differences in speech recognition scores in a clinical setting involves the use of the binomial distribution tables (BD) of Thornton and Raffin (1978). Limitations in the use of the binomial distribution to predict test-retest variability for speech recognition assessment has been discussed by Dillon (1982, 1983). He concluded that the predictions of the binomial distribution should fall close to empirical values for the average subject but may not lead to valid results for a particular individual. An alternative to the BD approach is the use of single-subject, or single-case, design. In this design, intrasubject variability is highlighted through repeated measurement of the dependent variable under both conditions of the independent variable. The sequence in which conditions are presented can vary and produce different types of singlecase designs. The alternating treatment (AT) design, for example, may be distinguished from the more familiar ABA design (Tawney and Gast, 1984). The ABA design typically compares a no-treatment (A) baseline and a treatment (B) condition, followed by another no-treatment (A) phase. Several data points are collected for each condition before moving to the next condition. In the AT design, single data points may be collected as one alternates between the two conditions. Thus, results may be obtained with fewer data points than in the ABA design because one need not employ a formal baseline phase or multiple measures within each phase. Proposed methods for establishing significant differences in single-subject designs include visual inspection and statistical analysis. Whereas visual inspection has the advantage of being less complex and time-consuming, critics point to the element of subjectivity in the judgments, which has 55 J o u r n a l o f t h e A m e r i c a n A c a d e m y o f A u d i o l o g y /Volume 16, Number 1, 2005 been shown to affect agreement among raters (Ottenbacher, 1993). Chmiel and Jerger (1995) employed the randomization test (RT), a nonparametric statistical measure, to evaluate a variety of treatments using speech measures in single-subject designs. They recommended the use of the RT over the BD. Since the RT uses information on the performance stability for the specific individual being tested, it can overcome two potential pitfalls of the BD: (1) it relies on averaged data and (2) the data on which the tables were generated could be based on a population different from the subject. Another shortcoming of the BD analysis (although not an issue in traditional speech recognition testing) is that it requires a dependent variable that is specified as percentage measure, which prevents its application to other types of data. One disadvantage of the RT with single-case design, however, is that the analysis of the data is more complex and time-consuming (Rubinstein, 1996). This is especially problematic in the clinical environment and what makes the BD so appealing. The RT uses the variability of scores for each subject to determine the estimate of error variance which, in turn, serves to assess the treatment effect for that individual (Chmiel and Jerger, 1995). The mean difference in scores between the two conditions is compared to all other possible mean differences based on all possible assignments of the individual scores to the two conditions. When all mean differences are placed in rank order, those great enough to occur less often then than five times out of one hundred can be established. If the mean difference obtained falls within that range, conditions are judged to be significantly different. Chmiel and Jerger (1995) provide a detailed description of this test. In addition to the problem of variability, word-recognition testing has also been criticized for its lack of sensitivity (Walden et al, 1983; Mendel and Danhauer 1997). The use of a person’s own measure of performance stability could conceivably improve the sensitivity of the test. Rubinstein et al (1998) analyzed data using phoneme scoring and a BD versus a single-subject visual inspection approach to determine if the findings comparing two aided conditions would result in similar outcomes. The BD analysis used a 25-word list for each condition, according to 56 common clinical practice (Martin, 1998); the single subject analysis compared four 25word lists (100 words) for each condition in order to arrive at individual performance variability estimates. Rubinstein et al (1998) found that different subjects were identified as being affected by the experimental variable using the two types of analyses. However, when the data were reanalyzed and the item number was held constant at 100 words for both approaches instead of using a 25-word list for the BD analysis, both analyses arrived at the same outcome. This suggests that the performance stability information from the single subject analysis did not provide any significant advantage over the information provided by the BD as long as a sufficient number of items was used. It also suggests that use of 25-word lists reduces the validity of the speech recognition test. These conclusions must be treated with caution, however, because so few subjects were shown to be affected by the treatment variable. To address these questions, it is necessary to incorporate a more powerful independent variable, one that would result in significant differences for more than a few subjects. One variable known to affect word recognition performance is signal-to-noise ratio (Gengel et al, 1981; Beattie, 1989; Sperry et al, 1997). The purpose of the present study was to determine if speech recognition test data, analyzed using both the BD tables of Thornton and Raffin (1978), and the randomization test with a single-case design, would result in similar outcomes in the presence of two different signal-to-noise ratios. If the two analyses result in similar outcomes, it would then be valuable to determine whether one method is more sensitive in detecting differences. METHOD Subjects Sixty-two adults (mean age of 25.1 with standard deviation of 6.35 years) with hearing within normal limits served as subjects and met the following selection criteria: (a) speaking English as the first language, (b) passing a pure-tone screening at 20 dBHL at 250–8000 Hz bilaterally (using a GSI 16 audiometer), (c) having normal A s s e s s i n g S p e e c h R e c o g n i t i o n/Cherry and Rubinstein tympanograms bilaterally (peak pressure more positive than -50 daPa and static immittance between .3 and 1.6 ml) and present ipsilateral acoustic reflexes at 1000 Hz at 100 dB HL (using a GSI 33 Middle-Ear Analyzer). Procedure All speech and noise materials were commercially produced on compact disc by Auditec of St. Louis, played on a Sony CDP211 CD player and delivered via a GSI 16 audiometer. Materials included (a) the NU6 word lists (due to their common clinical usage and availability of multiple lists) and (b) the AB word lists (due to their shorter length and because they were specifically developed to be used with phoneme scoring). All subjects received both NU-6 and AB lists, and the test protocol was carried out in a single session. NU-6 Lists For the NU-6 word lists, each subject listened under earphones in the preferred ear (as determined by the subject) and responded verbally to eight 25-word lists presented at 50 dB HL (normal conversational level) under two conditions (0 versus -2 dB signal-to-noise ratios, four lists per condition). Cafeteria noise was used because it is dominated by low-frequency information but also contains some highfrequency components and simulates a reallife listening environment (Humes et al, 1997). These noise levels were chosen (a) to avoid ceiling and floor effects and (b) to assure a sufficiently powerful independent variable so that differences in speech recognition ability could be expected (Rubinstein et al, 1998). Data from Sperry et al (1997) had suggested that using signal-to-noise ratios around 0 dB and monosyllabic words with this subject pool would prevent ceiling and floor effects. Prior to the onset of the study, five subjects were run at various signal-to-noise ratios around 0dB, and it was determined that 0 dB and -2 dB signal-to-noise ratios provided appropriate levels for the independent variable. In order to accommodate a single-case analysis of the data, the order of the two noise conditions over the eight trials was randomized for each subject according to a restricted alternating treatments design (Onghena and Edgington, 1994; Rubinstein, 1996). This may be differentiated from a completely randomized design where any order of presentation of the two conditions may be randomly chosen. In such a case, it might have been possible to randomly choose an order in which all administrations of one treatment (e.g., four presentations of a 0 dB signal-to-noise ratio) precede all administrations of the second (e.g., -2 dB signal-to-noise ratio), thus introducing the risk of an order effect. In a restricted alternating treatments design, certain orders of administration are eliminated as possibilities, thus preventing certain undesirable outcomes. In the present study, no subject received more than three successive presentations for the NU-6 lists under the same noise condition. The order of the lists was counterbalanced. Phoneme recognition scores were calculated in percent correct for each of the eight presentations (four lists times two conditions) with 75 phonemes (25 words times three phonemes) per list. Phoneme scoring was adopted to increase the sensitivity of the measure by increasing sample size without having to increase test time (Olsen et al, 1997). The BD (using a 95% critical difference level) was used to establish who is differentially affected by the two noise conditions, using only the first list of each condition. The binomial tables were adjusted because it is incorrect to assume 75 independent pieces of information when using meaningful words, and thus, j values (Boothroyd and Nittrouer, 1988) were calculated and applied to the analysis of the data. The effect of phonemic constraint was accounted for by using the equations derived by Boothroyd and Nittrouer (1988), who compared recognition probability scores of the whole word to the probability scores of the individual phonemes, as follows: pw = ppj where pw is the probability of recognition of the whole word and pp is the probability of the recognition of an individual phoneme when j is known. The j value is calculated by j = log(pw)/log (pp) Using the above calculation, the j value 57 J o u r n a l o f t h e A m e r i c a n A c a d e m y o f A u d i o l o g y /Volume 16, Number 1, 2005 will vary depending on such factors as test material and subject pool but will always be a number less than three. In the present study, j values were calculated from the data obtained from each subject and multiplied by 25 to obtain the correct number of independent sources of information for subsequent analysis by the BD method. In addition, analysis of the single-case data was performed using the RT (Onghena and Edgington, 1994) to determine which subjects were differentially affected by the treatment variable. AB Lists The Isophonemic (AB) Word Lists (Boothroyd, 1968) consist of 15 lists, with 10 consonant-nucleus-consonant monosyllabic words per list and consisting of the same 10 vowels and 20 consonants. A more thorough description of this test can be found in Boothroyd (1968). In the case of the AB word lists, the procedure was the same except for the following: (1) ten 10-word lists were used, five for each condition (allowing for a BD analysis using 10 words, as well as 50 words for comparison with the NU-6), (2) no subject received more than four successive lists for the same condition (to avoid all trials for one condition from occurring before all trials of the other), and (3) phoneme recognition scores were calculated for each of ten presentations (five lists times two conditions) with 30 phonemes (10 words times three phonemes) per list, and (4) binomial tables were adjusted accordingly. R E S U LT S A N D D I S C U S S I O N T he purpose of the study was to determine if the BD and RT analyses would result in similar outcomes, that is, the results of a subject who demonstrated a significant difference between noise conditions on the RT would also show a significant difference when the BD analysis was used. It may be recalled that the BD analysis was based on 25 words for the NU-6 material (as performed clinically) and 10 words for the AB lists. Table 1 lists the number of subjects demonstrating similar outcomes when comparing the results of the BD and RT analyses. Using NU-6 materials, the BD and RT analyses for 38 of the 62 subjects resulted in similar outcomes: for 13 subjects both the BD and RT analyses resulted in statistical differences between noise conditions, and for 25, neither analysis showed a statistical difference. In the case of the AB lists, there were similar outcomes for the two analyses for 44 subjects, the numbers being 6 and 38, respectively. To determine if the BD and RT analyses were statistically correlated, a Spearman Rank order correlation coefficient (Zar, 1996) was calculated on the results of each list, and it revealed that the results from the BD and RT were not significantly correlated (NU-6: r = .2, p > .05; AB: r = .23, p> .05). In other words, a subject who ranked high on the BD did not necessarily show a high ranking on the RT. These results are consistent with those of Rubinstein et al (1998), who also found that subjects were not judged to be similarly affected by the treatment variable when comparing the two analyses. As in the earlier study (Rubinstein et al, 1998), the question arises whether these different outcomes were due to the different types of analyses performed or to the differences in number of items used (using NU-6: BD-25 words, RT-100 words; using A B: BD-10 words, RT-50 words). Thus, further analyses were carried out, with the BD Table 1. Number (and percentage) of Subjects (n = 62) Demonstrating Similar Outcomes from BD and RT Analyses When Determining If the Two Noise Conditions (treatment effect) Were Significantly Different Test Material BD List Size Number (and %) of subjects with similar outcomes Treatment effect No treatment effect Total with BD and RT with BD or RT NU–6 25 13 (21%) AB * p < .01 ** p < .001 58 25 (40.3%) 38 (61.3%) 50 21 (80.6%) 26 (41.9%) 47 (75.8%)** 100 23 (37.1%) 19 (30.6%) 42 (67.7%)** 10 6 (9.7%) 38 (61.3%) 44 (71%) 50 18 (29%) 32 (51.6%) 50 (80.6%)** A s s e s s i n g S p e e c h R e c o g n i t i o n/Cherry and Rubinstein analysis including the entire data set and based on adjusted BD tables (NU-6: 100 x 3; AB: 50 x 3; j values calculated based on Boothroyd and Nittrouer, 1988). Table 1 also lists the number of subjects achieving statistical significance when item number was held constant. Results both from the NU-6 and the AB lists revealed a statistically significant positive correlation between the BD and RT analyses (NU-6: r = .48, p < .001; AB: r = .64, p < .001). In other words, when item number was constant, the BD and RT analyses tended to identify the same subjects as affected by the noise conditions. These results are in agreement with those of Rubinstein et al (1998) and support the conclusion that lists of 25 words or less reduce the validity of the speech recognition test results, even with the addition of phoneme scoring. Since the NU-6 word lists originally were designed to be administered using 50 words, another correlation test was carried out comparing the RT (100 words) and the first two 25-word lists (50 words) for the BD analysis with adjusted BD tables (see Table 1). Results revealed a statistically significant correlation in this case as well (r = .55, p < .001), also supporting the use of 50-word lists with phoneme scoring. Figure 1 presents the speech recognition scores for a sample subject in each noise condition and each trial, for the NU-6 lists. Analysis using the BD on the first two data points (25 words each) revealed no statistically significant difference between conditions. When the analysis was based on the first 50 words per condition, or all the data (100 words each), results were statistically significant. The same pattern of findings was noted for the results of this subject with the AB lists, shown in Figure 2 (10 words, not statistically significant; 50 words, statistically significant). The results of the RT analysis for this subject were also statistically significant for both types of material. These figures provide pictorial representation of the variability of the data with 10- and 25-word lists, and evidence of the problem in using shortened word lists. Although the two analyses resulted in statistically significant agreement when 50and 100-word lists were used for the BD analysis, they did not always arrive at the F i g u r e 1 . Speech recognition scores of Subject 12 for both signal-to-noise ratio conditions using the NU-6 lists 59 J o u r n a l o f t h e A m e r i c a n A c a d e m y o f A u d i o l o g y /Volume 16, Number 1, 2005 F i g u r e 2 . Speech recognition scores of Subject 12 for both signal-to-noise ratio conditions using the AB lists same conclusion for every subject. Table 2 shows the number of subjects for whom the treatment effects of noise were significant for one analysis, but not the other (BD only vs. RT only). For example, in Table 1, in the case of the 50-word AB lists, 50 out of 62 subjects demonstrated agreement between the BD and RT. In Table 2, we see that of the remaining 12 subjects where the analyses did not agree, it was only the BD analysis that was sensitive enough to demonstrate significant differences between noise conditions. Thus, although the results from the BD and RT analyses were significantly correlated (as demonstrated by the Spearman Test), in cases where their results did not agree, one analysis (in this case the BD) was more sensitive and able to identify differences between conditions. To address this issue, the data were analyzed using the McNemar P Test of Correspondence (Zar, 1996). Results are shown in Table 2. In all cases where a significant correlation had been found, the BD analysis was statistically more sensitive. In other words, for subjects where the BD and Table 2. Number (and percentage) of Subjects (n = 62) Demonstrating Different Outcomes from the BD and RT Analyses When Determining If the Two Noise Conditions (treatment effect) Were Significantly Different Test Material BD List Size Number of subjects with different outcomes Treatment effect No treatment effect with BD and RT with BD or RT Total NU–6 25 13 (21%) 11 (17.7%) 24 (38.7%) 50 12 (19.4%) 3 (4.8%) 15 (24.2%)** 100 19 (30.6%) 1 (1.6%) 20 (32.2%)** 10 6 (9.7%) 12 (19.4%) 18 (29%) 50 12 (19.4%) 0 (0%) 12 (19.4%)** AB * p < .01 ** p < .001 60 A s s e s s i n g S p e e c h R e c o g n i t i o n/Cherry and Rubinstein RT analyses resulted in different findings, the BD test was the analysis more likely to detect the difference between the two noise conditions. CONCLUSION T he purpose of the present study was to determine if speech recognition test data, analyzed using both the BD tables of Thornton and Raffin (1978), and the RT with single-case design would result in similar outcomes. Results revealed that there was a significant correlation between the two types of analyses when phoneme scoring was used and the BD analysis was based on at least 50word lists. These findings were consistent across both types of materials (NU-6 and AB word lists). Furthermore, the RT did not add to the sensitivity of the measure, and actually was found to be significantly less sensitive. Therefore, using the individual’s own information on performance stability from the single-case design combined with the RT did not provide any advantage over the use of the BD in the clinical setting. This does not rule out the potential value of the RT in other circumstances. As noted earlier, it can demonstrate treatment effects based on any measures on an interval scale including measures not suitable to the BD tables (e.g., reaction time measures, amplitudes or latencies of evoked potentials) (Chmiel and Jerger,1995). The fact that the 10- and 25-word lists arrived at different findings as compared to the longer lists adds further evidence to the literature that questions the legitimacy of their use. The concern over the use of half lists continues to be acknowledged in the literature (Runge and Hosford-Dunn, 1985; Stach et al, 1995; Mendel and Danhauer, 1997; Olsen et al, 1997; Gelfand, 1998, Studebaker et al, 1999; Hurley and Sells, 2003), yet it continues to be used clinically (Wiley et al, 1995; Studebaker et al, 1999; Hurley and Sells, 2003). Further research is needed to find a solution acceptable to clinicians that can enable accuracy and reliability while limiting expenditure of time. In summary, the present study revealed that results using a BD and RT analysis were statistically correlated when using phoneme scoring and at least 50-word lists. Although the two analyses were correlated, the BD analysis was found to be a more powerful measure. The absence of correlation with 10- or 25-word lists even when phoneme scoring is used provides further support challenging the common clinical practice of using reduced list size. Further, it supports the use of phoneme scoring and other attempts to find clinically acceptable, yet evidence-based (i.e., researchdriven) (Wiley et al, 1995), solutions to overcome the conflict between time and accuracy. A c k n o w l e d g m e n t s . The authors would like to acknowledge the assistance of K. Gerow and M. Rosing in the statistical treatment of the data, and M. Bendavid for her comments. REFERENCES Beattie RC. (1989) Word recognition functions for the CID W-22 Test in multitalker noise for normally hearing and hearing-impaired subjects. J Speech Hear Disord 54:20–32. Boothroyd A. (1968) Developments in speech audiometry. Sound 2:3–10. Boothroyd A, Nittrouer S. (1988) Mathematical treatment of context effects in phoneme and word recognition. J Acoust Soc Am 84:101–114. Chmiel R, Jerger J. (1995) Quantifying improvement with amplification. Ear Hear 16:166–175. Dillon H. (1982) A quantitative examination of the sources of speech discrimination test score variability. Ear Hear 3:51–58. Dillon H. (1983) The effect of test difficulty on the sensitivity of speech recognition tests. J Acoust Soc Am 73:336–344. Gelfand SA. (1998) Optimizing the reliability of speech recognition scores. J Speech Lang Hear Res 41:1088–1102. Gelfand SA. (2003) Tri-word presentations with phonemic scoring for practical high reliability speech recognition assessment. J Speech Lang Hear Res 41:1088–1102. Gengel RW, Miller L, Rosenthal E. (1981) Between and within listener variability in response to CID W-22 presented in noise. Ear Hear 2:78–81 Hirsh IJ, Davis H, Silverman SR, Reynolds EG, Eldert E, Benson RW. (1952) Development of materials for speech audiometry. J Speech Hear Disord 17:321–327. Humes L, Christensen L, Bess F, Hedley-Williams A. (1997) A comparison of the benefit provided by wellfit linear hearing aids and instruments with automatic reductions of low-frequency gain. J Speech Lang Hear Res 40:666–685. Hurley RM, Sells JP. (2003) An abbreviated word recognition protocol based on item difficulty. Ear Hear 24:111–118. 61 J o u r n a l o f t h e A m e r i c a n A c a d e m y o f A u d i o l o g y /Volume 16, Number 1, 2005 Martin F, Champlin C, Chambers J. (1998) Seventh survey of audiometric practices in the United States. J Am Acad Audiol 9:95–104. Martin F, Champlin C, Perez D. (2000) The question of phonetic balance in word recognition testing. J Am Acad Audiol 11:489–493. Mendel LL, Danhauer JL. (1997). Test administration and interpretation. In: Mendel LL, Danhauer JL, eds. Audiologic Evaluation and Management and Speech Perception Assessment . San Diego, CA: Singular Publishing Group, 15–43. Olsen W, Van Tasell D, Speaks C. (1997) Phoneme and word recognition for words in isolation and in sentences. Ear Hear 18:175–188. Onghena P, Edgington E. (1994) Randomization tests for restricted alternating treatments designs. Behav Res Ther 32:783–786. Ottenbacher K. (1993) Interrater agreement of visual analysis in single-subject decisions: quantitative review and analysis. Am Assoc Ment Retard 98:135–142. Rubinstein A. (1996) Letter to the editor: Complete and restricted alternating treatment designs. Ear Hear 17:78. Rubinstein A, Cherry R, Schur K. (1998) Comparing two methods of evaluating aided speech recognition performance for single cases. J Acad Rehabilitative Audiol 31:31–44. Runge CA, Hosford-Dunn H. (1985) Word recognition performance with modified CID W-22 word lists. J Speech Hear Res 28:355–362. Sperry J, Wiley T, Chial M. (1997) Word recognition performance in various background competitions. J Am Acad Audiol 8:71–80. Stach B, Davis-Thaxton ML, Jerger J. (1995) Improving the efficiency of speech audiometry: computer-based approach. J Am Acad Audiol 6:330–333. Studebaker G, Gray G, Branch W. (1999) Prediction and statistical evaluation of speech recognition test scores. J Am Acad Audiol 10:355–370. Tawney J, Gast D. (1984) Single Subject Research in Special Education. Columbus, OH: Charles E. Merrill Publishing. Thornton A, Raffin M. (1978) Speech discrimination scores modeled as a binomial variable. J Speech Hear Res 21:507–518. Tillman TW, Carhart R. (1966) An Expanded Test for Speech Discrimination Utilizing CNC Monosyllabic Words. Northwestern University Auditory Test 6. USAF School of Aerospace Medicine Technical Report. Brooks Air Force Base, TX: USAF School of Aerospace Medicine. Walden B, Schwartz D, Williams D, Holum-Hardegen L, Crowley J. (1983) Test of the assumptions underlying comparative hearing aid evaluations. J Speech Hear Disord 48:264–273. Wiley T, Stoppenbach D, Feldhake L, Moss K, Thordarottir E. (1995) Audiologic practices: what is 62 popular versus what is supported by the evidence. Am J Audiol 4:26–34. Zar JH. (1996) Biostatistical Analysis. Upper Saddle River, NJ: Prentice Hall.
© Copyright 2026 Paperzz