Do severity markers differentiate visual-analog-scale ratings of dysphonia? Kathy Nagle1, Leah B. Helou2, Nancy P. Solomon3 & Tanya Eadie1 1University of Washington, Seattle; 2University of Pittsburgh; 3Walter Reed National Military Medical Center Introduction Method Results Conclusions PURPOSE To compare auditory perceptual ratings of dysphonic and normal voice samples using a standard visual analog scale labeled at the extremes, and VAS labeled for severity nonlinearly (CAPE-V, beta version) and symmetrically (CAPE-V, official version). The null hypothesis is that ratings across the three styles of VAS will be comparable and equally reliable. RESEARCH QUESTIONS 1. What is the effect of presence and location of severity labels on perceived overall severity (OS) of dysphonia and perceived vocal effort (VE) when listeners use VAS? 2. What is the effect of the presence and location of severity labels on the reliability of ratings of dysphonia for OS and VE when listeners use VAS? BACKGROUND Auditory perceptual evaluation of voice is the clinical “gold standard” of voice assessment, but reliability is problematic (Kreiman et al., 1993). The Consensus Auditory Perceptual Evaluation of Voice (CAPE-V; ASHA, 2002) was developed to standardize procedures and improve reliability. Rating scales can affect the reliability and validity of results. Visual analog scales (VAS), or undifferentiated lines, are often preferred because they provide continuous data. The CAPE-V uses VAS, but it is supplemented with severity labels placed beneath the lines for each parameter to make the tool more familiar to clinicians. Positioning of severity labels under the VAS on the CAPE-V has varied over time. ASHA first released the CAPE-V in 2002; the labels were placed nonlinearly (Figure 1, left). This “beta” version has been used in subsequent research and reprinted in several clinical textbooks. Kempster et al. published the “official” version of the CAPE-V form in 2009; the labels were placed symmetrically (Figure 1, right). The presence and position of severity labels associated with VAS could potentially affect ratings. This study addresses this possibility. Stimuli: 2nd sentence of the Rainbow Passage (Fairbanks, 1960), produced by 10 males and 10 females with a variety of voice disorders (mean age=52 yr), and 4 age- and sexmatched controls. Six samples were repeated to determine intra-rater reliability (n= 24+6). Mean ratings: • statistically significant difference (p < .01) between Group 1 (nonlinear) and Groups 2 & 3 for both OS and VE • difference between Groups 2 (endpoints only) and 3 (symmetric) not statistically significant Effect of presence & location of severity markers on 1. Ratings of perceived dysphonia: The use of nonlinearly placed severity markers towards the lower end of a VAS results in lower ratings for both dimensions tested (Figure 3). The mean difference shifted about 6 mm, which is almost half the distance that the “moderate” severity marker was shifted between the two types of marked scales. This suggests that the nonlinearly placed markers introduced a rating bias that was tempered somewhat by the VAS. 2. Reliability of judgments Rates of agreement were comparable to other studies of vocal effort and overall severity, and consistent with previous research using inexperienced listeners. The type of VAS used did not affect agreement in this study. RESEARCH POSTER PRESENTATION DESIGN © 2011 www.PosterPresentations.com normal extremely severe Figure 2. Standard VAS Group 3: CAPE-V official (symmetrical markers) 50 45 Mean ratings in % of line Listeners: 60 inexperienced listeners (mean age=24 yr) randomly assigned to one of three groups, each using a different 100-mm VAS. Group 1: CAPE-V beta (nonlinear markers) Group 2: Standard VAS (endpoint markers only) • Stimuli were presented in random order by a customized software program on a desktop computer. Listeners wore headphones with volume set at a comfortable level, and rated stimuli by placing a cursor at the selected location on the given VAS. The order of dimensions was counterbalanced across listeners. 50 40 35 35 ** 30 ** 30 25 25 20 20 15 15 10 10 5 5 0 0 Nonlinear ** 45 40 PROCEDURES • Overall severity (OS) was defined as “a measure of how ‘good’ or ‘poor’ the voice sample was judged to be for its voice quality” (Eadie & Doyle, 2002). Vocal effort (VE) was defined as “the perceived effort in producing voice” (Verdolini et al., 1994), and is considered to reflect vocal strain. These two parameters were selected to represent relatively high and low reliability for voice quality ratings. Vocal Effort ** Figure 1. Consensus Auditory Perceptual Evaluation of Voice (CAPE-V): beta version (left), official version (right) Overall Severity Endpoints Symmetric Only Caveats & Implications Nonlinear Endpoints Only Symmetric Figure 3. Differences in mean ratings between groups, in % of the scale. Error bars = 1 SE. **Difference significant at p < .01 level. Reliability: • intra-rater agreement range across groups: OS: .60-.69, VE: .62-.64 • inter-rater agreement range across groups: OS: .55-.60, VE: .45-.52 • differences in intra- and inter-rater agreement not statistically significant for either dimension • differences in variance not statistically significant for either dimension • more variability for VE than OS (Figure 4) Mean Variance by Scale Type 600 OS Data Analysis 500 Mean ratings: Mean ratings for each speaker were calculated for each group. Within-groups repeated measures ANOVA with Bonferroni correction was performed for each dimension. Intra-rater reliability: Six stimuli (30%) were repeated to calculate intra-rater agreement of judgments using oneway ANOVA, with post-hoc Tukey’s HSD. Agreement was defined as ratings within 10 mm (10% of line) (Chan & Yiu, 2002). Inter-rater reliability: Inter-rater reliability within groups was calculated using one-way ANOVA, with post-hoc Tukey’s HSD. Judgments within 10% agreed. Variability: Variability within groups was calculated in terms of variance (mean sum of squares). 400 VE 300 • These findings may be applicable to perceptual ratings other than voice quality, as well as in research studies that use visual analog scales. • All three scales appear acceptable for comparing listener ratings within a study, as listeners were similarly reliable for all. • Comparison across studies using different scale types is not recommended due to differences in mean ratings. • Inexperienced listeners were used in this study, but clinicians and other expert listeners are more likely to use the CAPE-V. • Researchers and clinicians should be specific about which version of the CAPE-V they use; the widely published 2002 (nonlinear) version can be referred to as the “beta” version, and the more recent 2009 (symmetric) version as the “official” version. • The official version of the CAPE-V provides data akin to results expected from a traditional VAS, and is therefore recommended for future studies. • Future research will determine whether these findings will hold for dimensions other than OS and VE. 200 References 100 0 Nonlinear Endpoints Only Symmetric Figure 4. Differences in mean variance by scale type, for each dimension. Error bars = 1 SE. Acknowledgments Funding Sources: Lesley B. & Steven G. Olswang Endowed Graduate Student Conference Fund, University of Washington (UW) Graduate School Fund for Excellence and Innovation; Speaker and listener participants; UW Vocal Function Lab. The views expressed in this presentation are those of the authors and do not necessarily reflect the official policy or position of the Department of Defense or the US Government. • American Speech-Language-Hearing Association (2002). Consensus auditory-perceptual evaluation of voice (CAPE-V). Rockville, MD. • Chan, K. M., & Yiu, E. M. (2002). The effect of anchors and training on the reliability of perceptual voice evaluation. J Speech Lang Hear Res, 45(1), 111-126. • Eadie, T. L., & Doyle, P. C. (2002). Direct magnitude estimation and interval scaling of pleasantness and severity in dysphonic and normal speakers. Journal of the Acoustical Society of America, 112(6), 3014-3021. • Kempster, G. B., Gerratt, B. R., Verdolini Abbott, K., Barkmeier-Kraemer, J., & Hillman, R. E. (2009). Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. American Journal of Speech-Language Pathology,18:124132. • Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., & Berke, G.S. (1993). Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. J Speech Hear Res, 36(1):21-40. • Verdolini, K., Titze, I. R., & Fennell, A. (1994). Dependence of phonatory effort on hydration level. J Speech Hear Res, 37(5), 1001-1007.
© Copyright 2026 Paperzz