Do severity markers differentiate visual-analog-scale ratings

Do severity markers differentiate visual-analog-scale ratings of dysphonia?
Kathy Nagle1, Leah B. Helou2, Nancy P. Solomon3 & Tanya Eadie1
1University
of Washington, Seattle; 2University of Pittsburgh; 3Walter Reed National Military Medical Center
Introduction
Method
Results
Conclusions
PURPOSE
To compare auditory perceptual ratings of dysphonic and
normal voice samples using a standard visual analog scale
labeled at the extremes, and VAS labeled for severity
nonlinearly (CAPE-V, beta version) and symmetrically
(CAPE-V, official version). The null hypothesis is that
ratings across the three styles of VAS will be comparable
and equally reliable.
RESEARCH QUESTIONS
1. What is the effect of presence and location of severity
labels on perceived overall severity (OS) of dysphonia and
perceived vocal effort (VE) when listeners use VAS?
2. What is the effect of the presence and location of
severity labels on the reliability of ratings of dysphonia for
OS and VE when listeners use VAS?
BACKGROUND
Auditory perceptual evaluation of voice is the clinical
“gold standard” of voice assessment, but reliability is
problematic (Kreiman et al., 1993). The Consensus
Auditory Perceptual Evaluation of Voice (CAPE-V;
ASHA, 2002) was developed to standardize procedures
and improve reliability.
Rating scales can affect the reliability and validity of
results. Visual analog scales (VAS), or undifferentiated
lines, are often preferred because they provide continuous
data. The CAPE-V uses VAS, but it is supplemented with
severity labels placed beneath the lines for each parameter
to make the tool more familiar to clinicians.
Positioning of severity labels under the VAS on the
CAPE-V has varied over time. ASHA first released the
CAPE-V in 2002; the labels were placed nonlinearly
(Figure 1, left). This “beta” version has been used in
subsequent research and reprinted in several clinical
textbooks. Kempster et al. published the “official” version
of the CAPE-V form in 2009; the labels were placed
symmetrically (Figure 1, right). The presence and position
of severity labels associated with VAS could potentially
affect ratings. This study addresses this possibility.
Stimuli: 2nd sentence of the Rainbow Passage (Fairbanks,
1960), produced by 10 males and 10 females with a variety
of voice disorders (mean age=52 yr), and 4 age- and sexmatched controls. Six samples were repeated to determine
intra-rater reliability (n= 24+6).
Mean ratings:
• statistically significant difference (p < .01) between
Group 1 (nonlinear) and Groups 2 & 3 for both OS and VE
• difference between Groups 2 (endpoints only) and 3
(symmetric) not statistically significant
Effect of presence & location of severity markers on
1. Ratings of perceived dysphonia:
The use of nonlinearly placed severity markers towards the
lower end of a VAS results in lower ratings for both
dimensions tested (Figure 3). The mean difference shifted
about 6 mm, which is almost half the distance that the
“moderate” severity marker was shifted between the two
types of marked scales. This suggests that the nonlinearly
placed markers introduced a rating bias that was tempered
somewhat by the VAS.
2. Reliability of judgments
Rates of agreement were comparable to other studies of
vocal effort and overall severity, and consistent with
previous research using inexperienced listeners. The type
of VAS used did not affect agreement in this study.
RESEARCH POSTER PRESENTATION DESIGN © 2011
www.PosterPresentations.com
normal
extremely severe
Figure 2. Standard VAS
Group 3: CAPE-V official (symmetrical markers)
50
45
Mean ratings in % of line
Listeners: 60 inexperienced listeners (mean age=24 yr)
randomly assigned to one of three groups, each using a
different 100-mm VAS.
Group 1: CAPE-V beta (nonlinear markers)
Group 2: Standard VAS (endpoint markers only)
• Stimuli were presented in random order by a customized
software program on a desktop computer. Listeners wore
headphones with volume set at a comfortable level, and
rated stimuli by placing a cursor at the selected location on
the given VAS. The order of dimensions was counterbalanced across listeners.
50
40
35
35
**
30
**
30
25
25
20
20
15
15
10
10
5
5
0
0
Nonlinear
**
45
40
PROCEDURES
• Overall severity (OS) was defined as “a measure of how
‘good’ or ‘poor’ the voice sample was judged to be for its
voice quality” (Eadie & Doyle, 2002). Vocal effort (VE)
was defined as “the perceived effort in producing voice”
(Verdolini et al., 1994), and is considered to reflect vocal
strain. These two parameters were selected to represent
relatively high and low reliability for voice quality ratings.
Vocal Effort
**
Figure 1. Consensus Auditory Perceptual Evaluation of Voice (CAPE-V):
beta version (left), official version (right)
Overall Severity
Endpoints Symmetric
Only
Caveats & Implications
Nonlinear
Endpoints
Only
Symmetric
Figure 3. Differences in mean ratings between groups, in % of the scale.
Error bars = 1 SE. **Difference significant at p < .01 level.
Reliability:
• intra-rater agreement range across groups:
 OS: .60-.69, VE: .62-.64
• inter-rater agreement range across groups:
 OS: .55-.60, VE: .45-.52
• differences in intra- and inter-rater agreement not
statistically significant for either dimension
• differences in variance not statistically significant for
either dimension
• more variability for VE than OS (Figure 4)
Mean Variance by Scale Type
600
OS
Data Analysis
500
Mean ratings: Mean ratings for each speaker were
calculated for each group. Within-groups repeated
measures ANOVA with Bonferroni correction was
performed for each dimension.
Intra-rater reliability: Six stimuli (30%) were repeated
to calculate intra-rater agreement of judgments using oneway ANOVA, with post-hoc Tukey’s HSD. Agreement was
defined as ratings within 10 mm (10% of line) (Chan &
Yiu, 2002).
Inter-rater reliability: Inter-rater reliability within groups
was calculated using one-way ANOVA, with post-hoc
Tukey’s HSD. Judgments within 10% agreed.
Variability: Variability within groups was calculated in
terms of variance (mean sum of squares).
400
VE
300
• These findings may be applicable to perceptual ratings
other than voice quality, as well as in research studies that
use visual analog scales.
• All three scales appear acceptable for comparing listener
ratings within a study, as listeners were similarly reliable
for all.
• Comparison across studies using different scale types is
not recommended due to differences in mean ratings.
• Inexperienced listeners were used in this study, but
clinicians and other expert listeners are more likely to use
the CAPE-V.
• Researchers and clinicians should be specific about
which version of the CAPE-V they use; the widely
published 2002 (nonlinear) version can be referred to as the
“beta” version, and the more recent 2009 (symmetric)
version as the “official” version.
• The official version of the CAPE-V provides data akin
to results expected from a traditional VAS, and is therefore
recommended for future studies.
• Future research will determine whether these findings
will hold for dimensions other than OS and VE.
200
References
100
0
Nonlinear
Endpoints Only
Symmetric
Figure 4. Differences in mean variance by scale type, for each dimension.
Error bars = 1 SE.
Acknowledgments
Funding Sources: Lesley B. & Steven G. Olswang Endowed Graduate
Student Conference Fund, University of Washington (UW) Graduate
School Fund for Excellence and Innovation; Speaker and listener
participants; UW Vocal Function Lab.
The views expressed in this presentation are those of the authors and do not necessarily
reflect the official policy or position of the Department of Defense or the US Government.
• American Speech-Language-Hearing Association (2002). Consensus auditory-perceptual
evaluation of voice (CAPE-V). Rockville, MD.
• Chan, K. M., & Yiu, E. M. (2002). The effect of anchors and training on the reliability of
perceptual voice evaluation. J Speech Lang Hear Res, 45(1), 111-126.
• Eadie, T. L., & Doyle, P. C. (2002). Direct magnitude estimation and interval scaling of
pleasantness and severity in dysphonic and normal speakers. Journal of the Acoustical
Society of America, 112(6), 3014-3021.
• Kempster, G. B., Gerratt, B. R., Verdolini Abbott, K., Barkmeier-Kraemer, J., & Hillman,
R. E. (2009). Consensus auditory-perceptual evaluation of voice: development of a
standardized clinical protocol. American Journal of Speech-Language Pathology,18:124132.
• Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., & Berke, G.S. (1993). Perceptual
evaluation of voice quality: review, tutorial, and a framework for future research. J Speech
Hear Res, 36(1):21-40.
• Verdolini, K., Titze, I. R., & Fennell, A. (1994). Dependence of phonatory effort on
hydration level. J Speech Hear Res, 37(5), 1001-1007.