The Voice Lab: Is it just numbers?

The Voice Lab: Is it just numbers?
Linda M. Carroll, PhD CCC-SLP
Private Practice, New York, NY
Senior Voice Scientist
The Children’s Hospital of Philadelphia
ASHA 2011
ABSTRACT:
Laryngeal function studies allow inspection of vocal tract control and coordination,
enhancing the clinical interpretation and recommendations for management. This course
provides a review of speech science principles, as well as accurate collection and
interpretation of acoustic and aerodynamic measures. Attendees will become
reacquainted with formulas for measures of fundamental frequency, gain an
understanding of the typical measures for various pathologies, explore protocols, and
understand the role of clinical data management, analysis and interpretation.
Summary:
The voice laboratory is an important tool for the laryngologist and therapist.
Acoustic and aerodynamic measures yield valuable information on laryngeal behavior,
function and compensation, and help guide the treatment plan. The results assist the team
in predicting medical/surgical consequences, determining voice/speech therapy needs,
and help outline the most efficacious treatment plan to restore optimum function.
Baken and Orlikoff (1) outline 6 rules for clinical measurement of voice function:
(1) measurements must have a known (or at least a very likely) and specific relationship
to recognized aspects of speech system physiology; (2) a measurement must have clear
relevance; (3) a measurement method should have a “history” in the literature; (4)
measurements must be thoroughly understood; (5) never trust a computer completely;
and (6) measurement should be limited to situations in which it is likely to be useful.
They add a final comment that “measurements can be no better than the knowledge and
skills of the clinician who chooses and obtains them.” Clinical measures include acoustic
measures, aerodynamic measures, and use clinician and patient perceptual rating scales.
PROMS (Patient Reported Outcome Measures) are an important aspect of
determining the severity of perceived dysphonia by the adult patient, or their caregiver
(for children). A patient who reports minimal disability due to their voice problem
warrants different management, goals, and objective measures than a patient with severe
perceived disability. There are a myriad of PROMs for adults, but very few for children.
In general, two indices are preferred to fully establish impact of dysphonia. The two most
common PROMs are the Voice Handicap Index and the Pediatric Voice Handicap index.
The Voice Handicap Index (2) probes degree of vocal disability perceived by the patient
through 10 questions relating to functional, physical and emotional areas. Children may
be assessed through the Pediatric Voice Handicap Index (PVHI). Mild perceived
dysphonia by the patient coupled with moderate to severe objective measures and indices
of abnormality in a benign voice disorder will alter and minimize the treatment plan.
Conversely, severe perceived dysphonia by the patient in the presence of near normal
voice function causes the voice care team to delicately manage the patient’s perception of
impairment.
Acoustic measures include speaking fundamental frequency (SF0), physiological
range, and perturbation measures as a bare minimum. Within the area of SF0, measures
are taken of the observed SF0 in a specified task and compared with the normative data
and compared with the predicted SF0 for the patient’s overall physiologic capacity.
Although there is no absolute frequency for an individual to use, there is a general range
of efficient vocal production. Colton and Casper (3) and Fairbanks (4) report healthy voice
use at roughly 25% above the lowest SF0 of the individual’s physiological frequency
range. It is important that predicted SF0 be computed from the PFR and then compared
with the observed SF0 to make a reasonable judgment on the speaking “pitch.”
Because the vocal folds vibrate at rapid rates during conversational speech, most
clinicians will take advantage of computer algorithms to extract SF0. When computing
F0 for the physiological low and physiological high, it is often necessary to use zoom-in
features on the signal. Many analysis programs demonstrate errors in F0 extraction due to
capture attributes or signal features. A trained ear is invaluable in the laboratory, but even
a trained ear needs to remember the frequency ranges of the various registers in order to
interpret the data. To compute the number of semitones over the PFR range, frequencies
cannot be simply subtracted due to the logarithmic relationship between frequencies.
Interestingly, the PFR is the same for adults as children.
Jitter and shimmer are the two common perturbation measures in acoustic
analysis. Jitter is a measure of frequency instability, while shimmer is a measure of
amplitude instability. A normal voice has a small amount of instability during sustained
vowel production. Normal instabilities are influences by tissue and muscle properties.
Large variations in perturbation values signal increased instability at the source
(laryngeal) level. Ratio of noise component to harmonic component yields information on
the ability of the individual to coordinate source and filter acoustics. Perturbation
measures vary between children and adults, and may vary among professional voice
users.
Although maximum phonation was commonplace in the past, the more elaborate
acoustic and aerodynamic measures are the current standard. Maximum phonation
remains a valuable probe in the therapy environment, and can yield important
information.
Aerodynamic measures yield important information on the patient’s ability to
coordinate respiratory drive from the power subsystem. Aerodynamic measures are
particularly important prior to laryngeal framework surgery, allowing a numeric value to
the observed glottal aperture abnormality. Aerodynamic measures are then repeated postoperatively once the voice has stabilized to document response to treatment. The primary
aerodynamic measures are transglottal flow, subglottal pressure and intensity. From
these three measures, simple calculation of laryngeal resistance, efficiency and power can
be determined.
Phonation threshold pressure is a valuable aerodynamic measure, yielding
information on laryngeal resistance to initiate phonation, as well as viscoelastic properties
of the mucosa.
Intensity may be measured through a sound level meter, or through root mean
square (RMS) calculation, using controlled mouth to microphone distance pre and posttreatment.
There have been specific recommendations by the National Center for Voice and
Speech (NCVS) (5) on management of acoustic data collection and analysis. The NCVS
Statement discusses the role of Type I II, and III acoustic signals to help explain
abnormally high perturbation measures with the disordered voice. High perturbation
values may not be valid between subjects, but are relevant within subjects as a function of
treatment (pre-op vs. post-op).
Perturbation may arise from the recording format. With the advent of electronic
media capture devices, attention continues to be warranted for microphone response, as
well as storage platform. In the end, the data needs will influence the acceptable
recording media.
The protocol should be sufficient to meet the needs of the voice center, including
available laboratory resources. Measurements should be taken in a quiet, sound treated
room. Many centers have a minimum protocol which is then expanded based on the
specific disorder demands. Whenever possible, data should be kept in a database to aid
in easy pre-post analysis for specific disorders and management plans.
References
1. Baken RJ, Orlikoff RF. Introduction. In: Clinical Measurement of Speech and
Voice (second ed). San Diego, CA: Singular Thomson Learning, 2000. p.3.
2. Jacobson BH, Johnson A, Grywalski C et al., The voice handicap index (VHI):
development and validation, Am J Speech Lang Pathol 1997;6:66–70.
3. Colton R, Casper J. Vocal rehabilitation. In: Understanding voice problems: a
physiological perspective for diagnosis and treatment (second ed). Baltimore,
MD: Williams and Wilkins, 1996. p. 311.
4. Fairbanks G. Pitch. In: Voice and Articulation Drillbook. New York: Harper and
Bros., 1940, p. 168-170.
5. Titze IR. Workshop on Acoustic Voice Analysis: Summary Statement, Denver,
CO: National Center for Voice and Speech. 1994.
Goals of course
Review of speech science principles for the voice laboratory:
Perceptual, acoustic, aerodynamic
Tips on accurate collection

•Protocol: standard and customized for pathology
Tips on data analysis

•formulae
Discussion of interpretation of data
Sample report
Clinical data collection system


•compilation, management, storage, and summary analysis
Acoustics and aerodynamic vs. perceptual
Baken and Orlikoff:
must have a known (or at least a very likely) and
specific relationship to recognized aspects of speech system physiology
•a measurement must have clear relevance
•a measurement method should have a “history” in the literature
•measurements must be thoroughly understood;
•never trust a computer completely
•measurement should be limited to situations in which it is likely to be
useful.
•“measurements can be no better than the knowledge and skills of the
clinician who chooses and obtains them.”
Acoustic/Aerodynamic
•Type 1, Type 2, Type 3 signals
Perceptual
•GRBAS
•Patient perceptual instruments
•Caregiver perceptual instruments

•measurements
Data Capture: Signal types (NCVS - Titze)
http://www.ncvs.org/museum-archive/downloadables.html
Type 1= nearly periodic signals, noise energies are less than F0
energy level
Type 2=subharmonics and modulating frequencies approach F0
energy level; no obvious single F0
Type 3= signals with no apparent periodic structure; regular
irregularity, perceived chaos present
Analysis of Voice by Signal Type
Perceptual Instruments
Perception = psychological representation of a physical stimulus

[Sapienza &
Ruddy 2009]
Perception is often formed based on a variety of factors: age, sex,
language, culture, intrinsic and extrinsic bias, etc
“More often than not, the physiologic process of the voice disorder
does not match the perceptual description of the voice quality, and
more often than not listeners do not agree with each other very well”

[Shrivastav & Sapienza 2003; Sapienza & Ruddy 2009]
•Minimize errors by using an ordinal scale or visual analog scale
Perceptual Instruments
Voice Handicap Index

•VHI, VHI-10
Singing Voice Handicap Index (sVHI)
Pediatric Voice Handicap Index (pVHI)
Voice Symptom Scale (VoiSS)
Voice Related Quality of Life Index (VRQOL)
Pediatric Voice Related Quality of Life (PVRQOL)
Buffalo Rating Voice Profile
GRBAS
CAPE-V

Why do we need numbers?
Aerodynamic and acoustic measures allow objective comparison of
subject to expected values
Measures provide numeric value to function
Knowledgeable clinician can interpret objective measures and relate
patient performance with laryngeal exam, subjective evaluation, and
patient history/symptoms/severity, offering invaluable pre/post
treatment insight
Numbers allow statistical comparison

Question: How would you answer this
email?
“I have a question.. sometimes the acoustic analysis shows high (and

even very high) Jitter, but the NHR doesn’t show red on the monitor..
the reading of NHR is 15 for example and still show green. When I
open the data it shows that the reading of 15 is above the normal
range but this doesn’t show on the diagram circle. Perceptually the
voice is hoarse or very hoarse, do you have an explanation for this or
should I contact [the manufacturer]??”
Measurement and Diagnostic Needs
Aerodynamic measures yield information on glottal competence and
compensation

•paralysis, neurological voice/speech, lesions which compromise
posterior and membranous vf
Acoustic measures yield information on glottal behavior and stability

•membranous lesions (nodules, cyst, polyp), paralysis, neurological
voice/speech
Aerodynamic and acoustic measures should be congruent with
perceptual judgments and laryngeal observations, and should guide
patient management

The Basics
Source
Source
characteristics: SF0, intensity
capacity/use: PFR, VRP, MPT
Source stability: perturbation, tremor
Power-source coordination and compensation: flow, Psub, resistance
Source-Filter coordination and compensation: NHR, Spectrogram
Jitter-
frequency instability
Elevated for vf edge abnormalities and disorders that compromise CT
function
Shimmer-amplitude instability
Elevated for disorders that interfere with medial-lateral wave
propagation
Objective Report continued …
Source function acoustic measures

Range Capacity
•Expected SF0 (normative)
•Observed SF0 (conversation and reading task)
•Predicted SF0 (based on statistical calculation)
•Dynamic range (Voice Range Profile-30dB SPL)
•Physiological Frequency Range of Phonation (PFR or PFRP)

Valving Capacity

•S/Z
ratio
•Maximum
Phonation Time (MPT)
Speaking Fundamental Frequency (SF0)
Rote task (name, date)
Conversational speech sample
Reading passage
Typical values are
~120 Hz for adult males, 220 Hz for adult females.
SF0 is generally 25% above lowest note in vocal range (PFR)
Computation of Highest PFR
Hz= 1/period
ie: 1/ 0.00365= 273.97 Hz

Computation of lowest PFR
Hz=
1/period
ie: 1/0.01356=73.75 Hz
Computation of SF0 from spectrogram
Narrow band spectrogram
Lowest harmonic is SF0
Subharmonics
Filters
may also be seen if present
Computer calculation of SF0
are set to predict likely F0 (typical maximum is 1000 Hz) and
computer will look for most likely frequency to input signal. If actual
signal is greater than filter, computer will calculate measures based on
most likely input signal for frequency.
Formula for semitone (ST) range [use calculator
in scientific view]
ST
= 39.86 x log (frequency 1/frequency 2)
•ie: ST = 39.86 x log (273.97/73.75)
•ST range is 22.72 = 39.86 x 0.05699
Subject’s PFRP is 22.7 semitones [normal is 36 ST; (Hollien, Dew and
Phillips, 1971)]
n
SF0pred = 1.059463 x F0low [where n=25% of total ST]
5.675
•SF0pred = 1.059463
x 73.75 = 102.36 Hz
•Predicted SF0 is 102.36 Hz
Formants
FFT

spectrogram

Yanagihara Hoarseness Rating
Narrowband
spectrogram evaluation of sustained vowel
•Type 1: Regular harmonic components are mixed with the noise
component chiefly in formant regions (F1, F2, F3)
•Type 2: Noise dominates harmonic components for F2 for /i/
•Type 3: F2 for /i/ is replaced by noise, and noise above 3KHz
increases
•Type 4: F2 for /a, i/ replaced by noise, F1 for all vowels has loss of
periodic components
Females

Males

Yanagihara
Voice Range Profile
Protocols and Cautions
Standardization
•Protocol:
high/low frequency, soft/loud intensity, conv/sustain
Hypothesis specific
•Physical set-up
o
Microphones: 45 angle at 3-6 cm from mouth
Acoustic signal types (NCVS = Titze): 1,2,3
Data capture accurate and reliable
Realistic report format
Interpretation (not just a summary) of data to describe biomechanics
vs. patient perception. Data can assist with therapy goals/rationale

Acoustic Protocol
SF0
and Intensity of conversational/rote speech, read text
•Microphone at 3-6 cm, sound level meter at 30 cm
Perturbation of sustained /a/ for ~2 secs
•Modal voice
Physiological range: glide down/up from midrange on /a/
•Maximum, excluding vocal fry, including falsetto
•Monitor clipping of critical data
Intensity range (VRP): softest/loudest on /a/ at 30 cm for C,E,G,A
(each octave)
Record
Perturbation data sampling
3 sustained /a/
Trim first 500ms (avoid onset and offset of phonation which has
inherent instabilities; avoid change of vowel/phoneme; avoid
consonants)
Analyze next 1 sec
Average values for 3 trials
Data Capture
Input
too soft (under-sampling)
Input
too high (peak clipping and artificial instability)
okay
Acoustic Pitfalls
Input
signal too low (under sampling)
Input signal too loud (peak clipping)
Atypical F0 or loudness during tasks
Inaccurate (or inadequate)
Incorrect cuing by clinician
Patient
.nsp
MP3
PFR sample
unable to follow directions
File formats
or .wav for KayPentax programs
transfer to /.wav using
•GoldWave www.goldwave.com
•Audacity http://www.download.com/Audacity/3000-2170_410058117.html
• BonkEnc: (also supports FLAC) http://www.bonkenc.org
Power - Source Measures
Pulmonary Function
FVC
•forced vital capacity
Inspiratory loop for PVFM
FEV 1.0
•forced expiratory volume in first 1 second of exhalation
FEF 25%-75%
•prime indicator for obstructive lung disease
Mean Flow Rate (MFR)

Mask
Aerodynamic Protocol
and pressure tube to measure laryngeal aerodynamics during
sustained 7-syllable /pa/
•Monitor subject phonation and effort level
•Comfortable loudness on syllable train vs. PTP (softest w/o whisper)
•Adjust input levels to expected values
Avoid peak clipping and undersampling
---other means to capture approximate flow rates
Maximum phonation
•Aerodynamic program, acoustic recording or Stopwatch
•Best of 3 trials
Aerodynamic signals: flow and PTP
Aerodynamic signals: MPT
Failure
Aerodynamic Pitfalls
to calibrate equipment
to adjust sampling range for flow and pressure
Mask leakage
•Offset of transglottal flow at baseline
Voicing of /p/ (“p” changes to “b”)
Saliva in pressure tube
Abnormal effort by subject
Incorrect cuing by clinician
Patient unable to follow directions
Failure
Expected normal values:
acoustic measures
SF0
120 Hz for males, 220 Hz for females
[predicted SF0 from PFR at 25% of PFRP]
PFR
36 semitones (excluding vocal fry, including
falsetto/head voice)
Jitter%
<0.589% males, <0.633% females
Pediatrics: <1.24%
Shimmer%
<2.523% males, <1.997% females
Pediatrics: <3.35%
NHR
<0.122 males, <0.112 females
Pediatrics: <0.11
Expected standard values: aerodynamic
measures
Intensity
70.4 dB (3.1) males, 68.2 dB (2.51) females
Transglottal flow
0.100-0.200 L/sec males/females
Pediatrics: 72-180 L/sec [depending on age]
Subglottal pressure
6.43 cmH2O (1.07) males,
7.52 cmH2O (2.17) females
Pediatrics: 7.4-9.29 for medium loudness,
depending on age
Resistance
56.8 Ns/m5 males, 81.8 Ns/m5 females [41
cmH20/ml]
Maximum
Phonation
34.6 secs males, 25.7 secs females
Pediatrics: 6-22 sec, depending on age
Dilemma in acoustic analysis programs
Choose
program that is “user-friendly” and easily compared to other
center, but has start-up cost
Choose program that is not so “user-friendly” but still very
dependable, and can be compared to other centers, but has little startup cost
Caveat: how tech-savvy are you? Do you just need numbers, or need to defend
outcomes in peer-review? How research-minded are you?
Expected Adult Biomechanics: Paralysis
Aerodynamics:
elevated flow, reduced resistance, reduced maximum
phonation
Acoustics: increased jitter, shimmer, NHR; reduced PFR, reduced
loudness
Expected Adult Biomechanics: Polyp
Aerodynamics: reduced resistance
Acoustics: increased shimmer, increased
jitter, reduced PFR
Expected Adult Biomechanics: Nodules
Aerodynamics:
slight increase in airflow, may have accompanying
higher pressures
Acoustics: increased jitter, may have increased shimmer, lowered F0,
lower PFR
Expected Adult Biomechanics: Cyst
Aerodynamics:
Acoustics:
flows may be unstable
elevated shimmer, may have elevated jitter
Expected Adult Biomechanics:
Hyperfunction
Aerodynamics:
reduced flow, increased pressure
Acoustics: NHR may be elevated, intensity range reduced, PFR
reduced
Sample Report
Linda M Carroll PhD CCC-SLP
Speech-Language Pathologist
Voice/Speech Disorders, Acoustic/Aerodynamic Assessment, Vocal rehabilitation and retraining
NPI# 1073727574
424 West 49th Street, Suite 1
New York, NY 10019
TEL: 212-459-3929 FAX: 212-459-2585
Email: [email protected]
Laryngeal function studies: 92520-59
Date of Service: 12/10/09
Patient: L.C.
File# L091009
Tel: 901.497.9326
Physician: Drs. PW and JP
Pediatrician: DR
Occupation: sophomore music theatre major
Date of Birth 4/25/90
Sex: female
Diagnosis: nodules, allergic rhinitis, tonsillitis
Professional Voice User: yes
Laryngeal function studies were obtained to determine severity of dysphonia secondary to enlarged tonsils and small soft nodules.
Patient reports continued episodes of tonsillectomy and feeling of increased vocal effort to overcome singing through enlarged tonsils.
Measures were obtained in a quiet room that met or exceeded ANSI 1977 requirements. KayPentax Multispeech and KayPentax
Phonatory Aerodynamic system were used in conjunction with a Radio Shack analog sound level meter. Acoustic data was obtained
at 3 cm mouth-microphone distance. Perturbation measures represent an average of three tokens, with analysis of the mid 1.5 sec.
Aerodynamic data for flow and subglottal pressure were calculated from midportion of the token.
Summary of findings:
Parameter
Speaking F0
Physiologic F0 Range
Jitter%
Shimmer%
Noise:harmonic
Voicing
Voice turbulence index
Maximum phonation time
Transglottal flow
Subglottal pressure
Phonation threshold
pressure
GRBAS
Observed
206.638 Hz
38.38 ST
1.375%
3.807%
0.113
100%
0.045
12.63 secs
0.200 L/sec
10.05 cm H2O
Normal values
200-225 Hz
36 ST
0.633%
1.997%
0.112
100%
0.046
25.7 secs
0.150 L/sec
7.52 cm H2O
4.16 cm H2O
3-5 cm H2O
G1R1B1A1S0
G0R0B0A0S0
Comment
wnl
wnl
Elevated, and may be related to nodular edema or tonsils
Elevated, and may be related to nodular edema or tonsils
Wnl
Wnl
Wnl
Reduced
Slightly elevated flow, consistent with nodular edema
Significantly increased effort, consistent with enlarged mass
(tonsils)
Increased effort for singer (but wnl for nonsinger), consistent
with enlarged mass
Mild dysphonia
Interpretation of Findings:
Patient presents pre-operative with increased effort to achieve voicing due to small soft nodules and enlarged bilateral tonsils.
Acoustic and aerodynamic measures are elevated for frequency instability, amplitude instability and subglottal pressure. It is unlikely
that these abnormalities are a result of only laryngeal findings. It is more likely that observed measures are a result of the coupling of
mild instability of frequency and amplitude at the glottal level, and then marked increased of the source signal characteristics are it
travels through the supraglottic tract and past the irregular, enlarged tonsillar tissue. This is supported by only slightly elevated flow,
but markedly elevated subglottal pressure. Due to the patient’s training, phonation threshold pressure would be expected to be below 3
cm H20 (singers), but her results suggest continued effort to achieve phonation. Overall dysphonia appears to be primarily related to
supraglottic mass, and secondary to laryngeal mass. Surgery is warranted.
Linda M Carroll PhD CCC-SLP
Clinical data collection system
Database Defined
A
collection of data arranged for ease and speed of search and
retrieval
Compilation
Management
Storage
Analysis
Notorious
Medical Databases
for immense knowledge base
to acquire patient care data and communicating patient care
management information
Provides rapid communication
Reduces manpower needs
Linking information with other sites
Many clinicians and hospitals still rely on pen and paper to store data
Used
Microsoft ACCESS
Data Entry
Functional Form Designs
Data Management
Central repository
Simplify security and archiving efforts
Data Analysis
Simple analytical capabilities
Ease of exportation to other analytical systems
•SPSS, Excel
Report writing
Customized Design involving text and/or graphics


ACCESS






Creating a file/ setting an index order
Creating data entry forms and coding
Updating and editing information
Viewing and querying data
Designing report forms and report writing
Providing simple analytical capabilities (ease of exportation into
other analytical systems like Excel and SPSS).
Objective Voice Report
General Demographics

Patient ID
Name, DOB, age, occupation, DX, Date
Referring Physician
Referred for
•pre/post-surgery, pre/post therapy, pre/post botox
Subjective Measures
•GRBAS, Voice Handicap Index (VHI), CAPE-v


Benefits of CIS
Benefits of a Clinical Information System (“CIS”)
Structured compilation of clinical data for use in comprehensive
research studies


Benefits applicable to most all Clinical settings
Similar daily worklo ads and constraints on employees
Similar need for research data

Formalized databases may foster cooperative research
Ease of duplication and sharing
Merging of individual databases developed by multiple researchers
cooperating on mutual projects

Capital Outlay
Costs of CIS

Software and Hardware costs
Maintenance

Time Commitment

Learning/Training
Database/Form/Report Designs

Database Management

Archiving/Backup
U p d at i n g
Security


•HIPPA Compliance
The
Final thoughts on lab measures
patient history should support the dx
Laryngeal appearance should support the dx
Objective measures should support the dx
Perceptual judgments should support the dx
Treatment should be based on patient voice/speech needs
Documentation should be representative of professional training and
medical requirements
Contact Information
Linda M. Carroll, PhD CCC-SLP
[email protected]
212.459.3929 office
212.459.2585 fax
424 West 49th Street, Suite 1
New York, New York 10019
USA