The Voice Lab: Is it just numbers? Linda M. Carroll, PhD CCC-SLP Private Practice, New York, NY Senior Voice Scientist The Children’s Hospital of Philadelphia ASHA 2011 ABSTRACT: Laryngeal function studies allow inspection of vocal tract control and coordination, enhancing the clinical interpretation and recommendations for management. This course provides a review of speech science principles, as well as accurate collection and interpretation of acoustic and aerodynamic measures. Attendees will become reacquainted with formulas for measures of fundamental frequency, gain an understanding of the typical measures for various pathologies, explore protocols, and understand the role of clinical data management, analysis and interpretation. Summary: The voice laboratory is an important tool for the laryngologist and therapist. Acoustic and aerodynamic measures yield valuable information on laryngeal behavior, function and compensation, and help guide the treatment plan. The results assist the team in predicting medical/surgical consequences, determining voice/speech therapy needs, and help outline the most efficacious treatment plan to restore optimum function. Baken and Orlikoff (1) outline 6 rules for clinical measurement of voice function: (1) measurements must have a known (or at least a very likely) and specific relationship to recognized aspects of speech system physiology; (2) a measurement must have clear relevance; (3) a measurement method should have a “history” in the literature; (4) measurements must be thoroughly understood; (5) never trust a computer completely; and (6) measurement should be limited to situations in which it is likely to be useful. They add a final comment that “measurements can be no better than the knowledge and skills of the clinician who chooses and obtains them.” Clinical measures include acoustic measures, aerodynamic measures, and use clinician and patient perceptual rating scales. PROMS (Patient Reported Outcome Measures) are an important aspect of determining the severity of perceived dysphonia by the adult patient, or their caregiver (for children). A patient who reports minimal disability due to their voice problem warrants different management, goals, and objective measures than a patient with severe perceived disability. There are a myriad of PROMs for adults, but very few for children. In general, two indices are preferred to fully establish impact of dysphonia. The two most common PROMs are the Voice Handicap Index and the Pediatric Voice Handicap index. The Voice Handicap Index (2) probes degree of vocal disability perceived by the patient through 10 questions relating to functional, physical and emotional areas. Children may be assessed through the Pediatric Voice Handicap Index (PVHI). Mild perceived dysphonia by the patient coupled with moderate to severe objective measures and indices of abnormality in a benign voice disorder will alter and minimize the treatment plan. Conversely, severe perceived dysphonia by the patient in the presence of near normal voice function causes the voice care team to delicately manage the patient’s perception of impairment. Acoustic measures include speaking fundamental frequency (SF0), physiological range, and perturbation measures as a bare minimum. Within the area of SF0, measures are taken of the observed SF0 in a specified task and compared with the normative data and compared with the predicted SF0 for the patient’s overall physiologic capacity. Although there is no absolute frequency for an individual to use, there is a general range of efficient vocal production. Colton and Casper (3) and Fairbanks (4) report healthy voice use at roughly 25% above the lowest SF0 of the individual’s physiological frequency range. It is important that predicted SF0 be computed from the PFR and then compared with the observed SF0 to make a reasonable judgment on the speaking “pitch.” Because the vocal folds vibrate at rapid rates during conversational speech, most clinicians will take advantage of computer algorithms to extract SF0. When computing F0 for the physiological low and physiological high, it is often necessary to use zoom-in features on the signal. Many analysis programs demonstrate errors in F0 extraction due to capture attributes or signal features. A trained ear is invaluable in the laboratory, but even a trained ear needs to remember the frequency ranges of the various registers in order to interpret the data. To compute the number of semitones over the PFR range, frequencies cannot be simply subtracted due to the logarithmic relationship between frequencies. Interestingly, the PFR is the same for adults as children. Jitter and shimmer are the two common perturbation measures in acoustic analysis. Jitter is a measure of frequency instability, while shimmer is a measure of amplitude instability. A normal voice has a small amount of instability during sustained vowel production. Normal instabilities are influences by tissue and muscle properties. Large variations in perturbation values signal increased instability at the source (laryngeal) level. Ratio of noise component to harmonic component yields information on the ability of the individual to coordinate source and filter acoustics. Perturbation measures vary between children and adults, and may vary among professional voice users. Although maximum phonation was commonplace in the past, the more elaborate acoustic and aerodynamic measures are the current standard. Maximum phonation remains a valuable probe in the therapy environment, and can yield important information. Aerodynamic measures yield important information on the patient’s ability to coordinate respiratory drive from the power subsystem. Aerodynamic measures are particularly important prior to laryngeal framework surgery, allowing a numeric value to the observed glottal aperture abnormality. Aerodynamic measures are then repeated postoperatively once the voice has stabilized to document response to treatment. The primary aerodynamic measures are transglottal flow, subglottal pressure and intensity. From these three measures, simple calculation of laryngeal resistance, efficiency and power can be determined. Phonation threshold pressure is a valuable aerodynamic measure, yielding information on laryngeal resistance to initiate phonation, as well as viscoelastic properties of the mucosa. Intensity may be measured through a sound level meter, or through root mean square (RMS) calculation, using controlled mouth to microphone distance pre and posttreatment. There have been specific recommendations by the National Center for Voice and Speech (NCVS) (5) on management of acoustic data collection and analysis. The NCVS Statement discusses the role of Type I II, and III acoustic signals to help explain abnormally high perturbation measures with the disordered voice. High perturbation values may not be valid between subjects, but are relevant within subjects as a function of treatment (pre-op vs. post-op). Perturbation may arise from the recording format. With the advent of electronic media capture devices, attention continues to be warranted for microphone response, as well as storage platform. In the end, the data needs will influence the acceptable recording media. The protocol should be sufficient to meet the needs of the voice center, including available laboratory resources. Measurements should be taken in a quiet, sound treated room. Many centers have a minimum protocol which is then expanded based on the specific disorder demands. Whenever possible, data should be kept in a database to aid in easy pre-post analysis for specific disorders and management plans. References 1. Baken RJ, Orlikoff RF. Introduction. In: Clinical Measurement of Speech and Voice (second ed). San Diego, CA: Singular Thomson Learning, 2000. p.3. 2. Jacobson BH, Johnson A, Grywalski C et al., The voice handicap index (VHI): development and validation, Am J Speech Lang Pathol 1997;6:66–70. 3. Colton R, Casper J. Vocal rehabilitation. In: Understanding voice problems: a physiological perspective for diagnosis and treatment (second ed). Baltimore, MD: Williams and Wilkins, 1996. p. 311. 4. Fairbanks G. Pitch. In: Voice and Articulation Drillbook. New York: Harper and Bros., 1940, p. 168-170. 5. Titze IR. Workshop on Acoustic Voice Analysis: Summary Statement, Denver, CO: National Center for Voice and Speech. 1994. Goals of course Review of speech science principles for the voice laboratory: Perceptual, acoustic, aerodynamic Tips on accurate collection •Protocol: standard and customized for pathology Tips on data analysis •formulae Discussion of interpretation of data Sample report Clinical data collection system •compilation, management, storage, and summary analysis Acoustics and aerodynamic vs. perceptual Baken and Orlikoff: must have a known (or at least a very likely) and specific relationship to recognized aspects of speech system physiology •a measurement must have clear relevance •a measurement method should have a “history” in the literature •measurements must be thoroughly understood; •never trust a computer completely •measurement should be limited to situations in which it is likely to be useful. •“measurements can be no better than the knowledge and skills of the clinician who chooses and obtains them.” Acoustic/Aerodynamic •Type 1, Type 2, Type 3 signals Perceptual •GRBAS •Patient perceptual instruments •Caregiver perceptual instruments •measurements Data Capture: Signal types (NCVS - Titze) http://www.ncvs.org/museum-archive/downloadables.html Type 1= nearly periodic signals, noise energies are less than F0 energy level Type 2=subharmonics and modulating frequencies approach F0 energy level; no obvious single F0 Type 3= signals with no apparent periodic structure; regular irregularity, perceived chaos present Analysis of Voice by Signal Type Perceptual Instruments Perception = psychological representation of a physical stimulus [Sapienza & Ruddy 2009] Perception is often formed based on a variety of factors: age, sex, language, culture, intrinsic and extrinsic bias, etc “More often than not, the physiologic process of the voice disorder does not match the perceptual description of the voice quality, and more often than not listeners do not agree with each other very well” [Shrivastav & Sapienza 2003; Sapienza & Ruddy 2009] •Minimize errors by using an ordinal scale or visual analog scale Perceptual Instruments Voice Handicap Index •VHI, VHI-10 Singing Voice Handicap Index (sVHI) Pediatric Voice Handicap Index (pVHI) Voice Symptom Scale (VoiSS) Voice Related Quality of Life Index (VRQOL) Pediatric Voice Related Quality of Life (PVRQOL) Buffalo Rating Voice Profile GRBAS CAPE-V Why do we need numbers? Aerodynamic and acoustic measures allow objective comparison of subject to expected values Measures provide numeric value to function Knowledgeable clinician can interpret objective measures and relate patient performance with laryngeal exam, subjective evaluation, and patient history/symptoms/severity, offering invaluable pre/post treatment insight Numbers allow statistical comparison Question: How would you answer this email? “I have a question.. sometimes the acoustic analysis shows high (and even very high) Jitter, but the NHR doesn’t show red on the monitor.. the reading of NHR is 15 for example and still show green. When I open the data it shows that the reading of 15 is above the normal range but this doesn’t show on the diagram circle. Perceptually the voice is hoarse or very hoarse, do you have an explanation for this or should I contact [the manufacturer]??” Measurement and Diagnostic Needs Aerodynamic measures yield information on glottal competence and compensation •paralysis, neurological voice/speech, lesions which compromise posterior and membranous vf Acoustic measures yield information on glottal behavior and stability •membranous lesions (nodules, cyst, polyp), paralysis, neurological voice/speech Aerodynamic and acoustic measures should be congruent with perceptual judgments and laryngeal observations, and should guide patient management The Basics Source Source characteristics: SF0, intensity capacity/use: PFR, VRP, MPT Source stability: perturbation, tremor Power-source coordination and compensation: flow, Psub, resistance Source-Filter coordination and compensation: NHR, Spectrogram Jitter- frequency instability Elevated for vf edge abnormalities and disorders that compromise CT function Shimmer-amplitude instability Elevated for disorders that interfere with medial-lateral wave propagation Objective Report continued … Source function acoustic measures Range Capacity •Expected SF0 (normative) •Observed SF0 (conversation and reading task) •Predicted SF0 (based on statistical calculation) •Dynamic range (Voice Range Profile-30dB SPL) •Physiological Frequency Range of Phonation (PFR or PFRP) Valving Capacity •S/Z ratio •Maximum Phonation Time (MPT) Speaking Fundamental Frequency (SF0) Rote task (name, date) Conversational speech sample Reading passage Typical values are ~120 Hz for adult males, 220 Hz for adult females. SF0 is generally 25% above lowest note in vocal range (PFR) Computation of Highest PFR Hz= 1/period ie: 1/ 0.00365= 273.97 Hz Computation of lowest PFR Hz= 1/period ie: 1/0.01356=73.75 Hz Computation of SF0 from spectrogram Narrow band spectrogram Lowest harmonic is SF0 Subharmonics Filters may also be seen if present Computer calculation of SF0 are set to predict likely F0 (typical maximum is 1000 Hz) and computer will look for most likely frequency to input signal. If actual signal is greater than filter, computer will calculate measures based on most likely input signal for frequency. Formula for semitone (ST) range [use calculator in scientific view] ST = 39.86 x log (frequency 1/frequency 2) •ie: ST = 39.86 x log (273.97/73.75) •ST range is 22.72 = 39.86 x 0.05699 Subject’s PFRP is 22.7 semitones [normal is 36 ST; (Hollien, Dew and Phillips, 1971)] n SF0pred = 1.059463 x F0low [where n=25% of total ST] 5.675 •SF0pred = 1.059463 x 73.75 = 102.36 Hz •Predicted SF0 is 102.36 Hz Formants FFT spectrogram Yanagihara Hoarseness Rating Narrowband spectrogram evaluation of sustained vowel •Type 1: Regular harmonic components are mixed with the noise component chiefly in formant regions (F1, F2, F3) •Type 2: Noise dominates harmonic components for F2 for /i/ •Type 3: F2 for /i/ is replaced by noise, and noise above 3KHz increases •Type 4: F2 for /a, i/ replaced by noise, F1 for all vowels has loss of periodic components Females Males Yanagihara Voice Range Profile Protocols and Cautions Standardization •Protocol: high/low frequency, soft/loud intensity, conv/sustain Hypothesis specific •Physical set-up o Microphones: 45 angle at 3-6 cm from mouth Acoustic signal types (NCVS = Titze): 1,2,3 Data capture accurate and reliable Realistic report format Interpretation (not just a summary) of data to describe biomechanics vs. patient perception. Data can assist with therapy goals/rationale Acoustic Protocol SF0 and Intensity of conversational/rote speech, read text •Microphone at 3-6 cm, sound level meter at 30 cm Perturbation of sustained /a/ for ~2 secs •Modal voice Physiological range: glide down/up from midrange on /a/ •Maximum, excluding vocal fry, including falsetto •Monitor clipping of critical data Intensity range (VRP): softest/loudest on /a/ at 30 cm for C,E,G,A (each octave) Record Perturbation data sampling 3 sustained /a/ Trim first 500ms (avoid onset and offset of phonation which has inherent instabilities; avoid change of vowel/phoneme; avoid consonants) Analyze next 1 sec Average values for 3 trials Data Capture Input too soft (under-sampling) Input too high (peak clipping and artificial instability) okay Acoustic Pitfalls Input signal too low (under sampling) Input signal too loud (peak clipping) Atypical F0 or loudness during tasks Inaccurate (or inadequate) Incorrect cuing by clinician Patient .nsp MP3 PFR sample unable to follow directions File formats or .wav for KayPentax programs transfer to /.wav using •GoldWave www.goldwave.com •Audacity http://www.download.com/Audacity/3000-2170_410058117.html • BonkEnc: (also supports FLAC) http://www.bonkenc.org Power - Source Measures Pulmonary Function FVC •forced vital capacity Inspiratory loop for PVFM FEV 1.0 •forced expiratory volume in first 1 second of exhalation FEF 25%-75% •prime indicator for obstructive lung disease Mean Flow Rate (MFR) Mask Aerodynamic Protocol and pressure tube to measure laryngeal aerodynamics during sustained 7-syllable /pa/ •Monitor subject phonation and effort level •Comfortable loudness on syllable train vs. PTP (softest w/o whisper) •Adjust input levels to expected values Avoid peak clipping and undersampling ---other means to capture approximate flow rates Maximum phonation •Aerodynamic program, acoustic recording or Stopwatch •Best of 3 trials Aerodynamic signals: flow and PTP Aerodynamic signals: MPT Failure Aerodynamic Pitfalls to calibrate equipment to adjust sampling range for flow and pressure Mask leakage •Offset of transglottal flow at baseline Voicing of /p/ (“p” changes to “b”) Saliva in pressure tube Abnormal effort by subject Incorrect cuing by clinician Patient unable to follow directions Failure Expected normal values: acoustic measures SF0 120 Hz for males, 220 Hz for females [predicted SF0 from PFR at 25% of PFRP] PFR 36 semitones (excluding vocal fry, including falsetto/head voice) Jitter% <0.589% males, <0.633% females Pediatrics: <1.24% Shimmer% <2.523% males, <1.997% females Pediatrics: <3.35% NHR <0.122 males, <0.112 females Pediatrics: <0.11 Expected standard values: aerodynamic measures Intensity 70.4 dB (3.1) males, 68.2 dB (2.51) females Transglottal flow 0.100-0.200 L/sec males/females Pediatrics: 72-180 L/sec [depending on age] Subglottal pressure 6.43 cmH2O (1.07) males, 7.52 cmH2O (2.17) females Pediatrics: 7.4-9.29 for medium loudness, depending on age Resistance 56.8 Ns/m5 males, 81.8 Ns/m5 females [41 cmH20/ml] Maximum Phonation 34.6 secs males, 25.7 secs females Pediatrics: 6-22 sec, depending on age Dilemma in acoustic analysis programs Choose program that is “user-friendly” and easily compared to other center, but has start-up cost Choose program that is not so “user-friendly” but still very dependable, and can be compared to other centers, but has little startup cost Caveat: how tech-savvy are you? Do you just need numbers, or need to defend outcomes in peer-review? How research-minded are you? Expected Adult Biomechanics: Paralysis Aerodynamics: elevated flow, reduced resistance, reduced maximum phonation Acoustics: increased jitter, shimmer, NHR; reduced PFR, reduced loudness Expected Adult Biomechanics: Polyp Aerodynamics: reduced resistance Acoustics: increased shimmer, increased jitter, reduced PFR Expected Adult Biomechanics: Nodules Aerodynamics: slight increase in airflow, may have accompanying higher pressures Acoustics: increased jitter, may have increased shimmer, lowered F0, lower PFR Expected Adult Biomechanics: Cyst Aerodynamics: Acoustics: flows may be unstable elevated shimmer, may have elevated jitter Expected Adult Biomechanics: Hyperfunction Aerodynamics: reduced flow, increased pressure Acoustics: NHR may be elevated, intensity range reduced, PFR reduced Sample Report Linda M Carroll PhD CCC-SLP Speech-Language Pathologist Voice/Speech Disorders, Acoustic/Aerodynamic Assessment, Vocal rehabilitation and retraining NPI# 1073727574 424 West 49th Street, Suite 1 New York, NY 10019 TEL: 212-459-3929 FAX: 212-459-2585 Email: [email protected] Laryngeal function studies: 92520-59 Date of Service: 12/10/09 Patient: L.C. File# L091009 Tel: 901.497.9326 Physician: Drs. PW and JP Pediatrician: DR Occupation: sophomore music theatre major Date of Birth 4/25/90 Sex: female Diagnosis: nodules, allergic rhinitis, tonsillitis Professional Voice User: yes Laryngeal function studies were obtained to determine severity of dysphonia secondary to enlarged tonsils and small soft nodules. Patient reports continued episodes of tonsillectomy and feeling of increased vocal effort to overcome singing through enlarged tonsils. Measures were obtained in a quiet room that met or exceeded ANSI 1977 requirements. KayPentax Multispeech and KayPentax Phonatory Aerodynamic system were used in conjunction with a Radio Shack analog sound level meter. Acoustic data was obtained at 3 cm mouth-microphone distance. Perturbation measures represent an average of three tokens, with analysis of the mid 1.5 sec. Aerodynamic data for flow and subglottal pressure were calculated from midportion of the token. Summary of findings: Parameter Speaking F0 Physiologic F0 Range Jitter% Shimmer% Noise:harmonic Voicing Voice turbulence index Maximum phonation time Transglottal flow Subglottal pressure Phonation threshold pressure GRBAS Observed 206.638 Hz 38.38 ST 1.375% 3.807% 0.113 100% 0.045 12.63 secs 0.200 L/sec 10.05 cm H2O Normal values 200-225 Hz 36 ST 0.633% 1.997% 0.112 100% 0.046 25.7 secs 0.150 L/sec 7.52 cm H2O 4.16 cm H2O 3-5 cm H2O G1R1B1A1S0 G0R0B0A0S0 Comment wnl wnl Elevated, and may be related to nodular edema or tonsils Elevated, and may be related to nodular edema or tonsils Wnl Wnl Wnl Reduced Slightly elevated flow, consistent with nodular edema Significantly increased effort, consistent with enlarged mass (tonsils) Increased effort for singer (but wnl for nonsinger), consistent with enlarged mass Mild dysphonia Interpretation of Findings: Patient presents pre-operative with increased effort to achieve voicing due to small soft nodules and enlarged bilateral tonsils. Acoustic and aerodynamic measures are elevated for frequency instability, amplitude instability and subglottal pressure. It is unlikely that these abnormalities are a result of only laryngeal findings. It is more likely that observed measures are a result of the coupling of mild instability of frequency and amplitude at the glottal level, and then marked increased of the source signal characteristics are it travels through the supraglottic tract and past the irregular, enlarged tonsillar tissue. This is supported by only slightly elevated flow, but markedly elevated subglottal pressure. Due to the patient’s training, phonation threshold pressure would be expected to be below 3 cm H20 (singers), but her results suggest continued effort to achieve phonation. Overall dysphonia appears to be primarily related to supraglottic mass, and secondary to laryngeal mass. Surgery is warranted. Linda M Carroll PhD CCC-SLP Clinical data collection system Database Defined A collection of data arranged for ease and speed of search and retrieval Compilation Management Storage Analysis Notorious Medical Databases for immense knowledge base to acquire patient care data and communicating patient care management information Provides rapid communication Reduces manpower needs Linking information with other sites Many clinicians and hospitals still rely on pen and paper to store data Used Microsoft ACCESS Data Entry Functional Form Designs Data Management Central repository Simplify security and archiving efforts Data Analysis Simple analytical capabilities Ease of exportation to other analytical systems •SPSS, Excel Report writing Customized Design involving text and/or graphics ACCESS Creating a file/ setting an index order Creating data entry forms and coding Updating and editing information Viewing and querying data Designing report forms and report writing Providing simple analytical capabilities (ease of exportation into other analytical systems like Excel and SPSS). Objective Voice Report General Demographics Patient ID Name, DOB, age, occupation, DX, Date Referring Physician Referred for •pre/post-surgery, pre/post therapy, pre/post botox Subjective Measures •GRBAS, Voice Handicap Index (VHI), CAPE-v Benefits of CIS Benefits of a Clinical Information System (“CIS”) Structured compilation of clinical data for use in comprehensive research studies Benefits applicable to most all Clinical settings Similar daily worklo ads and constraints on employees Similar need for research data Formalized databases may foster cooperative research Ease of duplication and sharing Merging of individual databases developed by multiple researchers cooperating on mutual projects Capital Outlay Costs of CIS Software and Hardware costs Maintenance Time Commitment Learning/Training Database/Form/Report Designs Database Management Archiving/Backup U p d at i n g Security •HIPPA Compliance The Final thoughts on lab measures patient history should support the dx Laryngeal appearance should support the dx Objective measures should support the dx Perceptual judgments should support the dx Treatment should be based on patient voice/speech needs Documentation should be representative of professional training and medical requirements Contact Information Linda M. Carroll, PhD CCC-SLP [email protected] 212.459.3929 office 212.459.2585 fax 424 West 49th Street, Suite 1 New York, New York 10019 USA
© Copyright 2026 Paperzz