Non-linguistic sentence-length precursors affect speech perception: Implications for speaker and rate normalization Lori L. Holt & Travis Wade Department of Psychology & Center for the Neural Basis of Cognition Carnegie Mellon University SPECTRAL EXPERIMENTS INTRODUCTION Spectral and temporal properties of speech sounds are perceived in a thoroughly context-dependent manner at both local and more protracted levels of the speech signal. ♦ ♦ STANDARD TONE: 70-ms 2300-Hz sine-wave tone. Speech categorization of /ga/ to /da/ stimuli in isolation did not significantly differ from categorization of the syllables preceded by the standard tone (t(9)=1.35, p=.21). ♦ ACOUSTIC HISTORIES: 21 70-ms sine-wave tones (30-ms silent intervals) with unique frequencies. Tone order was randomized on a trial-by-trial basis. For example, pure tone precursors modeling simple spectral characteristics of /al/ or /ar/ produce similar influences on categorization of a subsequent CV syllable. /ar/ 100 90 80 70 60 50 40 30 20 10 0 /al/ /ar/ 1800 2000 2200 2400 2600 CV F3 Onset Frequency (Hz) Lotto & Kluender, 1998; Lotto, Sullivan, & Holt, 2003 Speech 100 90 80 70 60 50 40 30 20 10 0 High Tone Modeling /al/ Low Tone Modeling /ar/ 1800 2000 2200 2400 2600 CV F3 Onset Frequency (Hz) Non-Speech TEMPORAL: For example, pure tones modeling vowel duration produce similar influences on categorization of abruptness of preceding frequency modulation. Long Short Diehl and Walsh, 1989 Speech ♦ FAST MEAN: 40ms tone-to-tone; 1 continuum member shorter than most /ba/-like initial formant pattern ♦ SLOW MEAN: 120ms tone-to-tone; 1 continuum member longer than most /wa/-like initial formant pattern Long Short ♦ MID MEAN: 1800-2800 Hz in 50-Hz steps Procedure Tone order was randomized on a trial-by-trial basis ♦ MID MEAN/HIGH VAR: 1300-3300 in 100-Hz steps Stimuli were mixed across conditions. Listeners categorized the speech targets as /ga/ or /da/. Since these melodies were distinct, ended at the same frequency level, and comprised wide, overlapping frequency and duration ranges, any influence they showed upon Experiment 1: categorization of following speech was expected to Procedure Do Acoustic Histories Influence demonstrate listeners’ sensitivity to their long-term spectral Stimuli were mixed across conditions. Listeners categorized the or temporal distribution and not merely to the simple Speech Categorization? speech targets as /ba/ or /wa/ (Experiment 1) or rated their acoustic characteristics of any particular segment. Results goodness as /wa/ (Experiment 2). 100 Although the surface acoustic characteristics of the Acoustic 90 High Mean Histories varied on a trial-by-trial basis and the final Standard 80 Low Mean Experiments 1a-f: Mid Mean/Low Var 70 Tone of each Acoustic History was constant across conditions … Percent "ga" Responses TONES Percent "ga" Responses SPEECH Percent "ga" Responses At least the more local of these effects might originate from general perceptual processes: /al/ ♦ ACOUSTIC HISTORIES: 1.2-s pure tone precursor sequences with 30 fast or 10 slow tones. ♦ LOW MEAN: 1300-2300 Hz in 50-Hz steps These processes presumably mitigate the effects of acoustic variability introduced by coarticulation and cross-talker differences. SPECTRAL: ♦ SPEECH TARGET STIMULUS: Eleven-step synthesized /ba/ to /wa/ series varying only in F1, F2 transision duration. ♦ HIGH MEAN: 2300-3300 Hz in 50-Hz steps Accounts of these effects often invoke speech-specific normalization processes. ♦ TEMPORAL EXPERIMENTS Stimuli Mid Mean/High V ar 60 40 30 20 10 0 Non-Speech 1 2 3 4 5 6 7 8 9 Consonant -Vowel St imulus 100 90 80 70 60 50 40 30 20 10 0 ♦ Further, test whether the auditory system is sensitive to the statistical structure of spectral and temporal distributions of energy over time, and whether such sensitivity may sway listeners’ perception of following speech. 1 2 3 4 5 6 7 8 9 Consonant-Vowel Stimulus Consonant-Vowel Stimulus High Low High Low 250 ms High Low Effect of Acoustic History, p<.001 No Effect of Silent Interval, F<1. Acoustic History x Silent Interval Interaction, p<.001 High Low tone range inter-tone silence amplitude normalization tone lengths (fast,slow) a F1-F2 of following speech(234 -1232Hz) 10 ms RMS for sequences 30, 110 ms b F2 (7691232Hz) 10 ms RMS for sequences 30, 110 ms c F2 10 ms max for tones 30, 110 ms d F2 none max for tones 30, 110 ms e F2 10 ms max for tones 26-34; 98122 ms f F2 10 ms max for tones 20-40; 80140 ms No effects or interactions involving tone manipulations Effect of Acoustic History, p<.001 No Effect of Silent Interval, F<1. 700 ms 900 ms No Acoustic History x Silent Interval Interaction, p=.14 1100 ms 1300 ms 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Consonant-Vowel Stimulus Consonant-Vowel Stimulus Consonant-Vowel Stimulus Equivalent Across Conditions The Standard Tone was repeated 1-13 times across two experiments. This final context was constant across conditions, so any effect of context would need to persist across intervening acoustic stimuli. 100 90 High 80 Low 70 60 50 40 30 3 Tones 20 10 0 100 1 2 3 4 5 6 7 8 9 90 High Consonant-Vowel Stimulus 80 Low 70 60 50 40 30 20 9 Tones 10 0 1 2 3 4 5 6 7 8 9 100 90 High 80 Low 70 60 50 40 30 20 5 Tones 10 0 100 1 2 3 4 5 6 7 8 9 90 High Consonant-Vowel Stimulus 80 Low 70 60 50 40 30 20 11 Tones 10 0 1 2 3 4 5 6 7 8 9 Consonant-Vowel Stimulus Consonant-Vowel Stimulus Consonant-Vowel Stimulus Percent "ga" Responses 100 90 High 80 Low 70 60 50 40 30 20 1 Tone 10 0 100 1 2 3 4 5 6 7 8 9 90 High Consonant-Vowel Stimulus 80 Low 70 60 50 40 30 1 Tone 20 10 0 1 2 3 4 5 6 7 8 9 Percent "ga" Responses 100 90 80 70 60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 0 Exp. Effect of tone rate on speech categorization for all six conditions, p<.025 Context Effect Gets Larger with Longer Silent Intervals 300 ms High Low Percent "ga" Responses Delgutte, B. (1996). Auditory neural processing of speech. In W. J. Hardcastle & J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 505538). Oxford: Blackwell. Diehl, R. & Walsh, M. (1989). An auditory basis for the stimulus-length effect in the perception of stops and glides. Journal of the Acoustic Society of America, 85, 2154-64 Holt, L. L., & Lotto, A. (2002). Behavioral examinations of the level of auditory processing of speech context effects. Hearing Research, 167, 156-169. Ladefoged, P. and Broadbent, D. (1957). Information conveyed by vowels. Journal of the Acoustic Society of America, 29, 98-104. Lotto, A. J., & Kluender, K. R. (1998). General contrast effects of speech perception: Effect of preceding liquid on stop consonant identification. Perception and Psychophysics, 60, 602-619. Mann, V. A. (1980). Influence of preceding liquid on stop-consonant perception. Perception & Psychophysics, 28, 407-412. Miller, J. & Liberman, A. (1979). Some effects of later-occurring information on the perception of stop consonand and semivowel. Perception and Psychophysics, 25, 457-65. Miller, J., O’Rourke, T. & Volaitis, L. (1997). Internal structure of phonetic categories: effects of speaking rate. Phonetica, 54, 121-37. Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance, 7, 1074-95. 200 ms High Low Percent "ga" Responses REFERENCES 150 ms 0 High0 0 Low 0 0 0 0 0 0 500 ms 0 0 1 2 3 4 5 6 7 8 9 0 High 0 0 Low 0 0 0 0 0 0 0 0 Experiments 3a-b: What are the Boundaries of Temporal Adjacency? Percent "ga" Responses ♦ In the present studies, sentence-length pure tone sequences serve as both spectral and temporal perceptual contexts for following speech sounds. 100 ms High Low There is a significant effect of Acoustic History Mean… even when 1300 ms of silence separated tone sequences and speech targets. This is significantly longer than has been observed previously. Percent "ga" Responses ♦ Observation of a context effect under these circumstances requires that linguistic and non-linguistic acoustic information interact at higher levels of auditory processing than has previously been demonstrated. High Low Percent "ga" Responses ♦ Investigate whether the influence of non-linguistic sounds upon speech categorization is limited to local interaction in auditory processing. 100 90 80 70 60 50 40 30 20 10 0 Percent "ga" Responses EXPERIMENT AIMS This influence was spectrally contrastive. Acoustic histories drawn from the distribution with a HIGH mean frequency led listeners to more often categorize following speech targets as /ga/, the alternative with a LOWER-frequency F3 onset. The silent interval separating the Standard Tone and the Speech Target was manipulated in two experiments. Percent "ga" Responses Thus, it is possible that higher-order linguistic processes or speech-perception-specific processes underlie longer-term (sentence level) speech context effects whereas low-level auditory perceptual interactions influence how local contexts affect speech categorization. Does the Rate of Acoustic Histories Influence Categorization? Experiments 2a-b: What is the Time Course of the Influence of Acoustic Histories? Percent "ga" Responses The temporal adjacency of the context-providing sounds in the experiments like those illustrated allows for the possibility that these interactions arise from well-understood local interactions in neural processing such as neural adaptation (Delgutte, 1996). they exerted a significant influence on speech categorization, F(3,9)=29.3, p<.001. 50 High Low Effect of Acoustic History, p<.0001 Effect of Repetition, p=.01. No Acoustic History x Repetition Interaction, F<1 7 Tones 1 2 3 4 5 6 7 8 9 High Consonant-Vowel Stimulus Low Context Effect was Equivalent Across Conditions categorize following speech targets as /wa/, the alternative with a SLOWER formant-frequency transition. Experiment 2: Does the Context Affect Internal Category Structure? Target: /ba/ - /wa/ continuum extended from Exp. 1 to include exaggerated /w*/ transitions Context: pure tone precursors similar to Exp. 1c Slow Tones Fast Tones Transition Duration (ms) Effect of Acoustic History, p<.0001 Effect of Repetition, p=.01. No Acoustic History x Repetition Interaction, p=.12 13 Tones The rate of sentence-length non-speech precursors influences speech categorization. This influence was temporally contrastive such that FASTER acoustic histories led listeners to more often Context Effect was Equivalent Across Conditions 1 2 3 4 5 6 7 8 9 Consonant-Vowel Stimulus The influence of Acoustic Histories persists even when 1300 ms of intervening constant acoustic stimulation separates Acoustic Histories and Speech Targets. Quadratic functions fit to response curves for stimuli with 50% overall /wa/ rating Precursor rate effect observed for location of maximum estimated /wa/ rating (F(1,25)=4.40; p=0.046), lower (F(1,25)=4.32; p=0.048) and upper limits of 95% best exemplar range: (F(1,25)=4.31; p=0.048). Transition Duration (ms) ♦ Stimuli Immediate phonetic context influences categorization of sounds differentiated by formant frequencies (Mann, 1980). Longer-term, speaker-specific spectral patterns also influence speech categorization (Ladefoged and Broadbent, 1957). Likewise, categorization of sounds based on temporal information is affected by both local and longer-term speaking rate (Miller and Liberman, 1979; Summerfield, 1981). SPEECH TARGET STIMULUS: Nine-step /ga/ to /da/ series created from natural productions. Average /wa/ Goodness Rating ♦ ♦ 150 145 140 Slow Tones Fast Tones 135 130 125 120 115 110 105 100 lower 95% range best exemplar upper 95% range The rate of the Acoustic Histories significantly influenced internal category structure.
© Copyright 2026 Paperzz