Non-linguistic sentence-length precursors affect

Non-linguistic sentence-length precursors affect speech perception:
Implications for speaker and rate normalization
Lori L. Holt & Travis Wade
Department of Psychology & Center for the Neural Basis of Cognition
Carnegie Mellon University
SPECTRAL EXPERIMENTS
INTRODUCTION
Spectral and temporal properties of speech sounds are perceived in a thoroughly
context-dependent manner at both local and more protracted levels of the speech
signal.
♦
♦
STANDARD TONE: 70-ms 2300-Hz sine-wave tone. Speech
categorization of /ga/ to /da/ stimuli in isolation did not
significantly differ from categorization of the syllables
preceded by the standard tone (t(9)=1.35, p=.21).
♦ ACOUSTIC HISTORIES: 21 70-ms sine-wave tones (30-ms
silent intervals) with unique frequencies. Tone order was
randomized on a trial-by-trial basis.
For example, pure tone precursors
modeling simple spectral characteristics of
/al/ or /ar/ produce similar influences on
categorization of a subsequent CV syllable.
/ar/
100
90
80
70
60
50
40
30
20
10
0
/al/
/ar/
1800
2000
2200
2400
2600
CV F3 Onset Frequency (Hz)
Lotto & Kluender, 1998;
Lotto, Sullivan, & Holt, 2003
Speech
100
90
80
70
60
50
40
30
20
10
0
High Tone
Modeling /al/
Low Tone
Modeling /ar/
1800
2000
2200
2400
2600
CV F3 Onset Frequency (Hz)
Non-Speech
TEMPORAL:
For example, pure tones modeling vowel
duration produce similar influences on
categorization of abruptness of preceding
frequency modulation.
Long
Short
Diehl and Walsh, 1989
Speech
♦ FAST MEAN: 40ms tone-to-tone; 1
continuum member shorter than most /ba/-like
initial formant pattern
♦ SLOW MEAN: 120ms tone-to-tone; 1
continuum member longer than most /wa/-like
initial formant pattern
Long
Short
♦ MID MEAN: 1800-2800 Hz in 50-Hz steps
Procedure
Tone order was randomized on a trial-by-trial basis
♦ MID MEAN/HIGH VAR: 1300-3300 in 100-Hz steps
Stimuli were mixed across conditions.
Listeners categorized the speech targets as /ga/ or /da/.
Since these melodies were distinct, ended at the same
frequency level, and comprised wide, overlapping frequency
and duration ranges, any influence they showed upon
Experiment 1:
categorization of following speech was expected to
Procedure
Do Acoustic Histories Influence
demonstrate listeners’ sensitivity to their long-term spectral
Stimuli were mixed across conditions. Listeners categorized the
or
temporal distribution and not merely to the simple
Speech Categorization?
speech targets as /ba/ or /wa/ (Experiment 1) or rated their
acoustic characteristics of any particular segment.
Results
goodness as /wa/ (Experiment 2).
100
Although
the
surface
acoustic
characteristics
of
the
Acoustic
90
High Mean
Histories varied on a trial-by-trial basis and the final Standard
80
Low Mean
Experiments
1a-f:
Mid Mean/Low Var
70
Tone of each Acoustic History was constant across conditions …
Percent "ga" Responses
TONES
Percent "ga" Responses
SPEECH
Percent "ga" Responses
At least the more local of these effects might originate from general perceptual
processes:
/al/
♦ ACOUSTIC HISTORIES: 1.2-s pure tone precursor
sequences with 30 fast or 10 slow tones.
♦ LOW MEAN: 1300-2300 Hz in 50-Hz steps
These processes presumably mitigate the effects of acoustic variability introduced by
coarticulation and cross-talker differences.
SPECTRAL:
♦ SPEECH TARGET STIMULUS: Eleven-step
synthesized /ba/ to /wa/ series varying only in F1, F2
transision duration.
♦ HIGH MEAN: 2300-3300 Hz in 50-Hz steps
Accounts of these effects often invoke speech-specific normalization processes.
♦
TEMPORAL EXPERIMENTS
Stimuli
Mid Mean/High V ar
60
40
30
20
10
0
Non-Speech
1
2
3
4
5
6
7
8
9
Consonant -Vowel St imulus
100
90
80
70
60
50
40
30
20
10
0
♦
Further, test whether the auditory system is sensitive to the statistical structure of
spectral and temporal distributions of energy over time, and whether such sensitivity may
sway listeners’ perception of following speech.
1 2 3 4 5 6 7 8 9
Consonant-Vowel Stimulus
Consonant-Vowel Stimulus
High
Low
High
Low
250 ms
High
Low
Effect of Acoustic History, p<.001
No Effect of Silent Interval, F<1.
Acoustic History x Silent Interval Interaction, p<.001
High
Low
tone range
inter-tone
silence
amplitude
normalization
tone
lengths
(fast,slow)
a
F1-F2 of
following
speech(234
-1232Hz)
10 ms
RMS for
sequences
30, 110 ms
b
F2 (7691232Hz)
10 ms
RMS for
sequences
30, 110 ms
c
F2
10 ms
max for tones
30, 110 ms
d
F2
none
max for tones
30, 110 ms
e
F2
10 ms
max for tones
26-34; 98122 ms
f
F2
10 ms
max for tones
20-40; 80140 ms
No effects or interactions involving tone manipulations
Effect of Acoustic History, p<.001
No Effect of Silent Interval, F<1.
700 ms
900 ms
No Acoustic History x Silent Interval Interaction,
p=.14
1100 ms
1300 ms
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Consonant-Vowel Stimulus
Consonant-Vowel Stimulus
Consonant-Vowel Stimulus
Equivalent Across Conditions
The Standard Tone was repeated 1-13 times across two experiments. This final context was constant across
conditions, so any effect of context would need to persist across intervening acoustic stimuli.
100
90
High
80
Low
70
60
50
40
30
3 Tones
20
10
0
100
1 2 3 4 5 6 7 8 9
90
High
Consonant-Vowel Stimulus
80
Low
70
60
50
40
30
20
9 Tones
10
0
1 2 3 4 5 6 7 8 9
100
90
High
80
Low
70
60
50
40
30
20
5 Tones
10
0
100
1 2 3 4 5 6 7 8 9
90
High
Consonant-Vowel Stimulus
80
Low
70
60
50
40
30
20
11 Tones
10
0
1 2 3 4 5 6 7 8 9
Consonant-Vowel Stimulus
Consonant-Vowel Stimulus
Consonant-Vowel Stimulus
Percent "ga" Responses
100
90
High
80
Low
70
60
50
40
30
20
1 Tone
10
0
100
1 2 3 4 5 6 7 8 9
90
High
Consonant-Vowel Stimulus
80
Low
70
60
50
40
30
1 Tone
20
10
0
1 2 3 4 5 6 7 8 9
Percent "ga" Responses
100
90
80
70
60
50
40
30
20
10
0
100
90
80
70
60
50
40
30
20
10
0
Exp.
Effect of tone rate on speech categorization
for all six conditions, p<.025
Context Effect Gets Larger
with Longer Silent Intervals
300 ms
High
Low
Percent "ga" Responses
Delgutte, B. (1996). Auditory neural processing of speech. In W. J. Hardcastle & J. Laver (Eds.), The Handbook of Phonetic Sciences (pp. 505538). Oxford: Blackwell.
Diehl, R. & Walsh, M. (1989). An auditory basis for the stimulus-length effect in the perception of stops and glides. Journal of the Acoustic
Society of America, 85, 2154-64
Holt, L. L., & Lotto, A. (2002). Behavioral examinations of the level of auditory processing of speech context effects. Hearing Research, 167,
156-169.
Ladefoged, P. and Broadbent, D. (1957). Information conveyed by vowels. Journal of the Acoustic Society of America, 29, 98-104.
Lotto, A. J., & Kluender, K. R. (1998). General contrast effects of speech perception: Effect of preceding liquid on stop consonant identification.
Perception and Psychophysics, 60, 602-619.
Mann, V. A. (1980). Influence of preceding liquid on stop-consonant perception. Perception & Psychophysics, 28, 407-412.
Miller, J. & Liberman, A. (1979). Some effects of later-occurring information on the perception of stop consonand and semivowel. Perception and
Psychophysics, 25, 457-65.
Miller, J., O’Rourke, T. & Volaitis, L. (1997). Internal structure of phonetic categories: effects of speaking rate. Phonetica, 54, 121-37.
Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human
Perception and Performance, 7, 1074-95.
200 ms
High
Low
Percent "ga" Responses
REFERENCES
150 ms
0
High0
0
Low
0
0
0
0
0
0
500 ms
0
0
1 2 3 4 5 6 7 8 9
0
High 0
0
Low
0
0
0
0
0
0
0
0
Experiments 3a-b:
What are the Boundaries of Temporal Adjacency?
Percent "ga" Responses
♦ In the present studies, sentence-length pure tone sequences serve as both spectral and
temporal perceptual contexts for following speech sounds.
100 ms
High
Low
There is a significant effect of Acoustic History Mean…
even when 1300 ms of silence separated tone sequences and speech targets.
This is significantly longer than has been observed previously.
Percent "ga" Responses
♦ Observation of a context effect under these circumstances requires that linguistic
and non-linguistic acoustic information interact at higher levels of auditory processing
than has previously been demonstrated.
High
Low
Percent "ga" Responses
♦
Investigate whether the influence of non-linguistic sounds upon speech categorization is
limited to local interaction in auditory processing.
100
90
80
70
60
50
40
30
20
10
0
Percent "ga" Responses
EXPERIMENT AIMS
This influence was spectrally contrastive.
Acoustic histories drawn from the distribution with a HIGH
mean frequency led listeners to more often categorize
following speech targets as /ga/, the alternative with a
LOWER-frequency F3 onset.
The silent interval separating the Standard Tone and the Speech Target was manipulated in two experiments.
Percent "ga" Responses
Thus, it is possible that higher-order linguistic processes or speech-perception-specific processes
underlie longer-term (sentence level) speech context effects whereas low-level auditory
perceptual interactions influence how local contexts affect speech categorization.
Does the Rate of Acoustic Histories Influence Categorization?
Experiments 2a-b:
What is the Time Course of the Influence of Acoustic Histories?
Percent "ga" Responses
The temporal adjacency of the context-providing sounds in the experiments like those illustrated
allows for the possibility that these interactions arise from well-understood local interactions in
neural processing such as neural adaptation (Delgutte, 1996).
they exerted a significant influence on speech categorization,
F(3,9)=29.3, p<.001.
50
High
Low
Effect of Acoustic History, p<.0001
Effect of Repetition, p=.01.
No Acoustic History x Repetition Interaction, F<1
7 Tones
1 2 3 4 5 6 7 8 9
High
Consonant-Vowel Stimulus
Low
Context Effect was
Equivalent Across Conditions
categorize following speech targets as /wa/, the alternative with a SLOWER formant-frequency transition.
Experiment 2:
Does the Context Affect Internal Category Structure?
Target: /ba/ - /wa/ continuum extended from Exp. 1 to
include exaggerated /w*/ transitions
Context: pure tone precursors similar to Exp. 1c
Slow Tones
Fast Tones
Transition Duration (ms)
Effect of Acoustic History, p<.0001
Effect of Repetition, p=.01.
No Acoustic History x Repetition Interaction, p=.12
13 Tones
The rate of sentence-length non-speech precursors influences speech categorization.
This influence was temporally contrastive such that FASTER acoustic histories led listeners to more often
Context Effect was
Equivalent Across Conditions
1 2 3 4 5 6 7 8 9
Consonant-Vowel Stimulus
The influence of Acoustic Histories persists even when 1300 ms of intervening constant acoustic
stimulation separates Acoustic Histories and Speech Targets.
Quadratic functions fit to response curves for stimuli
with 50% overall /wa/ rating
Precursor rate effect observed for location of maximum
estimated /wa/ rating (F(1,25)=4.40; p=0.046), lower
(F(1,25)=4.32; p=0.048) and upper limits of 95% best
exemplar range: (F(1,25)=4.31; p=0.048).
Transition Duration (ms)
♦
Stimuli
Immediate phonetic context influences categorization of sounds differentiated by formant
frequencies (Mann, 1980).
Longer-term, speaker-specific spectral patterns also influence speech categorization
(Ladefoged and Broadbent, 1957).
Likewise, categorization of sounds based on temporal information is affected by both local
and longer-term speaking rate (Miller and Liberman, 1979; Summerfield, 1981).
SPEECH TARGET STIMULUS: Nine-step /ga/ to /da/
series created from natural productions.
Average /wa/ Goodness Rating
♦
♦
150
145
140
Slow Tones
Fast Tones
135
130
125
120
115
110
105
100
lower 95% range
best exemplar
upper 95% range
The rate of the Acoustic Histories significantly influenced internal category structure.