Goals and Objectives

The Relation Between
Speech Intelligibility
and
The Complex Modulation Spectrum
Steven Greenberg
International Computer Science Institute
1947 Center Street, Berkeley, CA 94704, USA
http://www.icsi.berkeley.edu/~steveng
[email protected]
Takayuki Arai
Department of Electrical and Electronics Engineering
Sophia University, 7-1 Kioi-cho, Chiyoda-Ku, Tokyo, Japan
http://www.splab.ee.sophia.ac.jp/arai
[email protected]
Acknowledgements and Thanks
Technical Assistance
Joy Hollenback, Shino Sakaguchi and Rosaria Silipo
Research Funding
U.S. National Science Foundation
Germane Publications
PERCEPTUAL BASES OF SPEECH INTELLIGIBILITY
Arai, T. and Greenberg, S. (1998) Speech intelligibility in the presence of cross-channel
spectral asynchrony, IEEE International Conference on Acoustics, Speech and Signal
Processing, Seattle, pp. 933-936.
Greenberg, S. and Arai, T. (1998) Speech intelligibility is highly tolerant of crosschannel spectral asynchrony. Proceedings of the Joint Meeting of the Acoustical
Society of America and the International Congress on Acoustics, Seattle, pp. 2677-2678.
Greenberg, S. and Arai, T. (2001) The relation between speech intelligibility and the
complex modulation spectrum. Proceedings of the 7th European Conference on Speech
Communication and Technology (Eurospeech-2001).
Greenberg, S., Arai, T. and Silipo, R. (1998) Speech intelligibility derived from
exceedingly sparse spectral information, Proceedings of the International Conference
on Spoken Language Processing, Sydney, pp. 74-77.
Greenberg, S. (1996) Understanding speech understanding - towards a unified theory
of speech perception. Proceedings of the ESCA Tutorial and Advanced Research
Workshop on the Auditory Basis of Speech Perception, Keele, England, p. 1-8.
Silipo, R., Greenberg, S. and Arai, T.
(1999) Temporal constraints on speech
intelligibility as deduced from exceedingly sparse spectral representations,
Proceedings of the 6th European Conference on Speech Communication and
Technology (Eurospeech-99).
http://www.icsi.berkeley.edu/~steveng
What is the Complex Modulation Spectrum?
The Complex Modulation Spectrum Combines both the Magnitude and Phase of the
Modulation Pattern Distributed Across the (tonotopically organized) Spectrum
This Representation Predicts the Intelligibility of (Locally) Time-Reversed Speech (which
Dissociates the Phase and Magnitude Components of the Modulation Spectrum)
Thereby Demonstrating the Importance of Modulation Phase (across the frequency
spectrum) for Understanding Spoken Language
We’’ll discuss this
slide in greater
detail shortly
Modulation Phase Across the Spectrum
The modulation phase pattern distributed across the (tonotopically
organized) frequency spectrum is most easily visualized as follows:
The signal spectrum is partitioned into 15 separate 1/3-octave channels
Only 4 of the channels are retained; the remaining 11 are “tossed”
The upper edge of a channel is one octave below the lower edge of the adjacent (upper) channel
The modulation pattern in the waveform emanating from each channel is shown
Note that the timing of the peaks and valleys (i.e., phase) of the modulation pattern varies across
the spectrum
An earlier study (Greenberg et al., 1998), using spectrally sparse speech signals, suggested that the
modulation phase pattern across the frequency spectrum could be important for intelligibility
What is (Locally) Time-Reversed Speech?
Each segment of the speech
signal is “flipped” on its
horizontal axis
The length of the segment
thus flipped is the primary
experimental parameter
This signal manipulation has
the effect of dissociating the
phase and magnitude
components of the
modulation spectrum
What impact does this
manipulation have on
intelligibility?
Stimulus paradigm based on K. Saberi and D.
Perrott (1999) “Cognitive restoration of
reversed speech,” Nature 398: 760.
Experimental paradigm and acoustic analysis
bear virtually no relation to that described in
the Saberi and Perrott study
Intelligibility of (Locally)Time-Reversed Speech
What impact does local
time reversal have on
intelligibility?
There is a progressive
decline in intelligibility with
increasing length of the
reversed segment
When the segment exceeds
40 ms the intelligibility is
very poor
What acoustic properties
are correlated with this
decline in intelligibility?
Stimuli were sentences from the TIMIT corpus
Sample sentence:
“She washed his dark suit in greasy wash water all year”
80 different sentences, each spoken by a different speaker
Intelligibility Does NOT Depend Solely on the
Magnitude Component of Modulation Spectrum
Intelligibility as a function of
reverse-segment length
Modulation Spectrum
(magnitude component only)
Saberi and Perrott had conjectured that the results of their experiment could be
explained on the basis of the magnitude component of the modulation spectrum
Brain – 1
(Cognitive) Scientists – 0
Increasing Modulation Phase Dispersion as a
Function of Increasing Reversed-Segment Length
Let’s examine the relation between modulation phase and intelligibility ….
Intelligibility as a function
of reverse-segment length
Phase dispersion (relative to the original
signal) across 40 sentences as a function
of reversed-segment length (ms)
(example = 750-1500 Hz sub-band; 4.5 Hz)
Original
20
40
60
80
100
Increasing Modulation Phase Dispersion Across Frequency
as a Function of Increasing Reversed-Segment Length
Let’s examine the relation between modulation phase and intelligibility from
a slightly different perspective ….
Phase dispersion across the spectrum for a single sentence at 4.5 Hz
Frequency
For reversed-segment lengths greater than 40 ms there is significant phase
dispersion (relative to the original) that becomes severe for segments > 80 ms
Computing the Complex Modulation Spectrum
It is important to compute the phase dispersion across the spectrum with precision
and to ascertain its impact on the global modulation spectral representation
(shown on the following slide)
Complex Modulation Spectrum = Magnitude x Phase
Intelligibility is Based on BOTH the Magnitude and
Phase Components of the Modulation Spectrum
The Relation between Intelligibility and the Complex Modulation Spectrum isn’t Bad!
Intelligibility as a function of
reverse-segment length
Complex modulation spectrum computed for all 80 sentences
Complex Modulation Spectrum
(both magnitude and phase)
Complex Modulation Spectrum - Summary
•
Locally time-reversed speech provides a convenient means to
dissociate the magnitude and phase components of the modulation
spectrum
Complex Modulation Spectrum - Summary
•
Locally time-reversed speech provides a convenient means to
dissociate the magnitude and phase components of the modulation
spectrum
•
The intelligibility of time-reversed speech decreases as the segment
length increases up to ca. 100 ms
Complex Modulation Spectrum - Summary
•
Locally time-reversed speech provides a convenient means to
dissociate the magnitude and phase components of the modulation
spectrum
•
The intelligibility of time-reversed speech decreases as the segment
length increases up to ca. 100 ms
•
Speech intelligibility is NOT correlated with the magnitude component of
the low-frequency modulation spectrum
Complex Modulation Spectrum - Summary
•
Locally time-reversed speech provides a convenient means to
dissociate the magnitude and phase components of the modulation
spectrum
•
The intelligibility of time-reversed speech decreases as the segment
length increases up to ca. 100 ms
•
Speech intelligibility is NOT correlated with the magnitude component of
the low-frequency modulation spectrum
•
Speech intelligibility IS CORRELATED with the COMPLEX modulation
spectrum (magnitude x phase)
Complex Modulation Spectrum - Summary
•
Locally time-reversed speech provides a convenient means to
dissociate the magnitude and phase components of the modulation
spectrum
•
The intelligibility of time-reversed speech decreases as the segment
length increases up to ca. 100 ms
•
Speech intelligibility is NOT correlated with the magnitude component of
the low-frequency modulation spectrum
•
Speech intelligibility IS CORRELATED with the COMPLEX modulation
spectrum (magnitude x phase)
•
Thus, the phase of the modulation pattern distributed across the
frequency spectrum appears to play an important role in
understanding spoken language
That’s All, Folks
Many Thanks for Your Time and Attention