Introduction Experiment 2 Experiment 3 Discussion

Proceedings of ISCA Workshop on Plasticity in Speech Perception (PSP2005); London, UK; 15-17 June 2005
Perceptual Learning of Noise Vocoded Words
Cognition and
Brain Sciences Unit
Alexis G. Hervais-Adelman, Matt H. Davis, Robert P. Carlyon.
8000
F requency (Hz)
0
T ime (s )
3.0
Vocoded
Word
10 Subjects
Vocoded
Word
10 Subjects
0
0
Vocoded-Vocoded-Clear (VVC)- Does not induce pop-out
0
T ime (s )
Report
Clear
Word
Vocoded
Word
Vocoded-Clear-Vocoded (VCV)- Induces pop-out
F requency (Hz)
8000
Noise-Vocoded Speech
In NV sentences, words that are initially difficult to understand, or
even incomprehensible, appear to "pop out" of the distortion when
their identity is known and can be much more accurately reported.
Davis and colleagues (2005) found that performance improved more
rapidly when the feedback provided induced the experience of popout.
We attempt to replicate this finding with single words using
feedback conditions designed to either induce or avoid pop-out.
3.0
Individuals fitted with a CI show dramatic improvements in
comprehension over the first few months of using their implant.
In this work we investigate whether responses to NV words shows a
similar perceptual learning effect and further investigate the nature
of the cognitive processes involved in this learning.
We address 3 main questions:
Experiment 1: Is learning affected by feedback on the content of
distorted speech?
Experiment 2: Is learning affected by the lexical status of single
words?
Experiment 3: Are improvements in NV word recognition
associated with enhanced discrimination performance?
Stimuli: The amplitude envelope of the clear words was extracted
from 6 bands (50-8000Hz, spaced approximately logarithmically
according to Greenwood's (1990) equation). The extracted
envelopes were half-wave rectified and used to modulate bandlimited noise in their respective extraction bands.
Subjects: All participants were native speakers of British English, aged
18-25 and had no history of hearing impairment or dyslexia.
References:
Davis, M. H., Johnsrude, I., Hervais-Adelman, A. G., Taylor, K., & McGettigan, C. (2005). Lexical information
drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded
sentences. JEP: General, 134(2), 222-241.
Greenwood, D. D. (1990). A cochlear frequency-position function for several species--29 years later.
J Acoust Soc Am, 87(6), 2592-2605.
Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily
temporal cues. Science, 270, 303-304.
% Words Reported Correctly
Normal Speech
Experiment 2
Report
40
35
30
Vocoded-Clear-Vocoded
[pop-out]
20
Vocoded-Vocoded-Clear
[no pop-out]
First
Test Block
2 groups of 12 subjects took part in a cross-over study in which
they were trained with 120 VCV word and 120 VCV non-word
stimuli; one group was trained with words followed by non-words
(W.N.) and the other group was trained on non-words followed by
words (N.W.).
W.N.
Test: 40
V
Words
Train:
120 VCV
Words
Test: 40
V
Words
Train: 120
VCV NonWords
Test: 40
V
Words
Train: 120
VCV NonWords
Test: 40
V
Words
Train:
120 VCV
Words
Test: 40
V
Words
Test: 40
V
Words
Experiment 2: Presence
of Lexical Information
Davis et al. (2005) found that a period of exposure to noisevocoded non-word sentences did not provide any advantage to
listeners hearing NV speech, while training with syntactic prose
was as effective as training with real English sentences. This was
taken as an indication that lexical information is required for
effective adaptation to NV speech. However because STM capacity
may be exceeded by non-word sentences, we conducted an
experiment in which subjects were trained with words and nonwords, and the effectiveness of each type of stimulus was assessed.
40 2AFC
trialsminimal
pairs of
words
Free
Report:
20 V
Words
20
40
35
30
Word - Non Word
Training
25
Non Word - Word
Training
20
Baseline
Second
Subjects in both conditions show a significant improvement in
comprehension of NV speech (p<0.001). There is a significant
difference in performance between VCV and VVC (p<0.003),
indicating that training with VCV is more effective than training
with VVC.
4 groups of 8 subjects (aged 18-25) were trained with 0, 40, 80 or
120 real English words in VCV triplets. Phoneme-discrimination
performance was tested immediately beore and after training, using
a 2AFC task with visually-presented minimal phonological pairs of
real English words. Free report performance was also assessed- at
the beginning and end of the experiment.
N.W.
Vocoded
Word
Clear
Word
25
Experiment 3
Retest-Test Scores (%)
Human speech perception is robust in the face of the degraded
speech to which we are exposed in everyday life.
This robustness is vital to cochlear implant (CI) users who can
understand speech despite a dramatic reduction in the spectral
detail and temporal fine-structure provided by their implants.
The information provided by a CI can be simulated for normally
hearing listeners by using a range of vocoding techniques. In noise
vocoded (NV) speech information within a number of frequencyspecific regions (bands) is replaced by band-limited noise
modulated by the amplitude envelope of speech within that
frequency region (Shannon et al.,1995).
Spectrograms of normal speech and noise-vocoded speech
Experiment 1: The
Role of Feedback
% Words Reported Correctly
Introduction
After one
training block
After both
training blocks
There is a significant interaction between training condition and
improvement in performance (p<0.03). The improvement in
performance after training with words is significantly greater than
with non-words (p<0.05). This shows that previous findings were not
due to STM limitations. This result confirms the importance of the
presence of lexical information for rapid learning of NV speech.
Experiment 3:
Discrimination
It has been shown that subjects' ability to report noise vocoded
speech improves with training. However, it is not clear whether this
is due to improvements in their ability to discriminate between
noise-vocoded speech sounds or due to improved guessing.
In this experiment, subjects' discrimination performance was tested
explicitly using a 2AFC task. Free report performance was also
assessed to provide a comparison of the two tasks.
Train: 0,
40, 80 or
120 VCV
words
40 2AFC
trialsminimal
pairs of
words
Free
Report:
20 V
Words
20
15
10
5
0
-5
0
-10
120
80
40
Number of Training Words
Free
Report
2AFC
2AFC performance improves signficantly from the first to the second
trial block; however, the amount of improvement appears to be
independent of the amount of training received. Improvements in
free report performance are significantly greater with more training
(p<0.001) with a significant linear correlation between number of
training words and performance (p<0.005), suggesting that the two
tasks do not rely on the same underlying processes.
Discussion
We have shown that explicit knowledge of the identity of distorted
words improves the rate at which listeners learn to understand NV
speech. Knowledge of the acoustic form of the . This advantage
depends upon involvement of a level of information beyond the
acoustic representation of the stimuli, as demonstrated by the
significantly greater effect of training subject with words rather than
non-words.
The generalisability of the learning to untrained words suggests
that the learning occurs at a pre-lexical level, but the importance of
lexical information to learning indicates that it is driven by higher
levels of the auditory system
Improvements in word recognition, but not discrimination, correlate
with the amount of training given- it is unclear whether listeners
learn to understand NV phonemes after training with NV words.