effect of masker fluctuations on lexical segmentation in cochlear

•
Normal hearing (NH) listeners can take advantage of
amplitude fluctuations in the noise to improve their speech
understanding, compared to their performance in steady noise
at the same nominal Signal-to-Noise Ratio (SNR).
• On the other hand, most cochlear implant (CI) users show
little to no benefit from masker fluctuations.
• We tried to artificially promote masking release in CI users by
creating a condition with little temporal overlap between
speech and noise (“+MR”, Kwon et al. (2012)1). Some
proficient CI users demonstrated masking release: they were
able to glimpse and understand the speech in the dips of the
masker.
A
Noise
B
Noise
C
A+B
“+MR”
A+C
“Steady”
Some CI users fail to show masking release. Why?
•
Listeners must segment speech into meaningful linguistic
units to understand the message.
• When other segmentation cues are unavailable or unreliable,
listeners can use metrical stress as a segmentation guide.
• The Metrical Segmentation Strategy (MSS) is a strategy of
using syllable strength as indicators of word boundaries.2
• The MSS is useful since most common English content
words begin with a strong syllable.3
• Sometimes listeners make Lexical Boundary Errors (LBE).
• LBEs can be split into 4 types: incorrectly Inserting a word
boundary before a Strong (IS) or Weak (IW) syllable, or
incorrectly Deleting a word boundary before a Strong (DS) or
Weak (DW) syllable.
Department of Hearing, Speech, and Language Sciences, Gallaudet University, Washington D.C., USA
Table 1: Lexical Boundary Error (LBE) Examples
Stimulus Presented
Subject Response
Platoons deserve respect
The tunes deserve respect
IS
Clever women wrote this
Clever windows open
DW
He tapped the black device
Attack the black device
DS
The farmer waters crops
The farmer bought his crops
IW
Consistent with the MSS, NH listeners make many more IS
and DW errors than DS and IW errors, particularly when the
speech signal is noisy or degraded.4,5
 NH listeners tend to treat strong syllables as indicating the
start of a word, but weak syllables as continuing a word.
• A listener’s use of the MSS can be given by an index, IMSS,
ranging from 0 to 1, with values near 0.5 indicating little use of
metrical cues and values near 1 indicating very strong use of
the MSS.
IMSS = (#IS + #DW) / #Total LBEs
If masking release in fluctuating noise is related to the
use of metrical segmentation cues…
•
CI users who show masking release should also show more
robust use of the MSS (higher IMSS value) in the +MR condition
than in Steady noise.
• Users who show no masking release should also show little
difference in their use of the MSS between noise conditions.
Method
•
•
Fig. 2: Difference in IMSS as a Function of Masking Release
•
LBE
•
What enabled those good CI users to
benefit from the masker dips in the
artificial condition?
HYPOTHESIS: For CI users, the fluctuating noise around the speech might weaken or obscure speech segmentation cues, reducing the usefulness of the exposed speech.
Trevor T. Perry & Bomjun J. Kwon
The study was supported by NIDCD (R03 DC009061)
Fig. 1: Speech in Noise Conditions
Speech
EFFECT OF MASKER FLUCTUATIONS ON LEXICAL SEGMENTATION
IN COCHLEAR IMPLANT LISTENERS
10 Nucleus CI users, data from 8 users included in analysis
Sentence Keyword Identification measured at a fixed SNR in
two noise conditions (See Fig. 1):
• Steady noise
• +MR noise
 +MR noise is an artificial noise condition designed to
promote opportunities for “dip-listening” by lessening the
temporal overlap between speech and noise. Noise
energy is low when speech energy is high, and vice versa.
 Both conditions have the same spectral shape as the
sentences
As performance of CI users in quiet varies between subjects,
SNRs were selected individually for each subject to minimize
ceiling and floor effects.
 Tested SNRs ranged from -2 to 7 dB
• 60 sentences per condition, 3 keywords per sentence
• Sentences are constructed to make opportunities for all 4
types of LBE roughly equal with each other.
• Presented via custom software based on NIC2 and NMT to
emulate actual CI processing (courtesy of Cochlear Ltd.)
• Sentence level: 50 dB SPL, default sensitivity
• Subjects’ verbal responses are transcribed and their LBEs
are coded by 2 raters.
• When rater discrepancies could not be resolved, the disputed
LBEs are discarded from analysis.
• To mitigate potential under-sampling of LBEs within subjects,
a subject’s data is included for further analysis only if they
make at least 15 LBEs (of any type) in each noise condition.
 This criterion excluded data from 2 subjects, leaving data
from 8 subjects for further analysis.
 Fewer LBEs does not necessarily mean better speech
understanding. It could merely indicate a subject’s
reluctance to guess when uncertain.
Results
•
Masking release was defined as the improvement in
proportion-correct keyword identification performance from the
Steady to the +MR condition.
• 4 subjects showed substantial masking release
• 1 subject showed poorer performance in +MR than in Steady
• All 8 subjects made more IS and DW errors than DS and IW
errors in both conditions (IMSS >0.5).
• This suggests they could perceive some syllable strength
cues and attempted to use these cues for segmentation.
Table 2: Representative Results from 2 Subjects
Subject Condition
S7
Steady
+MR
S8
Steady
+MR
SNR
3
3
3
3
Keyword
Proportion
Correct
.44
.72
.50
.53
#IS
17
14
22
20
#DW
8
5
9
11
#DS
3
1
0
1
#IW
4
1
1
1
# Total
LBE
32
21
32
33
IMSS(+MR) – IMSS(Steady)
W32, CIAP 2013
Background
Subjects are grouped here
based on masking release
Mean Masking Release: 0.25
Mean IMSS +MR: 0.90
Mean IMSS Difference: 0.08
Masking Release
Mean Masking Release: 0.01
Mean IMSS +MR: 0.73
Mean IMSS Difference: -0.09
•
The Y axis is the difference in IMSS between conditions. More
positive Y values can be interpreted as stronger use of the
MSS in the +MR condition than in the Steady condition.
• Mean IMSS Steady for both groups was the same: 0.82
Discussion
•
CI users who showed masking release showed more robust
use of segmentation cues in the fluctuating masker.
• CI users who show no benefit from masker dips appear to
have a harder time segmenting the speech in a fluctuating
masker than in steady noise.
• These two groups differ specifically in their ability to segment
speech exposed by masker fluctuations.
• The effect of noise is more than energetic masking per se.
Noise fluctuations can disrupt speech segmentation, hindering
the ability to understand speech, even when the instantaneous
SNR is quite favorable.
References
1
Kwon, B. J., Perry, T. T., Wilhelm, C. L., & Healy, E. W. (2012). Sentence recognition in noise
promoting or suppressing masking release by normal-hearing and cochlear-implant
listeners. J. Acous. Soc. Am., 131, 3111.
2 Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from
juncture misperception. J. Mem. Lang., 31(2), 218-236.
3 Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English
vocabulary. Comput. Speech. Lang., 2(3), 133-142.
4 Liss, J. M., Spitzer, S. M., Caviness, J. N., Adler, C., and Edwards, B. W. (1998). “Syllabic
strength and lexical boundary decisions in the perception of hypokinetic dysarthric
speech,” J. Acoust. Soc. Am. 104, 2457–2466
5 Mattys, S. L., White, L., and Melhorn, J. F. (2005). “Integration of multiple segmentation cues: A
hierarchical framework,” J. Exp. Psychol. Gen. 134, 477–500.