Pitch perception in normal, impaired and electric hearing: All in the

Pitch perception in normal,
impaired and electric hearing:
All in the timing?
Andrew J. Oxenham, PhD
Departments of Psychology
and Otolaryngology
University of Minnesota – Twin Cities
!
Auditory
Perception &
Cognition Lab
Mathematical transforms in hearing. I.
•  Fourier (1768-1830)
–  Any waveform can be represented as a sum of sinusoids.
–  Perceptually relevant (Ohm; Helmholtz).
–  Cochlea acts in some ways as a Fourier analyzer, decomposing
complex sound into sinusoidal components.
–  Serves as basis for multi-channel representation in CIs.
Mathematical transforms in hearing. II.
•  Hilbert (1862-1943)
–  Any band-limited waveform can be represented as the
product of a temporal envelope and temporal fine structure.
–  Auditory nerve performs the nonlinearity necessary to
extract temporal envelope.
–  Temporal fidelity of auditory nerve sufficient to retain
temporal fine structure (at low frequencies).
Rose et al. (1971)
Dreyer & Delgutte (2006)
Auditory chimeras
•  Principle: take the
fine structure of one
sound and combine
it with the envelope
of another. What do
you hear?
•  Does envelope or
fine structure
dominate
perception?
Smith, Delgutte & Oxenham (Nature, 2002)
Speech chimeras
•  S1: “The clown has a funny face”
•  S2: “The car is going too fast”
16-channel chimera:
•  Fine structure – S1
•  Envelope – S2
Which sentence do you hear?
Speech taken from HINT sentence corpus (Nilsson et al., 1994)
Melody chimeras
•  Frere Jacques
•  Twinkle twinkle
16-channel chimera:
•  Fine-structure – Twinkle twinkle
•  Envelope – Frere Jacques
Which melody do you hear?
(Tunes altered to eliminate any rhythmic cues.)
Temporal Envelope vs. Fine Structure
Dominance of each code depends on the material:
1.  Envelope is dominant for speech.
2.  Fine structure is dominant for pitch.
Results support most current cochlear implant coding
schemes, in which temporal fine structure is mostly not
represented.
Importance of TFS
•  TFS conveys pitch (and localization) information.
•  Pitch is important for:
–  Music
•  Melody and harmony
–  Speech
• 
• 
• 
• 
Voicing cues
Prosody
Speaker identification
Lexical information in tone languages
–  Sound segregration
•  Cocktail party problems
•  Reintroducing TFS may help CIs in all areas.
But is TFS for pitch really coded temporally?
Yes for binaural processing.
Unsure for monaural aspects,
such as pitch.
Temporal code?
(Hilbert)
From Rose et al. (1971)
Time
Place code?
(Fourier)
“Place-time” code?
(Hybrid)
From Ruggero et al. (1997)
Why do we care?
•  In many respects place and time models can be
considered functionally and mathematically equivalent.
–  Different transforms of the same thing.
•  Representations in the auditory nerve are transformed
many times on their way to cortex. How relevant is the
cochlear/auditory-nerve representation anyway?
Implications
•  Clinical:
Cochlear implant users:
–  Poor pitch/music perception
–  Poor speaker, prosody identification
–  Poor source segregation abilities
Hearing-impaired listeners:
–  Difficulty in noisy, complex backgrounds
–  Broader auditory filters, leading to fewer resolved harmonics
–  May be less able to make use of temporal fine structure (e.g.,
Lorenzi et al., 2006)
Pitch coding in the cochlea may play a role in all these
areas
•  Technology:
–  Major limiting factor in automatic speech recognition (ASR) is
susceptibility to background interference.
Some background information:
Harmonic complex tones
•  Most musical sounds and voiced speech sounds are
harmonic complex tones.
•  Tone – periodic
•  Complex – comprising more than one sinusoid (or pure
tone).
•  Harmonic – all frequencies are integer multiples of the
fundamental frequency (or repetition rate)
15
80
60
40
20
0
20
10
10
5
0
0
-5
-10
-10
0
200
400
600
800
1000
-15
0
0
5
5
10
10
15
15
-20
20
Time (ms)
1
1
0.8
0.8
0
Harmonic complex tones
Fundamental
frequency (F0)
1
0
2
4
6
8
Second harmonic (2F0)
10
+
12
14
16
18
20
2
Third harmonic (3F0)
Fourth harmonic (4F0)
Amplitude
0
2
4
6
8
+
10
12
14
16
18
20
3
0
2
4
6
8
10
+
12
14
16
18
20
4
0
2
4
6
8
10
12
14
16
18
20
80
60
40
20
0
0
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
80
60
40
20
0
80
60
40
20
0
80
60
40
20
0
=
80
60
40
20
0
1+2+3+4
0
5
10
Time (ms)
15
20
Frequency (Hz)
Pitch of
complex
tone
Pitch of the missing fundamental
•  Harmonic tones produce a pitch at the fundamental
frequency (F0), even if there is no energy at the F0
itself (pitch of the missing fundamental) (Seebeck,
1841; Schouten, 1940).
15
20
10
10
5
0
0
-5
-10
-10
Amplitude
Time
-15
0
5
10
15
-20
5
10
Pitch = 200 Hz
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
500
1000
1500
2000
0
0
500
1000
Pitch = 200 Hz
200
600 1000 1400
400
800 1200 1600
Frequency (Hz)
1500
Resolved
harmonics
Unresolved
harmonics
Place
Time
From: Oxenham (2012, J Neurosci)
Which is it - Time or Place?
•  For pure tones (and most other sounds), temporal and
place information co-vary, making dissociation difficult.
•  Transposed stimuli (van de Par & Kohlrausch, 1997) are
an attempt to overcome this.
AIMS:
•  Transpose low-frequency temporal fine-structure
information into the envelope of a high-frequency carrier.
•  Dissociate place and time representations in the auditory
nerve.
Transposed stimuli
Stimuli
Peripheral auditory representation
1
0
-1
1
Amplitude
15
0
-1
0
5
10
Time (ms)
15
1
x
1
0
-1
0
0
5
10
Time (ms)
15
5
10
Time (ms)
5
10
Time (ms)
15
1
0
-1
0
15
Transposed tone
-1
0
Amplitude
-1
0
Carrier
5
10
Time (ms)
1
0
Amplitude
Amplitude
0
Modulator
Amplitude
Amplitude
Sinusoid
5
10
Time (ms)
15
(van de Par and Kohlrausch, 1997)
fm-fc fm+fc
fm
Frequency
Transposed tones: Simple pitch
•  Unlike ITDs, temporal information for frequency cannot
be used optimally by the auditory system, if presented to
the “wrong” place.
•  Pitch perception is weaker for transposed tones.
•  Place information may be important.
300-Hz pure tone
300-Hz tone, transposed to 4 kHz
What about complex pitch?
300
400
4000
6300
-0.5
0
0.5
1
1.5
0
0
50
50
150
150
200
200
Time
100
100
250
250
300
300
(1)
-0.5
0
0.5
1
1.5
1
-0.5
0
0.5
0
0
1.5
-0.5
0
0.5
1
1.5
50
50
150
150
200
200
250
250
Time
100
100
300
300
}
Complex tone pitch perception
(Pitch of the missing fundamental)
Pitch = 100 Hz
Pitch = ?
500
(2)
10080
Frequency (Hz)
Temporal model predictions
Highest peak
occurs at lag of 10
ms (F0 = 100 Hz)
for both sinusoids
and transposed
tones.
(Model by Meddis and
O’Mard, 1997)
Pitch matches by normal-hearing subjects
50
40
Sinusoids
S7
Transposed
30
20
Frequency difference (%)
Number of matches
10
0
40
10
S8
30
20
10
0
40
S9
30
1
20
10
0
0.5
-10
-6
-2
2
Semitones
6
10
X
S6
X
S7
S8
Oxenham et al.
(PNAS, 2004)
S
Transposed tones: Conclusions
•  Pitch of pure tones is poor and complex pitch is
nonexistent.
•  Suggests that fine structure must be presented to the
correct place in the cochlea – timing is not sufficient.
Implications:
•  Basic: Provides a stimulus class to search for the neural
correlates of pitch.
•  Practical: If fine-structure is to be reintroduced into
cochlear implants, place of stimulation may be important,
i.e., as apical as possible for low frequencies.
Open questions:
•  Can training improve performance (plasticity)?
•  How close does the place/time correspondence need to
be?
Is a temporal code necessary for pitch?
Human
psychophysics
•  Frequency discrimination
becomes poorer at high
frequencies (> 2 kHz)
•  Phase-locking in mammals
studied so far also degrades
above 1-2 kHz.
Moore (JASA 1973)
Auditory-nerve physiology
Cat
Guinea pig
Upper limit of
“musical pitch”
23
Palmer & Russell (Hear Res 1986)
Rose et al. (1971)
Timing information and melodic pitch
•  In humans pure-tone melodic pitch perception is
poor for frequencies above 4-5 kHz.
•  Current beliefs:
–  Musical pitch relies on temporal coding
–  Could also explain the upper limits of musical
instruments (~4000 Hz)?
Example with scale starting on 440 Hz
Major scale
Minor scale
Example with scale starting on 6600 Hz
Major
Which is which?
Minor
Breakdown of pure-tone pitch
•  Temporal phase locking in the auditory nerve breaks
down at high frequencies.
•  In humans pure-tone melodic pitch perception is poor
when all frequencies are above 4-5 kHz.
•  Perhaps musical pitch relies on temporally coded TFS?
What about harmonic complex tones, if the resolved
harmonics are above 6 kHz, but the F0 is well below that?
Amplitude
What about complex tones?
Background noise
Musical pitch?
Yes
Amplitude
1
No
Amplitude
7
7
8
9
10 11 12
Frequency (kHz)
Ritsma’s (1962)
“existence region”
suggests no F0
perception > 5 kHz,
in line with
predictions of
phase locking.
Melody recognition
•  Listeners are played two 4-note diatonic melodies, M1
and M2.
•  In M2, either note 2 or 3 is changed (up or down) by one
scale step (or not).
Frequency
•  Listeners decide: Were M1 and M2 the same or different.
M1
M2
Time
Conditions
Predictions
1.  Pure tones – Low
– 
M1: 500 Hz -1 kHz
M2: 2 - 4 kHz
Good
2.  Pure tones – High
– 
M1: 1.5 – 3 kHz
M2: 6 - 12 kHz
Bad
3.  Pure and complex tones
– 
– 
M1: 1 - 2 kHz pure tone
M2: F0=1 - 2 kHz, harmonics all above 7.5 kHz
(30dB/oct slope).
Tones presented at 55 dB SPL per component.
Background noise masks possible distortion (45 dB TEN).
Very good hearing required up to 16 kHz
???
Results
Mean and C.I. for 6 musically trained normal-hearing subjects
(Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011)
Envelope repetition pitch?
•  Assumption that individual harmonics are being
processed may be mistaken.
•  Perhaps listeners are sensitive to the envelope repetition
pitch via unresolved harmonics.
–  1-2 kHz, not 6-20 kHz.
•  Test using two conditions:
–  Randomly shifted harmonics
–  Dichotic presentation
Shifted harmonics
•  Each note has components spaced apart by the F0, but
with a random (equal) shift in frequency.
6
7
8
9
10
kHz
•  If envelope repetition is being used:
–  Envelope remains identical, so performance should
remain good.
•  If individual harmonics are coded:
–  Tones are no longer harmonic, and have an
ambiguous pitch, so performance should be poor.
Shifted harmonics
•  Each note has components spaced apart by the F0, but
with a random (equal) shift in frequency.
6.2
7.2
8.2
9.2
10.2
kHz
•  If envelope repetition is being used:
–  Envelope remains identical, so performance should
remain good.
•  If individual harmonics are coded:
–  Tones are no longer harmonic, and have an
ambiguous pitch, so performance should be poor.
Shifted harmonics
(Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011)
Dichotic stimulation
Amplitude
•  Present alternating harmonics to opposite ears.
10
12
14
16
Amplitude
8
7
9
11
Frequency (kHz)
13
15
Dichotic stimulation
•  If envelope cues are being used:
–  Spacing is doubled, so performance should be worse
•  If individual frequencies are being coded:
Percent-correct
–  Pitch information can be integrated between the two
ears, so performance should remain good.
100
90
80
70
60
50
40
30
20
10
0
Diotic
Dichotic
(Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011)
Melodic pitch perception above 6 kHz
•  Complex pitch perception is possible, even:
–  When all harmonics are above 6 kHz
–  In the absence of temporal envelope cues.
•  Shown with:
–  Melodies, Pitch matching, F0 discrimination
•  Either:
–  Temporal information in the auditory nerve is not
necessary for complex pitch perception (e.g. place
code)
•  Or:
–  Limit of usable phase locking in humans is higher
than 6 kHz (Seems unlikely)
Summary
•  Temporal fine structure is important for pitch in music,
speech and auditory scene analysis.
•  However, neural phase-locking to TFS seems neither
sufficient (2004) nor necessary (2011) for complex pitch
perception.
•  Results suggest that coding schemes to improve
temporal processing in CIs may not be sufficient to
improve pitch in CI users.
•  Instead, improving TFS perception for CIs may require
improved spectral resolution, possibly recreating the
spatio-temporal pattern of cochlear excitation.
Thanks to…
• 
• 
• 
• 
• 
• 
Christophe Micheyl
Josh Bernstein
Hector Penagos
Zach Smith
Bertrand Delgutte
Mike Keebler
Auditory
Perception &
Cognition Lab
NIH Grant R01 DC005216