Pitch perception in normal, impaired and electric hearing: All in the timing? Andrew J. Oxenham, PhD Departments of Psychology and Otolaryngology University of Minnesota – Twin Cities ! Auditory Perception & Cognition Lab Mathematical transforms in hearing. I. • Fourier (1768-1830) – Any waveform can be represented as a sum of sinusoids. – Perceptually relevant (Ohm; Helmholtz). – Cochlea acts in some ways as a Fourier analyzer, decomposing complex sound into sinusoidal components. – Serves as basis for multi-channel representation in CIs. Mathematical transforms in hearing. II. • Hilbert (1862-1943) – Any band-limited waveform can be represented as the product of a temporal envelope and temporal fine structure. – Auditory nerve performs the nonlinearity necessary to extract temporal envelope. – Temporal fidelity of auditory nerve sufficient to retain temporal fine structure (at low frequencies). Rose et al. (1971) Dreyer & Delgutte (2006) Auditory chimeras • Principle: take the fine structure of one sound and combine it with the envelope of another. What do you hear? • Does envelope or fine structure dominate perception? Smith, Delgutte & Oxenham (Nature, 2002) Speech chimeras • S1: “The clown has a funny face” • S2: “The car is going too fast” 16-channel chimera: • Fine structure – S1 • Envelope – S2 Which sentence do you hear? Speech taken from HINT sentence corpus (Nilsson et al., 1994) Melody chimeras • Frere Jacques • Twinkle twinkle 16-channel chimera: • Fine-structure – Twinkle twinkle • Envelope – Frere Jacques Which melody do you hear? (Tunes altered to eliminate any rhythmic cues.) Temporal Envelope vs. Fine Structure Dominance of each code depends on the material: 1. Envelope is dominant for speech. 2. Fine structure is dominant for pitch. Results support most current cochlear implant coding schemes, in which temporal fine structure is mostly not represented. Importance of TFS • TFS conveys pitch (and localization) information. • Pitch is important for: – Music • Melody and harmony – Speech • • • • Voicing cues Prosody Speaker identification Lexical information in tone languages – Sound segregration • Cocktail party problems • Reintroducing TFS may help CIs in all areas. But is TFS for pitch really coded temporally? Yes for binaural processing. Unsure for monaural aspects, such as pitch. Temporal code? (Hilbert) From Rose et al. (1971) Time Place code? (Fourier) “Place-time” code? (Hybrid) From Ruggero et al. (1997) Why do we care? • In many respects place and time models can be considered functionally and mathematically equivalent. – Different transforms of the same thing. • Representations in the auditory nerve are transformed many times on their way to cortex. How relevant is the cochlear/auditory-nerve representation anyway? Implications • Clinical: Cochlear implant users: – Poor pitch/music perception – Poor speaker, prosody identification – Poor source segregation abilities Hearing-impaired listeners: – Difficulty in noisy, complex backgrounds – Broader auditory filters, leading to fewer resolved harmonics – May be less able to make use of temporal fine structure (e.g., Lorenzi et al., 2006) Pitch coding in the cochlea may play a role in all these areas • Technology: – Major limiting factor in automatic speech recognition (ASR) is susceptibility to background interference. Some background information: Harmonic complex tones • Most musical sounds and voiced speech sounds are harmonic complex tones. • Tone – periodic • Complex – comprising more than one sinusoid (or pure tone). • Harmonic – all frequencies are integer multiples of the fundamental frequency (or repetition rate) 15 80 60 40 20 0 20 10 10 5 0 0 -5 -10 -10 0 200 400 600 800 1000 -15 0 0 5 5 10 10 15 15 -20 20 Time (ms) 1 1 0.8 0.8 0 Harmonic complex tones Fundamental frequency (F0) 1 0 2 4 6 8 Second harmonic (2F0) 10 + 12 14 16 18 20 2 Third harmonic (3F0) Fourth harmonic (4F0) Amplitude 0 2 4 6 8 + 10 12 14 16 18 20 3 0 2 4 6 8 10 + 12 14 16 18 20 4 0 2 4 6 8 10 12 14 16 18 20 80 60 40 20 0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 80 60 40 20 0 80 60 40 20 0 80 60 40 20 0 = 80 60 40 20 0 1+2+3+4 0 5 10 Time (ms) 15 20 Frequency (Hz) Pitch of complex tone Pitch of the missing fundamental • Harmonic tones produce a pitch at the fundamental frequency (F0), even if there is no energy at the F0 itself (pitch of the missing fundamental) (Seebeck, 1841; Schouten, 1940). 15 20 10 10 5 0 0 -5 -10 -10 Amplitude Time -15 0 5 10 15 -20 5 10 Pitch = 200 Hz 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 500 1000 1500 2000 0 0 500 1000 Pitch = 200 Hz 200 600 1000 1400 400 800 1200 1600 Frequency (Hz) 1500 Resolved harmonics Unresolved harmonics Place Time From: Oxenham (2012, J Neurosci) Which is it - Time or Place? • For pure tones (and most other sounds), temporal and place information co-vary, making dissociation difficult. • Transposed stimuli (van de Par & Kohlrausch, 1997) are an attempt to overcome this. AIMS: • Transpose low-frequency temporal fine-structure information into the envelope of a high-frequency carrier. • Dissociate place and time representations in the auditory nerve. Transposed stimuli Stimuli Peripheral auditory representation 1 0 -1 1 Amplitude 15 0 -1 0 5 10 Time (ms) 15 1 x 1 0 -1 0 0 5 10 Time (ms) 15 5 10 Time (ms) 5 10 Time (ms) 15 1 0 -1 0 15 Transposed tone -1 0 Amplitude -1 0 Carrier 5 10 Time (ms) 1 0 Amplitude Amplitude 0 Modulator Amplitude Amplitude Sinusoid 5 10 Time (ms) 15 (van de Par and Kohlrausch, 1997) fm-fc fm+fc fm Frequency Transposed tones: Simple pitch • Unlike ITDs, temporal information for frequency cannot be used optimally by the auditory system, if presented to the “wrong” place. • Pitch perception is weaker for transposed tones. • Place information may be important. 300-Hz pure tone 300-Hz tone, transposed to 4 kHz What about complex pitch? 300 400 4000 6300 -0.5 0 0.5 1 1.5 0 0 50 50 150 150 200 200 Time 100 100 250 250 300 300 (1) -0.5 0 0.5 1 1.5 1 -0.5 0 0.5 0 0 1.5 -0.5 0 0.5 1 1.5 50 50 150 150 200 200 250 250 Time 100 100 300 300 } Complex tone pitch perception (Pitch of the missing fundamental) Pitch = 100 Hz Pitch = ? 500 (2) 10080 Frequency (Hz) Temporal model predictions Highest peak occurs at lag of 10 ms (F0 = 100 Hz) for both sinusoids and transposed tones. (Model by Meddis and O’Mard, 1997) Pitch matches by normal-hearing subjects 50 40 Sinusoids S7 Transposed 30 20 Frequency difference (%) Number of matches 10 0 40 10 S8 30 20 10 0 40 S9 30 1 20 10 0 0.5 -10 -6 -2 2 Semitones 6 10 X S6 X S7 S8 Oxenham et al. (PNAS, 2004) S Transposed tones: Conclusions • Pitch of pure tones is poor and complex pitch is nonexistent. • Suggests that fine structure must be presented to the correct place in the cochlea – timing is not sufficient. Implications: • Basic: Provides a stimulus class to search for the neural correlates of pitch. • Practical: If fine-structure is to be reintroduced into cochlear implants, place of stimulation may be important, i.e., as apical as possible for low frequencies. Open questions: • Can training improve performance (plasticity)? • How close does the place/time correspondence need to be? Is a temporal code necessary for pitch? Human psychophysics • Frequency discrimination becomes poorer at high frequencies (> 2 kHz) • Phase-locking in mammals studied so far also degrades above 1-2 kHz. Moore (JASA 1973) Auditory-nerve physiology Cat Guinea pig Upper limit of “musical pitch” 23 Palmer & Russell (Hear Res 1986) Rose et al. (1971) Timing information and melodic pitch • In humans pure-tone melodic pitch perception is poor for frequencies above 4-5 kHz. • Current beliefs: – Musical pitch relies on temporal coding – Could also explain the upper limits of musical instruments (~4000 Hz)? Example with scale starting on 440 Hz Major scale Minor scale Example with scale starting on 6600 Hz Major Which is which? Minor Breakdown of pure-tone pitch • Temporal phase locking in the auditory nerve breaks down at high frequencies. • In humans pure-tone melodic pitch perception is poor when all frequencies are above 4-5 kHz. • Perhaps musical pitch relies on temporally coded TFS? What about harmonic complex tones, if the resolved harmonics are above 6 kHz, but the F0 is well below that? Amplitude What about complex tones? Background noise Musical pitch? Yes Amplitude 1 No Amplitude 7 7 8 9 10 11 12 Frequency (kHz) Ritsma’s (1962) “existence region” suggests no F0 perception > 5 kHz, in line with predictions of phase locking. Melody recognition • Listeners are played two 4-note diatonic melodies, M1 and M2. • In M2, either note 2 or 3 is changed (up or down) by one scale step (or not). Frequency • Listeners decide: Were M1 and M2 the same or different. M1 M2 Time Conditions Predictions 1. Pure tones – Low – M1: 500 Hz -1 kHz M2: 2 - 4 kHz Good 2. Pure tones – High – M1: 1.5 – 3 kHz M2: 6 - 12 kHz Bad 3. Pure and complex tones – – M1: 1 - 2 kHz pure tone M2: F0=1 - 2 kHz, harmonics all above 7.5 kHz (30dB/oct slope). Tones presented at 55 dB SPL per component. Background noise masks possible distortion (45 dB TEN). Very good hearing required up to 16 kHz ??? Results Mean and C.I. for 6 musically trained normal-hearing subjects (Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011) Envelope repetition pitch? • Assumption that individual harmonics are being processed may be mistaken. • Perhaps listeners are sensitive to the envelope repetition pitch via unresolved harmonics. – 1-2 kHz, not 6-20 kHz. • Test using two conditions: – Randomly shifted harmonics – Dichotic presentation Shifted harmonics • Each note has components spaced apart by the F0, but with a random (equal) shift in frequency. 6 7 8 9 10 kHz • If envelope repetition is being used: – Envelope remains identical, so performance should remain good. • If individual harmonics are coded: – Tones are no longer harmonic, and have an ambiguous pitch, so performance should be poor. Shifted harmonics • Each note has components spaced apart by the F0, but with a random (equal) shift in frequency. 6.2 7.2 8.2 9.2 10.2 kHz • If envelope repetition is being used: – Envelope remains identical, so performance should remain good. • If individual harmonics are coded: – Tones are no longer harmonic, and have an ambiguous pitch, so performance should be poor. Shifted harmonics (Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011) Dichotic stimulation Amplitude • Present alternating harmonics to opposite ears. 10 12 14 16 Amplitude 8 7 9 11 Frequency (kHz) 13 15 Dichotic stimulation • If envelope cues are being used: – Spacing is doubled, so performance should be worse • If individual frequencies are being coded: Percent-correct – Pitch information can be integrated between the two ears, so performance should remain good. 100 90 80 70 60 50 40 30 20 10 0 Diotic Dichotic (Oxenham, Micheyl, Keebler, Loper, Santurette, PNAS, 2011) Melodic pitch perception above 6 kHz • Complex pitch perception is possible, even: – When all harmonics are above 6 kHz – In the absence of temporal envelope cues. • Shown with: – Melodies, Pitch matching, F0 discrimination • Either: – Temporal information in the auditory nerve is not necessary for complex pitch perception (e.g. place code) • Or: – Limit of usable phase locking in humans is higher than 6 kHz (Seems unlikely) Summary • Temporal fine structure is important for pitch in music, speech and auditory scene analysis. • However, neural phase-locking to TFS seems neither sufficient (2004) nor necessary (2011) for complex pitch perception. • Results suggest that coding schemes to improve temporal processing in CIs may not be sufficient to improve pitch in CI users. • Instead, improving TFS perception for CIs may require improved spectral resolution, possibly recreating the spatio-temporal pattern of cochlear excitation. Thanks to… • • • • • • Christophe Micheyl Josh Bernstein Hector Penagos Zach Smith Bertrand Delgutte Mike Keebler Auditory Perception & Cognition Lab NIH Grant R01 DC005216
© Copyright 2026 Paperzz