Interaction of Excitatory and Inhibitory Frequency

Cerebral Cortex September 2005;15:1371--1383
doi:10.1093/cercor/bhi019
Advance Access publication December 22, 2004
Interaction of Excitatory and Inhibitory
Frequency-receptive Fields in Determining
Fundamental Frequency Sensitivity of
Primary Auditory Cortex Neurons in
Awake Cats
Ling Qin, Masashi Sakai, Sohei Chimoto and Yu Sato
Harmonic complex tones produce pitch-height perception corresponding to the fundamental frequency (F0). This study investigates
how the spectral cue of F0 is processed in neurons of the primary
auditory cortex (A1) with sustained-response properties. We found
F0-sensitive and -insensitive cells: the former discriminated between harmonics and noise, while the latter did not. F0-sensitive
cells preferred F0s corresponding to the best frequency (BF) and
0.5 3 BF. The F0-sensitivity to F0 5 0.5 3 BF was preserved for
missing F0, but abolished by eliminating both F0 and the second
harmonic. The inhibitory subfield of the frequency-receptive field
was restricted to the spectral region between the preferred harmonics in F0-sensitive cells, while it was frequency unspecific in
F0-insensitive cells. We conclude that (i) A1 is well organized for
discrimination between harmonics and noise; (ii) pitch-height is
represented along with the tonotopic axis; (iii) all aspects of the
sustained neural responses to harmonic and noise stimuli are consequences of spectral filtering; and (iv) although the observed cell
behavior explains some psychophysical pitch perception behaviors,
such as pitch-chroma (helical pitch perception with frequency
elevation), pitch-level tolerance and adaptive behavior, F0-encoding
in A1 remains at the incomplete perceptual level (dominance of the
third to fifth harmonics for pitch strength is unexplainable by the
cell behavior).
tude modulated noise (Shouten, 1962; Plomp, 1967; Houtsma
et al., 1980; Kaernbach and Demany, 1998; Renken et al., 2004).
Pitch-height perception with resolved spectral stimuli is more
salient than with temporal periodicity stimuli by amplitudemodulated wide-band noise (Burns and Viemeister, 1976; Fastl
and Stoll, 1979; Houtsma, 1984), suggesting a larger contribution of the spectral cue to pitch-height formation in comparison
with that of the temporal cue.
Pitch-height perception is an area of broad interest, spanning
not just psychophysics, but also music and language (Patel,
2003). The neural mechanisms of pitch-height perception are
also of interest to neurophysiologists because they represent
a complex perceptual construct resulting from how the brain
processes sensory information across disparate perceptual cues
(temporal and spectral cues). Lesion studies in cats (Whitfield,
1980) and humans (Zatorre, 1988) show that the auditory
cortex is crucial for virtual pitch perception of the complex
tone. Therefore, the question remains how virtual pitch is represented in the auditory cortex.
Using magnetoencephalography (MEG), several studies in
humans have examined the topographical representation of
virtual pitch of harmonic complex tone in relation to the wellknown tonotopic map of pure tone in the primary auditory
cortex (A1). Two controversial findings were reported: Pantev
et al. (1989, 1996) showed that the spectral pitch of pure tone
and the virtual pitch of the missing F0 had the same evoked
magnetic field location, suggesting that the tonotopic map of A1
reflects the perceived pitch of complex sounds rather than their
spectral content. In contrast, Langner et al. (1997) demonstrated the orthogonal arrangement of tonotopic and periodotopic (virtual pitch) gradients.
Fishman et al. (1998, 2000) have noted the technical
limitations of non-invasive MEG studies: they suggested that
the supposition that magnetic response is generated in A1 with
no contribution from adjacent auditory areas is problematic,
and consequently, the assumption of a single dipole generator
within the superior temporal gyrus may not be justified. Fishman
et al. (1998, 2000) have performed experiments in awake
monkeys with the use of auditory-evoked potential, multipleunit activity and current source--density techniques. Laminar
response profiles in A1 reflected the spectral content rather
than the virtual pitch of compound stimuli (Fishman et al.,
1998), and hierarchically organized cortical areas other than A1
were suggested to generate virtual pitch (Fishman et al., 2000).
With the use of amplitude-modulated tones composed of
three harmonic components, the virtual-pitch mechanism in A1
was investigated in the single unit and in optical recording
studies in gerbils by Schulze and Langner (1997) and Schulze
et al. (2002). They investigated the topographical distribution
Keywords: neural mechanism, pitch chroma, pitch height, spectral cue
Introduction
A pure tone produces the perception of a single pitch-height
that corresponds simply to the frequency of the tone (spectral
pitch). A harmonic complex tone, composed of integer multiples of the fundamental frequency (F0), also produces perception of a single pitch-height corresponding to the F0 rather than
multiple pitch-heights corresponding to the individual component frequencies. When the F0 is missing we still perceive
a pitch-height corresponding to the missing F0, indicating that
subjective pitch-height corresponds to a physical frequency
that is actually absent (virtual pitch) (Terhardt, 1974). The
biological importance of virtual pitch perception is suggested
by its presence in multiple animals, including birds (Cynx and
Shapiro, 1986), cats (Heffner and Whitfield, 1976) and monkeys
(Tomlinson and Schwarz, 1988). Psychophysical experiments
have demonstrated that two acoustic cues are involved in pitchheight perception: spectral and temporal cues. The former is
a spectral relationship between resolved lower-frequency
harmonics (Ritsma, 1967; Terhardt, 1979; Cohen et al., 1995;
Renken et al., 2004), while the latter is temporal periodicity of
the sound-wave ‘envelope’ (Bernstein and Oxenham, 2003;
Renken et al., 2004), which explains pitch-height perception
from the unresolved higher-frequency harmonics and ampliÓ Oxford University Press 2005; all rights reserved
Department of Physiology, Interdisciplinary Graduate School
of Medicine and Engineering, University of Yamanashi,
Tamaho, Yamanashi 409-3898, Japan
of the best modulation frequency of amplitude-modulated
tones, and found that periodic pitch derived from sound
envelope periodicity is independently organized from the
spectral organization of the tonotopic map in A1. Although
Schulze and Langner (1997) provided a compelling explanation
for how complex tones with similar spectral locations can have
different pitches depending on their F0, the differential
organization of periodicity and spectral sensitivity does not
explain the key feature of pitch perception — that complex
tones and pure tones with the same F0 have the same pitch, as
noted by Fishman et al. (2000).
Using a harmonic complex tone composed of eight harmonics,
Schwarz and Tomlinson (1990) also performed unitary experiments in alert monkeys, searching without success for A1 neurons that respond to the missing F0. This indicates that the neural
responses in A1 were characterized by the spectral location of
the harmonics relative to the neuron’s frequency receptive field
(FRF) but not to the missing F0. Their finding (Schwarz and
Tomlinson, 1990) in single-unit studies is in good accordance
with the findings in multi-unit studies by Fishman et al. (1998,
2000). However, the neural organization in A1 for decoding
pitch-height from the harmonic complex tone remains unresolved.
The purpose of this study was to address how the spectral cue
for pitch-height perception is processed in A1, with special
attention to functional relevance such as the detection of harmonics, the discrimination between the spectral pitch-height of
pure tone and the virtual pitch-height of harmonics, and
the identification of pitch-height. For this purpose, we adopted
harmonic complex tone stimuli with a wide range of spectral
cues but fewer dominant temporal cues. We consider that an
A1 cell is sensitive to the F0 of harmonics when the A1 cell
has narrow tuning to a specific F0 of harmonic complex tones,
excluding the sensitivity to co-varying parameters associated
with F0 shift, such as low-edge frequency (LEF), bandwidth, and
overall intensity. This study reports a group of F0-sensitive cells
in A1 that respond to the missing F0 of the harmonics. The
results are discussed in terms of a physiological substrate of the
perception of pitch-height of harmonic stimuli.
The recording session began the following day. The dura was pierced
with a sharpened probe, and a glass microelectrode (tip diameter,
1.8--2.5 lm: resistance, 2--3 MX; filled with 2 M NaCl) was advanced
into the A1 with a remote-controlled micromanipulator (Narishige,
MO-951). Tone bursts with variable single frequency (SF) and sound
pressure level (SPL) were presented as search stimuli. The extracellular
single-spike activity was discriminated using a window discriminator.
The spike-occurrence outputs from the window discriminator were
captured directly through a digital-to-digital interface (Cambridge
Electronic Design Limited 1401), using a Pentium-based data-input
computer with a time resolution of 2 ls as the digital input for later data
analysis. Data were stored on hard disk.
The cat’s face, particularly the eyes, was continuously observed on
a monitor connected to a charge-coupled-device camera in front of the
cat. In our preliminary experiments, we studied the relationship between
the status of the eyes and electroencephalography (EEG). Slow waves in
EEG were observed when the eyes were closed, and when the eyes were
open but drifting, which were judged as a sleep state. Saccadic eye
movements and eye fixation were judged as signs of an awake state. Rapid
eye movements of paradoxical sleep, in which slow EEG waves were
absent, were easily identified by their characteristic appearance of halfopened eyelids, and were judged as signs of sleep. When drowsiness was
suspected, the cat was alerted by gently tapping the body with a remotecontrolled tapping tool or by briefly opening the door.
The cats sometimes moved during recording sessions, producing
artifacts in the recording. By carefully checking the monitors and the
spike train, recordings with artifacts were marked in the recording
computer in real time while recordings were in progress. Data with
artifacts could therefore be rejected.
Daily recording sessions lasted 3--5 h for 2--6 months in each animal. At
the end of each daily recording session, the recording chamber was
rinsed with sterile saline and antibiotic fluid, and sealed with Exafine
(GC Corporation) and an aluminum cap. The animal was returned to its
cage. The animals remained healthy throughout the experimental
period. At the termination of the experiment, some of the recording
sites were marked with electrolytic lesions (25 lA, 10 s). The animal was
deeply anesthetized with sodium pentobarbital and perfused with 10%
formalin before the brain was removed. A digital camera was used to
photograph the brain surface. The cerebral cortex was cut in transverse
sections and stained with neutral red. The section was captured as
a digital picture using a digital scanner. Based on the lesion locations and
electrode tracks, the recording sites were reconstructed on the picture
of the section. The position of the section was projected onto the
picture of the brain surface.
Materials and Methods
Sound Generation and Delivery
Sound generation and delivery were as in a previous report (Qin et al.,
2004a). In brief, sound signals were generated using user-written
programs under a MATLAB (Mathworks) environment on a Pentiumbased computer. Amplitude spectra were constructed, and transformed
into time-domain signals by inverse Discrete Fourier Transform. Pseudorandom phase spectra were used to make a flat temporal envelope. The
signals were fed into a 12-bit digital-to-analogue converter (National
Instruments PCI-MIO-16E-4) at a sampling interval of 100 kHz to an
eight-pole Chebyshev filter (NF Electric Instruments, P-86) with a high
cut-off frequency of 20 kHz. The output was attenuated and sent to
a low-output-impedance power amplifier (Denon, PMA2000III), and
tones were then presented from a speaker (AKG, K1000) placed 2 cm
away from the auricle contralateral to the recording site. We equalized
and calibrated the sound delivery system frequently between 128 and
16,000 Hz at 8 Hz step, and the output varied by ±1.5 dB. Harmonic
distortion was less than –60 dB. Stimuli were 500 ms in duration with
a rise/fall time of 5 ms, and were presented at an interstimulus interval of
1640 ms. The intensity of each frequency component was set to 20--80
dB SPL in 10 dB steps.
Animal Preparation, Recording and Histology
Experiments were performed in a manner consistent with the Guidelines for Animal Experiments, University of Yamanashi, and the Guiding
Principles for the Care and Use of Animals approved by the Council of
the Physiological Society of Japan. Animal preparation, recording and
histology procedures were as in previous reports (Chimoto et al., 2002;
Qin et al., 2004a). Five cats were chronically prepared for single-cell
recordings from the auditory cortex. Under pentobarbital sodium
anesthesia (initial dose 40 mg/kg) and aseptic conditions, cats had an
aluminum cylinder (inner diameter 12 mm) implanted in the bilateral
temporal bone for microelectrode access, at an angle of 10--20° from the
sagittal plane. A metal block was embedded in the dental acrylic cap to
immobilize the head. After more than 1 week of postoperative recovery,
the cat’s body was gently wrapped in a cloth bag. The head was
restrained with holding bars for a short period. In successive daily
sessions, the period was lengthened, and the cats were familiarized with
sitting in an electrically shielded, sound-attenuated chamber. The
animals were given food and drink during the sessions and, after each
session, they were returned to their home cages. The conditioning
procedure lasted at least 2 weeks. When recording experiments began,
they sat with no sign of discomfort or restlessness. One day before
beginning the recording session, the bone (diameter 1--2 mm) at the
bottom of the cylinder was removed, leaving the dura intact under
ketamine anesthesia (initial dose 15 mg/kg).
1372 Cortical Processing of Pitch Information Qin et al.
d
Pure and Two-tone Stimuli and FRF Analysis
The procedures for evaluating FRF are as reported previously (Qin et al.,
2004a). In brief, once a single neuron was found, we presented puretone stimuli with variable SF (125 steps) at a given SPL (Fig. 1A). Spike
activities were analyzed using user-written programs under a MATLAB
Cell No.: #84001; 50dB SPL
SF
80
SF (kHz)
2.0
60
1.5
0.5 BF
BF
40
1.0
20
0.5
0
0.5
0.9
1.3
1.7
2.1
0.5
1.0
1.5
2.0
SF (kHz)
F0
80
Driven rate (sp/s)
2.0
1.5
1.0
0.5
1.2
2.4
3.6
4.8
60
40
20
0
16
0.5
1.0
1.5
2.0
0.5
1.0
1.5
2.0
S2F
BF
80
S2F (kHz)
2.0
60
1.5
40
1.0
20
0.5
0
0.5
0.9
1.3
1.7
2.1
Frequency (kHz)
0
0.5
Time (s)
Figure 1. Temporal and spectral-response properties of a representative F0-sensitive cell in response to 3 different stimulus paradigms. (A--C) Stimulus paradigms. For pure-tone
(A), harmonic complex tone (B) and two-tone (C) stimuli, SF, F0 and S2F, respectively, were systematically shifted by 125 steps. The spectral profiles of 5 out of the 125 stimulus
frequencies are shown. (D--F) Raster diagrams were constructed by plotting spike trains for stimulus parameters shifted systematically: SF for pure-tone stimuli (D), F0 for harmonic
complex tone stimuli (E) and S2F for two-tone stimuli (F). Thick horizontal bar shows the stimulus period. (G--I) Response functions were constructed by plotting the driven rate
against SF (G), F0 (H) and S2F (I), respectively. Vertical dotted lines in G, H and I show the frequencies corresponding to BF and 0.5 3 BF (one octave below BF). Horizontal
interrupted lines in G and H show 2 SD of the background discharge level. The thin solid horizontal line in H shows the half height of the mean (10 repetitions) driven rate to BF tone
alone. Thick solid and dotted horizontal lines in I show the mean and 2 SD of the BF driven rate, respectively. Note that (i) the cell shows sustained firing during stimuli (black bars
in D, E and F) of 0.5 s; (ii) there are two F0-tuning peaks (H) higher than the half BF driven rate (F0 5 0.5 and 1.0 3 BF); and (iii) the S2F-response function (I) shows two inhibitory
subfields restricted to two spectral regions (0.5--1.0 and 1.0--1.5 3 BF) between harmonics whose F0 is 0.5 3 BF.
(Mathworks) environment. Driven rates (the firing rate during a stimulus
period of 0.5 s minus the background firing rate during a pre-stimulus
period of 0.5 s) were calculated for each SF to construct the isointensity
SF-response function (Fig. 1G), which was filtered by applying the
weighted averaging with four neighbors. The weight of the four
neighbors and the center frequency was in the ratio of 1:2:3:2:1,
respectively. By comparing the maximum heights of SF-response
functions obtained at different stimulus intensities (20--80 dB SPL in
10 dB steps), we defined the ‘best SPL’ as the sound intensity producing
the maximum driven rate and the best frequency (BF) as the SF
producing the maximum driven rate. We defined the excitatory subfield
of FRF at the best SPL as the portion where the SF-response function was
positive. We considered the excitatory subfield as significant if the SFresponse height was higher than 2 SD (interrupted line in Fig. 1G) of the
background firing rates. The excitatory magnitude of FRF was defined as
the height of the SF-response function above the baseline.
For two-tone stimuli, two frequency components were presented
simultaneously at the best SPL, the first-tone frequency was fixed at the
cell’s BF, and the second-tone frequency (S2F) varied systematically at
125 steps (Fig. 1C). The driven rate was calculated for each S2F to
construct the S2F-response function (Fig. 1I). We defined the inhibitory
subfield of FRF as the portion where the S2F-response function was
lower than the mean (10 repetitions) driven rate to BF only (solid line in
Fig. 1I ). We considered the inhibitory subfield significant if the S2Fresponse function was lower than the mean –2 SD (dotted horizontal
line in Fig. 1I ) to BF only. The inhibitory magnitude of FRF was defined
as the deviation of the S2F-reponse function from the mean driven rate
to BF tone alone, when the S2F-response function was lower than the
mean driven rate to BF.
To evaluate the summation strength of excitatory and inhibitory
subfields of FRF, respectively, corresponding to the harmonics of a
given F0, we calculated the summation of excitatory and inhibitory magnitudes of FRF. The excitatory summation (ES) was estimated by
sampling the excitatory magnitudes of FRF in a fixed frequency step,
then summing the sampled magnitudes. The inhibitory summation (IS)
was estimated by applying the same procedure to the inhibitory
magnitudes. For each cell, ES and IS were calculated in nine different
sampling steps (0.25 3 BF, 0.5 3 BF, 0.75 3 BF,. . ., 2.25 3 BF). In each
sampling step, IS was subtracted from ES to give the net summation (NS),
which was used to reflect the difference between ES and IS. A positive
NS indicated that ES was dominant, while a negative NS indicated
a dominant IS. For each cell, we plotted ES, IS and NS against the
sampling frequency to predict the change of excitation and inhibition
when FRF was sampled in different sampling steps.
To evaluate the excitation--inhibition balance of FRF quantitatively,
we defined the balance index (BI)
Cerebral Cortex September 2005, V 15 N 9 1373
BI = ðES – ISÞ=ðES+ISÞ
A
F0
where the IS and ES used were obtained in the smallest sampling step
(0.25 3 BF) to measure the total strength of excitation and inhibition
in FRF and compare it across cells. A positive value of BI indicated
the dominance of excitation (BI = 1, excitation only); zero indicated the
balance of excitation and inhibition; while a negative value indicated the
dominance of inhibition (BI = –1, inhibition only).
Harmonic Complex Tone and Control Stimuli
Figure 2A shows the spectral profile of a harmonic complex tone,
composed of integer multiples of F0 in the spectral range of 128--16000
Hz with equal spectral amplitude at the best SPL. The amplitude
spectrum of harmonic complex tone stimuli was transformed into
a time-domain signal of 500 ms in duration by inverse Discrete Fourier
Transform using a pseudo-random phase spectrum, resulting in less
dominant temporal periodicity of the sound-wave envelope corresponding to the period of F0 (Fig. 2B,C). Thus, in this study, harmonic complex
tone stimuli have a wide range (128--16000 Hz) of spectral cues but less
dominant temporal cues.
In the harmonic complex tone paradigm, the stimuli of 125 different
F0 were presented in random order. The spectral profiles of five of the
125 harmonic complex tones are presented in Figure 1B. F0 was shifted
systematically in the spectral ranges of 128--1120 Hz (step, 8 Hz), 128-2112 Hz (16 Hz) or 128-4096 Hz (32 Hz). One of the three F0 ranges was
adopted for a given cell to include the frequencies at least one octave
above and below the cell BF. For example, in cells with a BF 1.0 kHz,
a range of F0 of 128--2112 Hz was adopted. Thus, in this study, the
number of harmonics varied systematically. For example, when the F0
was 128 Hz, the number of the harmonics was 125 (component
frequencies of 128, 256, 384, 512,. . ., 16 000 Hz). When the F0 was
4096 Hz, the number of harmonics was three (component frequencies
of 4096, 8192 and 12 588 Hz).
A raster diagram of spike trains in response to 125 different F0s was
constructed (Fig. 1E). On the ordinate of the raster plot (Fig. 1E), each
harmonic tone contained F0 specified by a number (e.g. 1.0 kHz for
a harmonic tone of F0 = 1.0 kHz) and, by definition, its multiple integers
not indicated on the ordinate. One line of the raster plot (Fig. 1E)
showed the response timing of cell spikes in response to one harmonic
complex tone at a given F0. The driven rates were plotted against F0,
constructing the F0-response function (Fig. 1H).
To generate high-pass noise stimuli, we constructed amplitude spectra
composed of a large number of frequencies (range, 128--16 000 Hz;
interval between components, 16 or 32 Hz, much narrower than the
critical bandwidth for the lowest component frequency of 128 Hz) with
equal spectral amplitude at the best SPL. LEF varied systematically
(Fig. 5F), constructing the LEF-response function (Fig. 5H).
The missing F0 stimulus was constructed by systematically eliminating the lower harmonics, while preserving the higher harmonics (Fig.
7A). The response height was analyzed in relation to the lowest
harmonic number of the stimulus (Fig. 7B,C).
Finally, we constructed two-component harmonic stimuli composed
of F0 and the second harmonic (Fig. 7D). Similar to the full-component
harmonics, we shifted the F0 of the two-component harmonics
systematically (Fig. 7D), constructing F0-response functions (Fig. 7F ).
Results
The results reported here are based on 102 single cells recorded
from five cats. The recording site was determined with reference to the electric lesion and the electrode track in the brain
section, and was projected onto a picture of the brain surface.
Histological reconstruction of the recording sites showed that
the cells were sampled from the caudal part of the middle
ectosylvian gyrus, the banks of the dorsal tip of the posterior
ectosylvian sulcus and a small portion of the adjacent posterior
ectosylvian gyrus, i.e. the caudal part of A1 (Reale and Imig,
1980), in which the cells had a BF of <3.5 kHz. We used two
criteria for distinguishing cells recorded in the A1 and the
posterior auditory field (PAF): cell location and BF. The PAF is
1374 Cortical Processing of Pitch Information Qin et al.
d
0
1.0
2.0
16
Frequency (kHz)
B
0
10
20
30
40
50
Time (ms)
C
20
30
Time (ms)
Figure 2. Spectral and temporal profiles of a harmonic complex tone stimulus. (A)
Amplitude spectrum of a harmonic stimulus with 256 Hz F0 containing 62 component
frequencies in the spectral range of 128--16000 Hz. For clarification of the harmonic
interval, only the 10 lowest and highest frequencies are shown. (B, C) The time signal
of harmonic stimulus: onset 50 ms of total 500 ms in duration with a rise time of 5 ms
(B) and time-expanded version (C). Arrows show a time interval of 3.9 ms
corresponding to one period of 256 Hz. Note that the amplitude modulation in the
stimulus temporal envelope is less dominant.
located ventrally to the A1, and BF increases with the depth of
the cell location (Reale and Imig, 1980).
This study analyzed cells with sustained response properties
so that a significant (P < 0.05) increase of the response spike
rate above the background level was maintained throughout the
stimulus period (0.5 s) of pure tone stimuli. These cells are tonic
and phasic--tonic cells in the previous classification (Chimoto
et al., 2002). A1 in awake cats also has phasic cells characterized
by onset/offset responses (Chimoto et al., 2002). We analyzed
only the sustained-response cells because our main interest
was in spectral- rather than temporal-cue coding. Sustained
response cells have a simple temporal firing pattern throughout
the stimulus period (Qin et al., 2003; Qin and Sato, 2004),
simplifying the spectral-cue analysis procedures. The sustainedfiring property, first observed in pure-tone stimuli, was preserved in the other stimulus paradigms as long as vigorous
responses were present (Fig. 1D--F).
During the recording of A1 cells, pure-tone stimuli were
presented with variable SF and intensity, constructing SFresponse functions (Fig. 1G). We estimated BF (dotted vertical
line labeled BF in Fig. 1G) and the best SPL of the cell (see
Materials and Methods for definition). We then presented
a harmonic complex tone at the best SPL to construct the F0response functions (Fig. 1H). All 102 recorded cells were tested
with the pure-tone and harmonic complex tone paradigms. At
least two parameters may characterize the F0-response functions: the maximum response height and the response envelope.
The former is the cell responsiveness parameter to the stimuli,
while the latter is the parameter of tuning properties to the F0.
Based on both parameters at the threshold of the half-driven rate
for BF (thin solid line in Fig. 1H), we identified two types of cells:
F0-sensitive (48 cells) and F0-insensitive cells (54 cells).
F0-sensitive Cells
F0-sensitive cells were defined by their responsiveness above
the threshold and non-monotonic tuning properties to the F0
(the response height at the lowest F0 tested was less than 75%
of the maximum height). Figure 1 shows the responses of
a representative F0-sensitive cell. The F0-response function of
the cell (Fig. 1H) was characterized by two F0 tuning peaks
more than the half-driven rate for BF (thin solid line in Fig. 1H),
separated by non-responsive regions <2 SD of the background
discharge level (interrupted line in Fig. 1H). To investigate the
relationship between the cell BF (Fig. 1G) and the peak
frequencies of the F0-response function (Fig. 1H), SF and F0
were normalized for BF as 1, constructing normalized SF- and
F0-response functions (Fig. 3B,G). Interestingly, two peaks
appeared at regions one octave apart: one F0 peak was on the
cell BF (F01st), while the other was at a frequency one octave
below (F02nd) in this particular cell. F0-response functions of
other example cells are also shown in Figure 3F,H. Generally,
the F01st peak was present in all F0-sensitive cells, while the
amplitude of the F02nd-peak varied from cell to cell, resulting in
a single F0 tuning peak (single-F0-tuning cell, 18 cells, Fig. 3F)
and double F0 tuning peaks (double-F0-tuning cells, 22 cells,
Fig. 3G). Some cells had a third F0-tuning peak (F03rd) in
addition to the F01st and F02nd peaks (multi-F0-tuning cells,
eight cells, Fig. 3H) in which F03rd tended to be around 0.25 or
0.33 3 BF. Figure 4 summarizes the above F0-peak characteristics by showing the distribution of peak frequencies in F0response functions of single- (Fig. 4A), double- (Fig. 4B) and
multi-F0-tuning cells (Fig. 4C).
F0-insensitive Cells
F0-insensitive cells have two subtypes: an example cell of the
first subtype, an energy-integrator cell (see Discussion for an
explanation of the name), is shown in Figure 3I. The cell was
broadly tuned to F0, and the response height tended to increase
with a decrease in F0, showing the maximum height of the F0response function to be at low F0. Thus, energy-integrator cells
were defined quantitatively to meet the criterion that the
response height at the lowest F0 tested (128 Hz) was >75% of
the maximum height. We identified 27 cells that meet this
criterion. It is not likely that the broadly tuned F0 responses of
the energy-integrator cells contribute to the identification of
pitch-height.
An example cell of the second subtype, the non-responsive
cell, is shown in Figure 3J. It showed no responsiveness to F0
(the response height was less than the threshold of the halfdriven rates for BF). We found 27 cells that did not respond to
harmonic complex tone stimuli.
Control Experiments Ruling Out Co-varying Parameters
In this study, harmonic complex tone is composed of F0 and
a number of integer multiples of F0 in the spectral range, at most
128--16 000 Hz. This harmonic complex tone can also be
regarded, from the point of view of energy distribution, as
a high-pass tone whose LEF corresponds to F0. Thus, the upward shift of F0 in this study results in increased LEF, decreased
spectral bandwidth, and a decreased number of harmonics
equivalent to a decrease in overall intensity. Thus, there are
three co-varying parameters associated with F0 shift of the
harmonic complex tones in this study: LEF, bandwidth and
overall intensity. It is important to rule out the possibility that
the F0-response function reflects the sensitivity to those covarying parameters. To rule out the LEF and bandwidth sensitivity, control experiments were designed by constructing
high-pass noise stimuli with similar LEF and bandwidth as the
harmonic stimuli. LEF varied systematically at the same step as
F0 in harmonic stimuli (compare Fig. 5A,F), constructing the
LEF-response function (Fig. 5H).
An example F0-sensitive cell responded to harmonic stimuli
when the F0 was the cell BF and half BF (Fig. 5B,C). Nevertheless, noise stimuli with similar LEF and bandwidth did not evoke
responses of the F0-sensitive cell (Fig. 5G,H), ruling out the
possibility that the F0-response functions in F0-sensitive cells
show tuning properties to the co-varying parameters of LEF
and bandwidth. Similar results were found in all 29 tested F0sensitive cells.
The overall intensity is the third co-varying parameter with F0
shift. In this study, the F0-response functions of F0-sensitive
cells oscillated between the maximum and background discharge level when F0 shifted from F01st to F02nd. This F0 shift
indicates double the number of harmonics, resulting in an
increase of 3 dB in overall intensity. It is not likely that the
activity of A1 cells changed drastically from the maximum to
zero with a shift of overall intensity of just 3 dB.
Although the response property of F0-insensitive cells to
high-pass noise stimuli was not our main interest, it is worthy of
mention. The energy-integrator cells (tested in 16 cells)
responded to noise stimuli (Fig. 5I) as well as harmonic complex
tone stimuli (Fig. 5D). The response envelopes were similar to
each other: both the F0- and LEF-response functions showed
broadly tuned responses and the response height tended to
increase with a decrease in F0/LEF, showing a function peak at
low F0/LEF. Thus, it is impossible for energy-integrator cells to
discriminate between harmonics and noise with similar bandwidth and energy distribution, that is, energy-integrator cells
are insensitive to harmonic structure.
Non-responsive cells (tested in 18 cells) responded to neither
the harmonic complex tone (Fig. 5E) nor high-pass noise
(Fig. 5J), that is, the non-responsive cells were insensitive to
both the harmonic structure and high-pass noise.
Cerebral Cortex September 2005, V 15 N 9 1375
Figure 3. F0-sensitive versus F0-insensitive cells. (A--O) Response functions of three F0-sensitive (left three columns) and two F0-insensitive cells (right two columns) for SF (A--E),
F0 (F--J) and S2F (K--O). The driven rate was normalized for the BF-response height as 1. SF, F0 and S2F were normalized for the cell BF as 1. For horizontal and vertical lines, see
Figure 1 legend. Spectral profiles of harmonic complex tones are shown at the bottom of F--J when the F0 5 1, 0.5 and 0.25 3 BF. (P--T) ES-F0 (solid line) and IS-F0 (dotted line)
functions of three F0-sensitive (P--R) and two F0-insensitive cells (S--T). Note that (i) F0-sensitive cells have narrow F0-tuning peaks (F--H) and alternating ES-F0 versus IS-F0
patterns between preferred and non-preferred F0s (P--R), while (ii) F0-insensitive cells show a wide F0-tuning (I) and ES dominant pattern at any F0s (S) or no F0-tuning (J) and IS
dominant pattern at any F0s (T).
Excitatory and Inhibitory Summation Patterns
Underlying F0 Sensitivity
This study has so far demonstrated the existence of a neuronal
population in A1 that shows vigorous responses to harmonic
tone only when the F0 of the harmonic tone is equal to the cell
BF and half of BF, and these harmonic tone-sensitive cells show
a suppression of such responses by the addition of nonharmonic frequency components, as exemplified by broadband
noises. This failure raises the possibility that broadband noise
itself has an inhibitory effect on the cells, such as non-harmonic
frequency components of the noise falling on the inhibitory
subfield of cell FRF.
To investigate the possible spectral excitation-inhibition
mechanism involved in harmonic-component sensitivity and
non-harmonic-component suppression, the excitatory and
inhibitory subfields of the cell FRF were investigated using
two-tone stimuli in addition to pure-tone stimuli. We presented
1376 Cortical Processing of Pitch Information Qin et al.
d
two-tone stimuli to 68 of the 102 cells, constructing S2Fresponse functions. We noted that the S2F-response function of
F0-sensitive cells showed inhibitory subfields restricted to two
spectral regions (Fig. 1I, 0.5--1.0 and 1.0--1.5 3 BF) between the
harmonics (0.5, 1.0 and 1.5 3 BF) of preferred F02nd (Fig. 1H).
This finding suggests that (i) preferred harmonics with F0 = 0. 5
and 1.0 3 BF have energy on the excitatory subfields but not on
the inhibitory subfields, activated by the spike generation, while
(ii) non-preferred harmonics, such as harmonics with F0 = 0.25,
0.75 and 1.25 3 BF must fall in the inhibitory subfields (0.5--1.0
and 1.0--1.5 3 BF), suppressed by spike generation. It is likely
that this characteristic excitation-inhibition pattern generates
cell-specific F0 sensitivity.
To investigate the above qualitative observation quantitatively, we analyzed the excitatory and inhibitory summation (ES
and IS) of FRF magnitude, respectively (see Materials and
Methods for details). ES and IS were calculated in different
Single-F0-tunig cells
15
A
18 cells
18 peaks
10
5
0
0
0.5
1
1.5
2
2.5
Double-F0-tuning cells
20
B
22 cells
Number of peaks
44 peaks
10
0
0
0.5
1
1.5
2
2.5
Multi-F0-tuning cells
10
C
8 cells
24 peaks
5
0
0
0.5
1
1.5
2
2.5
Normalized F0
Figure 4. Distribution of peak frequencies in F0-response functions in F0-sensitive
cells. Note that single- (A) double- (B) and multi-F0-tuning cells (C) tend to have peak
F0s around cell BF (normalized F0 5 1) and/or one octave below (0.5).
sampling frequencies corresponding to different F0s of harmonic tones. We then plotted ES and IS against F0, constructing
the ES-F0 function (Fig. 3P--T, solid lines) and IS-F0 function (Fig.
3P--T, dotted lines), respectively.
F0-sensitive cells had a characteristic pattern of ES versus IS
alternating between preferred and non-preferred F0s. For
example, in single-F0-tuning cells (Fig. 3F), ES was dominant at
only the preferred F0 (F0 = 1 3 BF) surrounded by IS dominant
regions at non-preferred F0s (Fig. 3P). In double-F0-tuning cells
Figure 5. Spectral similarity between harmonic and control noise stimuli and
response properties of F0-sensitive and F0-insensitive cells. (A--E) Spectral profiles
of 5 out of 125 trials of harmonic complex tone (A), raster display of F0-sensitive cell
during 125 harmonic complex tone stimuli (B), and F0-response functions of F0sensitive (C) and F0-insensitive cells (D and E). (F--J) Spectral profiles of 5 out of 125
trials of high-pass noise (F), raster display of F0-sensitive cell during 125 noise stimuli
(G), and LEF-response functions of F0-sensitive (H) and F0-insensitive cells (I and J).
The driven rate was normalized for the BF-response height as 1. F0 and LEF were
normalized for the cell BF as 1. For horizontal and vertical lines, see Figure 1 legends.
Note that (i) F0-sensitive cells show narrow tuning to F0 of the harmonics (C) but not
to LEF of the noise (H) in spite of similar cut-off frequencies and bandwidths (compare
A with F), and (ii) F0-insensitive cells show similar responses to both stimuli (compare
D with I; E with J).
Cerebral Cortex September 2005, V 15 N 9 1377
(Fig. 3G), ES was dominant at two preferred F0s (F0 = 0.5 and 1 3
BF), alternating with IS dominant regions at non-preferred F0s
(F0 = 0.25, 0.75 and 1.25 3 BF) (Fig. 3Q). Finally, in multi-F0tuning cells, similar alternating ES versus IS patterns between
preferred and non-preferred F0s were identified (Fig. 3R).
The population data of ES-F0 and IS-F0 functions for 33 F0sensitive cells are shown in Fig. 6A,D, respectively. To clarify the
difference between ES-F0 and IS-F0 functions in each cell, the
IS-F0 function was subtracted from the ES-F0 function, constructing the NS-F0 function (Fig. 6G). A positive NS indicated
that ES was dominant, while a negative NS indicated a dominant
IS. The NS-F0 function was thus constructed (Fig. 6G). NS was
positive at F0 = BF in all F0-sensitive cells and at F0 = 0.5 3 BF in
most F0-sensitive cells, and it was negative in other regions (F0 =
0.25, 0.75, 1.25, 1.5, 1.75, 2.0 and 2.25 3 BF).
In contrast, the energy-integrator cells had dominant ES over
IS at any F0, as shown in an individual example (Fig. 3S) and
population data (Fig. 6B,E,H), while the non-responsive cells
had dominant IS at any F0s as shown in the individual example
(Fig. 3T) and population data (Fig. 6C,F,I). Collectively, the
findings suggest that characteristic excitatory and inhibitory
summation patterns alternating between preferred and nonpreferred harmonics underlie F0 sensitivity.
Figure 6 also suggests that there is some continuity between
the cell classes. The excitatory curves are similar, with relatively
large responses at low F0 (when many harmonics lie within the
excitatory tuning curve of the neuron) and when F0 = BF. This is
particularly clear when comparing Figure 6A with Figure 6C.
Therefore, the difference between these classes is mainly due to
the strength of the inhibition, which is weak in energy-integrator
cells (Fig. 6E), medium in F0-sensitive cells (uncovering the
excitatory responses for F0 = BF and F0 = BF/2, Fig. 6D) and
strong in non-responsive cells (Fig. 6F). Furthermore, it seems
that the main difference between energy-integrator cells and
other cells is the width of their excitatory tuning curve, causing
low-frequency peaks in Figure 6B to be more prominent. In
contrast, F0-sensitive cells seem to have an inhibitory bandwidth
of less than about one octave, so the second harmonic at F0 = 0.5
3 BF can excite them. Thus, it seems that there is a continuum of
response properties between three cell classes.
Characteristics of Excitatory and Inhibitory Subfields
of FRF in Different-type Cells
In the previous section, we showed the difference between
excitatory and inhibitory summation patterns among different
cell types. Related to this, we are interested to see the
relationship of F0 sensitivity to these single- and two-tone
properties of cells, namely their excitatory and inhibitory halfheight bandwidths and a measure of the strength of two-tone
inhibition. The characteristics of FRF may contribute to the
cell’s excitatory and inhibitory summation patterns. For example,
the FRF of an F0-sensitive cell has a narrow excitatory
peak (Fig. 3A--C) and several narrow inhibitory troughs (Fig.
3K--M). Therefore, a slight difference in F0 could result in
a drastic change in summation (Fig. 3P--R). The FRF of an
energy-integrator cell has a broad excitatory peak (Fig. 3D)
Figure 6. Excitatory and inhibitory summation patterns. ES-F0 (A--C), IS-F0 (D--F) and NS-F0 (G--I) functions were constructed for 33 F0-sensitive cells (A, D and G), 14 energyintegrator cells (B, E and H) and 21 non-responsive cells (C, F and I). The function height was normalized for the BF-response height as 1. F0 was normalized for the cell BF as 1.
Note that all F0-sensitive cells have positive NS at F0 5 BF and most F0-sensitive cells have positive NS at F0 5 0.5 3 BF, while F0-insensitive cells have positive (H) or negative (I)
NS at all F0s.
1378 Cortical Processing of Pitch Information Qin et al.
d
and no apparent inhibitory trough (Fig. 3N). Accordingly, decreased F0 (increasing the density of summation components)
could cause a monotonic increase of excitatory summation,
while maintaining inhibitory summation at a low level (Fig. 3S).
On the other hand, a non-responsive cell has a deep and broad
inhibitory trough on FRF (Fig. 3O), meaning that the inhibitory
summation is always higher than excitatory summation (Fig. 3T).
To examine such a qualitative observation quantitatively, we
measured the bandwidths (BWs) of the excitatory peak and
inhibitory trough of FRF. The BW of excitatory peak was
measured at the half-height of each cell’s SF-response function.
The majority of our units (89/102) had a single peak in SFresponse function. Only seven F0-sensitive cells, four energyintegrator cells and three non-responsive cells had double or
three separate peaks higher than half the maximum amplitude in
the SF-response function. Thus, we evaluated the BW of the main
excitatory peak. The BW of the inhibitory trough was measured
at the S2F-response function of each tested cell. An inhibitory
trough was defined as a contiguous frequency space on the S2Fresponse function where the function was below the half-height
BF response level. As most of our cells, particularly F0-sensitive
cells, have multi-separated troughs on the S2F-response function,
we recorded the BWs of the three deepest inhibitory troughs for
each cell. The mean ± SD of measured BWs for each cell group are
shown in Table 1. As shown in representative cells, the BW of the
excitatory peak was broad in energy-integrator cells, medium
in non-responsive cells and narrow in F0-sensitive cells. The
difference in mean BWs between each group pair was statistically significant (P < 0.05). For the BW of inhibitory troughs, the
first deepest trough was significantly broad in non-responsive
cells, medium in F0-sensitive cells and narrow in energyintegrator cells. The second deepest trough of non-responsive
cells was also significantly broader than that of F0-sensitive and
energy-integrator cells. Note that of the 14 energy-integrator
cells tested with two-tone stimuli, only three cells had the
second trough on their S2F-response function and no cell had the
third trough. The BW of the third inhibitory trough was still
significantly broader in non-responsive cells than in F0-sensitive
cells. These quantitative results confirmed our qualitative
observation in the representative cells.
We further used an index, BI (see Materials and Methods for
the definition), to evaluated the balance of excitatory and
inhibitory strengths of FRF. When the excitatory and inhibitory
strengths of FRF are balanced, BI is zero. The mean ± SD of BI in
each cell group is shown in Table 1. BI in energy-integrator cells
was positive, indicating the dominance of excitation. BIs in F0sensitive and non-responsive cells tend to be negative, while the
latter deviated further from zero. The difference of mean BI
between each group pair was statistically significant (P < 0.05).
It appears that the inhibitory strength of FRF is strong in non-
responsive cells, medium in F0-sensitive cells and weak in
energy-integrator cells.
Analysis of the Suppression Mechanism by
Non-harmonic Frequency Components
Since the previous section demonstrated the bandwidth of the
excitatory (0.45 octaves) and inhibitory (0.44 octaves) subfields
of FRF in F0-sensitive cells, we can now analyze whether adding
non-harmonic components, such as 0.75, 1.25 and 1.75 3 F0 to
a given effective harmonic complex tone, will decrease the
response responsiveness of F0-sensitive neurons. This type of
interaction was demonstrated indirectly by our data in which
F0-sensitive cells could be driven by stimuli F0 = 0.5 3 BF but not
by stimuli F0 = 0.25 3 BF. Figure 3G (bottom) shows that adding
one non-harmonic component of 0.75 3 BF in the lower
inhibitory subfield and another non-harmonic component of
1.25 3 BF in the upper inhibitory subfield abolished cell responses to F0 = 0.5 3 BF. Similar effects were observed in other
cells (Fig. 3, the third column for F0 = 0.5 3 BF and the first
column for F0 = BF).
The findings suggest that the inhibition mechanism for the
addition of the non-harmonic component frequency is very
strong: adding only one non-harmonic component with the
same intensity as the harmonic component in each of two
inhibitory subfields completely abolishes the F0-sensitivity to
F0 = 0.5 3 BF and F0 = BF.
Spectral Components Essential for F0 Sensitivity
To elucidate the spectral components essential for F0 sensitivity, we performed the missing F0 paradigm, in which the virtual
F0 was fixed at the peaks of the F0 sensitivity curve while lower
harmonics, including physical F0, were removed; that is, lower
harmonics, including physical F0, were systematically eliminated, preserving the higher harmonics (Fig. 7A). Driven rates
were analyzed in relation to the lowest harmonic number of the
missing F0 stimuli (Fig. 7B,C). In an example cell (Fig. 7B), the
F01st responsiveness was abolished when the lowest harmonic
number was 2 (F0 was deleted; Fig. 7B, solid line) and F02nd
responsiveness was abolished when the lowest harmonic
number was 3 (both F0 and the second harmonic were deleted;
Fig. 7B, dotted line). Normalized mean activities in 36 F0sensitive cells are shown in Figure 7C. The response amplitudes
for the full-component harmonic stimuli decreased to less than
half, when the lowest harmonic number was 2 at F0 = F01st (Fig.
7C, solid line) and 3 at F0 = F02nd (Fig. 7C, dotted line). The
finding suggests that the F0 component is essential for evoking
F01st responses, and F0 and the second harmonic are essential
for evoking F02nd responses.
The finding also rules out the possibility that the difference
tone (f2--f1) equal to F0 is essential for F0 sensitivity, because
Table 1
Excitatory and inhibitory bandwidths and balance index of FRF in different types of cells
Cell type
F0-Sensitive
Energy-integrator
Non-responsive
BW of excitatory
peak (Octaves)
BW of inhibitory troughs (Octaves)
1st trough
2nd trough
3rd trough
BI
0.45 ± 0.24 (n 5 49)
1.29 ± 0.65* (26)
0.96 ± 1.02*y (27)
0.44 ± 0.28 (n 5 33)
0.06 ± 0.09* (14)
1.80 ± 1.31*y (21)
0.26 ± 0.17 (n 5 31)
0.08 ± 0.07 (3)
0.64 ± 0.53*y (15)
0.14 ± 0.08 (n 5 14)
— (0)
0.38 ± 0.17* (8)
0.17 ± 0.18 (n 5 33)
0.50 ± 0.27* (14)
0.49 ± 0.23*y (21)
Values are mean ± S. D. The value in the bracket shows the number of cells for each item. *: significantly different from F0-sensitive cells (ANOVA followed by Tukey test for pairwise comparisons, or
Student t-test; P\0.05). y: significantly different from energy-integrator cells (P\0.05).
Cerebral Cortex September 2005, V 15 N 9 1379
the responses to F0 changed with elimination of the lower
harmonics in spite of the constant difference tone. It is also not
likely that the combination tone (2f1--f2) equal to F0 is essential
for F0 sensitivity, because the combination tone for F0 and the
second harmonic is theoretically 0 Hz.
The finding that the response disappears after the removal of
one or two harmonics is not necessarily surprising — the
resulting stimulus has no energy within the excitatory region
of the neuron. This point reinforces the interpretation that
spectral integration is the major determinant of the responses to
these harmonic stimuli.
If F0 and the second harmonic components were essential
for F0 sensitivity as indicated by the missing F0 paradigm, the
harmonic stimuli with only the two lowest components (F0 and
the second harmonic) would generate an F0-response function
similar to that by full-component harmonic stimuli. The twocomponent harmonic paradigm (Fig. 7D) was performed in 5 of
the 36 cells tested by the missing F0 paradigm. As shown in
Figure 7E,F, the F0-response functions were invariant regardless
Figure 7. Spectral components essential for F0 sensitivity. (A) Spectral profiles of the
missing F0 paradigm. The bottom trace shows the profile of stimuli when the lowest
harmonic number is 1. The upper traces show the profiles of stimuli when the lowest
harmonic numbers are 2, 3, 4 and 5. (B, C) Driven rates plotted against the lowest
harmonic number in an example double-F0-tuning cell (B) and in population cells (C),
when F0s are F01st (solid line) and F02nd (dotted line). The response height at the
lowest harmonic number of 1 was used to normalize the mean driven rates in C.
Vertical bars in C show SD. (D) Spectral profiles of 5 out of 125 trials of twocomponent harmonic stimuli. (E, F) F0-response functions of an example cell for fullcomponent harmonics (E) and two-component harmonics (F). Note that (i) the
response was abolished by eliminating F0 when F0 was F01st and by eliminating F0 and
the second harmonics when F0 was F02nd (B, C), and (ii) the F0-response functions
were invariant whether the components were full or at only the lowest two (E, F).
1380 Cortical Processing of Pitch Information Qin et al.
d
of the number of the component frequencies (full in Fig. 7E or
two in Fig. 7F). The finding suggests that the two lowest
frequencies (F0 and the second harmonic) are sufficient for
evoking F0-sensitive responses in A1 cells. The finding also rules
out the possibility that the F0-response function shows sensitivity to the co-varying parameter of overall SPL, because the
amplitude of the F0-response function changed with F0 in spite
of the constant overall SPL.
Effects of Sound Level on F0 Sensitivity
We examined the dependence of F0-tuning characteristics on
sound level (range, 20--70 dB SPL) in 16 F0-sensitive cells. In all
the cells tested, the response characteristics were relatively
invariant across sound levels. Figure 8 shows example responses
of a representative cell. While changing the sound level resulted
in changes of driven rates, peak frequencies in SF-response
functions remained largely unchanged (Fig. 8A), and F0-tuning
envelopes with peaks corresponding to BF and one octave
below, which are specific for F0-sensitive cells at the best SPL,
also remained largely unchanged (Fig. 8B) across a wide range
of sound levels tested. The finding shows that although
changing the sound level may change the peak driven rate, it
Figure 8. Effects of sound level on F0 sensitivity. SF- (A) and F0-response functions
(B) at six different intensities of 20, 30, 40, 50 and 60 dB SPL were plotted together to
illustrate the similarity between the response functions of different sound levels. For
horizontal and vertical lines, see Figure 1 legend.
did not result in significant change of F0-sensitivity characteristics of F0-sensitive cells.
Discussion
In this study, harmonic complex tone stimuli with less prominent temporal periodicity (Fig. 2) were presented during the
recording of low-BF single cells in the caudal part of A1,
investigating neural processing of the spectral cue of pitchheight. We found F0-sensitive and F0-insensitive cells based on
the F0-response functions (Fig. 3F--J). We further investigated
the underlying neural mechanism with the use of a two-tone
paradigm (Fig. 3K--O). Each cell type had a distinct spectralinhibition pattern (Fig. 3P--T), suggesting that F0 sensitivity
correlates with the spectral inhibition pattern.
F0-sensitive Cells
F0-sensitive cells were sensitive to the harmonics with given F0s
but not to noise with similar bandwidth and energy distribution
(Fig. 5C,H). They usually preferred two F0s one octave apart:
one, corresponding to the cell BF (F01st) and the other, one
octave below BF (F02nd) (Fig. 4). It is suggested that (i) the
caudal part of A1 is well organized for the detection of
harmonics by specific F0-sensitive cells; (ii) F0-sensitive cells
do not discriminate between the spectral pitch-height of pure
tone and virtual pitch-height of harmonics; and (iii) pitch-height
is represented along with the tonotopic axis in A1.
Pitch-height sensitivity may originate from the characteristic
FRF pattern alternating between excitation and inhibition
(predominant excitation on preferred harmonic frequencies
and predominant inhibition on non-preferred frequencies)
(Fig. 6A,D,G). This sensitivity seems to be derived, rather than
primary, in these cells, that is, the excitatory and inhibitory
subfields of FRF of F0-sensitive cells are specifically organized
for detection of the harmonics: the inhibitory subfields are
narrowly (0.44, 0.26 and 0.14 octaves for the deepest three
troughs) located adjacent to the excitatory subfield (Fig. 3K--M),
which allows the cell to excite only when the component
frequency falls on the excitatory subfield but not on inhibitory
subfields. Only harmonic complex tones, whose F0 are equal to
the cell BF or half of BF, fit the excitation criterion of the cell.
The addition of only one non-harmonic frequency component
in each of the two inhibitory subfields completely abolishes the
F0 sensitivity to F0 = 0.5 3 BF and F0 = BF. It is not surprising that
broadband noise, which has a number of additional nonharmonic-component frequencies on the inhibitory subfields,
fits the inhibition criterion of the cell. Thus, given the excitatory
and inhibitory fields of these neurons, it is conceivably possible
to design a special stimulus composed of a number of artificial
tonal components, which are not necessarily harmonically
related, and that would be more efficient than any harmonic
complexes. Theoretically, the inhibitory BW of 0.44 octaves can
pass the second and the third harmonics falling, respectively, at
the low and the high edges of the inhibitory subfield.
F0-insensitive Cells
Energy-integrator cells responded to both harmonic and highpass noise stimuli as far as the sound energy on FRF (Fig. 5D,I).
The driven rates tended to increase with the decrease in F0/
LEF, which was equivalent to the increase in the number of
spectral components on FRF. These data suggest that energyintegrator cells are not sensitive to pitch-height but are sen-
sitive to static sounds as far as the sound energy on FRF. This is
why the cells are named energy-integrator cells. The response
characteristics of energy-integrator cells may originate from
relatively wide bandwidth of the excitatory subfields of FRF
(1.29 octaves) and less dominant spectral inhibition (Fig.
6B,E,H). The absence of inhibitory subfields of FRF (Fig. 3N)
allows the energy-integrator cells to integrate the sound energy
on the excitatory subfield of FRF.
In contrast to comprehensive energy-integrator cells, nonresponsive cells showed responses to neither the harmonic
complex tone nor high-pass noise (Fig. 5E,J). This may originate
from dominant spectral inhibition on component frequencies of
the harmonics and noise (Fig. 6C,F,I); that is, a wideband (1.8
octaves) inhibitory subfield on the higher frequency side of the
excitatory subfield (Fig. 3O) does not allow non-responsive cells
to respond to the harmonics and high-pass noise. Theoretically,
inhibitory BW of 1.80 octaves can pass two frequency components, in which the lower component is on the low edge of the
inhibitory subfields and the higher component is >3.5 times
higher than the lower component.
Relation of this Study to Previous Studies
In this study, some cells (7 of 49 F0-sensitive cells, 14.3%; 4 of 26
energy-integrator cells, 15.4%; and 2 of 27 non-responsive cells,
7.4%) had multiple peaks more than half of the BF response
height in the SF-response function. Such multi-peaked cells
were previously reported by Sutter and Schreiner (1991) and
Kadia and Wang (2003). By observing the response facilitation
in the two-tone paradigm for the harmonic second tone in the
multipeaked neurons, Kadia and Wang (2003) suggested that
the multi-peaked neurons have functional significance for
extracting harmonic components embedded in complex
sounds. This study supports the suggestion of Kadia and Wang
because the majority (7 of the 13 multipeaked neurons) of the
multipeaked recorded neurons in this study had sensitivity to
the harmonic complex tone.
As mentioned in the introduction, Schwartz and Tomlinson
(1990) investigated the responses of A1 cells to the harmonic
complex tone in monkeys. They found ‘F0 neurons’ whose
frequency tuning for pure tones is similar to that for F0s of
harmonic complex tones. However, their ‘F0 neurons’ were
regarded as ‘not’ encoding F0 because there was no response to
missing F0 stimuli but responses to both noise and harmonic
complex tones, that is, they found no cells that were sensitive to
the pitch-height of harmonics. The negative result may be
inconclusive, rather than a strong indication of a lack of F0sensitive cells in monkey A1, because despite studying ‘pitch’,
Schwarz and Tomlinson used ‘white noise’ (a sound lacking
pitch) as their search stimulus, resulting in a bias against finding
F0-sensitive cells. This study has shown that F0-sensitive cells
can distinguish between harmonics and noise, that is, they
respond to harmonics but not to noise (Fig. 5C,H), suggesting that the noise-search stimulus prevented Schwarz and
Tomlinson from recording F0-sensitive A1 cells that did not
respond to white noise. This study revealed that (i) A1 has cells
sensitive to F0 of the harmonics but not to noise; and (ii) those
F0-sensitive cells in A1 respond to harmonics missing F0 when
F0 was one octave below the cell BF (dotted lines in Fig. 7B,C).
In this study, we estimated the excitatory and inhibitory
subfields of FRF from single- and two-tone stimuli, respectively,
and succeeded in explaining the neural responses to the
harmonic complex tone on the basis of the excitation-inhibition
Cerebral Cortex September 2005, V 15 N 9 1381
balance. Our procedures can be justified by the findings of
Nelken et al. (1994a,b), who reported in cat A1 that single-tone
effects and two-tone interactions are sufficient to explain the
A1 neuron responses to multi-tone stimuli, such as four-tone
and nine-tone complexes. In this study, A1 cells might act as
a spectral filter with pass-band and reject-band, which correspond to the peak and trough in FRF, respectively. Thus, all
aspects of the responses to the harmonic series of tones are
immediate consequences of the spectral filter, including the
narrowing of the F0-response peak when the second harmonic
coincides with the cell BF and the differential responses
between noise stimuli and harmonic stimuli for F0-sensitive
cells, the broadly tuned and sloping F0-responses for energyintegrator cells with no inhibitory sideband and the weakness of
F0 response for non-responsive cells with frequency-unspecific
inhibitory sidebands. The fundamental point of view that neural
activities of A1 can be explained from the spectral integration of
stimulus energy on FRF is in good accordance with the previous
reports of Schwarz and Tomlinson (1990) and Fishman et al.
(1998, 2000) that studied A1 neural responses to harmonics.
More generally, our finding is in accord with previous studies
in the sense that spectral filtering mechanisms modify information
processing of the central auditory system such as tonal frequency
tuning (Katsuki et al., 1958; Greenwood and Maruyama, 1965),
duration tuning (Casseday et al., 1994), amplitude-modulation
frequency tuning (Liang et al., 2002), frequency-modulation
direction-selectivity (Zhang et al., 2003), spectral-edge sensitivity
(Qin et al., 2004a) and spectral-shape preference (Qin et al.,
2004b). This study has provided evidence demonstrating that the
spectral filtering mechanism is also involved in the discrimination
between harmonics and noise.
This study shows that pitch-height is represented along with
the tonotopic axis. Our findings suggest that a harmonic
complex tone with a given F0 such as 440 Hz would activate
double-F0-tuning cells in two regions along the tonotopic axis
in A1: one on the tonotopic axis of 880 Hz and the other on an
octave below (440 Hz). Furthermore, a harmonic complex tone
missing the F0 of 440 Hz would activate double-F0-tuning cells
only on the tonotopic axis of 880 Hz, and harmonics missing
both F0 and the second harmonics would not activate the
double-F0-tuning cells. Thus, our findings of pitch-height representation along with the tonotopic axis can be applied only
for the harmonic complex tones, including relatively lower
harmonics. Our findings neither support nor reject the previous
findings (Schulze and Langner, 1997; Schulze et al., 2002),
suggesting an independent arrangement of pitch and BF. They
focused on the ‘temporal-cue’ mechanism and used amplitude
modulated tones with 100% depth of modulation and with only
three spectral components far away from the cell FRF, whereas
we focused on the ‘spectral-cue’ mechanism and used harmonic
complex tones with less-dominant envelope periodicity and
with spectral components near the cell FRF.
Correlation with Psychophysics
As discussed in the previous section, F0-sensitive cells in A1 were
driven as the consequence of the spectral filter function.
Nevertheless, very interestingly, the observed behavior of
F0-sensitive cells reflects some psychophysical properties of
harmonic complexes. First, the octave-response characteristic
of F0-sensitive cells explains ‘pitch-chroma’. Pitch has two
perceptual dimensions: the ‘vertical dimension’ of pitch-height
from low to high and the ‘circular dimension’ of pitch-chroma,
1382 Cortical Processing of Pitch Information Qin et al.
d
which is helical pitch perception returning to a similar pitch
perception at a pitch-height elevation one octave above (Warren
et al., 2003). In Western music, the same ‘note name’ is given at
pitch-heights one octave apart. For example, note names of both
440 and 880 Hz are ‘A’ in English and American music. This study
has shown that most F0-sensitive cells have two F0-tuning peaks:
one corresponding to the cell BF and the other to the frequency
one octave below (Fig. 4). This ‘octave’-response property of the
F0-sensitive cells in A1 may imply that the pitch-chroma is
created in the central nervous system of A1, which is well
organized to discriminate between harmonic sounds and noise
sounds. If it is supposed that there are two harmonic complex
tones with different F0s one octave apart, such as 880 and 440 Hz,
and an F0-sensitive A1 cell with BF at 880 Hz, then the two
harmonic stimuli would generate similar vigorous spike activities
of the cell, underlying the perception of the pitch-chroma. For
the first time, we have presented physiological evidence supporting the interpretation that the pitch-chroma has relevance
to the neural response characteristics in A1 organized for
detection of the harmonic structure of the complex tone.
Second, the level invariance of the preferred pitch-height of
F0-sensitive cells (Fig. 8) may explain the level invariance of
the pitch-height of the harmonic complex tone (Terhardt,
1979). Third, F0-sensitive cells reported in this study responded
to pitch stimuli simply throughout the stimulus period without
prominent adaptation (Fig. 1E). This time--response property
explains the psychophysical phenomenon of pitch perception
that hardly adapts. Thus, we close a knowledge gap between
psychophysics and physiology for complex features of pitch
perception that are universally found in harmonic sound.
However, the response features of F0-sensitive cells in A1 are
not always consistent with the psychophysical findings of pitch.
The third to fifth harmonics play a dominant role in pitch-height
perception (Ritsma, 1967), while the two lowest components
(F0 and the second harmonic) are essential for generating the
responses of F0-sensitive cells of A1 in this study (Fig. 7). Thus,
the encoding of pitch information in A1 remains in the spectral
filtering level but not in the perfect perceptual level. We suggest
that neural processing for pitch perception should be completed in higher-order auditory cortical fields. This should be
clarified in future studies.
Notes
We thank N. Yaguchi for technical assistance. The work was supported
by a grant from the Ministry of Education, Science, Culture, Sports and
Technology, Japan.
Address correspondence to Yu Sato, Department of Physiology,
Interdisciplinary Graduate School of Medicine and Engineering,
University of Yamanashi, Tamaho, Yamanashi 409-3898, Japan. Email:
[email protected].
References
Bernstein JG, Oxenham AJ (2003) Pitch discrimination of diotic and
dichotic tone complexes: harmonic resolvability or harmonic
number? J Acoust Soc Am 113:3323--3334.
Burns EM, Viemeister NF (1976) Nonspectral pitch. J Acoust Soc Am
60:863--869.
Casseday JH, Ehrlich D, Covey E (1994) Neural tuning for sound
duration: role of inhibitory mechanisms in the inferior colliculus.
Science 264:847--850.
Chimoto S, Kitama T, Qin L, Sakayori S, Sato Y (2002) Tonal response
patterns of primary auditory cortex neurons in alert cats. Brain Res
934:34--42.
Cohen MA, Grossberg S, Wyse LL (1995) A spectral network model of
pitch perception. J Acoust Soc Am 98:862--879.
Cynx J, Shapiro M (1986) Perception of missing fundamental by a
species of songbird (Sturnus vulgaris). J Comp Psychol 100:
356--360.
Fastl H, Stoll G (1979) Scaling of pitch strength. Hear Res 1:293--301.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (1998) Pitch vs.
spectral encoding of harmonic complex tones in primary auditory
cortex of the awake monkey. Brain Res 786:18--30.
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (2000) Complex tone
processing in primary auditory cortex of the awake monkey. II. Pitch
versus critical band representation. Acoust Soc Am 108:247--262.
Greenwood DD, Maruyama N (1965) Excitatory and inhibitory response
areas of auditory neurons in the cochlear nucleus. J Neurophysiol
28:863--892.
Heffner H, Whitfield IC (1976) Perception of the missing fundamental
by cats. J Acoust Soc Am 59:915--919.
Houtsma AJM (1984) Pitch salience of various complex sounds. Music
Percept 1:296--307.
Houtsma AJ, Wicke RW, Ordubadi A. (1980) Pitch of amplitudemodulated low-pass noise and predictions by temporal and spectral
theories. J Acoust Soc Am 67:1312--1322.
Kadia SC, Wang X (2003) Spectral integration in A1 of awake primates:
neurons with single- and multipeaked tuning characteristics.
J Neurophysiol 89:1603--1622.
Kaernbach C, Demany L (1998) Psychophysical evidence against the
autocorrelation theory of auditory temporal processing. J Acoust Soc
Am 104:2298--2306.
Katsuki Y, Sumi T, Uchiyama H, Watanaba T (1958) Electric responses of
auditory neurons in cat to sound stimulation. J Neurophysiol
21:569--588.
Langner G, Sams M, Heil P, Schulze H. (1997) Frequency and periodicity
are represented in orthogonal maps in the human auditory cortex:
evidence from magnetoencephalography. J Comp Physiol A
181:665--676.
Liang L, Lu T, Wang X (2002) Neural representations of sinusoidal
amplitude and frequency modulations in the primary auditory cortex
of awake primates. J Neurophysiol 87:2237--2261.
Nelken I, Prut Y, Vaddia E, Abeles M (1994a) Population responses to
multifrequency sounds in the cat auditory cortex: one- and twoparameter families of sounds. Hear Res 72:206--222.
Nelken I, Prut Y, Vaddia E, Abeles M (1994b) Population responses to
multifrequency sounds in the cat auditory cortex: four-tone complexes. Hear Res 72:223--236.
Pantev C, Hoke M, Lutkenhoner B, Lehnertz K (1989) Tonotopic
organization of the auditory cortex: pitch versus frequency representation. Science 246:486--488.
Pantev C, Elbert T, Ross B, Eulitz C, Terhardt E (1996) Binaural fusion
and the representation of virtual pitch in the human auditory cortex.
Hear Res 100:164--170.
Patel AD (2003) Language, music, syntax and the brain. Nat Neurosci
6:674--681.
Plomp R (1967) Pitch of complex tones. J Acoust Soc Am 41:1526--1533.
Qin L, SatoY (2004) Suppression of auditory cortical activities in awake
cats by puretone stimuli. Neurosci Lett 365:190--194.
Qin L, Kitama T, Chimoto S, Sakayori S, Sato Y (2003) Time course of
tonal frequency-response-area of primary auditory cortex neurons in
alert cats. Neurosci Res 46:145--152.
Qin L, Sakai M, Chimoto S, SatoY (2004a) Spectral-edge sensitivity of
primary auditory cortex neurons in alert cats. Brain Res 1014:1--13.
Qin L, Chimoto S, Sakai M, SatoY (2004b) Spectral-shape preference of
primary auditorycortex neurons in awake cats. Brain Res 1024:
167--175.
Reale RA, Imig TJ (1980) Tonotopic organization in auditory cortex of
the cat. J Comp Neurol 192:265--291.
Renken R, Wiersinga-Post JE, Tomaskovic S, Duifhuis H (2004) Dominance of missing fundamental versus spectrally cued pitch: individual
differences for complex tones with unresolved harmonics. J Acoust
Soc Am 115:2257--2263.
Ritsma RJ (1967) Frequencies dominant in the perception of the pitch
of complex sounds. J Acoust Soc Am 42:191--198.
Schulze H, Hess A, Ohl FW, Scheich H (2002) Superposition of
horseshoe-like periodicity and linear tonotopic maps in auditory
cortex of the Mongolian gerbil. Eur J Neurosci 15:1077--1084.
Schulze H, Langner G (1997) Periodicity coding in the primary auditory
cortex of the Mongolian gerbil (Meriones unguiculatus): two
different coding strategies for pitch and rhythm? J Comp Physiol A
181:651--663.
Schwarz DW, Tomlinson RW (1990) Spectral response patterns of
auditory cortex neurons to harmonic complex tones in alert monkey
(Macaca mulatta). J Neurophysiol 64:282--298.
Shouten JF (1962) Pitch of the residue. J Acoust Soc Am 34:1418--1424.
Sutter ML, Schreiner CE (1991) Physiology and topography of neurons
with multipeaked tuning curves in cat primary auditory cortex.
J Neurophysiol 65:1207--1226.
Terhardt E (1974) Pitch, consonance, and harmony. J Acoust Soc Am
55:1061--1069.
Terhardt E (1979) Calculating virtual pitch. Hear Res 1:155--182.
Tomlinson RW, Schwarz DW (1988) Perception of the missing fundamental in nonhuman primates. J Acoust Soc Am. 84:560--565.
Warren JD, Uppenkamp S, Patterson RD, Griffiths TD (2003) Separating
pitch chroma and pitch height in the human brain. Proc Natl Acad
Sci USA 100:10038--10042.
Whitfield IC (1980) Auditory cortex and the pitch of complex tones.
Acoust Soc Am 67:644--647.
Zatorre RJ (1988) Pitch perception of complex tones and human
temporal-lobe function. J Acoust Soc Am 84:566--572.
Zhang LI, Tan AY, Schreiner CE, Merzenich MM (2003) Topography and
synaptic shaping of direction selectivity in primary auditory cortex.
Nature 424:201--205.
Cerebral Cortex September 2005, V 15 N 9 1383