Perceptual chunking in the self-produced songs of Bengalese

Anim Cogn (2010) 13:515–523
DOI 10.1007/s10071-009-0302-4
ORIGINAL PAPER
Perceptual chunking in the self-produced songs of Bengalese
Wnches (Lonchura striata var. domestica)
Rie Suge · Kazuo Okanoya
Received: 20 August 2009 / Revised: 1 December 2009 / Accepted: 3 December 2009 / Published online: 29 December 2009
 Springer-Verlag 2009
Abstract Like humans, songbirds, including Bengalese
Wnches, have hierarchical structures in their vocalizations.
When humans perceive a sentence, processing occurs in
phrase units, not words. In this study, we investigated
whether songbirds also perceive their songs by chunks
(clusters of song notes) rather than single song notes. We
trained male Bengalese Wnches to react to a short noise in a
Go/NoGo task. We then superimposed the noise onto
recordings of their own songs and examined whether the
reaction time was aVected by the location of the short noise,
that is, whether the noise was placed between chunks or in
the middle of a chunk. The subjects’ reaction times to the
noise in the middle of a chunk were signiWcantly longer
than those to the noise placed between chunks. This result
was not observed, however, when the songs were played in
reverse. We thus concluded that Bengalese Wnches perceive
their songs by chunks rather than single notes.
Keywords Segmentation · Bird song · Phrase structure ·
Operant conditioning · Vocal learning
R. Suge · K. Okanoya
PRESTO, Japan Science and Technology Corporation,
4-1-8, Honcho, Kawaguchi 332-0012, Japan
R. Suge · K. Okanoya
Faculty of Letters, Chiba University, 1-33 Yayoi-cho,
Inage-ku, Chiba 263-0022, Japan
R. Suge
Department of Physiology, Saitama Medical University,
38 Morohongo, Moroyama, Saitama 350-0495, Japan
K. Okanoya (&)
RIKEN Brain Science Institute, 2-1 Hirosawa,
Saitama 351-0195, Japan
e-mail: [email protected]
Introduction
Bird songs are thought to be excellent models for studying
the mechanisms of human language. Bird songs and human
languages share several features in their development such
as having a critical period in early life, imitation of adults,
and innate predisposition for conspeciWcs (for a review, see
Doupe and Kuhl 1999). The structure of bird songs, with
hierarchical organization and syntactical control, also
shows similarities to human language (Okanoya 2004a).
The necessary abilities underlying these common features,
like vocal learning, have been investigated in many comparative studies.
Human language is organized in a hierarchical structure
in which phonemes form words, words form phrases, and
phrases form sentences (JackendoV 2002). When we listen
to speech, we segment the continuous stream of sound into
smaller units such as phrases. In other words, we chunk single words into larger units, phrases. Fodor and Bever
(1965) provided empirical evidence for this ability by presenting the sound of a click to subjects while they listened
to a spoken sentence; the subjects were then asked to state
where the click was located in the sentence. Many of the
subjects placed the click at boundaries of constituents, such
as phrases, even though the click was actually distributed
evenly throughout the sentence. Thus, humans seem to process spoken words using phrase structures as units. In the
perception of bird songs, do songbirds also process their
songs by segmenting them into structural subunits?
Bird songs can be classiWed on a spectrum of song complexity (Okanoya 2004a). For example, zebra Wnches sing
songs in which the order of the song notes is relatively
Wxed. In this type of song, a combination of a few notes
(but sometimes only one note) forms a song syllable, and a
number of syllables form a song phrase. A stereotyped
123
516
Fig. 1 Example of song analysis. a Example of the sonogram from a !
subject’s song. Axes indicate frequency and time, and each note type is
symbolized by a letter. The song was coded as a sequence of letters.
b Transition diagram extracted from the 20 song samples of the subject
shown in a. Circles with letters indicate notes, and the arrows show the
transition to the following notes. Width of the arrow indicates the frequency of a transition to the next note relative to the number of total
transitions. Each dotted circle shows a chunk deWned in this experiment. Note “e” was omitted from the diagram because it had a low
appearance probability. c Example of a “Typical song”. Based on the
diagram, part of a song was extracted and its structure determined.
d Examples of song stimuli. Solid lines placed under each sonogram
indicate the chunk structures. In the case of REV, the dotted lines indicate the structures deWned as chunks in BOS. Triangles represent the
placement of noises in the stimuli
Anim Cogn (2010) 13:515–523
a kHz
a
b
b
c
c
d
10
8
6
4
2
0.1
10
0.2
0.3
e
0.4
0.5
a
0.6
0.7
8
6
4
2
b
0.8
10
0.9
c
c
1.0
1.1
c
1.2
d
1.3
f
1.4
g
8
6
4
2
series of syllables within a song is known as a “motif”
(Sossinka and Bohner 1980). Other widely studied oscines,
including the white crowned sparrow, song sparrow, and
swamp sparrow, also sing songs with stereotyped series of
syllables. The order of song notes in the songs of Bengalese
Wnches varies, and there are also other species with varying
song orders, such as nightingales, starlings, and willow
warblers (Okanoya 2004b). The songs of Bengalese Wnches
have been described as having Wnite-state grammar, which
is deWned as the transition probabilities between several
states (Honda and Okanoya 1999; Okanoya 2004b). In Bengalese Wnches, 2–5 song notes form a unit, and each unit is
produced at a particular state transition. The transition pattern is not Wxed because one note can be followed by several possible notes and may include a repeat of the same
note. Some sequential notes form a “chunk” in which notes
have a stereotyped order with occasional variations in note
repetition, and any note can be used in other chunks as well.
Furthermore, because the order of chunks is also not Wxed,
one chunk can be followed by several diVerent chunks.
Song structure can be expressed as a Markov model of
note-to-note transitions (see Fig. 1b for an example). The
degree of song sequential complexity in Bengalese Wnches
allows statistical analysis (Hosino and Okanoya 2000;
Okanoya 2004b), and this makes Bengalese Wnches ideal
subjects for the study of perceptual chunking of song notes.
In song learning, young zebra Wnches have been
observed to copy a series of song notes as a chunk rather
than single elements from their tutors (ten Cate and Slater
1991; Williams and Staples 1992). Hultsch and Todt (1989)
showed that nightingales also copied temporally consistent
groups of songs from their tutors’ songs as a “package”. In
song production, two approaches have been used to Wnd
song units. Cynx (1990) Xashed lights at singing zebra
Wnches and found that interruptions often occurred after
each syllable, but not within syllables. This result was
123
h
10
i 1.7
1.6
j
j 1.9
1.8
2.0
j
2.1
j
8
6
4
2
10
j
2.3
a 2.4
2.5
8
2.6
b
b
c 2.7
c
2.8
2.9
d
6
4
2
b
3.0
3.1
3.2
c
d
BOS
(NOGO)
BOS
(GO/IN)
BOS
(GO/OUT)
REV
(GO/IN)
chunk
noise
3.3
3.4
3.5
3.6
s
Anim Cogn (2010) 13:515–523
conWrmed by analyzing their respiratory patterns (Franz
and Goller 2002). In nightingales, Xashing lights were
found to interrupt singing, with most stops occurring either
during silent portions of songs or 1–3 elements after the
Xash of light (Riebel and Todt 1997). The researchers suggested that the strength of association between elements
can vary. The other approach is based on interfering with
the neural circuits that underlie song production. Peripheral
lesions in the song system have been found to result in the
deletion of one part of a song that was a chunk of song
notes in zebra Wnches (Williams and McKibben 1992).
Electrophysiological stimulation (Vu et al. 1994) and direct
recording (Yu and Margoliash 1996) of song nuclei, HVC
(used as a proper name), and robust nucleus of the arcopallium (RA) have all indicated a hierarchical organization of
motor pathways in zebra Wnches in which the HVC is
responsible for syllable sequences while the RA represents
individual syllables.
Thus, although it is known that songbirds use chunks as
units for song learning, empirical results regarding the production of songs have been inconclusive and appear to vary
depending on the species. Are songbirds really using chunks
to hear their songs? In the present study, using the term
“chunk” instead of “phrase”, which better represents human
language (Hultsch et al. 1999), the “chunking”, or auditory
segmentation, ability of songbirds was examined via an
experiment similar to that conducted by Fodor and Bever.
To determine whether songbirds can chunk their songs
during the process of perception, we used an operant-conditioning technique when replicating Fodor and Bever’s
experiment. Subject birds were trained to react to a short
noise as soon as possible. Reaction time to the noise was
then measured when a song, rather than spoken words, was
played as the background to the noise. We hypothesized
that the reaction time to a noise placed inside a chunk
would be longer than that placed outside of the chunk on
the assumption that the songbirds would process the noise
placed inside the chunk after the auditory processing of the
chunk itself was completed. Furthermore, we hypothesized
that when the noise was shifted to the end of the chunk,
then the reaction would be delayed in accordance with the
time between the noise and the end of the chunk.
Methods
Subjects
Five male Bengalese Wnches (Lonchura striata var. domestica) kept in aviaries at Chiba University (a constant light–
dark cycle 13:11 h) were used as subjects. The subjects
were selected from Wve diVerent families to avoid using
similar types of songs, as this would make it diYcult for the
517
birds to recognize their own songs and the songs of other
conspeciWcs. Each song had unique notes, and the transition
patterns were varied and recognizable by the researchers.
The birds were housed individually but could see each
other and hear each other’s songs and calls. For operant
conditioning, each subject’s weight was controlled by feeding time, which was gradually reduced from 24 h to 2 h per
day, except for 24-h feeding 1 day per week (Ikebuchi and
Okanoya 2000). Each subject’s weight and condition were
checked at 9:00 AM every morning.
Song recording and analysis
Each bird’s song was recorded and analyzed to Wnd chunk
structures. In our experiment, only “undirected songs”,
which were produced by males unable to sense any females
close by, were used as a stimulus. It has been reported that
undirected songs observed in the zebra Wnch are structurally more variable than directed songs (Sossinka and Bohner 1980; Walters et al. 1991).
Stimuli were obtained by placing a subject in a small
wooden cage (115 cm£ 185 cm£ 150 cm) situated in a
soundproof room. Songs were recorded using a condenser
microphone (ECM-MS957, Sony, Tokyo, Japan) and a digital audio tape recorder (DTC-ZA5ES, Sony, Tokyo,
Japan). When the interval between two consecutive notes
was longer than 3 s, the silence was regarded as signaling
the end of the Wrst song. We collected at least 20 songs
from each subject from which to extract stimuli.
The recorded songs were analyzed using Avisoft-SAS
Lab Pro (Avisoft, Berlin, Germany) to produce sonograms
(Fig. 1a). To analyze the syntactical structure of a song, we
followed the same procedure as that described by Honda
and Okanoya (1999). In brief, the song notes were categorized by visual inspection for distinct groups, and each note
type was represented by a letter. Thus, each song was represented by a sequence of letters. These sequences (comprising the 20 songs from each subject) were analyzed using
Mnemic (CogniTom Academic Design Inc., Chiba, Japan)
to calculate the transition probability of one note changing
to another. When the number of collected songs exceeded
20, the 20 longest songs were selected from the collected
samples. Figure 1b shows one of the song syntaxes analyzed using the Mnemic software. Notes and transitions
with low probabilities of occurrence were omitted from the
diagram for simpliWcation (e.g., we omitted note “e”
because it was recorded only four times out of a total of 864
notes in 20 sample songs). The width of the line of each
arrow shows the probability of a transition. For example,
the probabilities of the various transitions for note “d” are
as follows: “a” (9%), “e” (2%), “f” (75%), “h” (9%), and
“end” (5%). The transition lines “d” to “e” and “d” to “end”
were omitted from the diagram because of their relatively
123
518
low (·5%) transition probabilities. From this data, compared with notes “b” and “c” with single output arrows
(with the exception of “self-transition”), it can be assumed
that “d” is both a branching point in this song structure and
the last note of the Wrst chunk. Thus, a typical transition
pattern for each song can be obtained via this analysis. We
selected a song that had the typical transition pattern
extracted as above. This song was termed the “typical
song” and subsequently used as a source of stimuli (see
Fig. 1c). Because the songs of Bengalese Wnches are nonWxed, not all song samples have a deWned chunk structure.
To use natural songs, therefore, we chose a song sample
with chunks (from the 20 songs used for analyzing song
structure) rather than a song stimulus generated artiWcially
from several song samples.
Apparatus
The subjects were trained and tested in a small wire cage
placed in a sound attenuation box. Attached to the cage was
a panel with two sensitive micro-switches: the left switch
was attached to a green light-emitting diode (LED) and the
right to a red LED. The green switch was used as the observation key and the red switch the report key. The birds
could Xip the switches by pecking the LED. A standard
pigeon grain hopper was placed under the panel. A wooden
perch was positioned in front of the hopper opening, with
the subjects able to reach the food from the perch.
Training and testing
We employed operant conditioning with standard “Go/NoGo”
training. At the start of a session, the observation key
was illuminated until the bird pecked it. Fifty milliseconds after the pecking, either the Go stimulus or the
NoGo stimulus was presented with the illuminated red
LED (report key).
The training stimuli were a 3 or 5 kHz 1.6-s pure tone
played with the noise (Go stimulus) and the pure tone without the noise (NoGo stimulus). The noise had a white (Xat)
spectral characteristic and lasted 15 ms with a rise/fall time
of 5 ms. We used Wve types of Go stimuli: pure tones with
the noise delivered at 300, 600, 900, 1,000 or 1,500 ms
after the start of the pure tone. Each sound stimulus had a
rise/fall time of 5 ms. During training, to prevent the subjects from learning to react to a given type of sound rather
than the noise itself and to prevent them from learning any
relationship between the noise and the structure of the
background sound, we used sine wave stimuli that did not
contain a chunk-like structure. Only the reactions (report
key pecking) within the Wrst 1 s of noise being presented
were reinforced. One session comprised 100 trials employing 10 types of Go stimuli (2 types, 3 and 5 kHz, of pure
123
Anim Cogn (2010) 13:515–523
tone, and Wve types of noise position) presented Wve times
each and two types of NoGo stimuli (3 and 5 kHz pure
tones) presented 25 times each. When the overall score of
“correct” (the ratio of correct reactions to the total number
of trials) exceeded 80% over two successive experimental
days, the subject was considered to have learned the task
and the test session was started. At the end of training, all
subjects showed stable reaction times to the noise. This
could be conWrmed by the fact that the reaction time to the
pure-tone stimuli was not shortened or prolonged for two
test sessions (order of testing sessions: F1, 50 = 0.431,
P = 0.515).
Two types of song stimuli were used in the test sessions:
(1) the bird’s own song (BOS) and (2) the reversed BOS
(REV). The REV stimulus was an exact reversed recording
of BOS, i.e., not only the order of the structure but each
note was also reversed in time. A part (1.6 s) of each bird’s
typical song (see song recording and analysis), which
included at least one chunk structure, was selected as a
stimulus. In the test sessions, song stimuli with noise were
used. The noise was placed either in the middle of the
chunk (IN) or between chunks (OUT), and only one noise
was placed in each stimulus. In both cases, noise was
placed in the silence between notes and did not overlap
with any song notes. In cases of a song with two or three
chunks, one chunk was selected as the target for the INnoise position. Two types of IN-noise placed in the middle
of the same target chunk were used in the test sessions for
each subject; one was placed in the Wrst half of the target
chunk and the other in the second half. OUT-noises were
placed after either the Wrst or the second chunk, but never
placed at the start or end of a stimulus. The REV with noise
stimuli was the reverse-played BOS with noise stimuli, i.e.,
the timing of the noise was also reversed (see Fig. 1;
Tables 1, 2). The stimuli were in digital format (12 bit
20 kHz sampling rate) and presented via a digital-to-analog
converter (DT2801, Data Translation, MA, USA) and a
loudspeaker (10 cm in diameter, 8 !) placed inside a sound
attenuation box (frequency response was within 4 dB in the
range of 100 Hz and 10 kHz, where most of the song stimuli energy was concentrated). The converter low-pass
Wltered the stimuli at 10 kHz to prevent aliasing and ampliWcation; the peak sound intensity was 65 dB re 20 microPa.
The test session was performed two times, once using
the BOS stimuli and once using the REV stimuli. Each test
(BOS, REV) was performed in an independent session
(day) that also contained pure-tone stimuli with noise
placed at the same temporal positions as in the song stimuli.
For the pure-tone stimuli, because the tone was continuous
and contained no chunking structure, we labeled the position of the noise based simply on the temporal position corresponding to the song stimulus, and not according to
whether the noise was actually inside or outside of a chunk.
Anim Cogn (2010) 13:515–523
519
Table 1 Sets of stimuli used for
tests: Sixteen types of stimuli
were used in the test sessions
This table describes each stimulus’s test type (BOS, REV), base
song (BOS, REV, tone), stimulus type (Go, NoGo), position of
the noise in the chunk (IN, OUT,
pseudo-position with regard to
tone), presentation order of the
noise (abbreviated as Ord. 1
(First), Ord. 2 (Second)) and
number of presentations in a test
(abbreviated as Rep.). Capital
letters indicate the notes of a
song, the straight line represents
a pure tone, underlines indicate
chunk structures, and inverted
commas (‘) show the position of
the noise. Reversed letters indicate that the stimulus was played
backwards
No.
Test
Base Song
Go/NoGo
Chunk
Ord.
Rep.
1
BOS
BOS
Go
IN
1
10
2
BOS
Go
IN
2
10
3
BOS
Go
OUT
1
10
4
BOS
Go
OUT
2
10
5
Tone
Go
(IN = Stim1)
1
5
6
Tone
Go
(OUT = Stim 3)
1
5
7
BOS
NoGo
-
-
40
8
Tone
NoGo
-
-
10
REV
Go
IN
2
10
10
REV
Go
IN
1
10
11
REV
Go
OUT
2
10
12
REV
Go
OUT
1
10
13
Tone
Go
(IN = Stim 9)
2
5
14
Tone
Go
(OUT = Stim11)
2
5
15
REV
NoGo
-
-
40
16
Tone
NoGo
-
-
10
9
REV
Diagram
BC
G
Table 2 Sets of stimuli used for tests: Set of the Go stimuli in the BOS test session
Position
IN
Presentation order
First
Second
First
Song
Stim 1 (10)
Stim 2 (10)
Stim 3 (10)
Stim 4 (10)
Tone
Stim 5 (5)
–
Stim 6 (5)
–
Stimulus type
OUT
Second
This table shows 50 Go stimuli comprising the BOS test session as an example. Numbers in brackets indicate the number of replications. The stimulus names (e.g., Stim. 1) can be found in Table 1
The order of the two test sessions was randomized, and
subjects were retrained with pure-tone stimuli to maintain
an overall correct score of 80% for two successive days
before the next testing session.
Six types of Go stimuli were presented in the subsequent
test session: song with a noise placed (1) on the Wrst half of
the chunk (10 presentations, shown as No. 1 in Tables 1, 2),
(2) on the second half of the chunk (10 presentations,
shown as No. 2), (3) on the Wrst boundary of the chunks
after the start of the stimulus (10 presentations, shown as
No. 3), (4) on the second boundary of the chunks (10 presentations, shown as No. 4), (5) on a pure tone with the
noise placed in the same temporal position as that in one
(5 presentations, shown as No. 5) and (6) on a pure tone
with the noise placed in the same temporal position as that
in three (5 presentations, shown as No. 6). NoGo stimuli in
the test sessions were as follows: pure tone without noise
(10 presentations) and song without noise (40 presentations) (see Tables 1, 2; Fig. 1).
The reaction time limit was 2 s after the end of the stimulus presentation. To encourage birds to peck more quickly,
in the training session, only reactions within 1 s of the noise
being presented were reinforced. The presentation order of
the stimuli was randomized across birds and across sessions. Subjects’ reactions to all Go stimuli were reinforced.
The score and reaction time (duration from the start of the
123
520
Anim Cogn (2010) 13:515–523
noise to pecking the reaction key) of each trial were
recorded.
Statistical analysis
Two types of analysis were performed on the length of the
reaction time. First, the reaction times to the song stimuli
and the tone stimuli that had the same noise position (for
example, in BOS tests, Stimuli 1, 3, 5, 6 in Table 1) were
analyzed using an analysis of variance (ANOVA) with the
following factors: Test (BOS, REV), position of noise (IN,
OUT), and stimulus type (song, tone). Trials in which subjects failed to react to the stimuli were omitted from the
analysis.
Second, only the song stimuli (for example, in BOS
tests, Stimuli 1, 2, 3, 4 in Table 1, see also Table 2) were
subjected to an ANOVA using the following factors: Test
(BOS, REV), position of noise (IN, OUT), and presentation
order of noise (First, Second; for example, in BOS tests,
stimuli Nos. 1 and 3 are “First”, while Nos. 2 and 4 are
“Second”, respectively, as shown in Tables 1, 2). Failed trials were omitted from the analysis. The correct reaction
ratio was subjected to the same analysis as that performed
on the reaction time. When there was a signiWcant eVect or
interaction, a least-signiWcant diVerence test was performed
to compare means. All tests were two-tailed.
Results
No signiWcant interaction with or eVect on the correct reaction ratio was observed for any type of stimulus and thus
here we describe the reaction time results only.
Comparison of stimuli with noise at the same position:
Tone and song
First, we examined reaction time to the stimuli with noise
of the same temporal position (e.g., Stimulus No. 1, 3, 5, 6
in BOS test; see Table 2). There was a signiWcant three-way
interaction between test, position of noise and stimulus type
(F1, 146 = 9.72, P = 0.002). No signiWcant main eVects of
test, position of noise, or stimulus type and any other (twoway) interaction were observed. Mean reaction times are
shown in Fig. 2. When BOS song stimuli were used, the
reaction time to the IN position was signiWcantly longer
than the reaction time to the OUT position (t146 = 2.37,
P = 0.019). In the REV song stimuli, however, no such
diVerence was observed. In both tests, no signiWcant diVerence was observed between IN and OUT using tone stimuli. When the reaction time was directly compared between
the song and tone stimuli, in which the noise was positioned in the same place in each stimuli set, a signiWcant
123
Fig. 2 Comparison of stimuli with noise at the same position—Tone
and song. In this analysis, reaction times to the song stimuli and the
tone stimuli that had the same noise position (e.g., in BOS tests, Stimuli 1, 3, 5, 6 in Tables 1, 2) were used. The Wgure shows the mean reaction time in the two types of tests: the bird’s own song (BOS) and the
same song reversed (REV). Striped bars show the mean reaction times
to IN noise with tone stimuli, Wlled bars show IN noise with song stimuli, dotted bars show OUT noise with tone stimuli, and blank bars
show OUT noise with the song. Error bars indicate the standard error
of the mean. * P < 0.01
increase in reaction time to the song stimuli was observed
in the BOS test IN position stimuli (t146 = 3.36, P = 0.001),
but not the OUT position stimuli. This increase was not
observed in the REV test.
Thus, during playback of their own songs, there was a
signiWcant diVerence in the subjects’ reaction times
depending on the position of the noise. Such a result was
not found in the reversed song condition.
EVect of temporal position of noise on reaction time:
Analysis of song stimuli
As stated earlier in the description of the statistical analyses, we examined reaction time to the song stimuli (e.g.,
Stimulus No. 1, 2, 3, 4 in BOS test; see Tables 1, 2) as well
as the eVect of temporal position of the noise on reaction
time.
There was a signiWcant eVect found for presentation
order of noise (F1, 199 = 5.66, P = 0.018); the reaction time
to the Wrst noise was longer than that to the second noise.
No signiWcant interaction with presentation order of noise
was observed. There was a signiWcant interaction between
test and position of noise (F1, 199 = 6.19, P = 0.014) on the
length of reaction time. Mean reaction times are shown in
Fig. 3. The results for the noise with BOS showed that the
Anim Cogn (2010) 13:515–523
Fig. 3 Comparisons of reaction times to IN and OUT noise in song
stimuli. In this analysis, only song stimuli (e.g., in BOS tests, Stimuli
1, 2, 3, 4 in Tables 1, 2 [compared with Fig. 2, we included stimuli 2
and 4, and excluded stimuli 5 and 6]) were used. Filled black bars
show the mean reaction times to IN noises, and blank bars show the
mean reaction times to OUT noises. Error bars indicate the standard
error of the mean. * P < 0.05
521
chunks (OUT), which showed no diVerence in reaction time
compared with noises placed in the same temporal position
in a pure tone. These results support our hypothesis and
indicate that Bengalese Wnches show a similar type of categorical reaction as the human participants in Fodor and
Bever’s experiment did. Bengalese Wnches exhibit auditory
chunking and, furthermore, they may not sense noises
placed within chunks until after those chunks have been
played completely. In other words, at least with respect to
BOS perception, auditory information might be processed
in chunks, with any noises not being processed until after
the auditory processing of the chunk.
There was no position-dependent change observed in the
reaction to noises played with a bird’s own reversed song.
No diVerence was observed between the reaction times for
noises with a bird’s own reversed song and those for pure
tones. The playing of the song in reverse may have
destroyed the chunk structures and produced a result diVerent from that observed with the normal song. Although
there remains the possibility that auditory segmentation
may be present in reversed songs, we conclude that Bengalese Wnches do not recognize reversed songs as their own,
because they did not react as they did to their own song
when played back normally.
Did we really observe a reaction to “chunking”?
reaction time to the noise within the chunks was signiWcantly longer than that to the noise between chunks
(t199 = 2.21, P = 0.028). There was no signiWcant diVerence
in the reaction to the IN and OUT noises for chunks in the
REV condition. Thus, we did not observe results that supported our hypothesis that reaction time would be prolonged as the duration between the noise and end of the
chunk became longer. In the analysis of data restricted to
the reaction time to IN noise, we found a signiWcant interaction between test and order of presentation (F1, 109 = 6.13,
P = 0.015), and, furthermore, the reaction time to the Wrst
noise was signiWcantly longer than that to the second in
BOS but not in REV (t109 = 2.80, P = 0.006).
Discussion
In this study, we showed that a listening bird’s reaction time
to a noise changed according to whether noise was played
during or between chunks in the playback of male Bengalese
Wnch songs. This result suggests that the birds use chunk
structures for perceptual analysis of their own song (BOS).
When Bengalese Wnches heard BOS, they showed a
more delayed reaction to noises placed in the middle of a
chunk (IN) compared with noises placed in the same temporal position in a pure tone. This delay was signiWcantly
diVerent from that observed for noises placed between
In this study, we used an “artiWcially calculated chunk” to
investigate songbirds’ capacity for segmentation to: (1)
increase the reproducibility of results, and (2) avoid reliance on experimenters’ “impressions”. Because the Bengalese Wnch’s songs have variations and because a chunk is not
a stereotyped structure like a motif, it is diYcult to extract
chunks from experimenters’ visual evaluations of song sonograms. Using real songs that have a structure deWned by
transition rates, artiWcial chunking should have been suYcient for the purpose of our experiments. We also must consider the possibility that factors other than chunking could
have aVected the subjects’ reactions. Because the acoustic
features of each bird song vary, one could consider that a
speciWc pattern or amplitude, for example, may have cued
the subjects’ reactions. This possibility, however, can be
rejected considering the following evidence inferred from
the results: (1) if amplitude was a cue to which the birds
reacted, then they should have reacted in the same manner
in both the BOS and the REV tests, and yet the results show
a clear diVerence; (2) in the BOS test, each subject heard a
diVerent song, with the acoustic features and volume varying between songs; and (3) the training was designed only
to shape the birds’ reactions to the noise. If the test made
any contingency between speciWc acoustic features and the
reaction behavior, then the order of the testing should have
had an eVect on the results. However, no such eVect was
123
522
observed, and the testing order itself was counter-balanced
in the experimental procedure.
In this experiment, we adapted Fodor and Bever’s experiment to Bengalese Wnches. However, is it possible to
equate our results with the original results based on
humans? In our experiment, the reaction was trained and
not associated with the song notes preceding the noise. It is
very diYcult to assess bird behavior except to say the reactions of the birds in our experiment were based on auditory
segmentation ability. When humans hear a sentence, they
perform semantic processing of the sentence. Obviously,
this aspect of perception is not testable with birds and their
songs, but at the very least, it is possible, as we have done
in this study, to test segmentation and chunking based on
the acoustic structures of birdsongs.
In previous studies of song production units, light Xashes
have been used to interrupt songs at silent intervals between
syllables (Cynx 1990). In an experiment with non-songbirds (collared doves), coos could be interrupted by a Xash
before completion (ten Cate and Ballintijn 1996). The
researchers also showed that the probability of a stop could
be changed by changing the timing of a Xash; Xashes in the
beginning of the elements did not induce a stop. These
results add support to our Wndings of a reaction time diVerence depending on the noise position in the current experiment. In a separate study of nightingales, isolated song
units could be interrupted by a Xash in some cases, while
the temporal position of a stimulus did not aVect the probability of a stop (Riebel and Todt 1997). The diVerences in
interruption can be explained by song type, because a Wxedtype song may have a stronger linkage within the syllable
than non-Wxed song or coos. If the probability of stopping
does in fact indicate biased perceptual sensitivity during
ongoing auditory feedback perception, then the results of
the current experiments can be concluded to apply to zebra
Wnches but not nightingales. Of course, these two sets of
experiments cannot be directly compared, but the variations
in reactions observed cannot be explained by simply species and song type alone. As well, we did not try to examine
the eVect of the noise position overlapping the song notes in
the current study. Further study is necessary to clarify
biased perceptual sensitivity during song production.
Perception of BOS
The most important feature of BOS perception for a male
Bengalese Wnch is auditory feedback for the maintenance
and adjustment of BOS. For some species of adult male
songbirds, hearing BOS is crucial to the maintenance of
normal song patterns (Nordeen and Nordeen 1992).
Bengalese Wnches rely on auditory feedback for the maintenance of their songs (Okanoya and Yamaguchi 1997;
Woolley and Rubel 1997, 1999, 2002; Yamada and
123
Anim Cogn (2010) 13:515–523
Okanoya 2003). Zebra Wnches also require ongoing auditory feedback, but deaf zebra Wnches can maintain their
learned songs for longer periods than deaf Bengalese
Wnches (Brainard and Doupe 2000; Nordeen and Nordeen
1992, 1993; Scott et al. 2000). Studies on deaf birds have
revealed an adjustment to the parameters of ongoing
auditory feedback, namely syllable ordering and syllable
structure. These aspects may be based on diVerent mechanisms (Woolley 2004; Woolley and Rubel 1997). Woolley
and Rubel (1997) and Okanoya and Yamaguchi (1997)
reported that although Bengalese Wnches who had had their
cochleae removed lost their normal song note sequences
within a week of the operation, their note structure did not
start to change until long after the alteration of note ordering. Furthermore, this note sequence error was observed to
occur even within chunks. With regard to time delay, these
authors suggested two distinct systems of control, temporal
pattern and note, and that both systems require auditory
feedback to maintain a learned song. The results of these
studies may explain the auditory processing in the BOS
tests that we observed using a chunk as a unit.
Perception of REV
With respect to REV, we concluded that birds did not recognize the song as their own and showed no evidence of
“chunking” because the playing of the song in reverse
seemed to have the eVect of destroying the chunk structures. If we had used stimuli with reversed chunk orders
(that is, not reversing the actual notes), then would the birds
have reacted like they did in the BOS tests? Because we
used relatively short parts of the full songs, we cannot
reject the possibility that the chunk-order-reversed version
of the songs would have been sung by their own or tutors
under natural circumstances. Even so, it would be interesting to investigate whether the subjects show a reaction similar to those observed in the BOS tests of the current study.
Such results may help to reveal song structural units and
neural system details of the perceptual system underlying
the chunking observed in the current experiment and their
relation with the ongoing auditory feedback system used
for the maintenance of songs in Bengalese Wnches. While
neurons in the caudal mesopallium (CM) and Weld L prefer
BOS to REV, this preference does not exceed the preference for the order-reversed version of the songs in zebra
Wnches (Amin et al. 2004; Janata and Margoliash 1999;
Lewicki and Arthur 1996). If such selectivity aVected reactions in the current experiment, then some form of reaction
based on chunk structure can be expected.
Based on the results of our study, we conclude that
Bengalese Wnches exhibit auditory chunking that is used for
song perception. We used an experiment similar to that
conducted on humans by Fodor and Bever (1965) and
Anim Cogn (2010) 13:515–523
observed similar results. This perceptual system was
observed when Wnches were exposed to their own songs.
No evidence of chunking was observed when the
Wnches’songs were played in reverse.
Acknowledgments We sincerely thank Professor Tatiana Czeschlik,
Dr. Brain McCabe, and four anonymous referees who patiently commented and provided useful suggestions and warm encouragements for
the earlier versions of the manuscript. This study was supported by
PRESTO, Japan Science and Technology Corporation. We declare that
this study complies with the current laws of Japan and with the recommendation for ethical treatments of animals provided by the Japanese
Society for Animal Psychology.
References
Amin N, Grace JA, Theunissen FE (2004) Neural response to bird’s
own song and tutor song in the zebra Wnch Weld L and caudal mesopallium. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 190:469–489. doi:10.1007/s00359-004-0511-x
Brainard MS, Doupe AJ (2000) Interruption of a basal ganglia-forebrain circuit prevents plasticity of learned vocalizations. Nature
404:762–766. doi:10.1038/35008083
Cynx J (1990) Experimental determination of a unit of song
production in the zebra Wnch (Taeniopygia guttata). J Comp
Psychol 104:3–10
Doupe AJ, Kuhl PK (1999) Birdsong and human speech: common
themes and mechanisms. Annu Rev Neurosci 22:567–631.
doi:10.1146/annurev.neuro.22.1.567
Fodor J, Bever T (1965) The psychological reality of linguistic segments. J Verbal Learn Verbal Behav 4:414–420
Franz M, Goller F (2002) Respiratory units of motor production and
song imitation in zebra Wnch. J Neurobiol 51:129–141.
doi:10.1002/neu.10043
Honda E, Okanoya K (1999) Acoustical and syntactical comparisons
between songs the White-backed Munia (Lonchura striata), and
its domesticated strain, the Bengalese Wnch (Lonchura striata var.
domestica). Zoolog Sci 16:319–326. doi:10.2108/zsj.16.319
Hosino T, Okanoya K (2000) Lesion of a higher-order song nucleus
disrupts phrase level complexity in Bengalese Wnches. Neuroreport 11:2091–2095
Hultsch H, Todt D (1989) Memorization and reproduction of songs in
nightingales (Luscinia megarhynchos): evidence for package formation. J Comp Physiol A 165:197–203
Hultsch H, Mundry R, Todt D (1999) Learning, representations and
retrieval of rule related knowledge in the song system of birds.
In: Friederici AD, Menzel R (eds) Learning: rule extraction and
representation. Walter de Gruyter, Berlin and New York, pp 89–115
Ikebuchi M, Okanoya K (2000) Limited auditory memory for conspeciWc songs in a non-territorial songbird. Neuroreport 11:3915–
3919
JackendoV R (2002) Foundations of language. Oxford University
Press, New York
Janata P, Margoliash D (1999) Gradual emergence of song selectivity
in sensorimotor structures of the male zebra Wnch song system.
J Neurosci 19:5108–5118
523
Lewicki MS, Arthur BJ (1996) Hierarchical organization of auditory
temporal context sensitivity. J Neurosci 16:6987–6998
Nordeen KW, Nordeen EJ (1992) Auditory feedback is necessary for
the maintenance of stereotyped song in adult zebra Wnches. Behav
Neural Biol 57:58–66
Nordeen KW, Nordeen EJ (1993) Long-term maintenance of song in
adult zebra Wnches is not aVected by lesions of a forebrain region
involved in song learning. Behav Neural Biol 59:79–82
Okanoya K (2004a) The Bengalese Wnch: a window on the behavioral
neurobiology of birdsong syntax. Ann N Y Acad Sci 1016:724–
735. doi:10.1196/annals.1298.026
Okanoya K (2004b) Song Syntax in Bengalese Finches: proximate and
ultimate analyses. Adv Study Behav 34:297–346. doi:10.1016/
S0065-3454(04)34008-8
Okanoya K, Yamaguchi A (1997) Adult Bengalese Wnches (Lonchura
striata var. domestica) require real-time auditory feedback to produce normal song syntax. J Neurobiol 33:343–356
Riebel K, Todt D (1997) Light Xash stimulation alters the nightingale’s
singing style: implications for song control mechanisms. Behaviour 134:789–808
Scott LL, Nordeen EJ, Nordeen KW (2000) The relationship between
rates of HVc neuron addition and vocal plasticity in adult songbirds. J Neurobiol 43:79–88
Sossinka R, Bohner J (1980) Song types in the Zebra Finch (Poephila
guttata castanotis). Z Tierpsychol 53:123–132
ten Cate C, Ballintijn MR (1996) Dove coos and Xashed lights: interruptibility of “Song” in a non-songbird. J Comp Psychol
110:267–275
ten Cate C, Slater PJB (1991) Song learning in zebra Wnches: how are
elements from two tutors integrated? Anim Behav 42:150–152
Vu ET, Mazurek ME, Kuo Y-C (1994) IdentiWcation of a forebrain
motor programming network for the learned song of zebra Wnches. J Neurosci 14:6924–6934
Walters M, Collado D, Harding C (1991) Oestrogenic modulation of
singing in male zebra Wnches: diVerential eVects on directed and
undirected songs. Anim Behav 42:695–705
Williams H, McKibben JR (1992) Changes in stereotyped central
motor patterns controlling vocalization are induced by peripheral
nerve injury. Behav Neural Biol 57:67–78
Williams H, Staples K (1992) Syllable chunking in zebra Wnch (Taeniopygia guttata) song. J Comp Psychol 106:278–286
Woolley SM (2004) Auditory experience and adult song plasticity.
Ann N Y Acad Sci 1016:208–221. doi:10.1196/annals.1298.017
Woolley SM, Rubel EW (1997) Bengalese Wnches Lonchura Striata
domestica depend upon auditory feedback for the maintenance of
adult song. J Neurosci 17:6380–6390
Woolley SM, Rubel EW (1999) High-frequency auditory feedback is
not required for adult song maintenance in Bengalese Wnches.
J Neurosci 19:358–371
Woolley SM, Rubel EW (2002) Vocal memory and learning in adult
Bengalese Finches with regenerated hair cells. J Neurosci
22:7774–7787
Yamada H, Okanoya K (2003) Song syntax changes in Bengalese
Wnches singing in a helium atmosphere. Neuroreport 14:1725–
1729. doi:10.1097/01.wnr.0000087731.58565.29
Yu AC, Margoliash D (1996) Temporal hierarchical control of singing
in birds. Science 273:1871–1875
123