Anim Cogn (2010) 13:515–523 DOI 10.1007/s10071-009-0302-4 ORIGINAL PAPER Perceptual chunking in the self-produced songs of Bengalese Wnches (Lonchura striata var. domestica) Rie Suge · Kazuo Okanoya Received: 20 August 2009 / Revised: 1 December 2009 / Accepted: 3 December 2009 / Published online: 29 December 2009 Springer-Verlag 2009 Abstract Like humans, songbirds, including Bengalese Wnches, have hierarchical structures in their vocalizations. When humans perceive a sentence, processing occurs in phrase units, not words. In this study, we investigated whether songbirds also perceive their songs by chunks (clusters of song notes) rather than single song notes. We trained male Bengalese Wnches to react to a short noise in a Go/NoGo task. We then superimposed the noise onto recordings of their own songs and examined whether the reaction time was aVected by the location of the short noise, that is, whether the noise was placed between chunks or in the middle of a chunk. The subjects’ reaction times to the noise in the middle of a chunk were signiWcantly longer than those to the noise placed between chunks. This result was not observed, however, when the songs were played in reverse. We thus concluded that Bengalese Wnches perceive their songs by chunks rather than single notes. Keywords Segmentation · Bird song · Phrase structure · Operant conditioning · Vocal learning R. Suge · K. Okanoya PRESTO, Japan Science and Technology Corporation, 4-1-8, Honcho, Kawaguchi 332-0012, Japan R. Suge · K. Okanoya Faculty of Letters, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba 263-0022, Japan R. Suge Department of Physiology, Saitama Medical University, 38 Morohongo, Moroyama, Saitama 350-0495, Japan K. Okanoya (&) RIKEN Brain Science Institute, 2-1 Hirosawa, Saitama 351-0195, Japan e-mail: [email protected] Introduction Bird songs are thought to be excellent models for studying the mechanisms of human language. Bird songs and human languages share several features in their development such as having a critical period in early life, imitation of adults, and innate predisposition for conspeciWcs (for a review, see Doupe and Kuhl 1999). The structure of bird songs, with hierarchical organization and syntactical control, also shows similarities to human language (Okanoya 2004a). The necessary abilities underlying these common features, like vocal learning, have been investigated in many comparative studies. Human language is organized in a hierarchical structure in which phonemes form words, words form phrases, and phrases form sentences (JackendoV 2002). When we listen to speech, we segment the continuous stream of sound into smaller units such as phrases. In other words, we chunk single words into larger units, phrases. Fodor and Bever (1965) provided empirical evidence for this ability by presenting the sound of a click to subjects while they listened to a spoken sentence; the subjects were then asked to state where the click was located in the sentence. Many of the subjects placed the click at boundaries of constituents, such as phrases, even though the click was actually distributed evenly throughout the sentence. Thus, humans seem to process spoken words using phrase structures as units. In the perception of bird songs, do songbirds also process their songs by segmenting them into structural subunits? Bird songs can be classiWed on a spectrum of song complexity (Okanoya 2004a). For example, zebra Wnches sing songs in which the order of the song notes is relatively Wxed. In this type of song, a combination of a few notes (but sometimes only one note) forms a song syllable, and a number of syllables form a song phrase. A stereotyped 123 516 Fig. 1 Example of song analysis. a Example of the sonogram from a ! subject’s song. Axes indicate frequency and time, and each note type is symbolized by a letter. The song was coded as a sequence of letters. b Transition diagram extracted from the 20 song samples of the subject shown in a. Circles with letters indicate notes, and the arrows show the transition to the following notes. Width of the arrow indicates the frequency of a transition to the next note relative to the number of total transitions. Each dotted circle shows a chunk deWned in this experiment. Note “e” was omitted from the diagram because it had a low appearance probability. c Example of a “Typical song”. Based on the diagram, part of a song was extracted and its structure determined. d Examples of song stimuli. Solid lines placed under each sonogram indicate the chunk structures. In the case of REV, the dotted lines indicate the structures deWned as chunks in BOS. Triangles represent the placement of noises in the stimuli Anim Cogn (2010) 13:515–523 a kHz a b b c c d 10 8 6 4 2 0.1 10 0.2 0.3 e 0.4 0.5 a 0.6 0.7 8 6 4 2 b 0.8 10 0.9 c c 1.0 1.1 c 1.2 d 1.3 f 1.4 g 8 6 4 2 series of syllables within a song is known as a “motif” (Sossinka and Bohner 1980). Other widely studied oscines, including the white crowned sparrow, song sparrow, and swamp sparrow, also sing songs with stereotyped series of syllables. The order of song notes in the songs of Bengalese Wnches varies, and there are also other species with varying song orders, such as nightingales, starlings, and willow warblers (Okanoya 2004b). The songs of Bengalese Wnches have been described as having Wnite-state grammar, which is deWned as the transition probabilities between several states (Honda and Okanoya 1999; Okanoya 2004b). In Bengalese Wnches, 2–5 song notes form a unit, and each unit is produced at a particular state transition. The transition pattern is not Wxed because one note can be followed by several possible notes and may include a repeat of the same note. Some sequential notes form a “chunk” in which notes have a stereotyped order with occasional variations in note repetition, and any note can be used in other chunks as well. Furthermore, because the order of chunks is also not Wxed, one chunk can be followed by several diVerent chunks. Song structure can be expressed as a Markov model of note-to-note transitions (see Fig. 1b for an example). The degree of song sequential complexity in Bengalese Wnches allows statistical analysis (Hosino and Okanoya 2000; Okanoya 2004b), and this makes Bengalese Wnches ideal subjects for the study of perceptual chunking of song notes. In song learning, young zebra Wnches have been observed to copy a series of song notes as a chunk rather than single elements from their tutors (ten Cate and Slater 1991; Williams and Staples 1992). Hultsch and Todt (1989) showed that nightingales also copied temporally consistent groups of songs from their tutors’ songs as a “package”. In song production, two approaches have been used to Wnd song units. Cynx (1990) Xashed lights at singing zebra Wnches and found that interruptions often occurred after each syllable, but not within syllables. This result was 123 h 10 i 1.7 1.6 j j 1.9 1.8 2.0 j 2.1 j 8 6 4 2 10 j 2.3 a 2.4 2.5 8 2.6 b b c 2.7 c 2.8 2.9 d 6 4 2 b 3.0 3.1 3.2 c d BOS (NOGO) BOS (GO/IN) BOS (GO/OUT) REV (GO/IN) chunk noise 3.3 3.4 3.5 3.6 s Anim Cogn (2010) 13:515–523 conWrmed by analyzing their respiratory patterns (Franz and Goller 2002). In nightingales, Xashing lights were found to interrupt singing, with most stops occurring either during silent portions of songs or 1–3 elements after the Xash of light (Riebel and Todt 1997). The researchers suggested that the strength of association between elements can vary. The other approach is based on interfering with the neural circuits that underlie song production. Peripheral lesions in the song system have been found to result in the deletion of one part of a song that was a chunk of song notes in zebra Wnches (Williams and McKibben 1992). Electrophysiological stimulation (Vu et al. 1994) and direct recording (Yu and Margoliash 1996) of song nuclei, HVC (used as a proper name), and robust nucleus of the arcopallium (RA) have all indicated a hierarchical organization of motor pathways in zebra Wnches in which the HVC is responsible for syllable sequences while the RA represents individual syllables. Thus, although it is known that songbirds use chunks as units for song learning, empirical results regarding the production of songs have been inconclusive and appear to vary depending on the species. Are songbirds really using chunks to hear their songs? In the present study, using the term “chunk” instead of “phrase”, which better represents human language (Hultsch et al. 1999), the “chunking”, or auditory segmentation, ability of songbirds was examined via an experiment similar to that conducted by Fodor and Bever. To determine whether songbirds can chunk their songs during the process of perception, we used an operant-conditioning technique when replicating Fodor and Bever’s experiment. Subject birds were trained to react to a short noise as soon as possible. Reaction time to the noise was then measured when a song, rather than spoken words, was played as the background to the noise. We hypothesized that the reaction time to a noise placed inside a chunk would be longer than that placed outside of the chunk on the assumption that the songbirds would process the noise placed inside the chunk after the auditory processing of the chunk itself was completed. Furthermore, we hypothesized that when the noise was shifted to the end of the chunk, then the reaction would be delayed in accordance with the time between the noise and the end of the chunk. Methods Subjects Five male Bengalese Wnches (Lonchura striata var. domestica) kept in aviaries at Chiba University (a constant light– dark cycle 13:11 h) were used as subjects. The subjects were selected from Wve diVerent families to avoid using similar types of songs, as this would make it diYcult for the 517 birds to recognize their own songs and the songs of other conspeciWcs. Each song had unique notes, and the transition patterns were varied and recognizable by the researchers. The birds were housed individually but could see each other and hear each other’s songs and calls. For operant conditioning, each subject’s weight was controlled by feeding time, which was gradually reduced from 24 h to 2 h per day, except for 24-h feeding 1 day per week (Ikebuchi and Okanoya 2000). Each subject’s weight and condition were checked at 9:00 AM every morning. Song recording and analysis Each bird’s song was recorded and analyzed to Wnd chunk structures. In our experiment, only “undirected songs”, which were produced by males unable to sense any females close by, were used as a stimulus. It has been reported that undirected songs observed in the zebra Wnch are structurally more variable than directed songs (Sossinka and Bohner 1980; Walters et al. 1991). Stimuli were obtained by placing a subject in a small wooden cage (115 cm£ 185 cm£ 150 cm) situated in a soundproof room. Songs were recorded using a condenser microphone (ECM-MS957, Sony, Tokyo, Japan) and a digital audio tape recorder (DTC-ZA5ES, Sony, Tokyo, Japan). When the interval between two consecutive notes was longer than 3 s, the silence was regarded as signaling the end of the Wrst song. We collected at least 20 songs from each subject from which to extract stimuli. The recorded songs were analyzed using Avisoft-SAS Lab Pro (Avisoft, Berlin, Germany) to produce sonograms (Fig. 1a). To analyze the syntactical structure of a song, we followed the same procedure as that described by Honda and Okanoya (1999). In brief, the song notes were categorized by visual inspection for distinct groups, and each note type was represented by a letter. Thus, each song was represented by a sequence of letters. These sequences (comprising the 20 songs from each subject) were analyzed using Mnemic (CogniTom Academic Design Inc., Chiba, Japan) to calculate the transition probability of one note changing to another. When the number of collected songs exceeded 20, the 20 longest songs were selected from the collected samples. Figure 1b shows one of the song syntaxes analyzed using the Mnemic software. Notes and transitions with low probabilities of occurrence were omitted from the diagram for simpliWcation (e.g., we omitted note “e” because it was recorded only four times out of a total of 864 notes in 20 sample songs). The width of the line of each arrow shows the probability of a transition. For example, the probabilities of the various transitions for note “d” are as follows: “a” (9%), “e” (2%), “f” (75%), “h” (9%), and “end” (5%). The transition lines “d” to “e” and “d” to “end” were omitted from the diagram because of their relatively 123 518 low (·5%) transition probabilities. From this data, compared with notes “b” and “c” with single output arrows (with the exception of “self-transition”), it can be assumed that “d” is both a branching point in this song structure and the last note of the Wrst chunk. Thus, a typical transition pattern for each song can be obtained via this analysis. We selected a song that had the typical transition pattern extracted as above. This song was termed the “typical song” and subsequently used as a source of stimuli (see Fig. 1c). Because the songs of Bengalese Wnches are nonWxed, not all song samples have a deWned chunk structure. To use natural songs, therefore, we chose a song sample with chunks (from the 20 songs used for analyzing song structure) rather than a song stimulus generated artiWcially from several song samples. Apparatus The subjects were trained and tested in a small wire cage placed in a sound attenuation box. Attached to the cage was a panel with two sensitive micro-switches: the left switch was attached to a green light-emitting diode (LED) and the right to a red LED. The green switch was used as the observation key and the red switch the report key. The birds could Xip the switches by pecking the LED. A standard pigeon grain hopper was placed under the panel. A wooden perch was positioned in front of the hopper opening, with the subjects able to reach the food from the perch. Training and testing We employed operant conditioning with standard “Go/NoGo” training. At the start of a session, the observation key was illuminated until the bird pecked it. Fifty milliseconds after the pecking, either the Go stimulus or the NoGo stimulus was presented with the illuminated red LED (report key). The training stimuli were a 3 or 5 kHz 1.6-s pure tone played with the noise (Go stimulus) and the pure tone without the noise (NoGo stimulus). The noise had a white (Xat) spectral characteristic and lasted 15 ms with a rise/fall time of 5 ms. We used Wve types of Go stimuli: pure tones with the noise delivered at 300, 600, 900, 1,000 or 1,500 ms after the start of the pure tone. Each sound stimulus had a rise/fall time of 5 ms. During training, to prevent the subjects from learning to react to a given type of sound rather than the noise itself and to prevent them from learning any relationship between the noise and the structure of the background sound, we used sine wave stimuli that did not contain a chunk-like structure. Only the reactions (report key pecking) within the Wrst 1 s of noise being presented were reinforced. One session comprised 100 trials employing 10 types of Go stimuli (2 types, 3 and 5 kHz, of pure 123 Anim Cogn (2010) 13:515–523 tone, and Wve types of noise position) presented Wve times each and two types of NoGo stimuli (3 and 5 kHz pure tones) presented 25 times each. When the overall score of “correct” (the ratio of correct reactions to the total number of trials) exceeded 80% over two successive experimental days, the subject was considered to have learned the task and the test session was started. At the end of training, all subjects showed stable reaction times to the noise. This could be conWrmed by the fact that the reaction time to the pure-tone stimuli was not shortened or prolonged for two test sessions (order of testing sessions: F1, 50 = 0.431, P = 0.515). Two types of song stimuli were used in the test sessions: (1) the bird’s own song (BOS) and (2) the reversed BOS (REV). The REV stimulus was an exact reversed recording of BOS, i.e., not only the order of the structure but each note was also reversed in time. A part (1.6 s) of each bird’s typical song (see song recording and analysis), which included at least one chunk structure, was selected as a stimulus. In the test sessions, song stimuli with noise were used. The noise was placed either in the middle of the chunk (IN) or between chunks (OUT), and only one noise was placed in each stimulus. In both cases, noise was placed in the silence between notes and did not overlap with any song notes. In cases of a song with two or three chunks, one chunk was selected as the target for the INnoise position. Two types of IN-noise placed in the middle of the same target chunk were used in the test sessions for each subject; one was placed in the Wrst half of the target chunk and the other in the second half. OUT-noises were placed after either the Wrst or the second chunk, but never placed at the start or end of a stimulus. The REV with noise stimuli was the reverse-played BOS with noise stimuli, i.e., the timing of the noise was also reversed (see Fig. 1; Tables 1, 2). The stimuli were in digital format (12 bit 20 kHz sampling rate) and presented via a digital-to-analog converter (DT2801, Data Translation, MA, USA) and a loudspeaker (10 cm in diameter, 8 !) placed inside a sound attenuation box (frequency response was within 4 dB in the range of 100 Hz and 10 kHz, where most of the song stimuli energy was concentrated). The converter low-pass Wltered the stimuli at 10 kHz to prevent aliasing and ampliWcation; the peak sound intensity was 65 dB re 20 microPa. The test session was performed two times, once using the BOS stimuli and once using the REV stimuli. Each test (BOS, REV) was performed in an independent session (day) that also contained pure-tone stimuli with noise placed at the same temporal positions as in the song stimuli. For the pure-tone stimuli, because the tone was continuous and contained no chunking structure, we labeled the position of the noise based simply on the temporal position corresponding to the song stimulus, and not according to whether the noise was actually inside or outside of a chunk. Anim Cogn (2010) 13:515–523 519 Table 1 Sets of stimuli used for tests: Sixteen types of stimuli were used in the test sessions This table describes each stimulus’s test type (BOS, REV), base song (BOS, REV, tone), stimulus type (Go, NoGo), position of the noise in the chunk (IN, OUT, pseudo-position with regard to tone), presentation order of the noise (abbreviated as Ord. 1 (First), Ord. 2 (Second)) and number of presentations in a test (abbreviated as Rep.). Capital letters indicate the notes of a song, the straight line represents a pure tone, underlines indicate chunk structures, and inverted commas (‘) show the position of the noise. Reversed letters indicate that the stimulus was played backwards No. Test Base Song Go/NoGo Chunk Ord. Rep. 1 BOS BOS Go IN 1 10 2 BOS Go IN 2 10 3 BOS Go OUT 1 10 4 BOS Go OUT 2 10 5 Tone Go (IN = Stim1) 1 5 6 Tone Go (OUT = Stim 3) 1 5 7 BOS NoGo - - 40 8 Tone NoGo - - 10 REV Go IN 2 10 10 REV Go IN 1 10 11 REV Go OUT 2 10 12 REV Go OUT 1 10 13 Tone Go (IN = Stim 9) 2 5 14 Tone Go (OUT = Stim11) 2 5 15 REV NoGo - - 40 16 Tone NoGo - - 10 9 REV Diagram BC G Table 2 Sets of stimuli used for tests: Set of the Go stimuli in the BOS test session Position IN Presentation order First Second First Song Stim 1 (10) Stim 2 (10) Stim 3 (10) Stim 4 (10) Tone Stim 5 (5) – Stim 6 (5) – Stimulus type OUT Second This table shows 50 Go stimuli comprising the BOS test session as an example. Numbers in brackets indicate the number of replications. The stimulus names (e.g., Stim. 1) can be found in Table 1 The order of the two test sessions was randomized, and subjects were retrained with pure-tone stimuli to maintain an overall correct score of 80% for two successive days before the next testing session. Six types of Go stimuli were presented in the subsequent test session: song with a noise placed (1) on the Wrst half of the chunk (10 presentations, shown as No. 1 in Tables 1, 2), (2) on the second half of the chunk (10 presentations, shown as No. 2), (3) on the Wrst boundary of the chunks after the start of the stimulus (10 presentations, shown as No. 3), (4) on the second boundary of the chunks (10 presentations, shown as No. 4), (5) on a pure tone with the noise placed in the same temporal position as that in one (5 presentations, shown as No. 5) and (6) on a pure tone with the noise placed in the same temporal position as that in three (5 presentations, shown as No. 6). NoGo stimuli in the test sessions were as follows: pure tone without noise (10 presentations) and song without noise (40 presentations) (see Tables 1, 2; Fig. 1). The reaction time limit was 2 s after the end of the stimulus presentation. To encourage birds to peck more quickly, in the training session, only reactions within 1 s of the noise being presented were reinforced. The presentation order of the stimuli was randomized across birds and across sessions. Subjects’ reactions to all Go stimuli were reinforced. The score and reaction time (duration from the start of the 123 520 Anim Cogn (2010) 13:515–523 noise to pecking the reaction key) of each trial were recorded. Statistical analysis Two types of analysis were performed on the length of the reaction time. First, the reaction times to the song stimuli and the tone stimuli that had the same noise position (for example, in BOS tests, Stimuli 1, 3, 5, 6 in Table 1) were analyzed using an analysis of variance (ANOVA) with the following factors: Test (BOS, REV), position of noise (IN, OUT), and stimulus type (song, tone). Trials in which subjects failed to react to the stimuli were omitted from the analysis. Second, only the song stimuli (for example, in BOS tests, Stimuli 1, 2, 3, 4 in Table 1, see also Table 2) were subjected to an ANOVA using the following factors: Test (BOS, REV), position of noise (IN, OUT), and presentation order of noise (First, Second; for example, in BOS tests, stimuli Nos. 1 and 3 are “First”, while Nos. 2 and 4 are “Second”, respectively, as shown in Tables 1, 2). Failed trials were omitted from the analysis. The correct reaction ratio was subjected to the same analysis as that performed on the reaction time. When there was a signiWcant eVect or interaction, a least-signiWcant diVerence test was performed to compare means. All tests were two-tailed. Results No signiWcant interaction with or eVect on the correct reaction ratio was observed for any type of stimulus and thus here we describe the reaction time results only. Comparison of stimuli with noise at the same position: Tone and song First, we examined reaction time to the stimuli with noise of the same temporal position (e.g., Stimulus No. 1, 3, 5, 6 in BOS test; see Table 2). There was a signiWcant three-way interaction between test, position of noise and stimulus type (F1, 146 = 9.72, P = 0.002). No signiWcant main eVects of test, position of noise, or stimulus type and any other (twoway) interaction were observed. Mean reaction times are shown in Fig. 2. When BOS song stimuli were used, the reaction time to the IN position was signiWcantly longer than the reaction time to the OUT position (t146 = 2.37, P = 0.019). In the REV song stimuli, however, no such diVerence was observed. In both tests, no signiWcant diVerence was observed between IN and OUT using tone stimuli. When the reaction time was directly compared between the song and tone stimuli, in which the noise was positioned in the same place in each stimuli set, a signiWcant 123 Fig. 2 Comparison of stimuli with noise at the same position—Tone and song. In this analysis, reaction times to the song stimuli and the tone stimuli that had the same noise position (e.g., in BOS tests, Stimuli 1, 3, 5, 6 in Tables 1, 2) were used. The Wgure shows the mean reaction time in the two types of tests: the bird’s own song (BOS) and the same song reversed (REV). Striped bars show the mean reaction times to IN noise with tone stimuli, Wlled bars show IN noise with song stimuli, dotted bars show OUT noise with tone stimuli, and blank bars show OUT noise with the song. Error bars indicate the standard error of the mean. * P < 0.01 increase in reaction time to the song stimuli was observed in the BOS test IN position stimuli (t146 = 3.36, P = 0.001), but not the OUT position stimuli. This increase was not observed in the REV test. Thus, during playback of their own songs, there was a signiWcant diVerence in the subjects’ reaction times depending on the position of the noise. Such a result was not found in the reversed song condition. EVect of temporal position of noise on reaction time: Analysis of song stimuli As stated earlier in the description of the statistical analyses, we examined reaction time to the song stimuli (e.g., Stimulus No. 1, 2, 3, 4 in BOS test; see Tables 1, 2) as well as the eVect of temporal position of the noise on reaction time. There was a signiWcant eVect found for presentation order of noise (F1, 199 = 5.66, P = 0.018); the reaction time to the Wrst noise was longer than that to the second noise. No signiWcant interaction with presentation order of noise was observed. There was a signiWcant interaction between test and position of noise (F1, 199 = 6.19, P = 0.014) on the length of reaction time. Mean reaction times are shown in Fig. 3. The results for the noise with BOS showed that the Anim Cogn (2010) 13:515–523 Fig. 3 Comparisons of reaction times to IN and OUT noise in song stimuli. In this analysis, only song stimuli (e.g., in BOS tests, Stimuli 1, 2, 3, 4 in Tables 1, 2 [compared with Fig. 2, we included stimuli 2 and 4, and excluded stimuli 5 and 6]) were used. Filled black bars show the mean reaction times to IN noises, and blank bars show the mean reaction times to OUT noises. Error bars indicate the standard error of the mean. * P < 0.05 521 chunks (OUT), which showed no diVerence in reaction time compared with noises placed in the same temporal position in a pure tone. These results support our hypothesis and indicate that Bengalese Wnches show a similar type of categorical reaction as the human participants in Fodor and Bever’s experiment did. Bengalese Wnches exhibit auditory chunking and, furthermore, they may not sense noises placed within chunks until after those chunks have been played completely. In other words, at least with respect to BOS perception, auditory information might be processed in chunks, with any noises not being processed until after the auditory processing of the chunk. There was no position-dependent change observed in the reaction to noises played with a bird’s own reversed song. No diVerence was observed between the reaction times for noises with a bird’s own reversed song and those for pure tones. The playing of the song in reverse may have destroyed the chunk structures and produced a result diVerent from that observed with the normal song. Although there remains the possibility that auditory segmentation may be present in reversed songs, we conclude that Bengalese Wnches do not recognize reversed songs as their own, because they did not react as they did to their own song when played back normally. Did we really observe a reaction to “chunking”? reaction time to the noise within the chunks was signiWcantly longer than that to the noise between chunks (t199 = 2.21, P = 0.028). There was no signiWcant diVerence in the reaction to the IN and OUT noises for chunks in the REV condition. Thus, we did not observe results that supported our hypothesis that reaction time would be prolonged as the duration between the noise and end of the chunk became longer. In the analysis of data restricted to the reaction time to IN noise, we found a signiWcant interaction between test and order of presentation (F1, 109 = 6.13, P = 0.015), and, furthermore, the reaction time to the Wrst noise was signiWcantly longer than that to the second in BOS but not in REV (t109 = 2.80, P = 0.006). Discussion In this study, we showed that a listening bird’s reaction time to a noise changed according to whether noise was played during or between chunks in the playback of male Bengalese Wnch songs. This result suggests that the birds use chunk structures for perceptual analysis of their own song (BOS). When Bengalese Wnches heard BOS, they showed a more delayed reaction to noises placed in the middle of a chunk (IN) compared with noises placed in the same temporal position in a pure tone. This delay was signiWcantly diVerent from that observed for noises placed between In this study, we used an “artiWcially calculated chunk” to investigate songbirds’ capacity for segmentation to: (1) increase the reproducibility of results, and (2) avoid reliance on experimenters’ “impressions”. Because the Bengalese Wnch’s songs have variations and because a chunk is not a stereotyped structure like a motif, it is diYcult to extract chunks from experimenters’ visual evaluations of song sonograms. Using real songs that have a structure deWned by transition rates, artiWcial chunking should have been suYcient for the purpose of our experiments. We also must consider the possibility that factors other than chunking could have aVected the subjects’ reactions. Because the acoustic features of each bird song vary, one could consider that a speciWc pattern or amplitude, for example, may have cued the subjects’ reactions. This possibility, however, can be rejected considering the following evidence inferred from the results: (1) if amplitude was a cue to which the birds reacted, then they should have reacted in the same manner in both the BOS and the REV tests, and yet the results show a clear diVerence; (2) in the BOS test, each subject heard a diVerent song, with the acoustic features and volume varying between songs; and (3) the training was designed only to shape the birds’ reactions to the noise. If the test made any contingency between speciWc acoustic features and the reaction behavior, then the order of the testing should have had an eVect on the results. However, no such eVect was 123 522 observed, and the testing order itself was counter-balanced in the experimental procedure. In this experiment, we adapted Fodor and Bever’s experiment to Bengalese Wnches. However, is it possible to equate our results with the original results based on humans? In our experiment, the reaction was trained and not associated with the song notes preceding the noise. It is very diYcult to assess bird behavior except to say the reactions of the birds in our experiment were based on auditory segmentation ability. When humans hear a sentence, they perform semantic processing of the sentence. Obviously, this aspect of perception is not testable with birds and their songs, but at the very least, it is possible, as we have done in this study, to test segmentation and chunking based on the acoustic structures of birdsongs. In previous studies of song production units, light Xashes have been used to interrupt songs at silent intervals between syllables (Cynx 1990). In an experiment with non-songbirds (collared doves), coos could be interrupted by a Xash before completion (ten Cate and Ballintijn 1996). The researchers also showed that the probability of a stop could be changed by changing the timing of a Xash; Xashes in the beginning of the elements did not induce a stop. These results add support to our Wndings of a reaction time diVerence depending on the noise position in the current experiment. In a separate study of nightingales, isolated song units could be interrupted by a Xash in some cases, while the temporal position of a stimulus did not aVect the probability of a stop (Riebel and Todt 1997). The diVerences in interruption can be explained by song type, because a Wxedtype song may have a stronger linkage within the syllable than non-Wxed song or coos. If the probability of stopping does in fact indicate biased perceptual sensitivity during ongoing auditory feedback perception, then the results of the current experiments can be concluded to apply to zebra Wnches but not nightingales. Of course, these two sets of experiments cannot be directly compared, but the variations in reactions observed cannot be explained by simply species and song type alone. As well, we did not try to examine the eVect of the noise position overlapping the song notes in the current study. Further study is necessary to clarify biased perceptual sensitivity during song production. Perception of BOS The most important feature of BOS perception for a male Bengalese Wnch is auditory feedback for the maintenance and adjustment of BOS. For some species of adult male songbirds, hearing BOS is crucial to the maintenance of normal song patterns (Nordeen and Nordeen 1992). Bengalese Wnches rely on auditory feedback for the maintenance of their songs (Okanoya and Yamaguchi 1997; Woolley and Rubel 1997, 1999, 2002; Yamada and 123 Anim Cogn (2010) 13:515–523 Okanoya 2003). Zebra Wnches also require ongoing auditory feedback, but deaf zebra Wnches can maintain their learned songs for longer periods than deaf Bengalese Wnches (Brainard and Doupe 2000; Nordeen and Nordeen 1992, 1993; Scott et al. 2000). Studies on deaf birds have revealed an adjustment to the parameters of ongoing auditory feedback, namely syllable ordering and syllable structure. These aspects may be based on diVerent mechanisms (Woolley 2004; Woolley and Rubel 1997). Woolley and Rubel (1997) and Okanoya and Yamaguchi (1997) reported that although Bengalese Wnches who had had their cochleae removed lost their normal song note sequences within a week of the operation, their note structure did not start to change until long after the alteration of note ordering. Furthermore, this note sequence error was observed to occur even within chunks. With regard to time delay, these authors suggested two distinct systems of control, temporal pattern and note, and that both systems require auditory feedback to maintain a learned song. The results of these studies may explain the auditory processing in the BOS tests that we observed using a chunk as a unit. Perception of REV With respect to REV, we concluded that birds did not recognize the song as their own and showed no evidence of “chunking” because the playing of the song in reverse seemed to have the eVect of destroying the chunk structures. If we had used stimuli with reversed chunk orders (that is, not reversing the actual notes), then would the birds have reacted like they did in the BOS tests? Because we used relatively short parts of the full songs, we cannot reject the possibility that the chunk-order-reversed version of the songs would have been sung by their own or tutors under natural circumstances. Even so, it would be interesting to investigate whether the subjects show a reaction similar to those observed in the BOS tests of the current study. Such results may help to reveal song structural units and neural system details of the perceptual system underlying the chunking observed in the current experiment and their relation with the ongoing auditory feedback system used for the maintenance of songs in Bengalese Wnches. While neurons in the caudal mesopallium (CM) and Weld L prefer BOS to REV, this preference does not exceed the preference for the order-reversed version of the songs in zebra Wnches (Amin et al. 2004; Janata and Margoliash 1999; Lewicki and Arthur 1996). If such selectivity aVected reactions in the current experiment, then some form of reaction based on chunk structure can be expected. Based on the results of our study, we conclude that Bengalese Wnches exhibit auditory chunking that is used for song perception. We used an experiment similar to that conducted on humans by Fodor and Bever (1965) and Anim Cogn (2010) 13:515–523 observed similar results. This perceptual system was observed when Wnches were exposed to their own songs. No evidence of chunking was observed when the Wnches’songs were played in reverse. Acknowledgments We sincerely thank Professor Tatiana Czeschlik, Dr. Brain McCabe, and four anonymous referees who patiently commented and provided useful suggestions and warm encouragements for the earlier versions of the manuscript. This study was supported by PRESTO, Japan Science and Technology Corporation. We declare that this study complies with the current laws of Japan and with the recommendation for ethical treatments of animals provided by the Japanese Society for Animal Psychology. References Amin N, Grace JA, Theunissen FE (2004) Neural response to bird’s own song and tutor song in the zebra Wnch Weld L and caudal mesopallium. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 190:469–489. doi:10.1007/s00359-004-0511-x Brainard MS, Doupe AJ (2000) Interruption of a basal ganglia-forebrain circuit prevents plasticity of learned vocalizations. Nature 404:762–766. doi:10.1038/35008083 Cynx J (1990) Experimental determination of a unit of song production in the zebra Wnch (Taeniopygia guttata). J Comp Psychol 104:3–10 Doupe AJ, Kuhl PK (1999) Birdsong and human speech: common themes and mechanisms. Annu Rev Neurosci 22:567–631. doi:10.1146/annurev.neuro.22.1.567 Fodor J, Bever T (1965) The psychological reality of linguistic segments. J Verbal Learn Verbal Behav 4:414–420 Franz M, Goller F (2002) Respiratory units of motor production and song imitation in zebra Wnch. J Neurobiol 51:129–141. doi:10.1002/neu.10043 Honda E, Okanoya K (1999) Acoustical and syntactical comparisons between songs the White-backed Munia (Lonchura striata), and its domesticated strain, the Bengalese Wnch (Lonchura striata var. domestica). Zoolog Sci 16:319–326. doi:10.2108/zsj.16.319 Hosino T, Okanoya K (2000) Lesion of a higher-order song nucleus disrupts phrase level complexity in Bengalese Wnches. Neuroreport 11:2091–2095 Hultsch H, Todt D (1989) Memorization and reproduction of songs in nightingales (Luscinia megarhynchos): evidence for package formation. J Comp Physiol A 165:197–203 Hultsch H, Mundry R, Todt D (1999) Learning, representations and retrieval of rule related knowledge in the song system of birds. In: Friederici AD, Menzel R (eds) Learning: rule extraction and representation. Walter de Gruyter, Berlin and New York, pp 89–115 Ikebuchi M, Okanoya K (2000) Limited auditory memory for conspeciWc songs in a non-territorial songbird. Neuroreport 11:3915– 3919 JackendoV R (2002) Foundations of language. Oxford University Press, New York Janata P, Margoliash D (1999) Gradual emergence of song selectivity in sensorimotor structures of the male zebra Wnch song system. J Neurosci 19:5108–5118 523 Lewicki MS, Arthur BJ (1996) Hierarchical organization of auditory temporal context sensitivity. J Neurosci 16:6987–6998 Nordeen KW, Nordeen EJ (1992) Auditory feedback is necessary for the maintenance of stereotyped song in adult zebra Wnches. Behav Neural Biol 57:58–66 Nordeen KW, Nordeen EJ (1993) Long-term maintenance of song in adult zebra Wnches is not aVected by lesions of a forebrain region involved in song learning. Behav Neural Biol 59:79–82 Okanoya K (2004a) The Bengalese Wnch: a window on the behavioral neurobiology of birdsong syntax. Ann N Y Acad Sci 1016:724– 735. doi:10.1196/annals.1298.026 Okanoya K (2004b) Song Syntax in Bengalese Finches: proximate and ultimate analyses. Adv Study Behav 34:297–346. doi:10.1016/ S0065-3454(04)34008-8 Okanoya K, Yamaguchi A (1997) Adult Bengalese Wnches (Lonchura striata var. domestica) require real-time auditory feedback to produce normal song syntax. J Neurobiol 33:343–356 Riebel K, Todt D (1997) Light Xash stimulation alters the nightingale’s singing style: implications for song control mechanisms. Behaviour 134:789–808 Scott LL, Nordeen EJ, Nordeen KW (2000) The relationship between rates of HVc neuron addition and vocal plasticity in adult songbirds. J Neurobiol 43:79–88 Sossinka R, Bohner J (1980) Song types in the Zebra Finch (Poephila guttata castanotis). Z Tierpsychol 53:123–132 ten Cate C, Ballintijn MR (1996) Dove coos and Xashed lights: interruptibility of “Song” in a non-songbird. J Comp Psychol 110:267–275 ten Cate C, Slater PJB (1991) Song learning in zebra Wnches: how are elements from two tutors integrated? Anim Behav 42:150–152 Vu ET, Mazurek ME, Kuo Y-C (1994) IdentiWcation of a forebrain motor programming network for the learned song of zebra Wnches. J Neurosci 14:6924–6934 Walters M, Collado D, Harding C (1991) Oestrogenic modulation of singing in male zebra Wnches: diVerential eVects on directed and undirected songs. Anim Behav 42:695–705 Williams H, McKibben JR (1992) Changes in stereotyped central motor patterns controlling vocalization are induced by peripheral nerve injury. Behav Neural Biol 57:67–78 Williams H, Staples K (1992) Syllable chunking in zebra Wnch (Taeniopygia guttata) song. J Comp Psychol 106:278–286 Woolley SM (2004) Auditory experience and adult song plasticity. Ann N Y Acad Sci 1016:208–221. doi:10.1196/annals.1298.017 Woolley SM, Rubel EW (1997) Bengalese Wnches Lonchura Striata domestica depend upon auditory feedback for the maintenance of adult song. J Neurosci 17:6380–6390 Woolley SM, Rubel EW (1999) High-frequency auditory feedback is not required for adult song maintenance in Bengalese Wnches. J Neurosci 19:358–371 Woolley SM, Rubel EW (2002) Vocal memory and learning in adult Bengalese Finches with regenerated hair cells. J Neurosci 22:7774–7787 Yamada H, Okanoya K (2003) Song syntax changes in Bengalese Wnches singing in a helium atmosphere. Neuroreport 14:1725– 1729. doi:10.1097/01.wnr.0000087731.58565.29 Yu AC, Margoliash D (1996) Temporal hierarchical control of singing in birds. Science 273:1871–1875 123
© Copyright 2025 Paperzz