development of temporal structure in zebra finch song

Articles in PresS. J Neurophysiol (November 21, 2012). doi:10.1152/jn.00578.2012
1
2
3
4
5
DEVELOPMENT OF TEMPORAL STRUCTURE IN
6
ZEBRA FINCH SONG
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Christopher M. Glaze and Todd W. Troyer
23
Program in Neuroscience and Cognitive Science, Dept. of Psychology
24
University of Maryland, College Park, MD, USA
25
26
Corresponding Author: Christopher M. Glaze
27
Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
28
29
E-mail: [email protected]
Copyright © 2012 by the American Physiological Society.
30
Abstract
31
Zebra finch song has provided an excellent case study in the neural basis of sequence learning,
32
with a high degree of temporal precision and tight links with precisely timed bursting in forebrain
33
neurons. To examine the development of song timing, we measured four aspects of song
34
temporal structure at four age ranges between 65 and 375 days post hatch: the mean durations
35
of song syllables and the silent gaps between them, timing variability linked to song tempo,
36
timing variability expressed independently across syllables and gaps, and transition probabilities
37
between consecutive syllable pairs. We found substantial increases in song tempo between 65
38
and 85 days post hatch, due almost entirely to a shortening of gaps. We also found a decrease
39
in tempo variability, also specific to gaps. Both the magnitude of the increase in tempo and the
40
decrease in tempo variability were correlated on gap-by-gap basis with increases in the reliability
41
of corresponding syllable transitions. Syllables had no systematic increase in tempo or decrease
42
in tempo variability. In contrast to tempo parameters, both syllables and gaps showed an early
43
sharp reduction in independent variability followed by continued reductions over the first year.
44
The data suggest that links between syllable-based representations are strengthened during the
45
later parts of the traditional period of song learning, and that song rhythm continues to become
46
more regular throughout the first year of life. Similar learning patterns have been identified in
47
human sequence learning, suggesting a potentially rich area of comparative research.
48
Keywords: temporal structure, learning, birdsong, generative model
49
2
50
Introduction
51
Sequence learning is one of the touchstone questions in neuroscience (e.g., Lashley, 1951;
52
Hikosaka et al., 2002; Keele et al., 2003; Rhodes et al., 2004), and song learning in zebra
53
finches has served as an excellent model system for understanding sequence learning and
54
production. Adult songs are highly stereotyped, and have a well-defined temporal structure
55
spanning multiple time scales. In the zebra finch, song learning can be roughly divided into two
56
overlapping processes (Immelmann, 1969; Marler, 1970; Konishi, 1985; Brainard & Doupe,
57
2000): “sensory acquisition," in which a young bird ∼20-65 days post-hatch (dph) is exposed to
58
the song of one or more tutors and forms an auditory template; and “sensorimotor learning," in
59
which the juvenile ∼35-90 dph learns to produce song based on that template. Once learned,
60
songs consist of several repeats of 500-1000 msec long “motifs;" motifs consists of a sequence
61
of 3-7 “syllables," 50-250 msec-long stereotyped vocalizations separated by silent gaps. In adult
62
zebra finch song, the spectral features of individual song syllables are highly stereotyped, as are
63
the timing and sequencing of syllable production.
64
To date, most studies of song development have focused on how the spectral features of
65
song are learned (e.g., Tchernichovski et al., 2001; Deregnaucourt et al., 2005; Shank &
66
Margoliash, 2009; Charlesworth et al., 2011). Further electrophysiological studies have
67
demonstrated that basic spiking characteristics in premotor nucleus RA develop concomitantly
68
with spectral features (Shank & Margoliash, 2009; Ölveczky et al., 2011), and both song
69
features and the patterns of neural activity stabilize by around 90 dph (Ölveczky et al., 2011).
70
However, few studies have focused on how the overall temporal structure of song is learned.
71
There is converging evidence for syllable-based representations in the adult motor code (Cynx,
72
1990; Franz & Goller, 2002; Schmidt, 2003; Solis & Perkel, 2005; Glaze & Troyer, 2006; Cooper
73
& Goller, 2006; Andalman et al., 2011), begging the question of how they are formed, and what
3
74
role they play in song learning. There are indications that syllable-sequence variability during the
75
later stages of song development is sensitive to lesions of LMAN (the lateral magnocellular
76
nucleus of the anterior nidopallium), the output nucleus of a basal ganglia circuit involved in
77
song learning (Scharff & Nottebohn, 1991; Bottjer et al., 1984; Ölveczky et al., 2005). There is
78
also evidence for changes to learned song timing during both early development and young
79
adulthood that are similarly sensitive to perturbations to the basal ganglia circuit (Brainard &
80
Doupe, 2001; Goldberg & Fee, 2011). The gradual switch from random timing during the early
81
stages of song learning to more stereotyped timing reflects a switch in motor control from LMAN
82
to the song motor pathway (Aronov et al., 2011). However, it remains unclear whether there are
83
in fact distinct neural processes associated with learning syllable timing and transitions that are
84
different from learning the spectral features of individual syllables (Troyer & Doupe, 2000).
85
We previously investigated rendition-to-rendition variability in adult song timing, and found
86
several patterns that suggested how song representations are organized (Glaze & Troyer, 2006,
87
2007, 2012): First, song intervals share a common source of length variance that we term
88
“tempo variability," with syllables proportionally less sensitive to this variability than gaps. We
89
also quantified an “independent" component of timing variability that was uncorrelated across
90
syllables and gaps and may be linked with synaptic or neural noise in the pattern generator for
91
song. We have recently developed a statistical “timing variability model" that allows us to
92
separate out these variability components in a principled way (Glaze & Troyer, 2012).
93
To probe the development of the motor code for song timing, we applied the timing variability
94
model to analyze changes in the temporal structure of zebra finch song from 65 dph through 1
95
year of age. We found that between 65 and 85 dph, silent gaps decreased in both average
96
duration and their sensitivity to tempo changes while syllables did not. Over that same age
97
range, there were increases in the probability of syllable transitions that were consistent with the
98
adult motif, and those increases were correlated with the decreases in timing parameters gap-
99
by-gap. In contrast to tempo, independent variability declined for both syllables and gaps, and
4
100
was the only component of timing variability that consistently declined in magnitude across the
101
entire age range examined, decreasing by a factor of 2-3 through the first year of life. Overall,
102
the data suggest that the final phases of song learning involve the chunking of syllable-based
103
representations into a reliable sequence, with the sequence becoming progressively more
104
rhythmic throughout the first year of life. Similar learning patterns have been hypothesized for
105
mammalian sequence learning, suggesting a potentially rich area of comparative research in
106
birdsong (Graybiel, 1998; Sakai et al., 2004; Yin et al., 2009; Costa, 2011).
107
108
Materials and Methods
109
Analysis was based on the songs from 7 male birds, recorded between 65 and 375 dph. All care
110
and housing was approved by the institutional animal care and use committee at the University
111
of Maryland, College Park. All analysis was performed in Matlab (Mathworks, Natick, MA), with
112
all template matching and dynamic time warping algorithms written as C-MEX routines.
113
114
Song collection
115
Each bird was raised with his parents and clutch mates until 25-30 dph. Individual juveniles were
116
then housed with a single adult tutor (either the father or another adult male) in separate cages
117
within a common sound isolation chamber (Industrial Acoustics, Bronx, NY). At 150 dph, they
118
were released back into the aviary with other birds. Subsequent adult (365+ dph) recordings
119
were gathered by returning birds to a chamber, paired with one other adult male. We gathered
120
song during 4 age periods: 65-70, 85-90, 125-135, and 365-375 dph.
121
Chambers were equipped with two directional microphones (Pro 45; Audio-Technica, Stow,
122
Ohio). Data were recorded at 24,414.1 Hz and consisted of a series of “clips," periods of sound
5
123
separated by at least 10 msec of silence as determined by a sliding window amplitude algorithm.
124
Sound clips separated by <200 msec were included in the same “recording" and clip onset times
125
were indicated by filling the gaps between clips with zeros. Recordings were attributed to a
126
target bird if the total power was greatest for the microphone directed toward the side of the
127
recording chamber.
128
We gathered song during 4 age periods: 65-70, 85-90, 125-135, and 365-375 dph. We will
129
call a collection of songs from a given bird and a given age range a “bird-by-age sample." Since
130
songs continue to develop in subtle ways into young adulthood (e.g., Brainard & Doupe, 2001),
131
all song analysis treated each time period independently, e.g., song matching was based on
132
templates that were constructed for that period only.
133
134
Song selection and template matching
135
The main thrust of our analysis used our previously published ‘timing variability model’
136
(Glaze and Troyer, 2012) to separate distinct components of temporal variability and track these
137
over development. The required input to the model is a series of repeated renditions of a single
138
stereotyped sequence of syllables.
139
toward identifying and selecting song syllables belonging to each bird’s main song motif.
140
Furthermore, previous studies have shown systematic timing slowing of motifs within a bout of
141
adult song (Chi and Margoliash, 2001; Pytte et al., 2007). Given the difficulty of accounting for
142
such within-song timing variability, particularly at the earliest ages, we restricted our timing
143
analysis the first complete motif in each song.
As a result, our data selection methodology is geared
144
Recordings were initially analyzed using the log-amplitude of the fast-Fourier transform
145
(FFT) windowed with a 256-point (10.49 msec) Hamming window advanced in 128-point steps.
146
Frequency bins outside the 0.5-8.6 kHz range were excluded because song structure is less
147
reliable at the highest and lowest frequencies. To determine if the sound in these clips matched
6
148
syllables in the bird’s song, templates were formed by aligning and averaging 4-5 manually
149
chosen clips corresponding to each syllable in the repertoire for that bird-by-age sample;
150
exemplars were time-aligned by finding the peaks in a standard cross-correlation of syllable
151
spectrograms. All recorded clips were then matched against each syllable template using a
152
sliding algorithm (Glaze & Troyer, 2007). For each template and each time point (t) in the clip, a
153
match score, c(t), was computed as the reciprocal of the mean squared difference between
154
template and song log-amplitudes at matched time-frequency points:
n
155
m
c(t ) = n × m /  ( s (i + t , j ) - s' (i, j )) 2
i =1 j =1
156
157
where s is the clip spectrogram, s' is the template spectrogram, n is the number of time bins
in the template, m is the number of frequency bins, i indexes time and j indexes frequency.
158
The peak match score was determined as the maximum of c(t) over time alignments t.
159
Matches were only considered potentially valid if the peak match score exceeded an initial fixed
160
threshold of 0.5 (manually chosen based on visual inspection), and if the onset and offset for the
161
clip and template differed by 20 msec or less. If a clip had multiple syllable matches, the
162
template with the highest match score was chosen. This yielded a median of 14,720, range
163
1686−120,873 matches per syllable in each bird-by-age sample.
164
Template matching thresholds
165
After all clips were initially matched to a template, template-matching thresholds were
166
recalculated for each syllable, based on a simplified minimum error-rate classification scheme
167
(Duda et al., 2000): Match scores were grouped into 10 bins from the 0.5 minimum score
168
through a value of 1.5, with values greater than 1.5 placed in the last bin. A random sample of
169
20 clips per score-bin were then gathered. A few syllables which had particularly high average
170
match scores had fewer than 20 clips in the low match bins. In these cases, the clips were
171
grouped with the next highest bin.
7
172
Within each bin, each of the 20 clips was manually judged as being either correctly classified
173
or a false positive. Error rates from this process were extrapolated to all clips in each bin and an
174
optimal threshold was set to minimize the estimated combined number of false positives and
175
false negatives (in this case, correct matches below threshold) for the entire sample (thresholds
176
set by this method necessarily occur at bin boundaries). Across syllables, this yielded a median
177
1.8% false positive rate, range 0−34.2%, median 0.9% false negative, 0−34.3%. Approximately
178
86 and 87% of syllables respectively had false positive and rejection rates <10%. The highest
179
error rates occurred for brief, noisy syllables that resemble introductory notes; these templates
180
frequently yielded modestly high match scores even for vocalization not directly involved in
181
song, such as “tet” calls (Zann, 1996). Importantly, we found no systematic changes in error
182
rates by age.
183
Motif selection and transition probabilities
184
We considered a transition between two syllables to be valid if (1) both syllables had a match
185
score above optimal threshold, and (2) the time between the offset of one syllable and the onset
186
of the next was no more than 100 msec. We defined the “forward transition probability" between
187
syllables X and Y as the total number of valid X-Y transitions divided by the total number of valid
188
transitions from X to another syllable. Following a previous analysis of sequence variability (e. g.
189
Foster & Bottjer, 2001; Kao & Brainard, 2006), we excluded song stops (i.e. transitions to
190
silence) from all probabilities. This analysis yields a transition probability for each pair of
191
syllables in each bird-by-age sample.
192
Further timing analysis relied on the application of our timing variability model, and was
193
confined exclusively to syllables and gaps contained within each bird’s main song motif, defined
194
as the most common sequence of song syllables sung at one-year of age. Importantly, we
195
restricted analysis to the first motif of each song recording. Across samples, 1972−14,057
196
recordings were made (median 7132); of these 12.6−80.4% (median 41.9%) contained at least
8
197
one motif, yielding 439−9927 recordings with song. All of the bird-by-age samples with <33%
198
matched recordings came from birds in the first 3 age groups (between the ages of 65 and 135
199
dph) that produced sequences of vocalizations consisting of frequently interrupted portions of
200
song and/or long sequences of calls.
201
Given the sensitivity of our timing analysis, we further screened the data by omitting any
202
sequence containing a song intervals (syllables and silent gaps) whose length was more than 5
203
standard deviations from the mean of the length distribution for that interval. The vast majority of
204
these outliers could be attributed to acoustic interference from other noises in the case (e.g.,
205
wing flaps, vocalization from the tutor). We omitted a median 4.4% bird-by-age sequences for
206
this reason, range 0.5−25.6%. Without omitting outliers, we have found no significant change in
207
basic statistics such as mean length and change in standard deviation across age periods;
208
however, the accuracy of our statistical modeling may be sensitive to the presence of these
209
outliers.
210
Across birds-by-age samples, the final sample size was median a 3390, range 352-9879.
211
Across the song motifs of all 7 birds, the sample included a total of 33 unique syllables and 26
212
gaps of silence.
213
214
Song timing calculations
215
Timing analysis used a more fine-grained algorithm and was restricted to syllables found within
216
identified motifs. Fine-grained spectrograms were recalculated for all syllables using FFTs with a
217
128-point window slid forward in 4-point steps, yielding 0.16 msec time bins. Although previous
218
research (Glaze & Troyer, 2006, 2007) used log-amplitudes, here we used raw amplitudes
219
which we have found to be more reliable. The resulting spectrograms were then smoothed in
220
time with a 64-point window formed by truncating a Gaussian with a 25.6-point (∼5 msec)
9
221
standard deviation. Time derivative spectrograms (TDSs), calculated as differences in amplitude
222
in time-adjacent bins, were used in the rest of the analysis.
223
Distinct syllable templates were constructed for each bird-by-age sample. In the first step, we
224
applied the same sliding algorithm used previously to the fine-grained TDSs. For each syllable,
225
we selected a random sample of 200 example TDSs. Using the sample with the highest (coarse-
226
grained) template-matching score as an initial template, each of the 200 TDSs were aligned to
227
this sample using our sliding algorithm. An updated template was formed by averaging the
228
aligned TDSs, and the process of alignment followed by averaging process was repeated.
229
In a second step, we repeated this strategy of template refinement by aligning TDSs to the
230
template using a custom dynamic time-warping (DTW) algorithm (Glaze & Troyer, 2007). The
231
final template was based on the average of these warped syllable TDSs. This final DTW step
232
proved to be crucial for the younger developmental periods (65-85 dph), in which timing within
233
syllables is quite variable.
234
Measuring the variability of interval lengths requires that interval boundaries be identified in a
235
consistent manner across syllable renditions. To accomplish this, we manually marked “events"
236
within the TDS templates, and then used DTW to determine the precise times at which these
237
events were observed in individual syllables. Syllable onsets/offsets were generally identified as
238
salient time-derivative peaks/troughs in the template TDS. These correspond to peaks in the rise
239
and fall of energy at the beginning and ends of syllables, and result in syllables that are slightly
240
shorter than would be expected from measurements based on exceeding a threshold of power.
241
However, the differences are generally small and we find that our method gives much more
242
reliable measurements.
243
Markers were first identified manually in templates extracted from oldest age period (365-375
244
dph). To define template syllable onsets/offsets at earlier ages using the same events chosen
245
for 365-370 dph, we used DTW to map syllable templates from younger ages onto the 365 dph
246
templates.
10
247
Timing variability model
248
Our analysis focused on tracking distinct patterns of timing variation through development.
249
These patterns of timing cannot be measured directly, but are ‘latent’ and are extracted by fitting
250
the ‘timing variability model’ to the data, a form of factor analysis explicity tailored to the analysis
251
of timing in action sequences (Glaze & Troyer, 2012). The model fitting procedure can be
252
viewed as decomposing the matrix of length covariances into the sum of several components
253
(Fig. 1). The model extracts four basic components of covariance. First is the “independent
254
variability” that is unique to each interval. This contributes to the individual variance of each
255
interval, found along the diagonal of the covariance matrix. Second, the model estimates the
256
jitter in the onset/offset border between adjacent intervals. Such jitter contributes to the variance
257
in the length of each interval, but also leads to a negative covariance between adjacet intervals.
258
These two components of timing variance are ‘local’ in the sense that the variations in each
259
interval and each onset/offset border are assumed to be governed by a distinct stochastic
260
variable.
261
The third and fourth timing components are each assumed to be governed by a single
262
stochastic variable, but that variable has a “global” influence spread across all intervals in the
263
sequence.
264
captured by a weight Wk, and each factor will contribute WkWj to the covariance between interval
265
k and interval j. In our analysis, we used 2 global factors. For the first factor, the weightings to all
266
intervals in the motif have the same sign. So as the motif lengthens and shortens, all intervals
267
lengthen or shorten together, although not by the same amount. For this reason, we will refer to
268
the first global factor as the tempo factor and the weighting for each interval as the tempo
269
weight.
The magnitude and direction of the effect of such a global factor interval k is
270
For the second global factor, most of the time (but not always) syllables and gaps have
271
weightings of opposite sign. We will refer to this factor as the syl/gap factor. Such a factor will
11
272
change the lengths of syllables and gaps across the song in opposite directions, and will
273
contribute a ‘checker board’ pattern of covariances. We chose a sign convention where the
274
syl/gap weights for most syllables were positive, and the weights for most gaps were negative.
275
Errors in the measurements of syllable and onset and offset will contribute to the jitter
276
component. Therefore, we focused our analysis on the other three factors: indpendent, tempo
277
and syl/gap timing variability. Jitter is mostly estimated to improve the estimates of the other
278
components of timing variability. Because the estimate of jitter depends on knowing the length of
279
the two surrounding intervals, it is impossible for our algorithm to estimate the jitter in the onset
280
of the first syllable and the offset of the last syllable in the motif. As a result, this jitter must be
281
captured by the other factors in the model. We found no systematic differences in the
282
developmental trends for variability estimates for these syllables, and therefore we lumped them
283
with other syllables. Finally, while both jitter and the syl/gap factor contribute to negative
284
covariances between syllables and gaps, jitter acts locally and for any given song can act to
285
lengthen one syllable and shorten another.
286
287
Model details and data fitting
288
For each interval we calculated the mean length of that interval. To analyze timing variability,
289
we subtract off the mean length and write the deviation from the mean of the kth interval in
290
sequence n as xkn. We separate this variability into a sum of components:
291
−u k −1,n + u kn

x kn = Wk1z1n + Wk 2 z 2 n + ηkn + u1n
−u
 K −1,n
1< k < K
k =1
k=K
292
Here, z1n is a zero-mean, unit-variance latent variable representing tempo while Wk1 is the
293
sensitivity of interval k to that variable, while z2n and Wk2 are the respective latent variables and
294
sensitivities for the syl/gap factor. The third term, ηkn represents the independent variable, and is
12
295
zero-mean but has a non-unitary variance, while uk-1,n and ukn represent jitter in the onset and
296
offsets of the interval and are similarly zero-mean with non-unitary variance. It is important to
297
note that the tempo and syl/gap variables (z1 and z2) come naturally from a single, 2×N latent
298
variable z that is shared by all song intervals, while the respective weights W1 and W2 are the
299
columns of a single K×2 weight matrix W. Furthermore, we can rewrite the timing jitter variables
300
with a single vector u multiplied by the K×(K−1) differencing matrix D, which has ones along the
301
diagonal (Dk,k=1 since boundary k causes deviations in the offset of interval k), negative ones
302
along the subdiagonal (Dk,k-1=-1 since boundary k−1 causes deviations in the onset of interval k),
303
and 0 elsewhere.
304
With these two substitutions we now have:
x n = Wz n + ηn + Du n
305
306
We fit parameters to the data using an expectation-maximization (EM) algorithm described
307
elsewhere (Glaze & Troyer, 2012). The algorithm fits parameters to the measured covariance
308
matrix S, so we now write the parameters in matrix form. Using Ω to denote a diagonal matrix of
309
jitter variance and Ψ to denote the diagonal matrix of independent variability, the data
310
covariance can now be written as:
311
S = WW T + Ψ + DΩDT
312
Once parameters were fit using EM, we transformed W so that the tempo weights (1st
313
column) captured all of the accumulating variability shared by song intervals. We then
314
determined interval variability due to the tempo and syl/gap factors along the 1st and 2nd
315
columns of W respectively. These values were in units of msec, and may be thought of as
316
standard deviations with respect to those factors. We determined variance due to the
317
independent factor along the diagonal Ψ, and to keep units consistent with the other variables,
318
computed the square root to yield an effective standard deviation with respect to that factor.
13
319
For each bird-by-age sample we ran the EM algorithm from 100 randomly drawn initial
320
parameter estimates (Glaze & Troyer, 2012). To assess model fits we computed the
321
standardized root mean-squared residual (SRMR; Glaze & Troyer, 2012; Hu & Bentler, 1998).
322
This measure is the average difference between the pairwise correlations in the data and those
323
predicted by the model covariance matrix. Across samples at 65-70, 85-90, 125-130 and 365-
324
370 dph, respective median SRMR (± median absolute deviation) was 0.027±0.015,
325
0.015±0.015, 0.015±0.009 and 0.015±0.010. While median SRMR appeared higher at 65 dph
326
than at other ages, this trend held for only 3 of 7 birds. Thus, the model fit a large majority of
327
birdsongs in our sample reasonably well, and there did not appear to be any reliable trend by
328
age for fits to get better or worse.
329
330
Reported statistics
331
To test for significant age-dependent changes in a given parameter we evaluated the
332
difference between 365-375 and 65-70 dph using the non-parametric Wilcoxon signed-rank test,
333
which we abbreviate throughout as “WSR". Tests for parameter differences between syllables
334
and gaps were performed at each age range were tested using the Wilcoxon rank-sum test,
335
abbreviated as “WSM", Bonferroni corrected for 4 repeated comparisons. For relationships
336
among parameters themselves, we used Spearman’s correlation. Throughout we report
337
parameter distributions using medians±standard errors, where we have computed the standard
338
error of the median from 200 bootstrapped samples per parameter.
339
340
Results
14
341
We analyzed the development of temporal structure in zebra finch song in 7 males from four
342
time periods spanning the later stages of juvenile song learning through the first year of life: 65-
343
70 dph, 85-90 dph, 125-135 dph and 365-375 dph. (For convenience, we will denote each
344
period using the first age only.) Across birds we analyzed a total of 33 unique syllables and 26
345
gaps of silence. Final bird-by-age samples contained 414-9876 sequences that contained at
346
least one full motif. We used our previously published timing variability model (Fig 1; Glaze &
347
Troyer, 2012) to track three different sources of timing variability across development. Previous
348
studies have shown important differences in the timing of syllables and intersyllable gaps (Glaze
349
and Troyer, 2006; Cooper and Goller, 2006; Andalman et al., 2011), with syllables generally
350
being longer but less variable as measured by the coefficient of variation (CV), equal to the
351
standard deviation divided by the mean Therefore we analyzed syllables and gaps separately.
352
Our main findings concerning the development of individual parameters are contained in Table
353
1.
354
355
Changes to average tempo are found in gaps
356
We began by confirming previous reports that average song tempo increases systematically as
357
a function of age (Lombardino and Nottebohm, 2000; Brainard & Doupe, 2001; Pytte et al.,
358
2007). In fact, between 65 and 365 dph, motif length decreased by a median 4.11±3.36%,(WSR,
359
p=.0313) with the trend holding across 5 of 7 birds and the change ranging from -18.94-3.14%.
360
Over half of this decrease appeared to be explained by changes between 65 and 85 dph, with
361
motif length decreasing over that period in all 7 birds by a median 2.68±0.75% (range -16.52-
362
1.75%).
363
Given the distinction between syllable- and gap-based timing in adult song, we investigated
364
the extent to which these developmental changes occurred across both types of time intervals
15
365
(Fig. 2). On average, the dominant increases in tempo occurred during gaps (Fig. 2D). Mean
366
gap length decreased in 22 out of the 26 gaps, with a median decrease of -10.94±2.81% (WSR,
367
p<3.96x10-4). Calculated per bird, median gap length decreased in 6 of 7 birds. As with motif
368
duration, more than half of this change occurred between 65 and 85 dph, during which gaps
369
decreased in duration by 6.10±2.90%, vs 5.67±2.43% 85-365 dph.
370
371
In contrast to gaps, syllables as a whole failed to show a consistent pattern of length
changes, with changes in length of 2.13±2.26% from 65-365 dph (WSR, p=0.851).
372
373
Changes to tempo variability occur in gaps
374
We next asked whether the variability in tempo also decreased over late development. For each
375
bird-by-age sample of songs, we used the timing variability model to extract tempo weights that
376
measure tempo variability for each interval. Interestingly, we found a pattern similar to the
377
development of average tempo: silent gaps selectively decreased in tempo variability between
378
65 dph and 365 dph while syllables did not (Fig. 2E). Median gap tempo variability was
379
1.265±0.230 msec during the former period and 0.943±0.239 msec during the latter, with
380
average decreases over that period of 0.613±0.337 msec (WSR, p=0.0107), more than one
381
third of the original 65 dph values. As with average tempo, the bulk of this change occurred
382
between 65-85 dph, with tempo variability decreasing by 0.225±0.157 msec over that age range,
383
vs. 0.094±0.173 msec 85-365 dph.
384
In contrast to gaps, syllable tempo variability failed to show significant decreases over late
385
development, with respective medians of 0.334±0.313 msec and 0.954±0.163 msec at 65 and
386
365 dph, and no significant change across the age range (WSR, p=0.339).
387
It is natural to expect that longer intervals will show more variability in their length. Indeed,
388
across all interval types we found positive Spearman’s correlations between average length and
16
389
global weight of 0.450, 0.626, 0.608 and 0.703 at 65, 85, 125 and 365 dph respectively. These
390
values were significantly positive over all four periods (Bonferroni corrected p<0.005 in all
391
cases). To control for changes in average length when examining changes in timing variability,
392
we divided tempo weights by mean length to derive a tempo-based coefficient of variation (CV;
393
Glaze & Troyer, 2012); for example, a tempo CV of 1% would mean that the given time interval
394
varies with respect to tempo by 1% of its average duration. We found that tempo CV also
395
decreased among gaps between 65 dph and 365 dph (Fig. 2F) from 2.013±0.487% to
396
1.317±0.163%, with the difference reaching statistical significance (median decrease of
397
0.752±0.493%; WSR, p=0.0124).
398
Previous data in adult song have indicated that gaps are proportionally more sensitive to
399
tempo changes than syllables, i.e. that gaps stretch and compress with changes to song
400
duration more than syllables after dividing out mean duration (Glaze & Troyer, 2006) and have a
401
higher tempo CV (Glaze & Troyer, 2012). Given that gaps decrease in tempo average and CV
402
over development, we expected that gaps would have higher tempo CV at all ages examined.
403
This was indeed the case (Fig. 5), although given the relatively limited size of our data only data
404
from 65 and 125 reached statistical significance (WSM, Bonferroni corrected p<0.05).
405
406
Independent variability decreases for both syllables and gaps
407
In contrast to tempo variability, independent variability decreased across both syllables and
408
gaps between 65 and 365 dph (Fig. 3), decreasing from 3.790±0.448 msec to 1.516±0.199
409
msec among gaps and from 2.629±0.412 to 1.179±0.245 msec among syllables, with respective
410
median decreases of 2.447±0.364 and 0.970±0.299 msec (WSR, p<0.0001 in both cases). For
411
both syllables and gaps, most of this change occurred 65-85 dph. As with tempo, independent
412
timing variability scales with average interval duration in adults (Glaze & Troyer, 2012). This
17
413
relationship also held across all developmental periods, with Spearman’s correlation ranging
414
from 0.291 to 0.450 (Bonferroni corrected p<0.05 in all cases except at 365 dph, where
415
uncorrected p=0.026). Therefore, we examined independent CV, as we did for tempo, by
416
dividing independent variability by mean duration and examined whether that measure also
417
changed over development. Independent CV decreased significantly between across the first
418
year of life for both gaps and syllables, from 6.251±0.643 to 2.353± 0.409% among gaps and
419
3.286±0.398 to 1.196±0.324% among syllables (WSR, p<0.0001 in both cases). As with tempo,
420
gaps tended to have higher independent CVs than syllables across the age ranges examined,
421
although the difference failed to reach significance for 365 dph (WSM, Bonferroni corrected
422
p<0.05).
423
424
Syl/gap factor does not change over late development
425
Previous analysis of adult songs revealed a second global timing factor shared across the song
426
in which syllables generally have positive weights and gaps have negative weights (Glaze &
427
Troyer, 2012). The result is a timing component which induces a negative covariation between
428
syllable and gap durations, independently of the other factors reported thus far. We asked
429
whether this syl/gap factor was present across development, and if so, whether it changed at all.
430
Indeed, during all four developmental periods, syllable weights tended to be positive while gap
431
weights tended to be negative (Fig. 4). The differences between syllables and gaps were
432
significant for all four periods (WSM, Bonferroni corrected p<0.0005). Furthermore, we found no
433
significance changes in factor weight among either gaps of syllables 65-365 dph, with median
434
changes of -0.421±0.294 and 0.303±0.212 msec (WSR, p=0.201 and 0.237).
435
436
Links between changes in gap timing and syllable transition probabilities
18
437
We have shown that gaps, defined as the period of silence between two syllables in the motif,
438
become shorter and less variable over development with changes in both tempo and
439
independent variability disproportionately occurring between 65 and 85 dph. To investigate
440
whether these changes in timing might correspond to a tighter link between those syllables, we
441
examined developmental changes in the transition probabilities for each consecutive syllable
442
pair in the motif. Transition probabilities showed a developmental trajectory that was similar to
443
timing parameters (Fig. 5). Specifically, we found significant increases 65-365 dph, from a
444
median of 73.0±9.8% to 99.6±0.7% (WSR, p<0.0001). Of the 20 gaps with transition
445
probabilities <95% at 65 dph, 18 increased, and 12 by over 10%. Qualitatively, increases
446
appeared to reflect jumps to a probability near 1, irrespective of what those probabilities were at
447
65 dph (Fig. 5A). As with changes to timing parameters that showed significant development,
448
most of this change occurred 65-85 dph, with transition probabilities increasing by 4.7±3.0%, vs.
449
0.3±0.0% 85-365 dph.
450
We next asked whether, on a gap-by-gap basis, the increases in transition probabilities from
451
65 to 85 dph were linked with changes in the timing parameters that showed significant
452
differences across development (Fig. 6), specifically average duration, tempo CV and
453
independent CV. Indeed, we found that increases in transition probability were correlated with
454
decreases in duration and tempo CV, with respective Spearman’s correlations of -0.557, and -
455
0.342 (Bonferroni corrected p<0.01 in both cases). The correlation between the increase in
456
transition probability and independent CV was also negative, although very weak (Spearman’s
457
correlation = -0.136, p=.506).
458
459
Discussion
19
460
We have investigated the development of zebra finch rhythmic stereotypy from the late plastic
461
period through 1 year of age. We collected hundreds to thousands of songs from each of four
462
age periods, and measured song timing and syllable transition probabilities within each age
463
period to probe how the motor code changes over this age range.
464
We found increases in song tempo that occur almost entirely during the gaps of silence
465
between syllables. Most of the tempo increase occurs 65-85 dph, and on a gap-by-gap basis
466
increased speed is linked with increases in the reliability of corresponding syllable transitions.
467
Increases in transition probability between 65-85 dph are also linked with decreases in tempo
468
variability. For syllables, we found no systematic increase in tempo or decrease in tempo
469
variability. But as with gaps, syllables showed a sharp reduction in independent variability
470
between 65 and 85 dph followed by continued reductions over the first year of life.
471
Overall, the data suggest a phase of late development in which song production becomes
472
faster and more automated. Such a process of motor consolidation has been previously
473
suggested for zebra finch song based on other data (Brainard & Doupe, 2001). Interestingly, one
474
recent study has indicated opposite changes in Bengalese finch song timing in the later years of
475
life, i.e. that songs slow down, mostly due to increases in gap duration (Cooper et al., 2012).
476
477
Syllable lengths
478
We found that gaps, but not syllables, shortened over the first year of life. Brainard and Doupe
479
(2001) reported that a shortening of both syllables and gaps in young birds (>100dph), with the
480
fractional shortening of syllables roughtly half as much as for gaps. However, only the syllable
481
changes reached statstical significance as gap changes were more variable. Pytte et al. (2007)
482
found a significant shortening of syllables over the first year and a half of life, but no overall
483
shortening of gaps. It is unclear what accounts for the descrepancies between the three studies.
484
One point of difference is our use of a reliable peak in the amplitude derivatives to measure
20
485
syllable durations, rather than threshold crossing of an amplitude threshold. Syllables sometimes
486
have significant amplitude extending past the onset of inspiration (Andalman et al., 2011); it is
487
likely such low amplitude “tails” of a syllable would be included in our measure of gap rather than
488
syllable length. If the greater coordination between syllable production and respiration that
489
occurs over development reduces these low amplitude portions of syllables, this might lead to
490
the shortening of syllable length as measured in previous studies without having much effect on
491
syllable length as measured here.
492
493
The chaining hypothesis
494
The most common view of sequence generation is that of a chain with neurons active at one
495
point in the sequence, exciting the neurons active at the next point in the sequence, and so on.
496
Previous modeling work has shown that the combination of spike timing dependent plasticity and
497
synaptic pruning can lead to the development of chain-like patterns of activity in initially
498
randomly connected networks (Jun & Jin, 2007). Under such a scenario, one might expect that
499
the strength of connections along the chain to be correlated with several behavioral parameters.
500
First, stronger connections would decrease the latency to activity in the next link in the chain,
501
leading to a positive correlation between synaptic strength and tempo. Second, stronger
502
synaptic connections would be associated with more reliable propagation along the chain, and
503
hence less subject to extraneous “noisy" inputs, yielding lower timing variability.
504
Behavioral data from adult birds are consistent with a version of this hypothesis in which the
505
parts of the chain corresponding to gaps remain more weakly connected than the parts of the
506
chain corresponding to syllables. First, birds startled by flashes of light interrupt their songs at
507
transitions between syllables (Cynx, 1990), consistent with a greater sensitivity to perturbation
508
for the more weakly connected gap portions of the chain. Timing variability is greater for gaps
509
than for syllables (Cooper and Goller, 2006; Andalman et al. 2011) both in the tempo and
21
510
independent components of variability (Glaze & Troyer, 2006, 2012), consistent with less
511
sensitivity to outside perturbation for the syllable portions of the chain.
512
The chaining hypothesis has been extended to songs with variable sequencing, by
513
supposing that each syllable corresponds to a tightly linked sub-chain, and neurons at the end of
514
chain corresponding to one syllable can synapse onto neurons at the head of the chain for more
515
than one subsequent syllable; mutual inhibition from local interneurons then implements a
516
winner-take-all competition between chains ensuring that only one syllable-based chain could be
517
active at any given time (Jin, 2009). Under this hypothesis, transition probability is correlated
518
with connection strength, since a chain receiving stronger input would be more likely to emerge
519
as the winner in a winner-take-all competition. During late plastic song (~65 dph), these
520
mechanisms may have created a loosely organized network of syllable-based chains capable of
521
producing a small variety of song sequences with a bias consistent with the ordering in final
522
motif. Over time, spike timing dependent plasticity would favor the more likely transitions, and
523
these could completely “win out" by adulthood.
524
We have found that increases in transition probability are correlated with decreases in gap
525
duration and tempo CV on a gap-by-gap basis. Correlations with independent CV are also
526
negative, although the correlation is weak. Negative correlations between transition probability
527
and duration and timing variability are consistent with the chaining hypothesis in that increases
528
in transition probability and decreases in timing parameters would result from increases in
529
synaptic strength over development. This pattern is also consistent with data in Bengalese
530
finches showing that the ability of auditory feedback perturbations to alter both the stereotypy of
531
song sequence and the latency to the next syllable is smaller when perturbations are introduced
532
at points in the song with more regular sequencing (Sakata and Brainard, 2006, 2009).
533
534
HVC as the chaining nucleus
22
535
Previous electrophysiolgical recordings have shown that premotor neurons in song
536
nucleus HVC (used as proper name) produce short bursts of precisely timed action potentials,
537
both during the syllables and during the gaps (Hahnloser et al., 2002). Further correlational
538
evidence suggests that inputs from HVC control burst timing in RA (Fee et al., 2004). Cooling
539
HVC leads to a slowing of song tempo, whereas cooling the downstream nucleus RA does not
540
(Long and Fee, 2008). These results, along with intracellular recordings in HVC neurons during
541
singing (Long et al. 2010), suggest that HVC supports chain-like bursting that controls timing of
542
song behavior. This hypothesis is challenged by tight synchronization of both left and right HVC,
543
despite the lack of a corpus callosum in birds (Schmidt, 2003). The most likely source of a
544
synchronization signal is from bilateral connections from the midbrain and brainstem targets of
545
the song motor pathway, that eventually feedback to HVC (Schmidt et al. 2004; Ashmore et al.
546
2005, 2006; Andalman et al, 2011).
547
Chaining in Bengalese finches
548
If true, the chaining hypothesis could also have one of several implications for changes to
549
song in older Bengalese finch birds (Cooper et al., 2012): The slowing of tempo could reflect a
550
gradual weakening of synapses, although this would also imply a similar correlation between
551
increases in gap duration and weakening of associated transition probabilities. That study did
552
find decreases in syllable repetition rate, although a specific correlation with gap duration was
553
not reported for those transitions. Furthermore, one other modeling study of Bengalese finch
554
song has suggested that syllable repetitions are controlled by a process (adaptation) that is
555
separate from what controls other syllable transitions (Jin & Kozhevnikov, 2011). On this basis
556
one may not expect the transition-timing correlation among syllable repeats per se, but rather
557
among other transitions; such an expectation implies other changes to transition structure in
558
aging Bengalese finch song. Alternatively, the slowing of tempo could reflect changes to
559
influences on tempo such as brain temperature (Long & Fee, 2008), rather than network
23
560
parameters such as synaptic strength; under this scenario there would not necessarily be
561
changes to transition structure outside the decrease in repetitions.
562
563
Challenges to a chaining explaination
564
While increased synaptic strength is an obvious hypothesis for explaining developmental
565
changes, it cannot alone account for the changes in timing variability we have found. First, while
566
average tempo increases among gaps only, decreases in independent timing variability occur
567
across both syllables and gaps. This contradicts one basic prediction of the chaining model, i.e.
568
that changes in chain speed are coupled with sensitivity to external sources of timing variability.
569
How can syllable tempo remain the same while independent timing variability decreases?
570
Second, increases in synaptic strength would be expected to decrease the sensitivity of song
571
intervals to perturbations from all possible noise sources, including neuromodulatory
572
mechanisms that induce shared, global timing variability as well as background noise and inputs
573
from areas of the song system that might contribute to independent variability. Why are the
574
decreases in indpendent variability so much greater than the decreases in tempo variability?
575
One mechanism that could reconcile these differences would be a developmental increase in
576
synaptic strength for gap portions of the chain coupled with an overall decrease in inputs from
577
circuits that contribute to independent timing variability for both syllables and gaps. The former
578
would account for changes in transition probability, mean length, and tempo variability for gaps,
579
while the latter would contribute to reductions in independent variability for both syllable and
580
gaps. Under such an hypothesis, the decreases in independent variability for gaps that come
581
from external sources would act to mask any dependence of independent variability on transition
582
probability.
583
One possibilty for the source of input contributing to independent variability is LMAN. This
584
nucleus is the output nucleus of the cortical-basal ganglia loop for song, and has been strongly
24
585
implicated in song learning (Bottjer et al., 1984; Scharff & Nottebohn, 1991). LMAN produces
586
reliable burst sequences with trial-to-trial timing variability (Hessler & Doupe, 1999; Kao et al.,
587
2008), while LMAN inactivation in developing birds yields adult-like stereotypy in the activity
588
premotor nucleus RA (robust nucleus of the arcopallium) during singing (Ölveczky et al., 2011).
589
Based on experiments combining behavioral or pharmacological manipulations of LMAN activity,
590
it has been proposed that LMAN is involved in generating the rendition-to-rendition variability
591
necessary for trial and error learning (Kao & Brainard, 2006; Kao et al., 2005; Ölveczky et al.,
592
2005; Stepanek & Doupe, 2010; Charlesworth et al., 2012). Behavioral studies in developing
593
birds have suggested that song variability driven by LMAN activity might be directed to more
594
poorly learned elements of the song (Ravbar et al., 2012). Finally, while LMAN influences song
595
variability throughout a bird’s lifetime, the degree of this influence has been shown to
596
progressively decline over development (Aronov et al., 2011), including the same age range we
597
have investigated here (Brainard & Doupe, 2001).
598
The hypothesis that LMAN also contributes to timing variability is consistent with previous
599
studies showing that inactivation of LMAN reduces variability in both syllable and gap durations
600
(Thompson et al., 2011; Goldberg and Fee, 2011). At the level of song motifs, Thompson et al.
601
(2011) found a reduction in timing variability after LMAN lesions, but this reduction failed to
602
reach statistical significance. One possible explanation is that tempo variability, since it is shared
603
across the song, dominates variations in motif duration, whereas the tempo and independent
604
components make roughly equal contributions to timing variability for individual syllables or gaps
605
(Glaze and Troyer, 2012). As a result, the hypothesized reduction in independent variability
606
after LMAN inactivation would be expected to cause relatively smaller reductions in motif
607
duration versus that seen at the level of individual syllables or gaps.
608
If the hypothesis that LMAN contributes to timing variability is paired with the hypothesis that
609
song timing is controlled in HVC, then LMAN must have a way to influence activity in HVC.
610
Given the absence of direct connections from LMAN to HVC, this influence must be carried by
25
611
the possible routes from from RA, the premotor nucleus that receives input from LMAN, back to
612
HVC. These include direct synaptic connections from RA back to HVC (Roberts et al. 2008),
613
projections from RA to the medial portion of the dorsolateral thalamus (DMP) to medial MAN
614
(MMAN) and back to HVC (Vates et al. 1997), or through the circuits mentioned above that loop
615
through the midbrain and brainstem (Schmidt et al. 2004).
616
An alternative mechanism that might reconcile differences in developmental timing changes
617
for syllables and gaps would be a balancing of increased excitatory synaptic strength with
618
commensurate increases in inhibitory input during syllables. In adults, inhibitory HVC
619
interneurons have been demonstrated to be more active during syllable-based activity than for
620
gaps (Kozhevnikov & Fee, 2007), and it is possible that tempo increases expected from a
621
strengthening of excitatory synapses over development are offset by increased inhibitory
622
feedback during syllables. However, this mechanism does not explain the differences in tempo
623
CV for syllables vs. gaps. Furthermore, increased inhibitory feedback might also suppress local
624
timing variability in timing, leading to a greater decrease in independent CV for syllables than
625
gaps, which we do not find.
626
Other possible influences on the development of song timing
627
The declining influence of LMAN at later stages in development may be a continuation of an
628
switch from an early mode of vocalization known as subsong, in which young birds less than
629
approximately 45 dph produce long rambling vocalizations without a clear segmentation into
630
syllables. Subsong is known to be primarily driven by LMAN inputs to RA with inputs from HVC
631
becoming more important as birds start to produce early versions of song syllables (Aronov et al.
632
2008, 2011). During these earlier stages of learning ( 35-65 dph), lesions to the thalamic
633
nucleus DLM (medial portion of the dorsolateral thalamus), which sends afferents to LMAN,
634
increase the temporal stereotypy and overall rhythmicity of these early song elements (Goldberg
635
& Fee, 2011).
26
636
During this early period, birds are also learning to coordinate their breathing patterns with
637
syllable production, wth long expiratory pulses increasingly punctuated by short inspirations that
638
eventually closely coincide with intersyllable gaps (Veit et al. 2011).
639
shortening of gaps during the later period of development analyzed here may result from
640
maturation of that respiratory and syringeal musculature, allowing birds to replenish their air
641
supply with shorter inspirations. However, this mechanism does not explain the link between the
642
variability in gap duration and sequence stereotypy.
It is possible that the
643
New neurons are added to the song nucleus HVC throughout a zebra finch’s life, but the rate
644
of new neuron incorporation declines precipitously over the first two years after song
645
crystalization (Wang et al., 2002), During this period, spectral features of song syllables become
646
increasingly stereotyped, and it is possible that the decline in new neuron incorporation may be
647
linked to changes in temporal stereotypy (Pytte et al., 2007). In particular, the incorporation of
648
new neurons might contribute to the decline in indpendent timing variability for both syllables
649
and gaps mesured here.
650
Previous experiments have shown that song tempo is positively correlated with brain
651
temperature (Long and Fee, 2008; Andalman et al., 2011) and that changes in temperature may
652
explain the increases in song tempo seen when male zebra finches direct their song to females
653
(Aronov and Fee, 2012). It is possible that developmental changes in the ability to regulate brain
654
temperature may contribute to increased tempo over development. Indeed, Long and Fee
655
(2008) reported that gap lengths are more sensitive to changes in temperature than syllable
656
lengths, although the syllable/gap difference was much smaller than for the developmental
657
changes in duration reported here.
658
659
Consolidation of sequencing and timing in other systems
660
Overall, the data suggest a phase of development in which syllables are eventually consolidated
661
into a longer, precisely-timed chaining mechanism that can run from beginning to end in a highly
662
reliable order. A similar process of linking simpler chains to form more functional activity patterns
663
has been proposed for neocortex (Bienenstock, 1995). Such a linking may be viewed from
664
another perspective as chunking actions (in this case, syllables) into units of behavior (motifs)
27
665
that are less susceptible to neural noise and interference from the environment. Such a process
666
has been proposed and investigated for mammalian action learning (Graybiel, 1998; Yin et al.,
667
2009; Costa, 2011), with faster and more stereotyped timing as a common characteristic of
668
these learned behaviors (Sakai et al., 2004; Costa, 2011). It would thus be interesting to
669
investigate whether similar timing components and developmental patterns are at play across
670
other systems.
671
672
Acknowledgments
673
TWT is now at the Department of Biology, University of Texas, San Antonio.
674
675
Grants
676
The work was partially supported by NIH grant 5R21MH 066047-02, a Sloan Research
677
Fellowship to TWT and a fellowship for CMG from the Center for the Comparative Biology and
678
Evolution of Hearing (C-CEBH).
679
680
Author Contributions
681
All data were collected by CMG, data were analyzed by both CMG and TWT.
682
683
28
684
References
685
686
Andalman AS, Foerster JN, Fee MS. Control of vocal and respiratory patterns in birdsong:
687
dissection of forebrain and brainstem mechanisms using temperature. PLoS ONE 6: e25461,
688
2011.
689
Arnold AP. The effects of castration on song development in zebra finches (Poephila guttata).
690
J. Exp. Zool. 191: 261–278, 1975.
691
Aronov D, Andalman AS, Fee MS. A specialized forebrain circuit for vocal babbling in the
692
juvenile songbird. Science 320: 630–634, 2008.
693
Aronov D, Fee MS. Natural Changes in Brain Temperature Underlie Variations in Song Tempo
694
during a Mating Behavior. PLoS ONE 7: e47856, 2012.
695
Aronov D, Veit L, Goldberg JH, Fee MS. Two distinct modes of forebrain circuit dynamics
696
underlie temporal patterning in the vocalizations of young songbirds. J. Neurosci. 31: 16353–
697
16368, 2011.
698
Ashmore RC, Renk JA, Schmidt MF. Bottom-up activation of the vocal motor forebrain by the
699
respiratory brainstem. J. Neurosci. 28: 2613–2623, 2008.
700
Ashmore RC, Wild JM, Schmidt MF. Brainstem and forebrain contributions to the generation of
701
learned motor behaviors for song. J. Neurosci. 25: 8543–8554, 2005.
702
Bienenstock E. A model of neocortex. Network: Comp Neural Sys, 6, 179–224, 1995..
703
Brainard MS, Doupe AJ. Auditory feedback in learning and maintenance of vocal behaviour.
704
Nat. Rev. Neurosci. 1: 31–40, 2000.
29
705
Brainard MS, Doupe AJ. Postlearning consolidation of birdsong: stabilizing effects of age and
706
anterior forebrain lesions. J. Neurosci. 21: 2501–2517, 2001.
707
Charlesworth JD, Tumer EC, Warren TL, Brainard MS. Learning the microstructure of
708
successful behavior. Nat. Neurosci. 14: 373–380, 2011.
709
Charlesworth JD, Warren TL, Brainard MS. Covert skill learning in a cortical-basal ganglia
710
circuit. Nature 486: 251–255, 2012.
711
Chi Z, Margoliash D. Temporal precision and temporal drift in brain and behavior of zebra finch
712
song. Neuron 32: 899–910, 2001.
713
Cooper BG, Goller F. Physiological insights into the social-context-dependent changes in the
714
rhythm of the song motor program. J. Neurophysiol. 95: 3798–3809, 2006.
715
Cooper BG, Méndez JM, Saar S, Whetstone AG, Meyers R, Goller F. Age-related changes in
716
the Bengalese finch song motor program. Neurobiol. Aging 33: 564–568, 2012.
717
Costa RM. A selectionist account of de novo action learning. Curr. Opin. Neurobiol. 21: 579–
718
586, 2011.
719
Cynx J. Experimental determination of a unit of song production in the zebra finch (Taeniopygia
720
guttata). J Comp Psychol 104: 3–10, 1990.
721
Derégnaucourt S, Mitra PP, Fehér O, Pytte C, Tchernichovski O. How sleep affects the
722
developmental learning of bird song. Nature 433: 710–716, 2005.
723
Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. Wiley-Interscience, New York,
724
2001.
725
Fee MS, Goldberg JH. A hypothesis for basal ganglia-dependent reinforcement learning in the
30
726
songbird. Neuroscience 198: 152–170, 2011.
727
Fee MS, Kozhevnikov AA, Hahnloser RHR. Neural mechanisms of vocal sequence generation
728
in the songbird. Ann. N. Y. Acad. Sci. 1016: 153–170, 2004.
729
Foster EF, Bottjer SW. Lesions of a telencephalic nucleus in male zebra finches: Influences on
730
vocal behavior in juveniles and adults. J. Neurobiol. 46: 142–165, 2001.
731
Franz M, Goller F. Respiratory units of motor production and song imitation in the zebra finch. J.
732
Neurobiol. 51: 129–141, 2002.
733
Glaze CM, Troyer TW. Temporal structure in zebra finch song: implications for motor coding. J
734
Neurosci, 26(3), 991–1005, 2006.
735
Glaze CM, Troyer TW. Behavioral measurements of a temporally precise motor code for
736
birdsong. J Neurosci, 27(29), 7631–7639, 2007.
737
Glaze CM, Troyer TW. A generative model for measuring latent timing structure in motor
738
sequences. PLoS ONE 7: e37616, 2012.
739
Goldberg JH, Fee MS. Vocal babbling in songbirds requires the basal ganglia-recipient motor
740
thalamus but not the basal ganglia. J. Neurophysiol. 105: 2729–2739, 2011.
741
Graybiel AM. The basal ganglia and chunking of action repertoires. Neurobiol Learn Mem 70:
742
119–136, 1998.
743
Hahnloser RHR, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of
744
neural sequences in a songbird. Nature 419: 65–70, 2002.
745
Hu L-T, Bentler PM. Fit indices in covariance structure modeling: Sensitivity to
746
underparameterized model misspecification. Psychological Methods 3: 424–453, 1998.
31
747
Immelmann K. Song development in zebra finch and other estrildid finches. In R. Hinde (Ed.),
748
Bird Vocalisations (pp. 61–74). London: Cambridge University Press, 1969.
749
750
Jin DZ, Kozhevnikov AA. A compact statistical model of the song syntax in Bengalese finch.
751
PLoS Comput. Biol.
752
Jin DZ. Generating variable birdsong syllable sequences with branching chain networks in avian
753
premotor nucleus HVC. Phys Rev E 80: 051902, 2009.
754
Jun JK, Jin DZ. Development of neural circuitry for precise temporal sequences through
755
spontaneous activity, axon remodeling, and synaptic plasticity. PLoS ONE.
756
Kao MH, Brainard MS. Lesions of an avian basal ganglia circuit prevent context-dependent
757
changes to song variability. J. Neurophysiol. 96: 1441–1455, 2006.
758
Kao MH, Doupe AJ, Brainard MS. Contributions of an avian basal ganglia–forebrain circuit to
759
real-time modulation of song. Nature 433: 638–643, 2005.
760
Kao MH, Wright BD, Doupe AJ. Neurons in a forebrain nucleus required for vocal plasticity
761
rapidly switch between precise firing and variable bursting depending on social context. J.
762
Neurosci. 28: 13232–13247, 2008.
763
Keele SW, Ivry R, Mayr U, Hazeltine E, Heuer H. The cognitive and neural architecture of
764
sequence representation. Psychol Rev 110: 316–339, 2003.
765
Konishi M. Birdsong: From Behavior to Neuron. Annu. Rev. Neurosci. 8: 125–170, 1985.
766
Kozhevnikov AA, Fee MS. Singing-related activity of identified HVC neurons in the zebra finch.
767
J. Neurophysiol. 97: 4271–4283, 2007.
32
768
Lashley KS. The problem of serial order in behavior. In L. Jeffress (Ed.), Cerebral Mechanisms
769
in Behavior (pp. 112–146). New York: Wiley. 1951.
770
771
Long MA, Fee MS. Using temperature to analyse temporal dynamics in the songbird motor
772
pathway. Nature 456: 189–194, 2008.
773
Long MA, Jin DZ, Fee MS. Support for a synaptic chain model of neuronal sequence
774
generation. Nature 468: 394–399, 2010.
775
Marler P. A comparative approach to vocal learning: Song development in white-crowned
776
sparrows. J Comp Physiol Psychol. 71: 1–25, 1970.
777
Ölveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires
778
a basal ganglia circuit. PLoS Biol. 3: e153, 2005.
779
Ölveczky BP, Otchy TM, Goldberg JH, Aronov D, Fee MS. Changes in the neural control of a
780
complex motor sequence during learning. J. Neurophysiol. 106: 386–397, 2011.
781
Pytte CL, Gerson M, Miller J, Kirn JR. Increasing stereotypy in adult zebra finch song
782
correlates with a declining rate of adult neurogenesis. Dev Neurobiol 67: 1699–1720, 2007.
783
Ravbar P, Lipkind D, Parra LC, Tchernichovski O. Vocal exploration is locally regulated
784
during song learning. J. Neurosci. 32: 3422–3432, 2012.
785
Rhodes BJ, Bullock D, Verwey WB, Averbeck BB, Page MPA. Learning and production of
786
movement sequences: behavioral, neurophysiological, and modeling perspectives. Hum Mov
787
Sci 23: 699–746, 2004.
788
Roberts TF, Klein ME, Kubke MF, Wild JM, Mooney R. Telencephalic neurons
789
monosynaptically link brainstem and forebrain premotor networks necessary for song. J.
33
790
Neurosci. 28: 3479–3489, 2008.
791
Sakai K, Hikosaka O, Nakamura K. Emergence of rhythm during motor learning. Trends Cogn.
792
Sci. (Regul. Ed.) 8: 547–553, 2004.
793
Sakata JT, Brainard MS. Real-time contributions of auditory feedback to avian vocal motor
794
control. J. Neurosci. 26: 9619–9628, 2006.
795
Sakata JT, Brainard MS. Social context rapidly modulates the influence of auditory feedback on
796
avian vocal motor control. J. Neurophysiol. 102: 2485–2497, 2009.
797
Scharff C, Nottebohm F. A comparative study of the behavioral deficits following lesions of
798
various parts of the zebra finch song system: implications for vocal learning. J. Neurosci. 11:
799
2896–2913, 1991.
800
Schmidt MF, Ashmore RC, Vu ET. Bilateral control and interhemispheric coordination in the
801
avian song motor system. Ann. N. Y. Acad. Sci. 1016: 171–186, 2004.
802
Schmidt MF. Pattern of interhemispheric synchronization in HVc during singing correlates with
803
key transitions in the song pattern. J Neurophysiol, 90(6), 3931–49, 2003.
804
Shank SS, Margoliash D. Sleep and sensorimotor integration during early vocal learning in a
805
songbird. Nature 458: 73–77, 2009.
806
Solis MM, Perkel DJ. Rhythmic activity in a forebrain vocal control nucleus in vitro. J. Neurosci.
807
25: 2811–2822, 2005.
808
Stepanek L, Doupe AJ. Activity in a cortical-basal ganglia circuit for song is required for social
809
context-dependent vocal variability. J Neurophysiol, 104(5), 2474–86, 2010.
810
Tchernichovski O. Dynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its
34
811
Song. Science 291: 2564–2569, 2001.
812
Troyer TW, Doupe AJ. An associational model of birdsong sensorimotor learning II. Temporal
813
hierarchies and the learning of song sequence. J Neurophysiol, 84, 1224–1239, 2000.
814
815
Vates GE, Vicario DS, Nottebohm F. Reafferent thalamo- “cortical” loops in the song system of
816
oscine songbirds. J. Comp. Neurol. 380: 275–290, 1997.
817
Veit L, Aronov D, Fee MS. Learning to breathe and sing: development of respiratory-vocal
818
coordination in young songbirds. J. Neurophysiol. 106: 1747–1765, 2011.
819
Wang NG, Hurley P, Pytte C, Kirn JR. Vocal control neuron incorporation decreases with age
820
in the adult zebra finch. J. Neurosci. 22: 10864–10870, 2002.
821
Yin HH, Mulcare SP, Hilário MRF, Clouse E, Holloway T, Davis MI, Hansson AC, Lovinger
822
DM, Costa RM. Dynamic reorganization of striatal circuits during the acquisition and
823
consolidation of a skill. Nat. Neurosci. 12: 333–341, 2009.
824
Zann R. The Zebra Finch: A Synthesis of Field and Laboratory Studies. New York: Oxford
825
University Press. 1996.
826
35
827
Figure Captions
828
Figure 1. Example of a timing covariance matrix and its decomposition in one bird (age 365
829
dph). A, Data covariance matrix. The darkness of each square shows the pairwise covariance of
830
the durations of the respective song elements listed along the x and y axes (letters denote
831
syllables; dashes denote silent gaps). B, Decomposition of the covariance matrix using the
832
timing variability model described in the Methods. The leftmost matrix is the total covariance
833
generated by the model, while each matrix on the righthand side of the equation is the
834
covariance generated by the respective timing components. Color limits are the same as in A.
835
836
Figure 2. Changes to average song duration and tempo variability over 65-365 dph. A:
837
Spectrograms from bird 257 from 65 and 85 dph; each spectrogram depicts the song motif
838
whose duration is closest to the average computed for that age period. Syllable labels are
839
indicated between the motifs. B, C: Change in average duration and tempo CV for the same bird
840
over 65-85 dph; syllable labels match panel A. D, E, F: Median±standard error for duration,
841
tempo variability and tempo CV across development. Durations are normalized by subtracting
842
the duration at 365 dph.
843
844
Figure 3. Changes to independent timing variability 65-365 dph. A, B: Median±standard error for
845
independent variability and CV. C: Independent CV for bird 253 for 65 and 85 dph.
846
847
Figure 4. Estimates of the syl/gap factor over the 4 developmental periods examined (see
848
Methods for definition). Errorbar plots indicate edian±standard error of raw syllable and gap
849
weights.
850
36
851
Figure 5. Changes to syllable transition probabilities 65-365 dph. A, B, C: Transition probabilities
852
across consecutive age ranges. D: Median±standard error of transition probabilities across
853
development. Overall, changes in probability are significant 65-85 dph, but not after 85 dph.
854
855
Figure 6. Changes in transition probability 65-85 dph vs. change in timing parameters of the
856
silent gaps associated with each transition. A: average gap duration; B: tempo CV; C:
857
independent CV.
858
859
Tables
Duration
(msec)
Motifs*
Syls
Gaps*
Tempo Var.
Syls
(msec)
Gaps*
Tempo CV
Syls
(%)
Gaps*
Ind.
Var.
Syls*
(msec)
Gaps*
Ind. CV (%)
Syls*
Gaps*
Syl/Gap
factor
Syls
Gaps
65 dph
590.092
±32.934
65.694
±11.916
62.603
±5.01
0.334
±0.313
1.265
±0.230
0.786
±0.453
2.013
±0.487
2.629
±0.412
3.790
±0.448
3.286
±0.398
6.251
±0.643
0.411
±0.199
-0.652
±0.311
85 dph
579.975
±31.437
63.198
±11.163
61.261
±6.46
0.742
±0.169
0.943
±0.239
1.244
±0.179
1.495
±0.254
1.285
±0.316
2.492
±0.264
1.719
±0.312
3.71
±0.477
0.547
±0.152
-0.587
±0.171
125 dph
569.658
±41.755
63.449
±12.218
60.131
±6.062
0.651
±0.135
0.875
±0.122
1.014
±0.136
1.612
±0.253
1.304
±0.168
1.844
±0.313
1.578
±0.278
3.585
±0.507
0.463
±0.173
-0.812
±0.197
37
365 dph
565.853
±47.697
63.059
±12.141
58.471
±4.807
0.954
±0.163
0.817
±0.052
1.132
±0.160
1.317
±0.163
1.179
±0.245
1.516
±0.199
1.196
±0.324
2.353
±0.409
0.773
±0.363
-0.933
±0.219
change
65-365
-4.11
±3.36%
2.13
±2.26%
-10.94
±2.81%
0.040
±0.183
-0.613
±0.337
-0.022
±0.287
-0.752
±0.656
-0.970
±0.299
-2.447
±0.364
-1.255
±0.315
-3.669
±0.829
0.303
±0.212
-0.421
±0.294
change
65-85
-2.68
±0.75%
0.31
±0.79%
-6.10
±2.90%
-0.062
±0.208
-0.225
±0.157
-0.089
±0.378
-0.373
±0.354
-0.479
±0.325
-1.543
±0.295
-0.876
±0.286
-2.038
±0.504
-0.033
±0.177
-0.120
±0.225
change
85-365
-2.43
±2.66%
-0.07
±1.58%
-5.67
±2.43%
0.149
±0.088
-0.094
±0.173
0.216
±0.081
-0.179
±0.375
-0.149
±0.162
-1.013
±0.342
-0.288
±0.112
-1.423
±0.311
0.087
±0.133
-0.132
±0.210
Transition
Prob.*
0.730
±0.098
0.989
±0.028
0.986
±0.018
0.996
±0.007
0.080
±0.064
0.047
±0.030
0.003
±0.000
860
861
Developmental changes in timing and sequencing. Reported values are median±standard error
862
of the median, calculated from 200 bootstrapped samples. Duration changes are reported as
863
percent change; all others in the same units as the raw measures. * denotes significant change
864
in that variable over development (65 to 365 dph; WSR p<.05). Heavy borders between
865
corresponding syllable and gap cells denote significant group differences (Bonferoni corrected
866
WSM, p<.05).
38