Articles in PresS. J Neurophysiol (November 21, 2012). doi:10.1152/jn.00578.2012 1 2 3 4 5 DEVELOPMENT OF TEMPORAL STRUCTURE IN 6 ZEBRA FINCH SONG 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Christopher M. Glaze and Todd W. Troyer 23 Program in Neuroscience and Cognitive Science, Dept. of Psychology 24 University of Maryland, College Park, MD, USA 25 26 Corresponding Author: Christopher M. Glaze 27 Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA 28 29 E-mail: [email protected] Copyright © 2012 by the American Physiological Society. 30 Abstract 31 Zebra finch song has provided an excellent case study in the neural basis of sequence learning, 32 with a high degree of temporal precision and tight links with precisely timed bursting in forebrain 33 neurons. To examine the development of song timing, we measured four aspects of song 34 temporal structure at four age ranges between 65 and 375 days post hatch: the mean durations 35 of song syllables and the silent gaps between them, timing variability linked to song tempo, 36 timing variability expressed independently across syllables and gaps, and transition probabilities 37 between consecutive syllable pairs. We found substantial increases in song tempo between 65 38 and 85 days post hatch, due almost entirely to a shortening of gaps. We also found a decrease 39 in tempo variability, also specific to gaps. Both the magnitude of the increase in tempo and the 40 decrease in tempo variability were correlated on gap-by-gap basis with increases in the reliability 41 of corresponding syllable transitions. Syllables had no systematic increase in tempo or decrease 42 in tempo variability. In contrast to tempo parameters, both syllables and gaps showed an early 43 sharp reduction in independent variability followed by continued reductions over the first year. 44 The data suggest that links between syllable-based representations are strengthened during the 45 later parts of the traditional period of song learning, and that song rhythm continues to become 46 more regular throughout the first year of life. Similar learning patterns have been identified in 47 human sequence learning, suggesting a potentially rich area of comparative research. 48 Keywords: temporal structure, learning, birdsong, generative model 49 2 50 Introduction 51 Sequence learning is one of the touchstone questions in neuroscience (e.g., Lashley, 1951; 52 Hikosaka et al., 2002; Keele et al., 2003; Rhodes et al., 2004), and song learning in zebra 53 finches has served as an excellent model system for understanding sequence learning and 54 production. Adult songs are highly stereotyped, and have a well-defined temporal structure 55 spanning multiple time scales. In the zebra finch, song learning can be roughly divided into two 56 overlapping processes (Immelmann, 1969; Marler, 1970; Konishi, 1985; Brainard & Doupe, 57 2000): “sensory acquisition," in which a young bird ∼20-65 days post-hatch (dph) is exposed to 58 the song of one or more tutors and forms an auditory template; and “sensorimotor learning," in 59 which the juvenile ∼35-90 dph learns to produce song based on that template. Once learned, 60 songs consist of several repeats of 500-1000 msec long “motifs;" motifs consists of a sequence 61 of 3-7 “syllables," 50-250 msec-long stereotyped vocalizations separated by silent gaps. In adult 62 zebra finch song, the spectral features of individual song syllables are highly stereotyped, as are 63 the timing and sequencing of syllable production. 64 To date, most studies of song development have focused on how the spectral features of 65 song are learned (e.g., Tchernichovski et al., 2001; Deregnaucourt et al., 2005; Shank & 66 Margoliash, 2009; Charlesworth et al., 2011). Further electrophysiological studies have 67 demonstrated that basic spiking characteristics in premotor nucleus RA develop concomitantly 68 with spectral features (Shank & Margoliash, 2009; Ölveczky et al., 2011), and both song 69 features and the patterns of neural activity stabilize by around 90 dph (Ölveczky et al., 2011). 70 However, few studies have focused on how the overall temporal structure of song is learned. 71 There is converging evidence for syllable-based representations in the adult motor code (Cynx, 72 1990; Franz & Goller, 2002; Schmidt, 2003; Solis & Perkel, 2005; Glaze & Troyer, 2006; Cooper 73 & Goller, 2006; Andalman et al., 2011), begging the question of how they are formed, and what 3 74 role they play in song learning. There are indications that syllable-sequence variability during the 75 later stages of song development is sensitive to lesions of LMAN (the lateral magnocellular 76 nucleus of the anterior nidopallium), the output nucleus of a basal ganglia circuit involved in 77 song learning (Scharff & Nottebohn, 1991; Bottjer et al., 1984; Ölveczky et al., 2005). There is 78 also evidence for changes to learned song timing during both early development and young 79 adulthood that are similarly sensitive to perturbations to the basal ganglia circuit (Brainard & 80 Doupe, 2001; Goldberg & Fee, 2011). The gradual switch from random timing during the early 81 stages of song learning to more stereotyped timing reflects a switch in motor control from LMAN 82 to the song motor pathway (Aronov et al., 2011). However, it remains unclear whether there are 83 in fact distinct neural processes associated with learning syllable timing and transitions that are 84 different from learning the spectral features of individual syllables (Troyer & Doupe, 2000). 85 We previously investigated rendition-to-rendition variability in adult song timing, and found 86 several patterns that suggested how song representations are organized (Glaze & Troyer, 2006, 87 2007, 2012): First, song intervals share a common source of length variance that we term 88 “tempo variability," with syllables proportionally less sensitive to this variability than gaps. We 89 also quantified an “independent" component of timing variability that was uncorrelated across 90 syllables and gaps and may be linked with synaptic or neural noise in the pattern generator for 91 song. We have recently developed a statistical “timing variability model" that allows us to 92 separate out these variability components in a principled way (Glaze & Troyer, 2012). 93 To probe the development of the motor code for song timing, we applied the timing variability 94 model to analyze changes in the temporal structure of zebra finch song from 65 dph through 1 95 year of age. We found that between 65 and 85 dph, silent gaps decreased in both average 96 duration and their sensitivity to tempo changes while syllables did not. Over that same age 97 range, there were increases in the probability of syllable transitions that were consistent with the 98 adult motif, and those increases were correlated with the decreases in timing parameters gap- 99 by-gap. In contrast to tempo, independent variability declined for both syllables and gaps, and 4 100 was the only component of timing variability that consistently declined in magnitude across the 101 entire age range examined, decreasing by a factor of 2-3 through the first year of life. Overall, 102 the data suggest that the final phases of song learning involve the chunking of syllable-based 103 representations into a reliable sequence, with the sequence becoming progressively more 104 rhythmic throughout the first year of life. Similar learning patterns have been hypothesized for 105 mammalian sequence learning, suggesting a potentially rich area of comparative research in 106 birdsong (Graybiel, 1998; Sakai et al., 2004; Yin et al., 2009; Costa, 2011). 107 108 Materials and Methods 109 Analysis was based on the songs from 7 male birds, recorded between 65 and 375 dph. All care 110 and housing was approved by the institutional animal care and use committee at the University 111 of Maryland, College Park. All analysis was performed in Matlab (Mathworks, Natick, MA), with 112 all template matching and dynamic time warping algorithms written as C-MEX routines. 113 114 Song collection 115 Each bird was raised with his parents and clutch mates until 25-30 dph. Individual juveniles were 116 then housed with a single adult tutor (either the father or another adult male) in separate cages 117 within a common sound isolation chamber (Industrial Acoustics, Bronx, NY). At 150 dph, they 118 were released back into the aviary with other birds. Subsequent adult (365+ dph) recordings 119 were gathered by returning birds to a chamber, paired with one other adult male. We gathered 120 song during 4 age periods: 65-70, 85-90, 125-135, and 365-375 dph. 121 Chambers were equipped with two directional microphones (Pro 45; Audio-Technica, Stow, 122 Ohio). Data were recorded at 24,414.1 Hz and consisted of a series of “clips," periods of sound 5 123 separated by at least 10 msec of silence as determined by a sliding window amplitude algorithm. 124 Sound clips separated by <200 msec were included in the same “recording" and clip onset times 125 were indicated by filling the gaps between clips with zeros. Recordings were attributed to a 126 target bird if the total power was greatest for the microphone directed toward the side of the 127 recording chamber. 128 We gathered song during 4 age periods: 65-70, 85-90, 125-135, and 365-375 dph. We will 129 call a collection of songs from a given bird and a given age range a “bird-by-age sample." Since 130 songs continue to develop in subtle ways into young adulthood (e.g., Brainard & Doupe, 2001), 131 all song analysis treated each time period independently, e.g., song matching was based on 132 templates that were constructed for that period only. 133 134 Song selection and template matching 135 The main thrust of our analysis used our previously published ‘timing variability model’ 136 (Glaze and Troyer, 2012) to separate distinct components of temporal variability and track these 137 over development. The required input to the model is a series of repeated renditions of a single 138 stereotyped sequence of syllables. 139 toward identifying and selecting song syllables belonging to each bird’s main song motif. 140 Furthermore, previous studies have shown systematic timing slowing of motifs within a bout of 141 adult song (Chi and Margoliash, 2001; Pytte et al., 2007). Given the difficulty of accounting for 142 such within-song timing variability, particularly at the earliest ages, we restricted our timing 143 analysis the first complete motif in each song. As a result, our data selection methodology is geared 144 Recordings were initially analyzed using the log-amplitude of the fast-Fourier transform 145 (FFT) windowed with a 256-point (10.49 msec) Hamming window advanced in 128-point steps. 146 Frequency bins outside the 0.5-8.6 kHz range were excluded because song structure is less 147 reliable at the highest and lowest frequencies. To determine if the sound in these clips matched 6 148 syllables in the bird’s song, templates were formed by aligning and averaging 4-5 manually 149 chosen clips corresponding to each syllable in the repertoire for that bird-by-age sample; 150 exemplars were time-aligned by finding the peaks in a standard cross-correlation of syllable 151 spectrograms. All recorded clips were then matched against each syllable template using a 152 sliding algorithm (Glaze & Troyer, 2007). For each template and each time point (t) in the clip, a 153 match score, c(t), was computed as the reciprocal of the mean squared difference between 154 template and song log-amplitudes at matched time-frequency points: n 155 m c(t ) = n × m / ( s (i + t , j ) - s' (i, j )) 2 i =1 j =1 156 157 where s is the clip spectrogram, s' is the template spectrogram, n is the number of time bins in the template, m is the number of frequency bins, i indexes time and j indexes frequency. 158 The peak match score was determined as the maximum of c(t) over time alignments t. 159 Matches were only considered potentially valid if the peak match score exceeded an initial fixed 160 threshold of 0.5 (manually chosen based on visual inspection), and if the onset and offset for the 161 clip and template differed by 20 msec or less. If a clip had multiple syllable matches, the 162 template with the highest match score was chosen. This yielded a median of 14,720, range 163 1686−120,873 matches per syllable in each bird-by-age sample. 164 Template matching thresholds 165 After all clips were initially matched to a template, template-matching thresholds were 166 recalculated for each syllable, based on a simplified minimum error-rate classification scheme 167 (Duda et al., 2000): Match scores were grouped into 10 bins from the 0.5 minimum score 168 through a value of 1.5, with values greater than 1.5 placed in the last bin. A random sample of 169 20 clips per score-bin were then gathered. A few syllables which had particularly high average 170 match scores had fewer than 20 clips in the low match bins. In these cases, the clips were 171 grouped with the next highest bin. 7 172 Within each bin, each of the 20 clips was manually judged as being either correctly classified 173 or a false positive. Error rates from this process were extrapolated to all clips in each bin and an 174 optimal threshold was set to minimize the estimated combined number of false positives and 175 false negatives (in this case, correct matches below threshold) for the entire sample (thresholds 176 set by this method necessarily occur at bin boundaries). Across syllables, this yielded a median 177 1.8% false positive rate, range 0−34.2%, median 0.9% false negative, 0−34.3%. Approximately 178 86 and 87% of syllables respectively had false positive and rejection rates <10%. The highest 179 error rates occurred for brief, noisy syllables that resemble introductory notes; these templates 180 frequently yielded modestly high match scores even for vocalization not directly involved in 181 song, such as “tet” calls (Zann, 1996). Importantly, we found no systematic changes in error 182 rates by age. 183 Motif selection and transition probabilities 184 We considered a transition between two syllables to be valid if (1) both syllables had a match 185 score above optimal threshold, and (2) the time between the offset of one syllable and the onset 186 of the next was no more than 100 msec. We defined the “forward transition probability" between 187 syllables X and Y as the total number of valid X-Y transitions divided by the total number of valid 188 transitions from X to another syllable. Following a previous analysis of sequence variability (e. g. 189 Foster & Bottjer, 2001; Kao & Brainard, 2006), we excluded song stops (i.e. transitions to 190 silence) from all probabilities. This analysis yields a transition probability for each pair of 191 syllables in each bird-by-age sample. 192 Further timing analysis relied on the application of our timing variability model, and was 193 confined exclusively to syllables and gaps contained within each bird’s main song motif, defined 194 as the most common sequence of song syllables sung at one-year of age. Importantly, we 195 restricted analysis to the first motif of each song recording. Across samples, 1972−14,057 196 recordings were made (median 7132); of these 12.6−80.4% (median 41.9%) contained at least 8 197 one motif, yielding 439−9927 recordings with song. All of the bird-by-age samples with <33% 198 matched recordings came from birds in the first 3 age groups (between the ages of 65 and 135 199 dph) that produced sequences of vocalizations consisting of frequently interrupted portions of 200 song and/or long sequences of calls. 201 Given the sensitivity of our timing analysis, we further screened the data by omitting any 202 sequence containing a song intervals (syllables and silent gaps) whose length was more than 5 203 standard deviations from the mean of the length distribution for that interval. The vast majority of 204 these outliers could be attributed to acoustic interference from other noises in the case (e.g., 205 wing flaps, vocalization from the tutor). We omitted a median 4.4% bird-by-age sequences for 206 this reason, range 0.5−25.6%. Without omitting outliers, we have found no significant change in 207 basic statistics such as mean length and change in standard deviation across age periods; 208 however, the accuracy of our statistical modeling may be sensitive to the presence of these 209 outliers. 210 Across birds-by-age samples, the final sample size was median a 3390, range 352-9879. 211 Across the song motifs of all 7 birds, the sample included a total of 33 unique syllables and 26 212 gaps of silence. 213 214 Song timing calculations 215 Timing analysis used a more fine-grained algorithm and was restricted to syllables found within 216 identified motifs. Fine-grained spectrograms were recalculated for all syllables using FFTs with a 217 128-point window slid forward in 4-point steps, yielding 0.16 msec time bins. Although previous 218 research (Glaze & Troyer, 2006, 2007) used log-amplitudes, here we used raw amplitudes 219 which we have found to be more reliable. The resulting spectrograms were then smoothed in 220 time with a 64-point window formed by truncating a Gaussian with a 25.6-point (∼5 msec) 9 221 standard deviation. Time derivative spectrograms (TDSs), calculated as differences in amplitude 222 in time-adjacent bins, were used in the rest of the analysis. 223 Distinct syllable templates were constructed for each bird-by-age sample. In the first step, we 224 applied the same sliding algorithm used previously to the fine-grained TDSs. For each syllable, 225 we selected a random sample of 200 example TDSs. Using the sample with the highest (coarse- 226 grained) template-matching score as an initial template, each of the 200 TDSs were aligned to 227 this sample using our sliding algorithm. An updated template was formed by averaging the 228 aligned TDSs, and the process of alignment followed by averaging process was repeated. 229 In a second step, we repeated this strategy of template refinement by aligning TDSs to the 230 template using a custom dynamic time-warping (DTW) algorithm (Glaze & Troyer, 2007). The 231 final template was based on the average of these warped syllable TDSs. This final DTW step 232 proved to be crucial for the younger developmental periods (65-85 dph), in which timing within 233 syllables is quite variable. 234 Measuring the variability of interval lengths requires that interval boundaries be identified in a 235 consistent manner across syllable renditions. To accomplish this, we manually marked “events" 236 within the TDS templates, and then used DTW to determine the precise times at which these 237 events were observed in individual syllables. Syllable onsets/offsets were generally identified as 238 salient time-derivative peaks/troughs in the template TDS. These correspond to peaks in the rise 239 and fall of energy at the beginning and ends of syllables, and result in syllables that are slightly 240 shorter than would be expected from measurements based on exceeding a threshold of power. 241 However, the differences are generally small and we find that our method gives much more 242 reliable measurements. 243 Markers were first identified manually in templates extracted from oldest age period (365-375 244 dph). To define template syllable onsets/offsets at earlier ages using the same events chosen 245 for 365-370 dph, we used DTW to map syllable templates from younger ages onto the 365 dph 246 templates. 10 247 Timing variability model 248 Our analysis focused on tracking distinct patterns of timing variation through development. 249 These patterns of timing cannot be measured directly, but are ‘latent’ and are extracted by fitting 250 the ‘timing variability model’ to the data, a form of factor analysis explicity tailored to the analysis 251 of timing in action sequences (Glaze & Troyer, 2012). The model fitting procedure can be 252 viewed as decomposing the matrix of length covariances into the sum of several components 253 (Fig. 1). The model extracts four basic components of covariance. First is the “independent 254 variability” that is unique to each interval. This contributes to the individual variance of each 255 interval, found along the diagonal of the covariance matrix. Second, the model estimates the 256 jitter in the onset/offset border between adjacent intervals. Such jitter contributes to the variance 257 in the length of each interval, but also leads to a negative covariance between adjacet intervals. 258 These two components of timing variance are ‘local’ in the sense that the variations in each 259 interval and each onset/offset border are assumed to be governed by a distinct stochastic 260 variable. 261 The third and fourth timing components are each assumed to be governed by a single 262 stochastic variable, but that variable has a “global” influence spread across all intervals in the 263 sequence. 264 captured by a weight Wk, and each factor will contribute WkWj to the covariance between interval 265 k and interval j. In our analysis, we used 2 global factors. For the first factor, the weightings to all 266 intervals in the motif have the same sign. So as the motif lengthens and shortens, all intervals 267 lengthen or shorten together, although not by the same amount. For this reason, we will refer to 268 the first global factor as the tempo factor and the weighting for each interval as the tempo 269 weight. The magnitude and direction of the effect of such a global factor interval k is 270 For the second global factor, most of the time (but not always) syllables and gaps have 271 weightings of opposite sign. We will refer to this factor as the syl/gap factor. Such a factor will 11 272 change the lengths of syllables and gaps across the song in opposite directions, and will 273 contribute a ‘checker board’ pattern of covariances. We chose a sign convention where the 274 syl/gap weights for most syllables were positive, and the weights for most gaps were negative. 275 Errors in the measurements of syllable and onset and offset will contribute to the jitter 276 component. Therefore, we focused our analysis on the other three factors: indpendent, tempo 277 and syl/gap timing variability. Jitter is mostly estimated to improve the estimates of the other 278 components of timing variability. Because the estimate of jitter depends on knowing the length of 279 the two surrounding intervals, it is impossible for our algorithm to estimate the jitter in the onset 280 of the first syllable and the offset of the last syllable in the motif. As a result, this jitter must be 281 captured by the other factors in the model. We found no systematic differences in the 282 developmental trends for variability estimates for these syllables, and therefore we lumped them 283 with other syllables. Finally, while both jitter and the syl/gap factor contribute to negative 284 covariances between syllables and gaps, jitter acts locally and for any given song can act to 285 lengthen one syllable and shorten another. 286 287 Model details and data fitting 288 For each interval we calculated the mean length of that interval. To analyze timing variability, 289 we subtract off the mean length and write the deviation from the mean of the kth interval in 290 sequence n as xkn. We separate this variability into a sum of components: 291 −u k −1,n + u kn x kn = Wk1z1n + Wk 2 z 2 n + ηkn + u1n −u K −1,n 1< k < K k =1 k=K 292 Here, z1n is a zero-mean, unit-variance latent variable representing tempo while Wk1 is the 293 sensitivity of interval k to that variable, while z2n and Wk2 are the respective latent variables and 294 sensitivities for the syl/gap factor. The third term, ηkn represents the independent variable, and is 12 295 zero-mean but has a non-unitary variance, while uk-1,n and ukn represent jitter in the onset and 296 offsets of the interval and are similarly zero-mean with non-unitary variance. It is important to 297 note that the tempo and syl/gap variables (z1 and z2) come naturally from a single, 2×N latent 298 variable z that is shared by all song intervals, while the respective weights W1 and W2 are the 299 columns of a single K×2 weight matrix W. Furthermore, we can rewrite the timing jitter variables 300 with a single vector u multiplied by the K×(K−1) differencing matrix D, which has ones along the 301 diagonal (Dk,k=1 since boundary k causes deviations in the offset of interval k), negative ones 302 along the subdiagonal (Dk,k-1=-1 since boundary k−1 causes deviations in the onset of interval k), 303 and 0 elsewhere. 304 With these two substitutions we now have: x n = Wz n + ηn + Du n 305 306 We fit parameters to the data using an expectation-maximization (EM) algorithm described 307 elsewhere (Glaze & Troyer, 2012). The algorithm fits parameters to the measured covariance 308 matrix S, so we now write the parameters in matrix form. Using Ω to denote a diagonal matrix of 309 jitter variance and Ψ to denote the diagonal matrix of independent variability, the data 310 covariance can now be written as: 311 S = WW T + Ψ + DΩDT 312 Once parameters were fit using EM, we transformed W so that the tempo weights (1st 313 column) captured all of the accumulating variability shared by song intervals. We then 314 determined interval variability due to the tempo and syl/gap factors along the 1st and 2nd 315 columns of W respectively. These values were in units of msec, and may be thought of as 316 standard deviations with respect to those factors. We determined variance due to the 317 independent factor along the diagonal Ψ, and to keep units consistent with the other variables, 318 computed the square root to yield an effective standard deviation with respect to that factor. 13 319 For each bird-by-age sample we ran the EM algorithm from 100 randomly drawn initial 320 parameter estimates (Glaze & Troyer, 2012). To assess model fits we computed the 321 standardized root mean-squared residual (SRMR; Glaze & Troyer, 2012; Hu & Bentler, 1998). 322 This measure is the average difference between the pairwise correlations in the data and those 323 predicted by the model covariance matrix. Across samples at 65-70, 85-90, 125-130 and 365- 324 370 dph, respective median SRMR (± median absolute deviation) was 0.027±0.015, 325 0.015±0.015, 0.015±0.009 and 0.015±0.010. While median SRMR appeared higher at 65 dph 326 than at other ages, this trend held for only 3 of 7 birds. Thus, the model fit a large majority of 327 birdsongs in our sample reasonably well, and there did not appear to be any reliable trend by 328 age for fits to get better or worse. 329 330 Reported statistics 331 To test for significant age-dependent changes in a given parameter we evaluated the 332 difference between 365-375 and 65-70 dph using the non-parametric Wilcoxon signed-rank test, 333 which we abbreviate throughout as “WSR". Tests for parameter differences between syllables 334 and gaps were performed at each age range were tested using the Wilcoxon rank-sum test, 335 abbreviated as “WSM", Bonferroni corrected for 4 repeated comparisons. For relationships 336 among parameters themselves, we used Spearman’s correlation. Throughout we report 337 parameter distributions using medians±standard errors, where we have computed the standard 338 error of the median from 200 bootstrapped samples per parameter. 339 340 Results 14 341 We analyzed the development of temporal structure in zebra finch song in 7 males from four 342 time periods spanning the later stages of juvenile song learning through the first year of life: 65- 343 70 dph, 85-90 dph, 125-135 dph and 365-375 dph. (For convenience, we will denote each 344 period using the first age only.) Across birds we analyzed a total of 33 unique syllables and 26 345 gaps of silence. Final bird-by-age samples contained 414-9876 sequences that contained at 346 least one full motif. We used our previously published timing variability model (Fig 1; Glaze & 347 Troyer, 2012) to track three different sources of timing variability across development. Previous 348 studies have shown important differences in the timing of syllables and intersyllable gaps (Glaze 349 and Troyer, 2006; Cooper and Goller, 2006; Andalman et al., 2011), with syllables generally 350 being longer but less variable as measured by the coefficient of variation (CV), equal to the 351 standard deviation divided by the mean Therefore we analyzed syllables and gaps separately. 352 Our main findings concerning the development of individual parameters are contained in Table 353 1. 354 355 Changes to average tempo are found in gaps 356 We began by confirming previous reports that average song tempo increases systematically as 357 a function of age (Lombardino and Nottebohm, 2000; Brainard & Doupe, 2001; Pytte et al., 358 2007). In fact, between 65 and 365 dph, motif length decreased by a median 4.11±3.36%,(WSR, 359 p=.0313) with the trend holding across 5 of 7 birds and the change ranging from -18.94-3.14%. 360 Over half of this decrease appeared to be explained by changes between 65 and 85 dph, with 361 motif length decreasing over that period in all 7 birds by a median 2.68±0.75% (range -16.52- 362 1.75%). 363 Given the distinction between syllable- and gap-based timing in adult song, we investigated 364 the extent to which these developmental changes occurred across both types of time intervals 15 365 (Fig. 2). On average, the dominant increases in tempo occurred during gaps (Fig. 2D). Mean 366 gap length decreased in 22 out of the 26 gaps, with a median decrease of -10.94±2.81% (WSR, 367 p<3.96x10-4). Calculated per bird, median gap length decreased in 6 of 7 birds. As with motif 368 duration, more than half of this change occurred between 65 and 85 dph, during which gaps 369 decreased in duration by 6.10±2.90%, vs 5.67±2.43% 85-365 dph. 370 371 In contrast to gaps, syllables as a whole failed to show a consistent pattern of length changes, with changes in length of 2.13±2.26% from 65-365 dph (WSR, p=0.851). 372 373 Changes to tempo variability occur in gaps 374 We next asked whether the variability in tempo also decreased over late development. For each 375 bird-by-age sample of songs, we used the timing variability model to extract tempo weights that 376 measure tempo variability for each interval. Interestingly, we found a pattern similar to the 377 development of average tempo: silent gaps selectively decreased in tempo variability between 378 65 dph and 365 dph while syllables did not (Fig. 2E). Median gap tempo variability was 379 1.265±0.230 msec during the former period and 0.943±0.239 msec during the latter, with 380 average decreases over that period of 0.613±0.337 msec (WSR, p=0.0107), more than one 381 third of the original 65 dph values. As with average tempo, the bulk of this change occurred 382 between 65-85 dph, with tempo variability decreasing by 0.225±0.157 msec over that age range, 383 vs. 0.094±0.173 msec 85-365 dph. 384 In contrast to gaps, syllable tempo variability failed to show significant decreases over late 385 development, with respective medians of 0.334±0.313 msec and 0.954±0.163 msec at 65 and 386 365 dph, and no significant change across the age range (WSR, p=0.339). 387 It is natural to expect that longer intervals will show more variability in their length. Indeed, 388 across all interval types we found positive Spearman’s correlations between average length and 16 389 global weight of 0.450, 0.626, 0.608 and 0.703 at 65, 85, 125 and 365 dph respectively. These 390 values were significantly positive over all four periods (Bonferroni corrected p<0.005 in all 391 cases). To control for changes in average length when examining changes in timing variability, 392 we divided tempo weights by mean length to derive a tempo-based coefficient of variation (CV; 393 Glaze & Troyer, 2012); for example, a tempo CV of 1% would mean that the given time interval 394 varies with respect to tempo by 1% of its average duration. We found that tempo CV also 395 decreased among gaps between 65 dph and 365 dph (Fig. 2F) from 2.013±0.487% to 396 1.317±0.163%, with the difference reaching statistical significance (median decrease of 397 0.752±0.493%; WSR, p=0.0124). 398 Previous data in adult song have indicated that gaps are proportionally more sensitive to 399 tempo changes than syllables, i.e. that gaps stretch and compress with changes to song 400 duration more than syllables after dividing out mean duration (Glaze & Troyer, 2006) and have a 401 higher tempo CV (Glaze & Troyer, 2012). Given that gaps decrease in tempo average and CV 402 over development, we expected that gaps would have higher tempo CV at all ages examined. 403 This was indeed the case (Fig. 5), although given the relatively limited size of our data only data 404 from 65 and 125 reached statistical significance (WSM, Bonferroni corrected p<0.05). 405 406 Independent variability decreases for both syllables and gaps 407 In contrast to tempo variability, independent variability decreased across both syllables and 408 gaps between 65 and 365 dph (Fig. 3), decreasing from 3.790±0.448 msec to 1.516±0.199 409 msec among gaps and from 2.629±0.412 to 1.179±0.245 msec among syllables, with respective 410 median decreases of 2.447±0.364 and 0.970±0.299 msec (WSR, p<0.0001 in both cases). For 411 both syllables and gaps, most of this change occurred 65-85 dph. As with tempo, independent 412 timing variability scales with average interval duration in adults (Glaze & Troyer, 2012). This 17 413 relationship also held across all developmental periods, with Spearman’s correlation ranging 414 from 0.291 to 0.450 (Bonferroni corrected p<0.05 in all cases except at 365 dph, where 415 uncorrected p=0.026). Therefore, we examined independent CV, as we did for tempo, by 416 dividing independent variability by mean duration and examined whether that measure also 417 changed over development. Independent CV decreased significantly between across the first 418 year of life for both gaps and syllables, from 6.251±0.643 to 2.353± 0.409% among gaps and 419 3.286±0.398 to 1.196±0.324% among syllables (WSR, p<0.0001 in both cases). As with tempo, 420 gaps tended to have higher independent CVs than syllables across the age ranges examined, 421 although the difference failed to reach significance for 365 dph (WSM, Bonferroni corrected 422 p<0.05). 423 424 Syl/gap factor does not change over late development 425 Previous analysis of adult songs revealed a second global timing factor shared across the song 426 in which syllables generally have positive weights and gaps have negative weights (Glaze & 427 Troyer, 2012). The result is a timing component which induces a negative covariation between 428 syllable and gap durations, independently of the other factors reported thus far. We asked 429 whether this syl/gap factor was present across development, and if so, whether it changed at all. 430 Indeed, during all four developmental periods, syllable weights tended to be positive while gap 431 weights tended to be negative (Fig. 4). The differences between syllables and gaps were 432 significant for all four periods (WSM, Bonferroni corrected p<0.0005). Furthermore, we found no 433 significance changes in factor weight among either gaps of syllables 65-365 dph, with median 434 changes of -0.421±0.294 and 0.303±0.212 msec (WSR, p=0.201 and 0.237). 435 436 Links between changes in gap timing and syllable transition probabilities 18 437 We have shown that gaps, defined as the period of silence between two syllables in the motif, 438 become shorter and less variable over development with changes in both tempo and 439 independent variability disproportionately occurring between 65 and 85 dph. To investigate 440 whether these changes in timing might correspond to a tighter link between those syllables, we 441 examined developmental changes in the transition probabilities for each consecutive syllable 442 pair in the motif. Transition probabilities showed a developmental trajectory that was similar to 443 timing parameters (Fig. 5). Specifically, we found significant increases 65-365 dph, from a 444 median of 73.0±9.8% to 99.6±0.7% (WSR, p<0.0001). Of the 20 gaps with transition 445 probabilities <95% at 65 dph, 18 increased, and 12 by over 10%. Qualitatively, increases 446 appeared to reflect jumps to a probability near 1, irrespective of what those probabilities were at 447 65 dph (Fig. 5A). As with changes to timing parameters that showed significant development, 448 most of this change occurred 65-85 dph, with transition probabilities increasing by 4.7±3.0%, vs. 449 0.3±0.0% 85-365 dph. 450 We next asked whether, on a gap-by-gap basis, the increases in transition probabilities from 451 65 to 85 dph were linked with changes in the timing parameters that showed significant 452 differences across development (Fig. 6), specifically average duration, tempo CV and 453 independent CV. Indeed, we found that increases in transition probability were correlated with 454 decreases in duration and tempo CV, with respective Spearman’s correlations of -0.557, and - 455 0.342 (Bonferroni corrected p<0.01 in both cases). The correlation between the increase in 456 transition probability and independent CV was also negative, although very weak (Spearman’s 457 correlation = -0.136, p=.506). 458 459 Discussion 19 460 We have investigated the development of zebra finch rhythmic stereotypy from the late plastic 461 period through 1 year of age. We collected hundreds to thousands of songs from each of four 462 age periods, and measured song timing and syllable transition probabilities within each age 463 period to probe how the motor code changes over this age range. 464 We found increases in song tempo that occur almost entirely during the gaps of silence 465 between syllables. Most of the tempo increase occurs 65-85 dph, and on a gap-by-gap basis 466 increased speed is linked with increases in the reliability of corresponding syllable transitions. 467 Increases in transition probability between 65-85 dph are also linked with decreases in tempo 468 variability. For syllables, we found no systematic increase in tempo or decrease in tempo 469 variability. But as with gaps, syllables showed a sharp reduction in independent variability 470 between 65 and 85 dph followed by continued reductions over the first year of life. 471 Overall, the data suggest a phase of late development in which song production becomes 472 faster and more automated. Such a process of motor consolidation has been previously 473 suggested for zebra finch song based on other data (Brainard & Doupe, 2001). Interestingly, one 474 recent study has indicated opposite changes in Bengalese finch song timing in the later years of 475 life, i.e. that songs slow down, mostly due to increases in gap duration (Cooper et al., 2012). 476 477 Syllable lengths 478 We found that gaps, but not syllables, shortened over the first year of life. Brainard and Doupe 479 (2001) reported that a shortening of both syllables and gaps in young birds (>100dph), with the 480 fractional shortening of syllables roughtly half as much as for gaps. However, only the syllable 481 changes reached statstical significance as gap changes were more variable. Pytte et al. (2007) 482 found a significant shortening of syllables over the first year and a half of life, but no overall 483 shortening of gaps. It is unclear what accounts for the descrepancies between the three studies. 484 One point of difference is our use of a reliable peak in the amplitude derivatives to measure 20 485 syllable durations, rather than threshold crossing of an amplitude threshold. Syllables sometimes 486 have significant amplitude extending past the onset of inspiration (Andalman et al., 2011); it is 487 likely such low amplitude “tails” of a syllable would be included in our measure of gap rather than 488 syllable length. If the greater coordination between syllable production and respiration that 489 occurs over development reduces these low amplitude portions of syllables, this might lead to 490 the shortening of syllable length as measured in previous studies without having much effect on 491 syllable length as measured here. 492 493 The chaining hypothesis 494 The most common view of sequence generation is that of a chain with neurons active at one 495 point in the sequence, exciting the neurons active at the next point in the sequence, and so on. 496 Previous modeling work has shown that the combination of spike timing dependent plasticity and 497 synaptic pruning can lead to the development of chain-like patterns of activity in initially 498 randomly connected networks (Jun & Jin, 2007). Under such a scenario, one might expect that 499 the strength of connections along the chain to be correlated with several behavioral parameters. 500 First, stronger connections would decrease the latency to activity in the next link in the chain, 501 leading to a positive correlation between synaptic strength and tempo. Second, stronger 502 synaptic connections would be associated with more reliable propagation along the chain, and 503 hence less subject to extraneous “noisy" inputs, yielding lower timing variability. 504 Behavioral data from adult birds are consistent with a version of this hypothesis in which the 505 parts of the chain corresponding to gaps remain more weakly connected than the parts of the 506 chain corresponding to syllables. First, birds startled by flashes of light interrupt their songs at 507 transitions between syllables (Cynx, 1990), consistent with a greater sensitivity to perturbation 508 for the more weakly connected gap portions of the chain. Timing variability is greater for gaps 509 than for syllables (Cooper and Goller, 2006; Andalman et al. 2011) both in the tempo and 21 510 independent components of variability (Glaze & Troyer, 2006, 2012), consistent with less 511 sensitivity to outside perturbation for the syllable portions of the chain. 512 The chaining hypothesis has been extended to songs with variable sequencing, by 513 supposing that each syllable corresponds to a tightly linked sub-chain, and neurons at the end of 514 chain corresponding to one syllable can synapse onto neurons at the head of the chain for more 515 than one subsequent syllable; mutual inhibition from local interneurons then implements a 516 winner-take-all competition between chains ensuring that only one syllable-based chain could be 517 active at any given time (Jin, 2009). Under this hypothesis, transition probability is correlated 518 with connection strength, since a chain receiving stronger input would be more likely to emerge 519 as the winner in a winner-take-all competition. During late plastic song (~65 dph), these 520 mechanisms may have created a loosely organized network of syllable-based chains capable of 521 producing a small variety of song sequences with a bias consistent with the ordering in final 522 motif. Over time, spike timing dependent plasticity would favor the more likely transitions, and 523 these could completely “win out" by adulthood. 524 We have found that increases in transition probability are correlated with decreases in gap 525 duration and tempo CV on a gap-by-gap basis. Correlations with independent CV are also 526 negative, although the correlation is weak. Negative correlations between transition probability 527 and duration and timing variability are consistent with the chaining hypothesis in that increases 528 in transition probability and decreases in timing parameters would result from increases in 529 synaptic strength over development. This pattern is also consistent with data in Bengalese 530 finches showing that the ability of auditory feedback perturbations to alter both the stereotypy of 531 song sequence and the latency to the next syllable is smaller when perturbations are introduced 532 at points in the song with more regular sequencing (Sakata and Brainard, 2006, 2009). 533 534 HVC as the chaining nucleus 22 535 Previous electrophysiolgical recordings have shown that premotor neurons in song 536 nucleus HVC (used as proper name) produce short bursts of precisely timed action potentials, 537 both during the syllables and during the gaps (Hahnloser et al., 2002). Further correlational 538 evidence suggests that inputs from HVC control burst timing in RA (Fee et al., 2004). Cooling 539 HVC leads to a slowing of song tempo, whereas cooling the downstream nucleus RA does not 540 (Long and Fee, 2008). These results, along with intracellular recordings in HVC neurons during 541 singing (Long et al. 2010), suggest that HVC supports chain-like bursting that controls timing of 542 song behavior. This hypothesis is challenged by tight synchronization of both left and right HVC, 543 despite the lack of a corpus callosum in birds (Schmidt, 2003). The most likely source of a 544 synchronization signal is from bilateral connections from the midbrain and brainstem targets of 545 the song motor pathway, that eventually feedback to HVC (Schmidt et al. 2004; Ashmore et al. 546 2005, 2006; Andalman et al, 2011). 547 Chaining in Bengalese finches 548 If true, the chaining hypothesis could also have one of several implications for changes to 549 song in older Bengalese finch birds (Cooper et al., 2012): The slowing of tempo could reflect a 550 gradual weakening of synapses, although this would also imply a similar correlation between 551 increases in gap duration and weakening of associated transition probabilities. That study did 552 find decreases in syllable repetition rate, although a specific correlation with gap duration was 553 not reported for those transitions. Furthermore, one other modeling study of Bengalese finch 554 song has suggested that syllable repetitions are controlled by a process (adaptation) that is 555 separate from what controls other syllable transitions (Jin & Kozhevnikov, 2011). On this basis 556 one may not expect the transition-timing correlation among syllable repeats per se, but rather 557 among other transitions; such an expectation implies other changes to transition structure in 558 aging Bengalese finch song. Alternatively, the slowing of tempo could reflect changes to 559 influences on tempo such as brain temperature (Long & Fee, 2008), rather than network 23 560 parameters such as synaptic strength; under this scenario there would not necessarily be 561 changes to transition structure outside the decrease in repetitions. 562 563 Challenges to a chaining explaination 564 While increased synaptic strength is an obvious hypothesis for explaining developmental 565 changes, it cannot alone account for the changes in timing variability we have found. First, while 566 average tempo increases among gaps only, decreases in independent timing variability occur 567 across both syllables and gaps. This contradicts one basic prediction of the chaining model, i.e. 568 that changes in chain speed are coupled with sensitivity to external sources of timing variability. 569 How can syllable tempo remain the same while independent timing variability decreases? 570 Second, increases in synaptic strength would be expected to decrease the sensitivity of song 571 intervals to perturbations from all possible noise sources, including neuromodulatory 572 mechanisms that induce shared, global timing variability as well as background noise and inputs 573 from areas of the song system that might contribute to independent variability. Why are the 574 decreases in indpendent variability so much greater than the decreases in tempo variability? 575 One mechanism that could reconcile these differences would be a developmental increase in 576 synaptic strength for gap portions of the chain coupled with an overall decrease in inputs from 577 circuits that contribute to independent timing variability for both syllables and gaps. The former 578 would account for changes in transition probability, mean length, and tempo variability for gaps, 579 while the latter would contribute to reductions in independent variability for both syllable and 580 gaps. Under such an hypothesis, the decreases in independent variability for gaps that come 581 from external sources would act to mask any dependence of independent variability on transition 582 probability. 583 One possibilty for the source of input contributing to independent variability is LMAN. This 584 nucleus is the output nucleus of the cortical-basal ganglia loop for song, and has been strongly 24 585 implicated in song learning (Bottjer et al., 1984; Scharff & Nottebohn, 1991). LMAN produces 586 reliable burst sequences with trial-to-trial timing variability (Hessler & Doupe, 1999; Kao et al., 587 2008), while LMAN inactivation in developing birds yields adult-like stereotypy in the activity 588 premotor nucleus RA (robust nucleus of the arcopallium) during singing (Ölveczky et al., 2011). 589 Based on experiments combining behavioral or pharmacological manipulations of LMAN activity, 590 it has been proposed that LMAN is involved in generating the rendition-to-rendition variability 591 necessary for trial and error learning (Kao & Brainard, 2006; Kao et al., 2005; Ölveczky et al., 592 2005; Stepanek & Doupe, 2010; Charlesworth et al., 2012). Behavioral studies in developing 593 birds have suggested that song variability driven by LMAN activity might be directed to more 594 poorly learned elements of the song (Ravbar et al., 2012). Finally, while LMAN influences song 595 variability throughout a bird’s lifetime, the degree of this influence has been shown to 596 progressively decline over development (Aronov et al., 2011), including the same age range we 597 have investigated here (Brainard & Doupe, 2001). 598 The hypothesis that LMAN also contributes to timing variability is consistent with previous 599 studies showing that inactivation of LMAN reduces variability in both syllable and gap durations 600 (Thompson et al., 2011; Goldberg and Fee, 2011). At the level of song motifs, Thompson et al. 601 (2011) found a reduction in timing variability after LMAN lesions, but this reduction failed to 602 reach statistical significance. One possible explanation is that tempo variability, since it is shared 603 across the song, dominates variations in motif duration, whereas the tempo and independent 604 components make roughly equal contributions to timing variability for individual syllables or gaps 605 (Glaze and Troyer, 2012). As a result, the hypothesized reduction in independent variability 606 after LMAN inactivation would be expected to cause relatively smaller reductions in motif 607 duration versus that seen at the level of individual syllables or gaps. 608 If the hypothesis that LMAN contributes to timing variability is paired with the hypothesis that 609 song timing is controlled in HVC, then LMAN must have a way to influence activity in HVC. 610 Given the absence of direct connections from LMAN to HVC, this influence must be carried by 25 611 the possible routes from from RA, the premotor nucleus that receives input from LMAN, back to 612 HVC. These include direct synaptic connections from RA back to HVC (Roberts et al. 2008), 613 projections from RA to the medial portion of the dorsolateral thalamus (DMP) to medial MAN 614 (MMAN) and back to HVC (Vates et al. 1997), or through the circuits mentioned above that loop 615 through the midbrain and brainstem (Schmidt et al. 2004). 616 An alternative mechanism that might reconcile differences in developmental timing changes 617 for syllables and gaps would be a balancing of increased excitatory synaptic strength with 618 commensurate increases in inhibitory input during syllables. In adults, inhibitory HVC 619 interneurons have been demonstrated to be more active during syllable-based activity than for 620 gaps (Kozhevnikov & Fee, 2007), and it is possible that tempo increases expected from a 621 strengthening of excitatory synapses over development are offset by increased inhibitory 622 feedback during syllables. However, this mechanism does not explain the differences in tempo 623 CV for syllables vs. gaps. Furthermore, increased inhibitory feedback might also suppress local 624 timing variability in timing, leading to a greater decrease in independent CV for syllables than 625 gaps, which we do not find. 626 Other possible influences on the development of song timing 627 The declining influence of LMAN at later stages in development may be a continuation of an 628 switch from an early mode of vocalization known as subsong, in which young birds less than 629 approximately 45 dph produce long rambling vocalizations without a clear segmentation into 630 syllables. Subsong is known to be primarily driven by LMAN inputs to RA with inputs from HVC 631 becoming more important as birds start to produce early versions of song syllables (Aronov et al. 632 2008, 2011). During these earlier stages of learning ( 35-65 dph), lesions to the thalamic 633 nucleus DLM (medial portion of the dorsolateral thalamus), which sends afferents to LMAN, 634 increase the temporal stereotypy and overall rhythmicity of these early song elements (Goldberg 635 & Fee, 2011). 26 636 During this early period, birds are also learning to coordinate their breathing patterns with 637 syllable production, wth long expiratory pulses increasingly punctuated by short inspirations that 638 eventually closely coincide with intersyllable gaps (Veit et al. 2011). 639 shortening of gaps during the later period of development analyzed here may result from 640 maturation of that respiratory and syringeal musculature, allowing birds to replenish their air 641 supply with shorter inspirations. However, this mechanism does not explain the link between the 642 variability in gap duration and sequence stereotypy. It is possible that the 643 New neurons are added to the song nucleus HVC throughout a zebra finch’s life, but the rate 644 of new neuron incorporation declines precipitously over the first two years after song 645 crystalization (Wang et al., 2002), During this period, spectral features of song syllables become 646 increasingly stereotyped, and it is possible that the decline in new neuron incorporation may be 647 linked to changes in temporal stereotypy (Pytte et al., 2007). In particular, the incorporation of 648 new neurons might contribute to the decline in indpendent timing variability for both syllables 649 and gaps mesured here. 650 Previous experiments have shown that song tempo is positively correlated with brain 651 temperature (Long and Fee, 2008; Andalman et al., 2011) and that changes in temperature may 652 explain the increases in song tempo seen when male zebra finches direct their song to females 653 (Aronov and Fee, 2012). It is possible that developmental changes in the ability to regulate brain 654 temperature may contribute to increased tempo over development. Indeed, Long and Fee 655 (2008) reported that gap lengths are more sensitive to changes in temperature than syllable 656 lengths, although the syllable/gap difference was much smaller than for the developmental 657 changes in duration reported here. 658 659 Consolidation of sequencing and timing in other systems 660 Overall, the data suggest a phase of development in which syllables are eventually consolidated 661 into a longer, precisely-timed chaining mechanism that can run from beginning to end in a highly 662 reliable order. A similar process of linking simpler chains to form more functional activity patterns 663 has been proposed for neocortex (Bienenstock, 1995). Such a linking may be viewed from 664 another perspective as chunking actions (in this case, syllables) into units of behavior (motifs) 27 665 that are less susceptible to neural noise and interference from the environment. Such a process 666 has been proposed and investigated for mammalian action learning (Graybiel, 1998; Yin et al., 667 2009; Costa, 2011), with faster and more stereotyped timing as a common characteristic of 668 these learned behaviors (Sakai et al., 2004; Costa, 2011). It would thus be interesting to 669 investigate whether similar timing components and developmental patterns are at play across 670 other systems. 671 672 Acknowledgments 673 TWT is now at the Department of Biology, University of Texas, San Antonio. 674 675 Grants 676 The work was partially supported by NIH grant 5R21MH 066047-02, a Sloan Research 677 Fellowship to TWT and a fellowship for CMG from the Center for the Comparative Biology and 678 Evolution of Hearing (C-CEBH). 679 680 Author Contributions 681 All data were collected by CMG, data were analyzed by both CMG and TWT. 682 683 28 684 References 685 686 Andalman AS, Foerster JN, Fee MS. Control of vocal and respiratory patterns in birdsong: 687 dissection of forebrain and brainstem mechanisms using temperature. PLoS ONE 6: e25461, 688 2011. 689 Arnold AP. The effects of castration on song development in zebra finches (Poephila guttata). 690 J. Exp. Zool. 191: 261–278, 1975. 691 Aronov D, Andalman AS, Fee MS. A specialized forebrain circuit for vocal babbling in the 692 juvenile songbird. Science 320: 630–634, 2008. 693 Aronov D, Fee MS. Natural Changes in Brain Temperature Underlie Variations in Song Tempo 694 during a Mating Behavior. PLoS ONE 7: e47856, 2012. 695 Aronov D, Veit L, Goldberg JH, Fee MS. Two distinct modes of forebrain circuit dynamics 696 underlie temporal patterning in the vocalizations of young songbirds. J. Neurosci. 31: 16353– 697 16368, 2011. 698 Ashmore RC, Renk JA, Schmidt MF. Bottom-up activation of the vocal motor forebrain by the 699 respiratory brainstem. J. Neurosci. 28: 2613–2623, 2008. 700 Ashmore RC, Wild JM, Schmidt MF. Brainstem and forebrain contributions to the generation of 701 learned motor behaviors for song. J. Neurosci. 25: 8543–8554, 2005. 702 Bienenstock E. A model of neocortex. Network: Comp Neural Sys, 6, 179–224, 1995.. 703 Brainard MS, Doupe AJ. Auditory feedback in learning and maintenance of vocal behaviour. 704 Nat. Rev. Neurosci. 1: 31–40, 2000. 29 705 Brainard MS, Doupe AJ. Postlearning consolidation of birdsong: stabilizing effects of age and 706 anterior forebrain lesions. J. Neurosci. 21: 2501–2517, 2001. 707 Charlesworth JD, Tumer EC, Warren TL, Brainard MS. Learning the microstructure of 708 successful behavior. Nat. Neurosci. 14: 373–380, 2011. 709 Charlesworth JD, Warren TL, Brainard MS. Covert skill learning in a cortical-basal ganglia 710 circuit. Nature 486: 251–255, 2012. 711 Chi Z, Margoliash D. Temporal precision and temporal drift in brain and behavior of zebra finch 712 song. Neuron 32: 899–910, 2001. 713 Cooper BG, Goller F. Physiological insights into the social-context-dependent changes in the 714 rhythm of the song motor program. J. Neurophysiol. 95: 3798–3809, 2006. 715 Cooper BG, Méndez JM, Saar S, Whetstone AG, Meyers R, Goller F. Age-related changes in 716 the Bengalese finch song motor program. Neurobiol. Aging 33: 564–568, 2012. 717 Costa RM. A selectionist account of de novo action learning. Curr. Opin. Neurobiol. 21: 579– 718 586, 2011. 719 Cynx J. Experimental determination of a unit of song production in the zebra finch (Taeniopygia 720 guttata). J Comp Psychol 104: 3–10, 1990. 721 Derégnaucourt S, Mitra PP, Fehér O, Pytte C, Tchernichovski O. How sleep affects the 722 developmental learning of bird song. Nature 433: 710–716, 2005. 723 Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. Wiley-Interscience, New York, 724 2001. 725 Fee MS, Goldberg JH. A hypothesis for basal ganglia-dependent reinforcement learning in the 30 726 songbird. Neuroscience 198: 152–170, 2011. 727 Fee MS, Kozhevnikov AA, Hahnloser RHR. Neural mechanisms of vocal sequence generation 728 in the songbird. Ann. N. Y. Acad. Sci. 1016: 153–170, 2004. 729 Foster EF, Bottjer SW. Lesions of a telencephalic nucleus in male zebra finches: Influences on 730 vocal behavior in juveniles and adults. J. Neurobiol. 46: 142–165, 2001. 731 Franz M, Goller F. Respiratory units of motor production and song imitation in the zebra finch. J. 732 Neurobiol. 51: 129–141, 2002. 733 Glaze CM, Troyer TW. Temporal structure in zebra finch song: implications for motor coding. J 734 Neurosci, 26(3), 991–1005, 2006. 735 Glaze CM, Troyer TW. Behavioral measurements of a temporally precise motor code for 736 birdsong. J Neurosci, 27(29), 7631–7639, 2007. 737 Glaze CM, Troyer TW. A generative model for measuring latent timing structure in motor 738 sequences. PLoS ONE 7: e37616, 2012. 739 Goldberg JH, Fee MS. Vocal babbling in songbirds requires the basal ganglia-recipient motor 740 thalamus but not the basal ganglia. J. Neurophysiol. 105: 2729–2739, 2011. 741 Graybiel AM. The basal ganglia and chunking of action repertoires. Neurobiol Learn Mem 70: 742 119–136, 1998. 743 Hahnloser RHR, Kozhevnikov AA, Fee MS. An ultra-sparse code underlies the generation of 744 neural sequences in a songbird. Nature 419: 65–70, 2002. 745 Hu L-T, Bentler PM. Fit indices in covariance structure modeling: Sensitivity to 746 underparameterized model misspecification. Psychological Methods 3: 424–453, 1998. 31 747 Immelmann K. Song development in zebra finch and other estrildid finches. In R. Hinde (Ed.), 748 Bird Vocalisations (pp. 61–74). London: Cambridge University Press, 1969. 749 750 Jin DZ, Kozhevnikov AA. A compact statistical model of the song syntax in Bengalese finch. 751 PLoS Comput. Biol. 752 Jin DZ. Generating variable birdsong syllable sequences with branching chain networks in avian 753 premotor nucleus HVC. Phys Rev E 80: 051902, 2009. 754 Jun JK, Jin DZ. Development of neural circuitry for precise temporal sequences through 755 spontaneous activity, axon remodeling, and synaptic plasticity. PLoS ONE. 756 Kao MH, Brainard MS. Lesions of an avian basal ganglia circuit prevent context-dependent 757 changes to song variability. J. Neurophysiol. 96: 1441–1455, 2006. 758 Kao MH, Doupe AJ, Brainard MS. Contributions of an avian basal ganglia–forebrain circuit to 759 real-time modulation of song. Nature 433: 638–643, 2005. 760 Kao MH, Wright BD, Doupe AJ. Neurons in a forebrain nucleus required for vocal plasticity 761 rapidly switch between precise firing and variable bursting depending on social context. J. 762 Neurosci. 28: 13232–13247, 2008. 763 Keele SW, Ivry R, Mayr U, Hazeltine E, Heuer H. The cognitive and neural architecture of 764 sequence representation. Psychol Rev 110: 316–339, 2003. 765 Konishi M. Birdsong: From Behavior to Neuron. Annu. Rev. Neurosci. 8: 125–170, 1985. 766 Kozhevnikov AA, Fee MS. Singing-related activity of identified HVC neurons in the zebra finch. 767 J. Neurophysiol. 97: 4271–4283, 2007. 32 768 Lashley KS. The problem of serial order in behavior. In L. Jeffress (Ed.), Cerebral Mechanisms 769 in Behavior (pp. 112–146). New York: Wiley. 1951. 770 771 Long MA, Fee MS. Using temperature to analyse temporal dynamics in the songbird motor 772 pathway. Nature 456: 189–194, 2008. 773 Long MA, Jin DZ, Fee MS. Support for a synaptic chain model of neuronal sequence 774 generation. Nature 468: 394–399, 2010. 775 Marler P. A comparative approach to vocal learning: Song development in white-crowned 776 sparrows. J Comp Physiol Psychol. 71: 1–25, 1970. 777 Ölveczky BP, Andalman AS, Fee MS. Vocal experimentation in the juvenile songbird requires 778 a basal ganglia circuit. PLoS Biol. 3: e153, 2005. 779 Ölveczky BP, Otchy TM, Goldberg JH, Aronov D, Fee MS. Changes in the neural control of a 780 complex motor sequence during learning. J. Neurophysiol. 106: 386–397, 2011. 781 Pytte CL, Gerson M, Miller J, Kirn JR. Increasing stereotypy in adult zebra finch song 782 correlates with a declining rate of adult neurogenesis. Dev Neurobiol 67: 1699–1720, 2007. 783 Ravbar P, Lipkind D, Parra LC, Tchernichovski O. Vocal exploration is locally regulated 784 during song learning. J. Neurosci. 32: 3422–3432, 2012. 785 Rhodes BJ, Bullock D, Verwey WB, Averbeck BB, Page MPA. Learning and production of 786 movement sequences: behavioral, neurophysiological, and modeling perspectives. Hum Mov 787 Sci 23: 699–746, 2004. 788 Roberts TF, Klein ME, Kubke MF, Wild JM, Mooney R. Telencephalic neurons 789 monosynaptically link brainstem and forebrain premotor networks necessary for song. J. 33 790 Neurosci. 28: 3479–3489, 2008. 791 Sakai K, Hikosaka O, Nakamura K. Emergence of rhythm during motor learning. Trends Cogn. 792 Sci. (Regul. Ed.) 8: 547–553, 2004. 793 Sakata JT, Brainard MS. Real-time contributions of auditory feedback to avian vocal motor 794 control. J. Neurosci. 26: 9619–9628, 2006. 795 Sakata JT, Brainard MS. Social context rapidly modulates the influence of auditory feedback on 796 avian vocal motor control. J. Neurophysiol. 102: 2485–2497, 2009. 797 Scharff C, Nottebohm F. A comparative study of the behavioral deficits following lesions of 798 various parts of the zebra finch song system: implications for vocal learning. J. Neurosci. 11: 799 2896–2913, 1991. 800 Schmidt MF, Ashmore RC, Vu ET. Bilateral control and interhemispheric coordination in the 801 avian song motor system. Ann. N. Y. Acad. Sci. 1016: 171–186, 2004. 802 Schmidt MF. Pattern of interhemispheric synchronization in HVc during singing correlates with 803 key transitions in the song pattern. J Neurophysiol, 90(6), 3931–49, 2003. 804 Shank SS, Margoliash D. Sleep and sensorimotor integration during early vocal learning in a 805 songbird. Nature 458: 73–77, 2009. 806 Solis MM, Perkel DJ. Rhythmic activity in a forebrain vocal control nucleus in vitro. J. Neurosci. 807 25: 2811–2822, 2005. 808 Stepanek L, Doupe AJ. Activity in a cortical-basal ganglia circuit for song is required for social 809 context-dependent vocal variability. J Neurophysiol, 104(5), 2474–86, 2010. 810 Tchernichovski O. Dynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its 34 811 Song. Science 291: 2564–2569, 2001. 812 Troyer TW, Doupe AJ. An associational model of birdsong sensorimotor learning II. Temporal 813 hierarchies and the learning of song sequence. J Neurophysiol, 84, 1224–1239, 2000. 814 815 Vates GE, Vicario DS, Nottebohm F. Reafferent thalamo- “cortical” loops in the song system of 816 oscine songbirds. J. Comp. Neurol. 380: 275–290, 1997. 817 Veit L, Aronov D, Fee MS. Learning to breathe and sing: development of respiratory-vocal 818 coordination in young songbirds. J. Neurophysiol. 106: 1747–1765, 2011. 819 Wang NG, Hurley P, Pytte C, Kirn JR. Vocal control neuron incorporation decreases with age 820 in the adult zebra finch. J. Neurosci. 22: 10864–10870, 2002. 821 Yin HH, Mulcare SP, Hilário MRF, Clouse E, Holloway T, Davis MI, Hansson AC, Lovinger 822 DM, Costa RM. Dynamic reorganization of striatal circuits during the acquisition and 823 consolidation of a skill. Nat. Neurosci. 12: 333–341, 2009. 824 Zann R. The Zebra Finch: A Synthesis of Field and Laboratory Studies. New York: Oxford 825 University Press. 1996. 826 35 827 Figure Captions 828 Figure 1. Example of a timing covariance matrix and its decomposition in one bird (age 365 829 dph). A, Data covariance matrix. The darkness of each square shows the pairwise covariance of 830 the durations of the respective song elements listed along the x and y axes (letters denote 831 syllables; dashes denote silent gaps). B, Decomposition of the covariance matrix using the 832 timing variability model described in the Methods. The leftmost matrix is the total covariance 833 generated by the model, while each matrix on the righthand side of the equation is the 834 covariance generated by the respective timing components. Color limits are the same as in A. 835 836 Figure 2. Changes to average song duration and tempo variability over 65-365 dph. A: 837 Spectrograms from bird 257 from 65 and 85 dph; each spectrogram depicts the song motif 838 whose duration is closest to the average computed for that age period. Syllable labels are 839 indicated between the motifs. B, C: Change in average duration and tempo CV for the same bird 840 over 65-85 dph; syllable labels match panel A. D, E, F: Median±standard error for duration, 841 tempo variability and tempo CV across development. Durations are normalized by subtracting 842 the duration at 365 dph. 843 844 Figure 3. Changes to independent timing variability 65-365 dph. A, B: Median±standard error for 845 independent variability and CV. C: Independent CV for bird 253 for 65 and 85 dph. 846 847 Figure 4. Estimates of the syl/gap factor over the 4 developmental periods examined (see 848 Methods for definition). Errorbar plots indicate edian±standard error of raw syllable and gap 849 weights. 850 36 851 Figure 5. Changes to syllable transition probabilities 65-365 dph. A, B, C: Transition probabilities 852 across consecutive age ranges. D: Median±standard error of transition probabilities across 853 development. Overall, changes in probability are significant 65-85 dph, but not after 85 dph. 854 855 Figure 6. Changes in transition probability 65-85 dph vs. change in timing parameters of the 856 silent gaps associated with each transition. A: average gap duration; B: tempo CV; C: 857 independent CV. 858 859 Tables Duration (msec) Motifs* Syls Gaps* Tempo Var. Syls (msec) Gaps* Tempo CV Syls (%) Gaps* Ind. Var. Syls* (msec) Gaps* Ind. CV (%) Syls* Gaps* Syl/Gap factor Syls Gaps 65 dph 590.092 ±32.934 65.694 ±11.916 62.603 ±5.01 0.334 ±0.313 1.265 ±0.230 0.786 ±0.453 2.013 ±0.487 2.629 ±0.412 3.790 ±0.448 3.286 ±0.398 6.251 ±0.643 0.411 ±0.199 -0.652 ±0.311 85 dph 579.975 ±31.437 63.198 ±11.163 61.261 ±6.46 0.742 ±0.169 0.943 ±0.239 1.244 ±0.179 1.495 ±0.254 1.285 ±0.316 2.492 ±0.264 1.719 ±0.312 3.71 ±0.477 0.547 ±0.152 -0.587 ±0.171 125 dph 569.658 ±41.755 63.449 ±12.218 60.131 ±6.062 0.651 ±0.135 0.875 ±0.122 1.014 ±0.136 1.612 ±0.253 1.304 ±0.168 1.844 ±0.313 1.578 ±0.278 3.585 ±0.507 0.463 ±0.173 -0.812 ±0.197 37 365 dph 565.853 ±47.697 63.059 ±12.141 58.471 ±4.807 0.954 ±0.163 0.817 ±0.052 1.132 ±0.160 1.317 ±0.163 1.179 ±0.245 1.516 ±0.199 1.196 ±0.324 2.353 ±0.409 0.773 ±0.363 -0.933 ±0.219 change 65-365 -4.11 ±3.36% 2.13 ±2.26% -10.94 ±2.81% 0.040 ±0.183 -0.613 ±0.337 -0.022 ±0.287 -0.752 ±0.656 -0.970 ±0.299 -2.447 ±0.364 -1.255 ±0.315 -3.669 ±0.829 0.303 ±0.212 -0.421 ±0.294 change 65-85 -2.68 ±0.75% 0.31 ±0.79% -6.10 ±2.90% -0.062 ±0.208 -0.225 ±0.157 -0.089 ±0.378 -0.373 ±0.354 -0.479 ±0.325 -1.543 ±0.295 -0.876 ±0.286 -2.038 ±0.504 -0.033 ±0.177 -0.120 ±0.225 change 85-365 -2.43 ±2.66% -0.07 ±1.58% -5.67 ±2.43% 0.149 ±0.088 -0.094 ±0.173 0.216 ±0.081 -0.179 ±0.375 -0.149 ±0.162 -1.013 ±0.342 -0.288 ±0.112 -1.423 ±0.311 0.087 ±0.133 -0.132 ±0.210 Transition Prob.* 0.730 ±0.098 0.989 ±0.028 0.986 ±0.018 0.996 ±0.007 0.080 ±0.064 0.047 ±0.030 0.003 ±0.000 860 861 Developmental changes in timing and sequencing. Reported values are median±standard error 862 of the median, calculated from 200 bootstrapped samples. Duration changes are reported as 863 percent change; all others in the same units as the raw measures. * denotes significant change 864 in that variable over development (65 to 365 dph; WSR p<.05). Heavy borders between 865 corresponding syllable and gap cells denote significant group differences (Bonferoni corrected 866 WSM, p<.05). 38
© Copyright 2026 Paperzz