bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 1 Neural Encoding of Auditory Features during Music Perception and Imagery: 2 Insight into the Brain of a Piano Player 3 Auditory Features and Music Imagery 4 Stephanie Martin1, 2, 8, Christian Mikutta2, 3, 4, 8, Matthew K. Leonard5, Dylan Hungate5, Stefan 5 Koelsch6, Edward F. Chang5, José del R. Millán1, Robert T. Knight2, 7, Brian N. Pasley2, 9 6 7 1 8 Polytechnique Fédérale de Lausanne, Switzerland 9 2 10 3 11 Services University of Bern (UPD), University Hospital of Psychiatry, Bern, Switzerland 12 4 13 Switzerland 14 5 15 Neuroscience, University of California, San Francisco, CA 94143, USA. 16 6 17 7 Department of Psychology, University of California, Berkeley, CA 94720, USA 18 8 Co-first author 19 9 Lead Contact Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA Translational Research Center and Division of Clinical Research Support, Psychiatric Department of Neurology, Inselspital, Bern, University Hospital, University of Bern, Department of Neurological Surgery, Department of Physiology, and Center for Integrative Languages of Emotion, Freie Universitä, Berlin, Germany 20 21 Corresponding Author: Brian Pasley, University of California, Berkeley, Helen Wills 22 Neuroscience Institute, 210 Barker Hall, Berkeley, CA 94720, USA, [email protected] 23 24 25 1 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 26 Abstract 27 It remains unclear how the human cortex represents spectrotemporal sound features during 28 auditory imagery, and how this representation compares to auditory perception. To assess 29 this, we recorded electrocorticographic signals from an epileptic patient with proficient music 30 ability in two conditions. First, the participant played two piano pieces on an electronic piano 31 with the sound volume of the digital keyboard on. Second, the participant replayed the same 32 piano pieces, but without auditory feedback, and the participant was asked to imagine hearing 33 the music in his mind. In both conditions, the sound output of the keyboard was recorded, 34 thus allowing precise time-locking between the neural activity and the spectrotemporal 35 content of the music imagery. For both conditions, we built encoding models to predict high 36 gamma neural activity (70-150Hz) from the spectrogram representation of the recorded 37 sound. We found robust similarities between perception and imagery – in frequency and 38 temporal tuning properties in auditory areas. 39 40 Keywords: electrocorticography, music perception and imagery, spectrotemporal receptive 41 fields, frequency tuning, encoding models. 42 43 Abbreviations 44 ECoG: electrocorticography 45 HG: high gamma 46 IFG: inferior frontal gyrus 47 MTG: middle temporal gyrus 48 Post-CG: post-central gyrus 49 Pre-CG: pre-central gyrus 50 SMG: supramarginal gyrus 51 STG: superior temporal gyrus 52 STRF: spectrotemporal receptive field 53 2 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 54 Introduction 55 Auditory imagery is defined here as the mental representation of sound perception in the 56 absence of external auditory stimulation. The experience of auditory imagery is common, 57 such as when a song runs continually through someone’s mind. On an advanced level, 58 professional musicians are able to imagine a piece of music by looking at the sheet music [1]. 59 Behavioral studies have shown that structural and temporal properties of auditory features 60 (see [2] for complete review), such as pitch [3], timbre [4,5], loudness [6] and rhythm [7] are 61 preserved during auditory imagery. However, it is unclear how these auditory features are 62 represented in the brain during imagery. Experimental investigation is difficult due to the lack 63 of observable stimulus or behavioral markers during auditory imagery. Using a novel 64 experimental paradigm to synchronize auditory imagery events to neural activity, we 65 investigated the neural representation of spectrotemporal auditory features during auditory 66 imagery in an epileptic patient with proficient music abilities. 67 68 Previous studies have identified anatomical regions active during auditory imagery [8], and 69 how they compare to actual auditory perception. For instance, lesion [9] and brain imaging 70 studies [4,10–14] have confirmed the involvement of bilateral temporal lobe regions during 71 auditory imagery (see [15] for a review). Brain areas consistently activated with fMRI during 72 auditory imagery include the secondary auditory cortex [10,12,16], the frontal cortex [16,17], 73 the sylvian parietal temporal area [18], ventrolateral and dorsolateral cortices [19] and the 74 supplementary motor area [17,20–24]. Anatomical regions active during auditory imagery 75 have been compared to actual auditory perception to understand the interactions between 76 externally and internally driven cortical processes. Several studies showed that auditory 77 imagery has substantial, but not complete overlap in brain areas with music perception [8] – 78 e.g. the secondary auditory cortex is consistently activated during music imagery and 3 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 79 perception while the primary auditory areas appear to be activated solely during auditory 80 perception [4,10,25,26]. 81 82 These studies have helped to unravel anatomical brain areas involved in auditory perception 83 and imagery, however, there is lack of evidence for the representation of specific acoustic 84 features in the human cortex during auditory imagery. It remains a challenge to investigate 85 neural processing during internal subjective experience like music imagery, due to the 86 difficulty in time-locking brain activity to a measurable stimulus during auditory imagery. To 87 address this issue, we recorded electrocorticographic neural signals (ECoG) of a proficient 88 piano player in a novel task design that permitted robust marking of the spectrotemporal 89 content of the intended music imagery to neural activity – thus allowing us to investigate 90 specific auditory features during auditory imagery. In the first condition, the participant 91 played an electronic piano with the sound output turned on. In this condition, the sound was 92 played out loud through speakers at a comfortable sound volume that allowed auditory 93 feedback (perception condition). In the second condition, the participant played the electronic 94 piano with the speakers turned off, and instead imagined the corresponding music in his mind 95 (imagery condition). In both conditions, the digitized sound output of the MIDI-compatible 96 sound module was recorded. This provided a measurable record of the content and timing of 97 the participant’s music imagery when the speakers of the keyboard were turned off and he did 98 not hear the music. This task design allowed precise temporal alignment between the recorded 99 neural activity and spectrogram representations of music perception and imagery – providing 100 a unique opportunity to apply receptive field modeling techniques to quantitatively study 101 neural encoding during auditory imagery. 102 103 A well-established role of the early auditory system is to decompose complex sounds into 104 their component frequencies [27–29], giving rise to tonotopic maps in the auditory cortex (see 4 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 105 [30] for a review). Auditory perception has been extensively studied in animal models and 106 humans using spectrotemporal receptive field (STRFs) analysis [27,31–34], which identifies 107 the time-frequency stimulus features encoded by a neuron or population of neurons. STRFs 108 are consistently observed during auditory perception tasks, but the existence of STRFs during 109 auditory imagery is unclear due to the experimental challenges associated with synchronizing 110 neural activity and the imagined stimulus. To characterize and compare the spectrotemporal 111 tuning properties during auditory imagery and perception, we fitted two encoding models on 112 data collected from the perception and imagery conditions. In this case, encoding models 113 describe the linear mapping between a given auditory stimulus representation and its 114 corresponding brain response. For instance, encoding models have revealed the neural tuning 115 properties of various speech features, such as acoustic, phonetic and semantic representations 116 [33,35–38]. 117 118 In this study, the neural representation of music perception and imagery was quantified by 119 spectrotemporal receptive fields (STRFs) that predict high gamma (HG; 70-150Hz) neural 120 activity. High gamma correlates with the spiking activity of the underlying neuronal ensemble 121 [39–41] and reliably tracks speech and music features in auditory and motor cortex [33,42– 122 45]. Results demonstrated the presence of robust spectrotemporal receptive fields during 123 auditory imagery with extensive overlap in frequency tuning and cortical location compared 124 to receptive fields measured during auditory perception. These results provide a quantitative 125 characterization of the shared neural representation underlying auditory perception and the 126 subjective experience of auditory imagery. 127 128 Results 129 High gamma neural encoding during auditory perception and imagery 5 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 130 In this study, ECoG recordings were obtained using subdural electrode arrays implanted in a 131 patient undergoing neurosurgical procedures for epilepsy. The participant was a proficient 132 piano player (age of start: 7, years of music education: 10, hours of training per week: 5). 133 Prior studies have shown that music training is associated with improved auditory imagery 134 ability, such as pitch and temporal acuity [46–48]. We took advantage of this rare clinical 135 case to investigate the neural correlates of music perception and imagery. In the first task, the 136 participant played two music pieces on an electronic piano with the speakers of the digital 137 keyboard turned on (perception condition; Fig 1A). In the second task, the participant played 138 the two same piano pieces, but the volume of the speaker system was turned off to establish a 139 silent room. The participant was asked to imagine hearing the corresponding music in his 140 mind as he played the piano (imagery condition; Fig 1B). In both conditions, the sound 141 produced by the MIDI-compatible sound module was digitally recorded in synchrony with the 142 ECoG signal. The recorded sound allowed synchronizing the auditory spectrotemporal 143 patterns of the imagined music and the neural activity when the speaker system was turned off 144 and no audible sounds were recorded in the room. 145 146 147 Fig 1. Experimental task design. 148 (A) The participant played an electronic piano with the sound of the digital keyboard turned 149 on (perception condition). (B) In the second condition, the participant played the piano with 150 the sound turned off and instead imagined the corresponding music in his mind (imagery 151 condition). In both conditions, the digitized sound output of the MIDI-compatible sound 6 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 152 module was recorded in synchrony with the neural signals (even when the participant did not 153 hear any sound in the imagery condition). The models take as input a spectrogram consisting 154 of time-varying spectral power across a range of acoustic frequencies (200– 7,000 Hz, bottom 155 left) and output time-varying neural signals. To assess the encoding accuracy, the predicted 156 neural signal (light lines) is compared to the original neural signal (dark lines). 157 158 The participant performed two famous music pieces: Chopin’s Prelude in C minor, Op. 28, 159 No.20 and Bach’s Prelude in C major BWV 846. Example auditory spectrograms from 160 Chopin’s Prelude determined through the participant's key presses with the electronic piano 161 are shown in Fig 2B for both perception and imagery conditions. To evaluate how 162 consistently the participant performed across perception and imagery tasks, we computed the 163 realigned Euclidean distance [49] between the spectrograms of the same music pieces played 164 across conditions (within-stimulus distance). We compared the within-stimulus distance with 165 the realigned Euclidean distance between the spectrograms of the different musical pieces 166 (between-stimulus distance). The realigned Euclidean distance was 251.6% larger for the 167 between-stimulus distance compared to the within-stimulus distance (p<10-3; randomization 168 test), suggesting that the spectrograms of same musical pieces played across conditions were 169 more similar than the spectrograms of different musical pieces. This indicates that the 170 participant performed the task with relative consistency and specificity across the two 171 conditions. To compare spectrotemporal auditory representations during music perception and 172 music imagery tasks, we fit separate encoding models in each condition. We used these 173 models to quantify specific anatomical and neural tuning differences between auditory 174 perception and imagery. 175 7 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 176 177 Fig 2. Encoding accuracy. 178 (A) Electrode location overlaid on cortical surface reconstruction of the participant’s 179 cerebrum. (B) Overlay of the spectrogram contours for the perception (blue) and imagery 180 (orange) condition (10% of maximum energy from the spectrograms). (C) Actual and 181 predicted high gamma band power (70–150 Hz) induced by the music perception and imagery 182 segment in (B). Top electrodes have very similar predictive power, whereas bottom electrodes 183 are very different for perception and imagery. Recordings are from two different STG sites, 184 highlighted in pink in (A). (D) Encoding accuracy is plotted on the cortical surface 185 reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction). (E) 186 Encoding accuracy of significant electrodes of the perception model as a function the imagery 187 model. Electrode-specific encoding accuracy is correlated between both perception and 188 imagery models (r=0.65; p<10-4; randomization test). (F) Encoding accuracy as a function of 189 anatomic location (pre-central gyrus (pre-CG), post-central gyrus (post-CG), supramarginal 190 gyrus (SMG), medial temporal gyrus (MTG) and superior temporal gyrus (STG)). 191 8 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 192 For both perception and imagery conditions, the observed and predicted high gamma neural 193 responses are illustrated for two individual electrodes in the temporal lobe, respectively (Fig 194 2C), together with the corresponding music spectrum (Fig 2B). The predicted neural response 195 for the electrode shown in the upper panel of Fig 2B was significantly correlated with its 196 corresponding measured neural response in both perception (r=0.41; p<10-7; one-sample z- 197 test; FDR correction) and imagery (r=0.42; p<10-4; one-sample z-test; FDR correction) 198 conditions. The predicted neural response for the lower panel electrode was correlated with 199 the actual neural response only in the perception condition (r=0.23; p<0.005; one-sample z- 200 test; FDR correction) but not in the imagery condition (r=-0.02; p>0.5; one-sample z-test; 201 FDR correction). The difference between both conditions was significant for the electrode in 202 the lower panel (p<0.05; two-sample t-test), but not in the upper panel (p>0.5; two-sample t- 203 test). This suggests that there is a strong continuous relationship between time-varying 204 imagined sound features and STG neural activity, but that this relationship is dependent on 205 cortical location. 206 207 To further investigate anatomical similarities and differences between the perception and 208 imagery conditions, we plotted the anatomical layout of prediction accuracy of individual 209 electrodes. In both conditions, results showed that sites with the highest prediction in both 210 conditions were located in the superior and middle temporal gyrus, pre- and post-central 211 gyrus, and supramarginal gyrus (Fig 2D; heat map thresholded to p<0.05; one-sample z-test; 212 FDR correction), consistent with previous ECoG results [33]. Among the 256 electrodes 213 recorded, 210 were fitted in the encoding model, while the remaining 46 electrodes were 214 removed due to excessive noise. Within the fitted electrodes, while 35 and 15 electrodes were 215 significant in the perception and imagery condition, respectively (p<0.05; one-sample z-test; 216 FDR correction), of which 9 electrodes were significant in both conditions. Anatomic 217 locations of the electrodes with significant encoding accuracy are depicted in Fig 3 and Fig 9 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 218 S1. To compare the encoding accuracy across conditions, we performed additional analysis on 219 the electrodes that had significant encoding accuracy in at least one condition (41 electrodes; 220 unless otherwise stated). Prediction accuracy of individual electrodes was correlated between 221 perception and imagery (Fig 2E; 41 electrodes; r=0.65; p<10-4; randomization test). Because 222 both perception and imagery models are based on the same auditory stimulus representation, 223 the correlated prediction accuracy provides strong evidence for a shared neural representation 224 of sound based on spectrotemporal features. 225 226 To assess how brain areas encoding auditory features varied across experimental conditions, 227 we analyzed the significant electrodes in the gyri highlighted in Fig 2A (pre-central gyrus 228 (pre-CG), post-central gyrus (post-CG), supramarginal gyrus (SMG), medial temporal gyrus 229 (MTG) and superior temporal gyrus (STG)) using Wilcoxon signed-rank test (p>0.05; one- 230 sample Kolmogorov-Smirnov test; Fig 2F). Results showed that the encoding accuracy in the 231 MTG and STG was higher for the perception (MTG: M = 0.16, STG: M = 0.13) than for the 232 imagery (MTG: M = 0.11, STG: M= 0.08; p<0.05; Wilcoxon signed-rank test; Bonferroni 233 correction). The encoding accuracy in the pre-CG, post-CG and SMG was not different 234 between the perception (pre-CG M=0.17; post-CG M=0.15; SMG M=0.12; p>0.5; Wilcoxon 235 signed-rank test; Bonferroni correction) and imagery (pre-CG M=0.14; post-CG M=0.13; 236 SMG M=0.12; p>0.5; Wilcoxon signed-rank test; Bonferroni correction) conditions. The 237 significant improvement of the perception vs. imagery model was thus specific to the 238 temporal lobe, which may reflect underlying differences in spectrotemporal encoding 239 mechanisms, or alternatively, a greater sensitivity to discrepancies between the actual content 240 of imagery and the recorded sound stimulus used in the model. 241 242 Spectrotemporal tuning during auditory perception and imagery 243 Auditory imagery and perception are distinct subjective experiences yet both are 10 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 244 characterized by a sense of sound. How does the auditory system encode sound during music 245 perception and imagery? Examples of standard STRFs are shown in Fig 3 for temporal 246 electrodes (Fig S2 for all the STRFs). These STRFs highlight neural stimulus preferences as 247 shown by the excitatory (warm color) and inhibitory (cold color) subregions. Fig 4A shows 248 the latency (s) for the perception and imagery conditions, defined as the temporal coordinates 249 of the maximum deviation in the STRF. The peak latency was significantly correlated 250 between both conditions (r=0.43; p<0.005; randomization test), suggesting that both 251 perception and imagery are simultaneously active for most of their response durations. Then, 252 we analyzed frequency tuning curves estimated from the STRFs (see Materials and Methods 253 for details). Examples of tuning curves for both perception and imagery encoding models are 254 shown for the electrodes indicated by the black outline in the anatomic brain (Fig 4B). Across 255 conditions, the majority of individual electrodes exhibited a complex frequency tuning 256 profile. For each electrode, similarities between the tuning curves in the perception and 257 imagery models were quantified using Pearson’s correlation coefficient. The anatomical 258 distribution of tuning curve similarity is plotted in Fig 4B, with the correlation at individual 259 sites ranging between r=-0.3-0.6. The effect of anatomical location (pre-CG, post-CG, SMG, 260 MTG and STG) on tuning curve similarity was not significant (Chi-square=3.59; p>0.1; 261 Kruskal-Wallis Test). Similarities in tuning curve shape between auditory imagery and 262 perception suggest a shared auditory representation, but there is no evidence that similarities 263 depended on gyral area. 11 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 264 265 Fig 3. Spectrotemporal receptive fields (STRFs). 266 Examples of standard STRFs for the perception (left panel) and imagery (right panel) models 267 (warm colors indicate where the neuronal ensemble is excited, cold colors indicate where the 268 neuronal ensemble is inhibited). On the lower left corner, Electrode location overlaid on 269 cortical surface reconstructions of the participant’s cerebrum. Electrodes whose STRFs are 270 shown are outlined in black. Grey electrodes were removed from the analysis due to excessive 271 noise (see Materials and Methods). 272 12 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 273 274 Fig 4. Auditory tuning. 275 (A) Latency peaks – estimated from the STRFs – were significantly correlated between 276 perception and imagery conditions (r=0.43; p<0.005; randomization test). (B) Examples of 277 tuning curves for both perception and imagery encoding models defined as the average gain 278 of the STRFs as a function of acoustic frequency. Black outline in the anatomic brain indicate 279 electrode location. for the electrodes indicated by the black outline in the left panel – 280 Correlation coefficients between the perception and imagery conditions are plotted for 281 significant electrodes on the cortical surface reconstruction of the participant’s cerebrum. 282 Grey electrodes were removed from the analysis due to excessive noise. Bottom panel is a 283 histogram of electrode correlation coefficients between the perception and imagery tuning. 284 (C) Proportion of predictive electrode sites (N=41) with peak tuning at each frequency. 285 Tuning peaks were identified as significant parameters in the acoustic frequency tuning 286 curves (z>3.1; p<0.001) – separated by more than one third an octave. 287 288 Different electrodes are sensitive to different acoustic frequencies important for music 13 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 289 processing. We next quantitatively assessed how frequency tuning of predictive electrodes 290 (N=41) varied during the two conditions. First, to evaluate how the acoustic spectrum was 291 covered at the population level, we quantified the proportion of significant electrodes with a 292 tuning peak at each acoustic frequency (Fig 4C). Tuning peaks were identified as significant 293 parameters in the acoustic frequency tuning curves (z>3.1; p<0.001; separated by more than 294 one third an octave). The proportion of electrodes with tuning peaks was significantly larger 295 for the perception (mean = 0.19) than for the imagery (mean = 0.14) condition (Fig 4C; 296 p<0.05; Wilcoxon signed-rank test). Despite this, both conditions exhibited reliable frequency 297 selectivity, as nearly the full range of the acoustic frequency spectrum was encoded. The 298 fraction of acoustic frequency covered with peaks by predictive electrodes was 0.91 for the 299 perception and 0.89 for the imagery. 300 301 Reconstruction of auditory features during music perception and imagery 302 To evaluate the ability to identify piano keys from the brain activity, we reconstructed the 303 same auditory spectrogram representation used in the encoding models. Results showed that 304 the overall reconstruction accuracy was higher than zero in both conditions (Fig 5A; p < 305 0.001; randomization test), but did not differ between conditions (p > 0.05; two-sample t-test). 306 As a function of acoustic frequency, mean accuracy ranged from r=0–0.45 (Fig 5B). 14 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 307 308 Fig 5. Reconstruction accuracy. 309 (A) Right panel: Overall reconstruction accuracy of the spectrogram representation for both 310 perception (blue) and imagery (orange) conditions. Error bars denote resampling SEM. Left 311 panel: Reconstruction accuracy as a function of acoustic frequency. Shaded region denotes 312 SEM over the resamples. (B) Examples of original and reconstructed segments for the 313 perception (left) and the imagery (right) model. (C) Distribution of identification rank for all 314 reconstructed spectrogram notes. Median identification rank is 0.65 and 0.63 for the 315 perception and imagery decoding model, respectively, which is significantly higher than 0.50 316 chance level (p<0.001; randomization test). Left panel: Receiver operating characteristic 317 (ROC) plot of identification performance for the perception (blue curve) and imagery (orange 318 curve) model. Diagonal black line indicates no predictive power. 319 15 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 320 We further assessed reconstruction accuracy by evaluating the ability to identify isolated 321 piano notes from the test set auditory spectrogram reconstructions. Examples of original and 322 reconstructed segments are depicted in Fig 5B for the perception (left) and imagery model 323 (right). For the identification, we extracted 0.5-second segments at piano note onsets from the 324 original and reconstructed auditory spectrogram. Onsets of the notes were defined with the 325 MIRtoolbox [50]. Then, we computed the correlation coefficient between a target 326 reconstructed spectrogram and original spectrograms in the candidate set. Finally, we sorted 327 the coefficients and computed the identification rank as the percentile rank of the correct 328 spectrogram. This metric reflects how well the target reconstruction matched the correct 329 original spectrogram out of all candidate. Results showed that the median identification rank 330 of individual piano notes was significantly higher than chance for both conditions (Fig 5C; 331 median identification rank perception = 0.72 and imagery = 0.69; p< 0.001; randomization 332 test). Similarly, the area under the curve (AUC) of identification performance for the 333 perception (blue curve) and imagery (orange curve) model was well above chance level 334 (diagonal black dashed line indicates no predictive power; p<0.001; randomization test). 335 Cross-condition analysis 336 Another way to evaluate the overlapping degree between both perception and imagery 337 conditions is to apply the decoding model built in the perception condition to imagery neural 338 data, and vice-versa. This approach is based on the hypothesis that both tasks share neural 339 mechanisms and is useful when one of the models cannot be built directly, because of the lack 340 of observable measures. This technique has been successfully applied to various fields, such 341 as vision [51–53], and speech [54]. When the model was trained on the perception condition 342 and tested on the imagined condition (r=0.28), decoding performances improved by 50% 343 compared to when the perception model was applied to imagined data (r=0.19) (Fig 6). This 344 highlight the importance of having a model that is specific to each condition. 345 16 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 346 347 Fig 6. Cross-condition analysis. 348 Reconstruction accuracy when the decoding model was built on the perception condition and 349 applied to the imagery neural data and vis-versa. Decoding performances improved by 50% 350 when the model was trained on the perception condition and tested on the imagined condition 351 (r=0.28), compared to when the perception model was applied to imagined data (r=0.19). 352 353 Control analysis for sensorimotor confounds 354 In this study, the participant played piano in two different conditions (music perception and 355 imagery). In addition to the auditory percept, arm-, hand- and finger-movements related to the 356 active piano task could have presented potential confounds to the decoding process. We 357 controlled for possible motor confounds in three different ways. First, the electrode grid did 358 not cover hand or finger sensorimotor brain area (Fig 7A). This reduces the likelihood that the 359 decoding model involved hand-related motor confounds. Second, examples of STRFs for two 360 neighboring electrodes show different weight patterns for both conditions (Fig 7B). For 361 instance, the weights of the electrode depicted in the left example of Fig 7B are correlated 362 between both conditions (r=0.48), whereas the weights are not correlated in the right example 363 (r=0.04). Differences across conditions cannot be explained by motor confounds, because 364 finger movement was similar in both tasks. Third, brain areas that significantly encoded 365 music perception and imagery overlapped with auditory sensory areas (Fig S1 and Fig S3), as 366 revealed by the encoding accuracy and STRFs during passive listening to TIMIT sentences 367 (no movements). These findings provide evidence against motor confounds, and suggest that 17 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 368 the brain responses were induced by auditory percepts rather than motor movements 369 associated with pressing piano keys. Finally, we built two additional decoding models, using 370 1) only temporal lobe electrodes and 2) only auditory-responsive electrodes (Fig 7C; see 371 Materials and Methods for details). Both models showed significant reconstruction accuracy 372 (p<0.001; randomization test) and median identification rate (p<0.001; randomization test). 373 This suggests that even if we removed all electrodes that are potentially not related to auditory 374 processes, the decoding model still performs well. 375 18 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 376 377 Fig 7. Control analysis for motor confound. 378 (A) Electrode location overlaid on cortical surface reconstructions of the participant’s 379 cerebrum. View from the top of the brain shows that hand motor areas not recorded with the 380 grid. Brain areas associated with face, upper and lower limbs are depicted in green, blue and 381 pink, respectively. (B) Different STRFs for two neighboring electrodes highlighted in (A) for 382 both perception and imagery encoding models. (C) Overall reconstruction accuracy (left 19 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 383 panel) and median identification rank (right panel) when using all electrodes, only temporal 384 electrodes, and auditory electrodes (see Materials and methods for details). 385 386 Discussion 387 Music imagery studies are daunting due to the subjective nature and absence of verifiable and 388 observable measures. This task design allowed precise time-locking between the recorded 389 neural activity and spectrotemporal features of music imagery, and provided a unique 390 opportunity to quantitatively study neural encoding during auditory imagery, and compare 391 tuning properties with auditory perception. We provide the first evidence of spectrotemporal 392 receptive field and auditory features encoding in the brain during music imagery, and 393 provided comparative measures with actual music perception encoding. We observed that 394 neuronal ensembles were tuned to acoustic frequencies during imagined music, suggesting 395 that spectral organization occurs in the absence of actual perceived sound. Supporting 396 evidence has shown that restored speech – when a speech instance is replaced by noise, but 397 the listener perceives a specific speech sound – is grounded in acoustic representations in the 398 superior temporal gyrus [55,56]. In addition, results showed substantial, but not complete 399 overlap in neural properties – i.e. spectral and temporal tuning properties, and brain areas – 400 during music perception and imagery. These findings are in agreement with visual studies 401 showing that visual imagery and perception involve many, but not all, overlapping brain areas 402 [57,58]. We also showed that auditory features could be reconstructed from neural activity of 403 the imagined music. Because both perception and imagery models are based on the same 404 auditory stimulus representation, the correlated prediction accuracy provides strong evidence 405 for a shared neural representation of sound based on spectrotemporal features. This confirms 406 that the brain encodes spectrotemporal properties of sounds – as previously shown by 407 behavioral and brain lesion studies (see [2] for a review). 408 20 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 409 Methodological issues in investigating imagery are numerous, including the lack of evidence 410 that the desired mental task was operational. The current task design did not allow verifying 411 how the mental task was performed, yet the behavioral index of keynote press on the piano 412 indicated the precise time and frequency content of the intended imagined sound. In addition, 413 we recorded a skilled piano player, and it has been suggested that participants with musical 414 training exhibited better pitch and temporal acuity during auditory imagery than did 415 participants with little or no musical training [47,59]. Furthermore, tonotopic maps located in 416 the STG are enlarged within trained musicians [60]. Thus, having a trained piano-player 417 suggests improved auditory imagery ability (see also [7,9,14], and reduced issues related to 418 spectral and temporal errors. 419 420 Finally, the electrode grid was located on the left hemisphere of the participant. This raises 421 the question of lateralization in the brain response to music perception and imagery. Studies 422 have shown the importance of both hemispheres for auditory perception and imagination 423 [4,9,10,12–14], although brain patterns tend to shift toward the right hemisphere (see [15] for 424 a review). In our task, the grid was located on the left hemisphere, and allowed significant 425 encoding and decoding accuracy within high gamma frequency ranges. This is consistent with 426 the notion that music auditory processes are also evident in the left hemisphere. 427 428 Materials and Methods 429 Participant and data acquisition 430 Electrocorticographic (ECoG) recording was obtained using subdural electrode arrays 431 implanted in one patient undergoing neurosurgical treatment for refractory epilepsy. The 432 participant gave his written informed consent prior to surgery and experimental testing. The 433 experimental protocol was approved by the University of California, San Francisco and 434 Berkeley Institutional Review Boards and Committees on Human Research. Inter-electrode 21 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 435 spacing (center-to-center) was 4 mm. Grid placement (Fig 2A) was defined solely based on 436 clinical requirements. Localization and co-registration of electrodes was performed using the 437 structural MRI. Multi-electrode ECoG data were amplified and digitally recorded with 438 sampling rate of 3,052 Hz. ECoG signals were re-referenced to a common average after 439 removal of electrodes with epileptic artifacts or excessive noise (including broadband 440 electromagnetic noise from hospital equipment or poor contact with the cortical surface). In 441 addition to the ECoG signals, the audio output of the piano was recorded along with the 442 multi-electrode ECoG data. 443 444 Experimental paradigm 445 The recording session included two conditions. In the first condition, the participant played on 446 an electronic piano with the sound turned on. That is, the music was played out loud through 447 the speakers of the digital keyboard in the hospital room (volume at comfortable and natural 448 sound level; Fig 1A; perception condition). In the second condition, the participant played on 449 the piano with the speakers turned off and instead imagined hearing the corresponding music 450 in his mind (Fig 1B; imagery condition). Fig 1B illustrates that in both conditions, the 451 digitized sound output of the MIDI sound module was recorded in synchrony with the ECoG 452 data (even when the speakers were turned off and the participant did not hear the music). The 453 two music pieces were Chopin’s Prelude in C minor Op. 28 no. 20 and Bach’s Prelude in C 454 major (BWV 846), respectively. 455 456 Feature extraction 457 We extracted the ECoG signal in the high gamma frequency band from eight bandpass filters 458 (hamming window non-causal filter of order 20, logarithmically increasing center frequencies 459 (70–150 Hz) and semi-logarithmically increasing bandwidths), and extracted the envelope 460 using the Hilbert transform. Prior to model fitting, the power was averaged across these eight 22 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 461 bands, down-sampled to 100 Hz and z-scored. 462 463 Auditory spectrogram representation 464 The auditory spectrogram representation was a time-varying representation of the amplitude 465 envelope at acoustic frequencies logarithmically spaced between 200-7,000 Hz. This 466 representation was calculated by affine wavelet transforms of the sound waveform using 467 auditory filter banks that mimics neural processing in the human auditory periphery [31]. To 468 compute these acoustic representations, we used the NSL MATLAB toolbox (http://www. 469 isr.umd.edu/Labs/NSL/Software.htm). 470 471 Encoding model 472 The neural encoding model, based on the spectrotemporal receptive field (STRF) [34] 473 describes the linear mapping between the music stimulus and the high gamma neural response 474 at individual electrodes. The encoding model was estimated as follows: 475 𝑅 𝑡, 𝑛 = ℎ(𝜏, 𝑓, 𝑛)𝑆(𝑡 − 𝜏, 𝑓) . - 476 where 𝑅 𝑡, 𝑛 is the predicted high gamma neural activity at time t and electrode n, S(t-t, f) is 477 the spectrogram representation at time (t-t) and acoustic frequency f. Finally, h(t, f, n) is the 478 linear transformation matrix that depends on the time lag t, the frequency f and electrodes n. h 479 represents the spectrotemporal receptive field of each electrode. The Neural tuning properties 480 of a variety of stimulus parameters in different sensory systems have been assessed using 481 STRFs [61]. We used Ridge regression to fit the encoding model [62], and a 10-fold cross- 482 validation resampling procedure, with no overlap between training and test partitions within 483 each resample. We performed grid search on the training set to define the penalty coefficient 484 𝛼 and the learning rate h, using a nested loop cross-validation approach. We standardized the 485 parameter estimates to yield the final model. 23 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 486 487 Decoding model 488 The decoding model linearly mapped the neural activity to the music representation, as a 489 weighted sum of activity at each electrode, as follows: 𝑆 𝑡, 𝑓 = 490 𝑔(𝜏, 𝑓, 𝑛)𝑅(𝑡 − 𝜏, 𝑛) . 1 491 where 𝑆 𝑡, 𝑓 is the predicted music representation at time t and frequency f. R(t-t, n) is the 492 HG neural response of electrode n at time (t-t), t is the time lag ranging between -100ms and 493 400ms. Finally, g(t, f, n) is the linear transformation matrix that depends on the time lag t, 494 frequency f, and electrode n. Both, neural response and music representation were 495 synchronized, downsampled to 100 Hz, and standardized to zero mean and unit standard 496 deviation prior to model fitting. To fit model parameters, we used gradient descent with early 497 stopping regularization. We used a 10-folds cross-validation resampling scheme, and 20% of 498 the training data were used as validation set to determine the early stopping criterion. Finally, 499 model prediction accuracy was evaluated on the independent testing set, and the parameter 500 estimates were standardized to yield the final model. 501 502 Evaluation 503 Prediction accuracy was quantified using the correlation coefficient (Pearson’s r) between the 504 predicted and actual HG signal using data from the independent test. Overall encoding 505 accuracy was reported as the mean correlation over folds. The z-test was applied for all 506 reported mean r values. Electrodes were defined as significant if the p-value was smaller than 507 the significance threshold of a=0.05 (95th-percentile; FDR correction). To define auditory 508 sensory areas, we built an encoding model on data recorded while the participant listened 24 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 509 passively to speech sentences from the TIMIT corpus [63] during 10min. Electrodes with 510 significant encoding accuracy are highlighted in Fig S1. 511 512 To further investigate the neural encoding of spectrotemporal acoustic features during music 513 perception and music imagery, we analyzed all the electrodes that were at least significant in 514 one condition (unless otherwise stated). Tuning curves were estimated from spectrotemporal 515 receptive fields, by first setting all inhibitory weights to zero, then averaging across the time 516 dimension and converting to standardized z-scores. Tuning peaks were identified as 517 significant peak parameters in the acoustic frequency tuning curves (z>3.1; p<0.001) – 518 separated by more than one third an octave. 519 520 Decoding accuracy was assessed by calculating the correlation coefficient (Pearson’s r) 521 between the reconstructed and original music spectrogram representation using testing set 522 data. Overall reconstruction accuracy was computed by averaging over acoustic frequencies 523 and resamples, and standard error of the mean (SEM) was computed by taking the standard 524 deviation across resamples. Finally, to further assess the reconstruction accuracy, we 525 evaluated the ability to identify isolated piano notes from the test set auditory spectrogram 526 reconstructions – using similar approach as in [33,54]. 527 528 Acknowledgments 529 This work was supported by the Zeno-Karl Schindler Foundation, NINDS Grant R3721135, 530 NIH 5K99DC012804 and the Nielsen Corporation. 531 532 References 533 534 535 1. Meister IG, Krings T, Foltys H, Boroojerdi B, Müller M, Töpper R, et al. Playing piano in the mind--an fMRI study on music imagery and performance in pianists. Brain Res Cogn Brain Res. 2004;19: 219–228. doi:10.1016/j.cogbrainres.2003.12.005 25 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 536 537 2. Hubbard TL. Auditory imagery: Empirical findings. Psychol Bull. 2010;136: 302–329. doi:10.1037/a0018436 538 539 3. Halpern AR. Memory for the absolute pitch of familiar songs. Mem Cognit. 1989;17: 572–581. 540 541 542 4. Halpern AR, Zatorre RJ, Bouffard M, Johnson JA. Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia. 2004;42: 1281–1292. doi:10.1016/j.neuropsychologia.2003.12.017 543 544 5. Pitt MA, Crowder RG. The role of spectral and dynamic cues in imagery for musical timbre. J Exp Psychol Hum Percept Perform. 1992;18: 728–738. 545 546 6. Intons-Peterson M. Components of auditory imagery. In: Reisberg D, editor. Auditory imagery. Hillsdale, N.J: L. Erlbaum Associates; 1992. pp. 45–72. 547 548 7. Halpern. Mental scanning in auditory imagery for songs. J Exp Psychol Learn Mem Cogn. 1988;14: 434–443. 549 550 8. Kosslyn SM, Ganis G, Thompson WL. Neural foundations of imagery. Nat Rev Neurosci. 2001;2: 635–642. doi:10.1038/35090055 551 552 9. Zatorre, Halpern. Effect of unilateral temporal-lobe excision on perception and imagery of songs. Neuropsychologia. 1993;31: 221–232. 553 554 10. Griffiths TD. Human complex sound analysis. Clin Sci Lond Engl 1979. 1999;96: 231– 234. 555 556 557 11. Halpern AR. When That Tune Runs Through Your Head: A PET Investigation of Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697–704. doi:10.1093/cercor/9.7.697 558 559 12. Kraemer DJM, Macrae CN, Green AE, Kelley WM. Musical imagery: Sound of silence activates auditory cortex. Nature. 2005;434: 158–158. doi:10.1038/434158a 560 13. Rauschecker JP. Cortical plasticity and music. Ann N Y Acad Sci. 2001;930: 330–336. 561 562 563 14. Zatorre RJ, Halpern AR, Perry DW, Meyer E, Evans AC. Hearing in the Mind’s Ear: A PET Investigation of Musical Imagery and Perception. J Cogn Neurosci. 1996;8: 29–46. doi:10.1162/jocn.1996.8.1.29 564 565 15. Zatorre RJ, Halpern AR. Mental Concerts: Musical Imagery and Auditory Cortex. Neuron. 2005;47: 9–12. doi:10.1016/j.neuron.2005.06.013 566 567 568 16. Zatorre RJ, Halpern AR, Bouffard M. Mental Reversal of Imagined Melodies: A Role for the Posterior Parietal Cortex. J Cogn Neurosci. 2009;22: 775–789. doi:10.1162/jocn.2009.21239 569 570 571 17. Halpern AR, Zatorre RJ. When That Tune Runs Through Your Head: A PET Investigation of Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697– 704. doi:10.1093/cercor/9.7.697 26 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 572 573 574 18. Hickok G, Buchsbaum B, Humphries C, Muftuler T. Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt. J Cogn Neurosci. 2003;15: 673–682. doi:10.1162/089892903322307393 575 576 577 19. Meyer M, Elmer S, Baumann S, Jancke L. Short-term plasticity in the auditory system: differential neural responses to perception and imagery of speech and music. Restor Neurol Neurosci. 2007;25: 411–431. 578 579 20. Halpern AR. Cerebral substrates of musical imagery. Ann N Y Acad Sci. 2001;930: 179–192. 580 581 21. Mikumo M. Motor Encoding Strategy for Pitches of Melodies. Music Percept Interdiscip J. 1994;12: 175–197. doi:10.2307/40285650 582 583 22. Petsche H, von Stein A, Filz O. EEG aspects of mentally playing an instrument. Brain Res Cogn Brain Res. 1996;3: 115–123. 584 585 23. Brodsky W, Henik A, Rubinstein B-S, Zorman M. Auditory imagery from musical notation in expert musicians. Percept Psychophys. 2003;65: 602–612. 586 587 24. Schürmann M, Raij T, Fujiki N, Hari R. Mind’s ear in a musician: where and when in the brain. NeuroImage. 2002;16: 434–440. doi:10.1006/nimg.2002.1098 588 589 590 25. Bunzeck N, Wuestenberg T, Lutz K, Heinze H-J, Jancke L. Scanning silence: mental imagery of complex sounds. NeuroImage. 2005;26: 1119–1127. doi:10.1016/j.neuroimage.2005.03.013 591 592 26. Yoo SS, Lee CU, Choi BG. Human brain mapping of auditory imagery: event-related functional MRI study. Neuroreport. 2001;12: 3045–3049. 593 594 595 27. Aertsen AMHJ, Olders JHJ, Johannesma PIM. Spectro-temporal receptive fields of auditory neurons in the grassfrog: III. analysis of the stimulus-event relation for natural stimuli. Biol Cybern. 1981;39: 195–209. doi:10.1007/BF00342772 596 597 598 28. Eggermont JJ, Aertsen AMHJ, Johannesma PIM. Quantitative characterisation procedure for auditory neurons based on the spectro-temporal receptive field. Hear Res. 1983;10: 167–190. doi:10.1016/0378-5955(83)90052-7 599 600 601 29. Tian B. Processing of Frequency-Modulated Sounds in the Lateral Auditory Belt Cortex of the Rhesus Monkey. J Neurophysiol. 2004;92: 2993–3013. doi:10.1152/jn.00472.2003 602 603 30. Saenz M, Langers DRM. Tonotopic mapping of human auditory cortex. Hear Res. 2014;307: 42–52. doi:10.1016/j.heares.2013.07.016 604 605 31. Chi T, Ru P, Shamma SA. Multiresolution spectrotemporal analysis of complex sounds. J Acoust Soc Am. 2005;118: 887. doi:10.1121/1.1945807 606 607 32. Clopton BM, Backoff PM. Spectrotemporal receptive fields of neurons in cochlear nucleus of guinea pig. Hear Res. 1991;52: 329–344. 27 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 608 609 610 33. Pasley BN, David SV, Mesgarani N, Flinker A, Shamma SA, Crone NE, et al. Reconstructing Speech from Human Auditory Cortex. Zatorre R, editor. PLoS Biol. 2012;10: e1001251. doi:10.1371/journal.pbio.1001251 611 612 613 34. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci Off J Soc Neurosci. 2000;20: 2315–2331. 614 615 616 35. Lotte F, Brumberg JS, Brunner P, Gunduz A, Ritaccio AL, Guan C, et al. Electrocorticographic representations of segmental features in continuous speech. Front Hum Neurosci. 2015;09. doi:10.3389/fnhum.2015.00097 617 618 36. Mesgarani N, Cheung C, Johnson K, Chang EF. Phonetic Feature Encoding in Human Superior Temporal Gyrus. Science. 2014;343: 1006–1010. doi:10.1126/science.1245994 619 620 37. Tankus A, Fried I, Shoham S. Structured neuronal encoding and decoding of human speech features. Nat Commun. 2012;3: 1015. doi:10.1038/ncomms1995 621 622 623 38. Huth AG, de Heer WA, Griffiths TL, Theunissen FE, Gallant JL. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature. 2016;532: 453–458. doi:10.1038/nature17637 624 625 626 39. Lachaux J-P, Axmacher N, Mormann F, Halgren E, Crone NE. High-frequency neural activity and human cognition: past, present and possible future of intracranial EEG research. Prog Neurobiol. 2012;98: 279–301. doi:10.1016/j.pneurobio.2012.06.008 627 628 629 40. Miller KJ, Leuthardt EC, Schalk G, Rao RPN, Anderson NR, Moran DW, et al. Spectral changes in cortical surface potentials during motor movement. J Neurosci Off J Soc Neurosci. 2007;27: 2424–2432. doi:10.1523/JNEUROSCI.3886-06.2007 630 631 632 41. Boonstra TW, Houweling S, Muskulus M. Does Asynchronous Neuronal Activity Average out on a Macroscopic Scale? J Neurosci. 2009;29: 8871–8874. doi:10.1523/JNEUROSCI.2020-09.2009 633 634 635 42. Towle VL, Yoon H-A, Castelle M, Edgar JC, Biassou NM, Frim DM, et al. ECoG gamma activity during a language task: differentiating expressive and receptive speech areas. Brain. 2008;131: 2013–2027. doi:10.1093/brain/awn147 636 637 638 43. Llorens A, Trébuchon A, Liégeois-Chauvel C, Alario F-X. Intra-Cranial Recordings of Brain Activity During Language Production. Front Psychol. 2011; doi:10.3389/fpsyg.2011.00375 639 640 641 44. Crone NE, Boatman D, Gordon B, Hao L. Induced electrocorticographic gamma activity during auditory perception. Brazier Award-winning article, 2001. Clin Neurophysiol Off J Int Fed Clin Neurophysiol. 2001;112: 565–582. 642 643 644 45. Sturm I, Blankertz B, Potes C, Schalk G, Curio G. ECoG high gamma activity reveals distinct cortical representations of lyrics passages, harmonic and timbre-related changes in a rock song. Front Hum Neurosci. 2014;8. doi:10.3389/fnhum.2014.00798 28 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 645 646 647 46. Aleman A, Nieuwenstein MR, Böcker KB., de Haan EH. Music training and mental imagery ability. Neuropsychologia. 2000;38: 1664–1668. doi:10.1016/S00283932(00)00079-8 648 649 47. Janata P, Paroo K. Acuity of auditory images in pitch and time. Percept Psychophys. 2006;68: 829–844. 650 651 652 48. Lotze M, Scheler G, Tan H-RM, Braun C, Birbaumer N. The musician’s brain: functional imaging of amateurs and professionals during performance and imagery. NeuroImage. 2003;20: 1817–1829. 653 654 49. Ellis D. Dynamic time warping (DTW) in Matlab [Internet]. 2003. Available: http://www.ee.columbia.edu/~dpwe/resources/matlab/dtw/ 655 656 657 658 50. Lartillot O, Toiviainen P, Eerola T. A Matlab Toolbox for Music Information Retrieval. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R, editors. Data Analysis, Machine Learning and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg; 2008. pp. 261–268. Available: http://link.springer.com/10.1007/978-3-540-78246-9_31 659 660 51. Haynes J-D, Rees G. Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nat Neurosci. 2005;8: 686–691. doi:10.1038/nn1445 661 662 52. Horikawa T, Tamaki M, Miyawaki Y, Kamitani Y. Neural Decoding of Visual Imagery During Sleep. Science. 2013;340: 639–642. doi:10.1126/science.1234330 663 664 665 53. Reddy L, Tsuchiya N, Serre T. Reading the mind’s eye: Decoding category information during mental imagery. NeuroImage. 2010;50: 818–825. doi:10.1016/j.neuroimage.2009.11.084 666 667 668 54. Martin S, Brunner P, Holdgraf C, Heinze H-J, Crone NE, Rieger J, et al. Decoding spectrotemporal features of overt and covert speech from the human cortex. Front Neuroengineering. 2014; doi:10.3389/fneng.2014.00014 669 670 55. Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech in human cortex. Nature Communications. 2016; doi:10.1038/ncomms13619 671 672 673 56. Holdgraf CR, de Heer W, Pasley B, Rieger J, Crone N, Lin JJ, et al. Rapid tuning shifts in human auditory cortex enhance speech intelligibility. Nat Commun. 2016;7: 13654. doi:10.1038/ncomms13654 674 675 676 57. Kosslyn SM, Thompson WL. Shared mechanisms in visual imagery and visual perception: Insights from cognitive neuroscience. In: Gazzaniga MS, editor. The New Cognitive Neurosciences. 2nd ed. Cambridge, MA: MIT Press; 2000. 677 678 58. Kosslyn SM, Thompson WL. When is early visual cortex activated during visual mental imagery? Psychol Bull. 2003;129: 723–746. doi:10.1037/0033-2909.129.5.723 679 680 681 59. Herholz SC, Lappe C, Knief A, Pantev C. Neural basis of music imagery and the effect of musical expertise. Eur J Neurosci. 2008;28: 2352–2360. doi:10.1111/j.14609568.2008.06515.x 29 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 682 683 60. Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory cortical representation in musicians. Nature. 1998;392: 811–814. doi:10.1038/33918 684 685 686 61. Wu MC-K, David SV, Gallant JL. Complete functional characterization of sensory neurons by system identification. Annu Rev Neurosci. 2006;29: 477–505. doi:10.1146/annurev.neuro.29.051605.113024 687 688 62. Thirion B, Duschenay E, Michel V, Varoquaux G, Grisel O, VanderPlas J, et al. scikitlearn. J Mach Learn Res. 2011;12: 2825–2830. 689 63. Garofolo JS. TIMIT acoustic-phonetic continuous speech corpus. 1993; 690 691 Supporting Information 692 693 694 695 696 697 Figure S1. Anatomical distribution of significant electrodes. Electrodes with significant encoding accuracy overlaid on cortical surface reconstruction of the participant’s cerebrum. To define auditory sensory areas (pink), we built an encoding model on data recorded while participant listened passively to speech sentences from the TIMIT corpus [64]. Electrodes with significant encoding accuracy (p<0.05; FDA correction) are highlighted. 30 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 698 699 700 Figure S2. Spectrotemporal receptive fields. STRFs for the perception (top panel) and imagery (bottom panel) models. Grey electrodes were removed from the analysis due to excessive noise. 31 bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 701 702 703 704 705 706 707 Figure S3. Neural encoding during passive listening. Standard STRFs for the passive listening model – built on data recorded while the participant listened passively to speech sentences from the TIMIT corpus [64]. Grey electrodes were removed from the analysis due to excessive noise. Encoding accuracy is plotted on the cortical surface reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction). 32
© Copyright 2026 Paperzz