Neural Encoding of Auditory Features during Music

bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
1
Neural Encoding of Auditory Features during Music Perception and Imagery:
2
Insight into the Brain of a Piano Player
3
Auditory Features and Music Imagery
4
Stephanie Martin1, 2, 8, Christian Mikutta2, 3, 4, 8, Matthew K. Leonard5, Dylan Hungate5, Stefan
5
Koelsch6, Edward F. Chang5, José del R. Millán1, Robert T. Knight2, 7, Brian N. Pasley2, 9
6
7
1
8
Polytechnique Fédérale de Lausanne, Switzerland
9
2
10
3
11
Services University of Bern (UPD), University Hospital of Psychiatry, Bern, Switzerland
12
4
13
Switzerland
14
5
15
Neuroscience, University of California, San Francisco, CA 94143, USA.
16
6
17
7
Department of Psychology, University of California, Berkeley, CA 94720, USA
18
8
Co-first author
19
9
Lead Contact
Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole
Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720, USA
Translational Research Center and Division of Clinical Research Support, Psychiatric
Department of Neurology, Inselspital, Bern, University Hospital, University of Bern,
Department of Neurological Surgery, Department of Physiology, and Center for Integrative
Languages of Emotion, Freie Universitä, Berlin, Germany
20
21
Corresponding Author: Brian Pasley, University of California, Berkeley, Helen Wills
22
Neuroscience Institute, 210 Barker Hall, Berkeley, CA 94720, USA, [email protected]
23
24
25
1
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
26
Abstract
27
It remains unclear how the human cortex represents spectrotemporal sound features during
28
auditory imagery, and how this representation compares to auditory perception. To assess
29
this, we recorded electrocorticographic signals from an epileptic patient with proficient music
30
ability in two conditions. First, the participant played two piano pieces on an electronic piano
31
with the sound volume of the digital keyboard on. Second, the participant replayed the same
32
piano pieces, but without auditory feedback, and the participant was asked to imagine hearing
33
the music in his mind. In both conditions, the sound output of the keyboard was recorded,
34
thus allowing precise time-locking between the neural activity and the spectrotemporal
35
content of the music imagery. For both conditions, we built encoding models to predict high
36
gamma neural activity (70-150Hz) from the spectrogram representation of the recorded
37
sound. We found robust similarities between perception and imagery – in frequency and
38
temporal tuning properties in auditory areas.
39
40
Keywords: electrocorticography, music perception and imagery, spectrotemporal receptive
41
fields, frequency tuning, encoding models.
42
43
Abbreviations
44
ECoG: electrocorticography
45
HG: high gamma
46
IFG: inferior frontal gyrus
47
MTG: middle temporal gyrus
48
Post-CG: post-central gyrus
49
Pre-CG: pre-central gyrus
50
SMG: supramarginal gyrus
51
STG: superior temporal gyrus
52
STRF: spectrotemporal receptive field
53
2
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
54
Introduction
55
Auditory imagery is defined here as the mental representation of sound perception in the
56
absence of external auditory stimulation. The experience of auditory imagery is common,
57
such as when a song runs continually through someone’s mind. On an advanced level,
58
professional musicians are able to imagine a piece of music by looking at the sheet music [1].
59
Behavioral studies have shown that structural and temporal properties of auditory features
60
(see [2] for complete review), such as pitch [3], timbre [4,5], loudness [6] and rhythm [7] are
61
preserved during auditory imagery. However, it is unclear how these auditory features are
62
represented in the brain during imagery. Experimental investigation is difficult due to the lack
63
of observable stimulus or behavioral markers during auditory imagery. Using a novel
64
experimental paradigm to synchronize auditory imagery events to neural activity, we
65
investigated the neural representation of spectrotemporal auditory features during auditory
66
imagery in an epileptic patient with proficient music abilities.
67
68
Previous studies have identified anatomical regions active during auditory imagery [8], and
69
how they compare to actual auditory perception. For instance, lesion [9] and brain imaging
70
studies [4,10–14] have confirmed the involvement of bilateral temporal lobe regions during
71
auditory imagery (see [15] for a review). Brain areas consistently activated with fMRI during
72
auditory imagery include the secondary auditory cortex [10,12,16], the frontal cortex [16,17],
73
the sylvian parietal temporal area [18], ventrolateral and dorsolateral cortices [19] and the
74
supplementary motor area [17,20–24]. Anatomical regions active during auditory imagery
75
have been compared to actual auditory perception to understand the interactions between
76
externally and internally driven cortical processes. Several studies showed that auditory
77
imagery has substantial, but not complete overlap in brain areas with music perception [8] –
78
e.g. the secondary auditory cortex is consistently activated during music imagery and
3
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
79
perception while the primary auditory areas appear to be activated solely during auditory
80
perception [4,10,25,26].
81
82
These studies have helped to unravel anatomical brain areas involved in auditory perception
83
and imagery, however, there is lack of evidence for the representation of specific acoustic
84
features in the human cortex during auditory imagery. It remains a challenge to investigate
85
neural processing during internal subjective experience like music imagery, due to the
86
difficulty in time-locking brain activity to a measurable stimulus during auditory imagery. To
87
address this issue, we recorded electrocorticographic neural signals (ECoG) of a proficient
88
piano player in a novel task design that permitted robust marking of the spectrotemporal
89
content of the intended music imagery to neural activity – thus allowing us to investigate
90
specific auditory features during auditory imagery. In the first condition, the participant
91
played an electronic piano with the sound output turned on. In this condition, the sound was
92
played out loud through speakers at a comfortable sound volume that allowed auditory
93
feedback (perception condition). In the second condition, the participant played the electronic
94
piano with the speakers turned off, and instead imagined the corresponding music in his mind
95
(imagery condition). In both conditions, the digitized sound output of the MIDI-compatible
96
sound module was recorded. This provided a measurable record of the content and timing of
97
the participant’s music imagery when the speakers of the keyboard were turned off and he did
98
not hear the music. This task design allowed precise temporal alignment between the recorded
99
neural activity and spectrogram representations of music perception and imagery – providing
100
a unique opportunity to apply receptive field modeling techniques to quantitatively study
101
neural encoding during auditory imagery.
102
103
A well-established role of the early auditory system is to decompose complex sounds into
104
their component frequencies [27–29], giving rise to tonotopic maps in the auditory cortex (see
4
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
105
[30] for a review). Auditory perception has been extensively studied in animal models and
106
humans using spectrotemporal receptive field (STRFs) analysis [27,31–34], which identifies
107
the time-frequency stimulus features encoded by a neuron or population of neurons. STRFs
108
are consistently observed during auditory perception tasks, but the existence of STRFs during
109
auditory imagery is unclear due to the experimental challenges associated with synchronizing
110
neural activity and the imagined stimulus. To characterize and compare the spectrotemporal
111
tuning properties during auditory imagery and perception, we fitted two encoding models on
112
data collected from the perception and imagery conditions. In this case, encoding models
113
describe the linear mapping between a given auditory stimulus representation and its
114
corresponding brain response. For instance, encoding models have revealed the neural tuning
115
properties of various speech features, such as acoustic, phonetic and semantic representations
116
[33,35–38].
117
118
In this study, the neural representation of music perception and imagery was quantified by
119
spectrotemporal receptive fields (STRFs) that predict high gamma (HG; 70-150Hz) neural
120
activity. High gamma correlates with the spiking activity of the underlying neuronal ensemble
121
[39–41] and reliably tracks speech and music features in auditory and motor cortex [33,42–
122
45]. Results demonstrated the presence of robust spectrotemporal receptive fields during
123
auditory imagery with extensive overlap in frequency tuning and cortical location compared
124
to receptive fields measured during auditory perception. These results provide a quantitative
125
characterization of the shared neural representation underlying auditory perception and the
126
subjective experience of auditory imagery.
127
128
Results
129
High gamma neural encoding during auditory perception and imagery
5
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
130
In this study, ECoG recordings were obtained using subdural electrode arrays implanted in a
131
patient undergoing neurosurgical procedures for epilepsy. The participant was a proficient
132
piano player (age of start: 7, years of music education: 10, hours of training per week: 5).
133
Prior studies have shown that music training is associated with improved auditory imagery
134
ability, such as pitch and temporal acuity [46–48]. We took advantage of this rare clinical
135
case to investigate the neural correlates of music perception and imagery. In the first task, the
136
participant played two music pieces on an electronic piano with the speakers of the digital
137
keyboard turned on (perception condition; Fig 1A). In the second task, the participant played
138
the two same piano pieces, but the volume of the speaker system was turned off to establish a
139
silent room. The participant was asked to imagine hearing the corresponding music in his
140
mind as he played the piano (imagery condition; Fig 1B). In both conditions, the sound
141
produced by the MIDI-compatible sound module was digitally recorded in synchrony with the
142
ECoG signal. The recorded sound allowed synchronizing the auditory spectrotemporal
143
patterns of the imagined music and the neural activity when the speaker system was turned off
144
and no audible sounds were recorded in the room.
145
146
147
Fig 1. Experimental task design.
148
(A) The participant played an electronic piano with the sound of the digital keyboard turned
149
on (perception condition). (B) In the second condition, the participant played the piano with
150
the sound turned off and instead imagined the corresponding music in his mind (imagery
151
condition). In both conditions, the digitized sound output of the MIDI-compatible sound
6
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
152
module was recorded in synchrony with the neural signals (even when the participant did not
153
hear any sound in the imagery condition). The models take as input a spectrogram consisting
154
of time-varying spectral power across a range of acoustic frequencies (200– 7,000 Hz, bottom
155
left) and output time-varying neural signals. To assess the encoding accuracy, the predicted
156
neural signal (light lines) is compared to the original neural signal (dark lines).
157
158
The participant performed two famous music pieces: Chopin’s Prelude in C minor, Op. 28,
159
No.20 and Bach’s Prelude in C major BWV 846. Example auditory spectrograms from
160
Chopin’s Prelude determined through the participant's key presses with the electronic piano
161
are shown in Fig 2B for both perception and imagery conditions. To evaluate how
162
consistently the participant performed across perception and imagery tasks, we computed the
163
realigned Euclidean distance [49] between the spectrograms of the same music pieces played
164
across conditions (within-stimulus distance). We compared the within-stimulus distance with
165
the realigned Euclidean distance between the spectrograms of the different musical pieces
166
(between-stimulus distance). The realigned Euclidean distance was 251.6% larger for the
167
between-stimulus distance compared to the within-stimulus distance (p<10-3; randomization
168
test), suggesting that the spectrograms of same musical pieces played across conditions were
169
more similar than the spectrograms of different musical pieces. This indicates that the
170
participant performed the task with relative consistency and specificity across the two
171
conditions. To compare spectrotemporal auditory representations during music perception and
172
music imagery tasks, we fit separate encoding models in each condition. We used these
173
models to quantify specific anatomical and neural tuning differences between auditory
174
perception and imagery.
175
7
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
176
177
Fig 2. Encoding accuracy.
178
(A) Electrode location overlaid on cortical surface reconstruction of the participant’s
179
cerebrum. (B) Overlay of the spectrogram contours for the perception (blue) and imagery
180
(orange) condition (10% of maximum energy from the spectrograms). (C) Actual and
181
predicted high gamma band power (70–150 Hz) induced by the music perception and imagery
182
segment in (B). Top electrodes have very similar predictive power, whereas bottom electrodes
183
are very different for perception and imagery. Recordings are from two different STG sites,
184
highlighted in pink in (A). (D) Encoding accuracy is plotted on the cortical surface
185
reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction). (E)
186
Encoding accuracy of significant electrodes of the perception model as a function the imagery
187
model. Electrode-specific encoding accuracy is correlated between both perception and
188
imagery models (r=0.65; p<10-4; randomization test). (F) Encoding accuracy as a function of
189
anatomic location (pre-central gyrus (pre-CG), post-central gyrus (post-CG), supramarginal
190
gyrus (SMG), medial temporal gyrus (MTG) and superior temporal gyrus (STG)).
191
8
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
192
For both perception and imagery conditions, the observed and predicted high gamma neural
193
responses are illustrated for two individual electrodes in the temporal lobe, respectively (Fig
194
2C), together with the corresponding music spectrum (Fig 2B). The predicted neural response
195
for the electrode shown in the upper panel of Fig 2B was significantly correlated with its
196
corresponding measured neural response in both perception (r=0.41; p<10-7; one-sample z-
197
test; FDR correction) and imagery (r=0.42; p<10-4; one-sample z-test; FDR correction)
198
conditions. The predicted neural response for the lower panel electrode was correlated with
199
the actual neural response only in the perception condition (r=0.23; p<0.005; one-sample z-
200
test; FDR correction) but not in the imagery condition (r=-0.02; p>0.5; one-sample z-test;
201
FDR correction). The difference between both conditions was significant for the electrode in
202
the lower panel (p<0.05; two-sample t-test), but not in the upper panel (p>0.5; two-sample t-
203
test). This suggests that there is a strong continuous relationship between time-varying
204
imagined sound features and STG neural activity, but that this relationship is dependent on
205
cortical location.
206
207
To further investigate anatomical similarities and differences between the perception and
208
imagery conditions, we plotted the anatomical layout of prediction accuracy of individual
209
electrodes. In both conditions, results showed that sites with the highest prediction in both
210
conditions were located in the superior and middle temporal gyrus, pre- and post-central
211
gyrus, and supramarginal gyrus (Fig 2D; heat map thresholded to p<0.05; one-sample z-test;
212
FDR correction), consistent with previous ECoG results [33]. Among the 256 electrodes
213
recorded, 210 were fitted in the encoding model, while the remaining 46 electrodes were
214
removed due to excessive noise. Within the fitted electrodes, while 35 and 15 electrodes were
215
significant in the perception and imagery condition, respectively (p<0.05; one-sample z-test;
216
FDR correction), of which 9 electrodes were significant in both conditions. Anatomic
217
locations of the electrodes with significant encoding accuracy are depicted in Fig 3 and Fig
9
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
218
S1. To compare the encoding accuracy across conditions, we performed additional analysis on
219
the electrodes that had significant encoding accuracy in at least one condition (41 electrodes;
220
unless otherwise stated). Prediction accuracy of individual electrodes was correlated between
221
perception and imagery (Fig 2E; 41 electrodes; r=0.65; p<10-4; randomization test). Because
222
both perception and imagery models are based on the same auditory stimulus representation,
223
the correlated prediction accuracy provides strong evidence for a shared neural representation
224
of sound based on spectrotemporal features. 225
226
To assess how brain areas encoding auditory features varied across experimental conditions,
227
we analyzed the significant electrodes in the gyri highlighted in Fig 2A (pre-central gyrus
228
(pre-CG), post-central gyrus (post-CG), supramarginal gyrus (SMG), medial temporal gyrus
229
(MTG) and superior temporal gyrus (STG)) using Wilcoxon signed-rank test (p>0.05; one-
230
sample Kolmogorov-Smirnov test; Fig 2F). Results showed that the encoding accuracy in the
231
MTG and STG was higher for the perception (MTG: M = 0.16, STG: M = 0.13) than for the
232
imagery (MTG: M = 0.11, STG: M= 0.08; p<0.05; Wilcoxon signed-rank test; Bonferroni
233
correction). The encoding accuracy in the pre-CG, post-CG and SMG was not different
234
between the perception (pre-CG M=0.17; post-CG M=0.15; SMG M=0.12; p>0.5; Wilcoxon
235
signed-rank test; Bonferroni correction) and imagery (pre-CG M=0.14; post-CG M=0.13;
236
SMG M=0.12; p>0.5; Wilcoxon signed-rank test; Bonferroni correction) conditions. The
237
significant improvement of the perception vs. imagery model was thus specific to the
238
temporal lobe, which may reflect underlying differences in spectrotemporal encoding
239
mechanisms, or alternatively, a greater sensitivity to discrepancies between the actual content
240
of imagery and the recorded sound stimulus used in the model.
241
242
Spectrotemporal tuning during auditory perception and imagery
243
Auditory imagery and perception are distinct subjective experiences yet both are
10
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
244
characterized by a sense of sound. How does the auditory system encode sound during music
245
perception and imagery? Examples of standard STRFs are shown in Fig 3 for temporal
246
electrodes (Fig S2 for all the STRFs). These STRFs highlight neural stimulus preferences as
247
shown by the excitatory (warm color) and inhibitory (cold color) subregions. Fig 4A shows
248
the latency (s) for the perception and imagery conditions, defined as the temporal coordinates
249
of the maximum deviation in the STRF. The peak latency was significantly correlated
250
between both conditions (r=0.43; p<0.005; randomization test), suggesting that both
251
perception and imagery are simultaneously active for most of their response durations. Then,
252
we analyzed frequency tuning curves estimated from the STRFs (see Materials and Methods
253
for details). Examples of tuning curves for both perception and imagery encoding models are
254
shown for the electrodes indicated by the black outline in the anatomic brain (Fig 4B). Across
255
conditions, the majority of individual electrodes exhibited a complex frequency tuning
256
profile. For each electrode, similarities between the tuning curves in the perception and
257
imagery models were quantified using Pearson’s correlation coefficient. The anatomical
258
distribution of tuning curve similarity is plotted in Fig 4B, with the correlation at individual
259
sites ranging between r=-0.3-0.6. The effect of anatomical location (pre-CG, post-CG, SMG,
260
MTG and STG) on tuning curve similarity was not significant (Chi-square=3.59; p>0.1;
261
Kruskal-Wallis Test). Similarities in tuning curve shape between auditory imagery and
262
perception suggest a shared auditory representation, but there is no evidence that similarities
263
depended on gyral area.
11
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
264
265
Fig 3. Spectrotemporal receptive fields (STRFs).
266
Examples of standard STRFs for the perception (left panel) and imagery (right panel) models
267
(warm colors indicate where the neuronal ensemble is excited, cold colors indicate where the
268
neuronal ensemble is inhibited). On the lower left corner, Electrode location overlaid on
269
cortical surface reconstructions of the participant’s cerebrum. Electrodes whose STRFs are
270
shown are outlined in black. Grey electrodes were removed from the analysis due to excessive
271
noise (see Materials and Methods).
272
12
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
273
274
Fig 4. Auditory tuning.
275
(A) Latency peaks – estimated from the STRFs – were significantly correlated between
276
perception and imagery conditions (r=0.43; p<0.005; randomization test). (B) Examples of
277
tuning curves for both perception and imagery encoding models defined as the average gain
278
of the STRFs as a function of acoustic frequency. Black outline in the anatomic brain indicate
279
electrode location. for the electrodes indicated by the black outline in the left panel –
280
Correlation coefficients between the perception and imagery conditions are plotted for
281
significant electrodes on the cortical surface reconstruction of the participant’s cerebrum.
282
Grey electrodes were removed from the analysis due to excessive noise. Bottom panel is a
283
histogram of electrode correlation coefficients between the perception and imagery tuning.
284
(C) Proportion of predictive electrode sites (N=41) with peak tuning at each frequency.
285
Tuning peaks were identified as significant parameters in the acoustic frequency tuning
286
curves (z>3.1; p<0.001) – separated by more than one third an octave.
287
288
Different electrodes are sensitive to different acoustic frequencies important for music
13
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
289
processing. We next quantitatively assessed how frequency tuning of predictive electrodes
290
(N=41) varied during the two conditions. First, to evaluate how the acoustic spectrum was
291
covered at the population level, we quantified the proportion of significant electrodes with a
292
tuning peak at each acoustic frequency (Fig 4C). Tuning peaks were identified as significant
293
parameters in the acoustic frequency tuning curves (z>3.1; p<0.001; separated by more than
294
one third an octave). The proportion of electrodes with tuning peaks was significantly larger
295
for the perception (mean = 0.19) than for the imagery (mean = 0.14) condition (Fig 4C;
296
p<0.05; Wilcoxon signed-rank test). Despite this, both conditions exhibited reliable frequency
297
selectivity, as nearly the full range of the acoustic frequency spectrum was encoded. The
298
fraction of acoustic frequency covered with peaks by predictive electrodes was 0.91 for the
299
perception and 0.89 for the imagery.
300
301
Reconstruction of auditory features during music perception and imagery
302
To evaluate the ability to identify piano keys from the brain activity, we reconstructed the
303
same auditory spectrogram representation used in the encoding models. Results showed that
304
the overall reconstruction accuracy was higher than zero in both conditions (Fig 5A; p <
305
0.001; randomization test), but did not differ between conditions (p > 0.05; two-sample t-test).
306
As a function of acoustic frequency, mean accuracy ranged from r=0–0.45 (Fig 5B).
14
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
307
308
Fig 5. Reconstruction accuracy.
309
(A) Right panel: Overall reconstruction accuracy of the spectrogram representation for both
310
perception (blue) and imagery (orange) conditions. Error bars denote resampling SEM. Left
311
panel: Reconstruction accuracy as a function of acoustic frequency. Shaded region denotes
312
SEM over the resamples. (B) Examples of original and reconstructed segments for the
313
perception (left) and the imagery (right) model. (C) Distribution of identification rank for all
314
reconstructed spectrogram notes. Median identification rank is 0.65 and 0.63 for the
315
perception and imagery decoding model, respectively, which is significantly higher than 0.50
316
chance level (p<0.001; randomization test). Left panel: Receiver operating characteristic
317
(ROC) plot of identification performance for the perception (blue curve) and imagery (orange
318
curve) model. Diagonal black line indicates no predictive power.
319
15
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
320
We further assessed reconstruction accuracy by evaluating the ability to identify isolated
321
piano notes from the test set auditory spectrogram reconstructions. Examples of original and
322
reconstructed segments are depicted in Fig 5B for the perception (left) and imagery model
323
(right). For the identification, we extracted 0.5-second segments at piano note onsets from the
324
original and reconstructed auditory spectrogram. Onsets of the notes were defined with the
325
MIRtoolbox [50]. Then, we computed the correlation coefficient between a target
326
reconstructed spectrogram and original spectrograms in the candidate set. Finally, we sorted
327
the coefficients and computed the identification rank as the percentile rank of the correct
328
spectrogram. This metric reflects how well the target reconstruction matched the correct
329
original spectrogram out of all candidate. Results showed that the median identification rank
330
of individual piano notes was significantly higher than chance for both conditions (Fig 5C;
331
median identification rank perception = 0.72 and imagery = 0.69; p< 0.001; randomization
332
test). Similarly, the area under the curve (AUC) of identification performance for the
333
perception (blue curve) and imagery (orange curve) model was well above chance level
334
(diagonal black dashed line indicates no predictive power; p<0.001; randomization test).
335
Cross-condition analysis
336
Another way to evaluate the overlapping degree between both perception and imagery
337
conditions is to apply the decoding model built in the perception condition to imagery neural
338
data, and vice-versa. This approach is based on the hypothesis that both tasks share neural
339
mechanisms and is useful when one of the models cannot be built directly, because of the lack
340
of observable measures. This technique has been successfully applied to various fields, such
341
as vision [51–53], and speech [54]. When the model was trained on the perception condition
342
and tested on the imagined condition (r=0.28), decoding performances improved by 50%
343
compared to when the perception model was applied to imagined data (r=0.19) (Fig 6). This
344
highlight the importance of having a model that is specific to each condition.
345
16
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
346
347
Fig 6. Cross-condition analysis.
348
Reconstruction accuracy when the decoding model was built on the perception condition and
349
applied to the imagery neural data and vis-versa. Decoding performances improved by 50%
350
when the model was trained on the perception condition and tested on the imagined condition
351
(r=0.28), compared to when the perception model was applied to imagined data (r=0.19).
352
353
Control analysis for sensorimotor confounds
354
In this study, the participant played piano in two different conditions (music perception and
355
imagery). In addition to the auditory percept, arm-, hand- and finger-movements related to the
356
active piano task could have presented potential confounds to the decoding process. We
357
controlled for possible motor confounds in three different ways. First, the electrode grid did
358
not cover hand or finger sensorimotor brain area (Fig 7A). This reduces the likelihood that the
359
decoding model involved hand-related motor confounds. Second, examples of STRFs for two
360
neighboring electrodes show different weight patterns for both conditions (Fig 7B). For
361
instance, the weights of the electrode depicted in the left example of Fig 7B are correlated
362
between both conditions (r=0.48), whereas the weights are not correlated in the right example
363
(r=0.04). Differences across conditions cannot be explained by motor confounds, because
364
finger movement was similar in both tasks. Third, brain areas that significantly encoded
365
music perception and imagery overlapped with auditory sensory areas (Fig S1 and Fig S3), as
366
revealed by the encoding accuracy and STRFs during passive listening to TIMIT sentences
367
(no movements). These findings provide evidence against motor confounds, and suggest that
17
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
368
the brain responses were induced by auditory percepts rather than motor movements
369
associated with pressing piano keys. Finally, we built two additional decoding models, using
370
1) only temporal lobe electrodes and 2) only auditory-responsive electrodes (Fig 7C; see
371
Materials and Methods for details). Both models showed significant reconstruction accuracy
372
(p<0.001; randomization test) and median identification rate (p<0.001; randomization test).
373
This suggests that even if we removed all electrodes that are potentially not related to auditory
374
processes, the decoding model still performs well.
375
18
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
376
377
Fig 7. Control analysis for motor confound.
378
(A) Electrode location overlaid on cortical surface reconstructions of the participant’s
379
cerebrum. View from the top of the brain shows that hand motor areas not recorded with the
380
grid. Brain areas associated with face, upper and lower limbs are depicted in green, blue and
381
pink, respectively. (B) Different STRFs for two neighboring electrodes highlighted in (A) for
382
both perception and imagery encoding models. (C) Overall reconstruction accuracy (left
19
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
383
panel) and median identification rank (right panel) when using all electrodes, only temporal
384
electrodes, and auditory electrodes (see Materials and methods for details).
385
386
Discussion
387
Music imagery studies are daunting due to the subjective nature and absence of verifiable and
388
observable measures. This task design allowed precise time-locking between the recorded
389
neural activity and spectrotemporal features of music imagery, and provided a unique
390
opportunity to quantitatively study neural encoding during auditory imagery, and compare
391
tuning properties with auditory perception. We provide the first evidence of spectrotemporal
392
receptive field and auditory features encoding in the brain during music imagery, and
393
provided comparative measures with actual music perception encoding. We observed that
394
neuronal ensembles were tuned to acoustic frequencies during imagined music, suggesting
395
that spectral organization occurs in the absence of actual perceived sound. Supporting
396
evidence has shown that restored speech – when a speech instance is replaced by noise, but
397
the listener perceives a specific speech sound – is grounded in acoustic representations in the
398
superior temporal gyrus [55,56]. In addition, results showed substantial, but not complete
399
overlap in neural properties – i.e. spectral and temporal tuning properties, and brain areas –
400
during music perception and imagery. These findings are in agreement with visual studies
401
showing that visual imagery and perception involve many, but not all, overlapping brain areas
402
[57,58]. We also showed that auditory features could be reconstructed from neural activity of
403
the imagined music. Because both perception and imagery models are based on the same
404
auditory stimulus representation, the correlated prediction accuracy provides strong evidence
405
for a shared neural representation of sound based on spectrotemporal features. This confirms
406
that the brain encodes spectrotemporal properties of sounds – as previously shown by
407
behavioral and brain lesion studies (see [2] for a review).
408
20
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
409
Methodological issues in investigating imagery are numerous, including the lack of evidence
410
that the desired mental task was operational. The current task design did not allow verifying
411
how the mental task was performed, yet the behavioral index of keynote press on the piano
412
indicated the precise time and frequency content of the intended imagined sound. In addition,
413
we recorded a skilled piano player, and it has been suggested that participants with musical
414
training exhibited better pitch and temporal acuity during auditory imagery than did
415
participants with little or no musical training [47,59]. Furthermore, tonotopic maps located in
416
the STG are enlarged within trained musicians [60]. Thus, having a trained piano-player
417
suggests improved auditory imagery ability (see also [7,9,14], and reduced issues related to
418
spectral and temporal errors.
419
420
Finally, the electrode grid was located on the left hemisphere of the participant. This raises
421
the question of lateralization in the brain response to music perception and imagery. Studies
422
have shown the importance of both hemispheres for auditory perception and imagination
423
[4,9,10,12–14], although brain patterns tend to shift toward the right hemisphere (see [15] for
424
a review). In our task, the grid was located on the left hemisphere, and allowed significant
425
encoding and decoding accuracy within high gamma frequency ranges. This is consistent with
426
the notion that music auditory processes are also evident in the left hemisphere.
427
428
Materials and Methods
429
Participant and data acquisition
430
Electrocorticographic (ECoG) recording was obtained using subdural electrode arrays
431
implanted in one patient undergoing neurosurgical treatment for refractory epilepsy. The
432
participant gave his written informed consent prior to surgery and experimental testing. The
433
experimental protocol was approved by the University of California, San Francisco and
434
Berkeley Institutional Review Boards and Committees on Human Research. Inter-electrode
21
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
435
spacing (center-to-center) was 4 mm. Grid placement (Fig 2A) was defined solely based on
436
clinical requirements. Localization and co-registration of electrodes was performed using the
437
structural MRI. Multi-electrode ECoG data were amplified and digitally recorded with
438
sampling rate of 3,052 Hz. ECoG signals were re-referenced to a common average after
439
removal of electrodes with epileptic artifacts or excessive noise (including broadband
440
electromagnetic noise from hospital equipment or poor contact with the cortical surface). In
441
addition to the ECoG signals, the audio output of the piano was recorded along with the
442
multi-electrode ECoG data.
443
444
Experimental paradigm
445
The recording session included two conditions. In the first condition, the participant played on
446
an electronic piano with the sound turned on. That is, the music was played out loud through
447
the speakers of the digital keyboard in the hospital room (volume at comfortable and natural
448
sound level; Fig 1A; perception condition). In the second condition, the participant played on
449
the piano with the speakers turned off and instead imagined hearing the corresponding music
450
in his mind (Fig 1B; imagery condition). Fig 1B illustrates that in both conditions, the
451
digitized sound output of the MIDI sound module was recorded in synchrony with the ECoG
452
data (even when the speakers were turned off and the participant did not hear the music). The
453
two music pieces were Chopin’s Prelude in C minor Op. 28 no. 20 and Bach’s Prelude in C
454
major (BWV 846), respectively.
455
456
Feature extraction
457
We extracted the ECoG signal in the high gamma frequency band from eight bandpass filters
458
(hamming window non-causal filter of order 20, logarithmically increasing center frequencies
459
(70–150 Hz) and semi-logarithmically increasing bandwidths), and extracted the envelope
460
using the Hilbert transform. Prior to model fitting, the power was averaged across these eight
22
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
461
bands, down-sampled to 100 Hz and z-scored.
462
463
Auditory spectrogram representation
464
The auditory spectrogram representation was a time-varying representation of the amplitude
465
envelope at acoustic frequencies logarithmically spaced between 200-7,000 Hz. This
466
representation was calculated by affine wavelet transforms of the sound waveform using
467
auditory filter banks that mimics neural processing in the human auditory periphery [31]. To
468
compute these acoustic representations, we used the NSL MATLAB toolbox (http://www.
469
isr.umd.edu/Labs/NSL/Software.htm).
470
471
Encoding model
472
The neural encoding model, based on the spectrotemporal receptive field (STRF) [34]
473
describes the linear mapping between the music stimulus and the high gamma neural response
474
at individual electrodes. The encoding model was estimated as follows:
475
𝑅 𝑡, 𝑛 =
ℎ(𝜏, 𝑓, 𝑛)𝑆(𝑡 − 𝜏, 𝑓)
.
-
476
where 𝑅 𝑡, 𝑛 is the predicted high gamma neural activity at time t and electrode n, S(t-t, f) is
477
the spectrogram representation at time (t-t) and acoustic frequency f. Finally, h(t, f, n) is the
478
linear transformation matrix that depends on the time lag t, the frequency f and electrodes n. h
479
represents the spectrotemporal receptive field of each electrode. The Neural tuning properties
480
of a variety of stimulus parameters in different sensory systems have been assessed using
481
STRFs [61]. We used Ridge regression to fit the encoding model [62], and a 10-fold cross-
482
validation resampling procedure, with no overlap between training and test partitions within
483
each resample. We performed grid search on the training set to define the penalty coefficient
484
𝛼 and the learning rate h, using a nested loop cross-validation approach. We standardized the
485
parameter estimates to yield the final model.
23
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
486
487
Decoding model
488
The decoding model linearly mapped the neural activity to the music representation, as a
489
weighted sum of activity at each electrode, as follows:
𝑆 𝑡, 𝑓 =
490
𝑔(𝜏, 𝑓, 𝑛)𝑅(𝑡 − 𝜏, 𝑛)
.
1
491
where 𝑆 𝑡, 𝑓 is the predicted music representation at time t and frequency f. R(t-t, n) is the
492
HG neural response of electrode n at time (t-t), t is the time lag ranging between -100ms and
493
400ms. Finally, g(t, f, n) is the linear transformation matrix that depends on the time lag t,
494
frequency f, and electrode n. Both, neural response and music representation were
495
synchronized, downsampled to 100 Hz, and standardized to zero mean and unit standard
496
deviation prior to model fitting. To fit model parameters, we used gradient descent with early
497
stopping regularization. We used a 10-folds cross-validation resampling scheme, and 20% of
498
the training data were used as validation set to determine the early stopping criterion. Finally,
499
model prediction accuracy was evaluated on the independent testing set, and the parameter
500
estimates were standardized to yield the final model.
501
502
Evaluation
503
Prediction accuracy was quantified using the correlation coefficient (Pearson’s r) between the
504
predicted and actual HG signal using data from the independent test. Overall encoding
505
accuracy was reported as the mean correlation over folds. The z-test was applied for all
506
reported mean r values. Electrodes were defined as significant if the p-value was smaller than
507
the significance threshold of a=0.05 (95th-percentile; FDR correction). To define auditory
508
sensory areas, we built an encoding model on data recorded while the participant listened
24
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
509
passively to speech sentences from the TIMIT corpus [63] during 10min. Electrodes with
510
significant encoding accuracy are highlighted in Fig S1.
511
512
To further investigate the neural encoding of spectrotemporal acoustic features during music
513
perception and music imagery, we analyzed all the electrodes that were at least significant in
514
one condition (unless otherwise stated). Tuning curves were estimated from spectrotemporal
515
receptive fields, by first setting all inhibitory weights to zero, then averaging across the time
516
dimension and converting to standardized z-scores. Tuning peaks were identified as
517
significant peak parameters in the acoustic frequency tuning curves (z>3.1; p<0.001) –
518
separated by more than one third an octave.
519
520
Decoding accuracy was assessed by calculating the correlation coefficient (Pearson’s r)
521
between the reconstructed and original music spectrogram representation using testing set
522
data. Overall reconstruction accuracy was computed by averaging over acoustic frequencies
523
and resamples, and standard error of the mean (SEM) was computed by taking the standard
524
deviation across resamples. Finally, to further assess the reconstruction accuracy, we
525
evaluated the ability to identify isolated piano notes from the test set auditory spectrogram
526
reconstructions – using similar approach as in [33,54].
527
528
Acknowledgments
529
This work was supported by the Zeno-Karl Schindler Foundation, NINDS Grant R3721135,
530
NIH 5K99DC012804 and the Nielsen Corporation.
531
532
References
533
534
535
1.
Meister IG, Krings T, Foltys H, Boroojerdi B, Müller M, Töpper R, et al. Playing piano
in the mind--an fMRI study on music imagery and performance in pianists. Brain Res
Cogn Brain Res. 2004;19: 219–228. doi:10.1016/j.cogbrainres.2003.12.005
25
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
536
537
2.
Hubbard TL. Auditory imagery: Empirical findings. Psychol Bull. 2010;136: 302–329.
doi:10.1037/a0018436
538
539
3.
Halpern AR. Memory for the absolute pitch of familiar songs. Mem Cognit. 1989;17:
572–581.
540
541
542
4.
Halpern AR, Zatorre RJ, Bouffard M, Johnson JA. Behavioral and neural correlates of
perceived and imagined musical timbre. Neuropsychologia. 2004;42: 1281–1292.
doi:10.1016/j.neuropsychologia.2003.12.017
543
544
5.
Pitt MA, Crowder RG. The role of spectral and dynamic cues in imagery for musical
timbre. J Exp Psychol Hum Percept Perform. 1992;18: 728–738.
545
546
6.
Intons-Peterson M. Components of auditory imagery. In: Reisberg D, editor. Auditory
imagery. Hillsdale, N.J: L. Erlbaum Associates; 1992. pp. 45–72.
547
548
7.
Halpern. Mental scanning in auditory imagery for songs. J Exp Psychol Learn Mem
Cogn. 1988;14: 434–443.
549
550
8.
Kosslyn SM, Ganis G, Thompson WL. Neural foundations of imagery. Nat Rev
Neurosci. 2001;2: 635–642. doi:10.1038/35090055
551
552
9.
Zatorre, Halpern. Effect of unilateral temporal-lobe excision on perception and imagery
of songs. Neuropsychologia. 1993;31: 221–232.
553
554
10. Griffiths TD. Human complex sound analysis. Clin Sci Lond Engl 1979. 1999;96: 231–
234.
555
556
557
11. Halpern AR. When That Tune Runs Through Your Head: A PET Investigation of
Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697–704.
doi:10.1093/cercor/9.7.697
558
559
12. Kraemer DJM, Macrae CN, Green AE, Kelley WM. Musical imagery: Sound of silence
activates auditory cortex. Nature. 2005;434: 158–158. doi:10.1038/434158a
560
13. Rauschecker JP. Cortical plasticity and music. Ann N Y Acad Sci. 2001;930: 330–336.
561
562
563
14. Zatorre RJ, Halpern AR, Perry DW, Meyer E, Evans AC. Hearing in the Mind’s Ear: A
PET Investigation of Musical Imagery and Perception. J Cogn Neurosci. 1996;8: 29–46.
doi:10.1162/jocn.1996.8.1.29
564
565
15. Zatorre RJ, Halpern AR. Mental Concerts: Musical Imagery and Auditory Cortex.
Neuron. 2005;47: 9–12. doi:10.1016/j.neuron.2005.06.013
566
567
568
16. Zatorre RJ, Halpern AR, Bouffard M. Mental Reversal of Imagined Melodies: A Role
for the Posterior Parietal Cortex. J Cogn Neurosci. 2009;22: 775–789.
doi:10.1162/jocn.2009.21239
569
570
571
17. Halpern AR, Zatorre RJ. When That Tune Runs Through Your Head: A PET
Investigation of Auditory Imagery for Familiar Melodies. Cereb Cortex. 1999;9: 697–
704. doi:10.1093/cercor/9.7.697
26
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
572
573
574
18. Hickok G, Buchsbaum B, Humphries C, Muftuler T. Auditory-motor interaction
revealed by fMRI: speech, music, and working memory in area Spt. J Cogn Neurosci.
2003;15: 673–682. doi:10.1162/089892903322307393
575
576
577
19. Meyer M, Elmer S, Baumann S, Jancke L. Short-term plasticity in the auditory system:
differential neural responses to perception and imagery of speech and music. Restor
Neurol Neurosci. 2007;25: 411–431.
578
579
20. Halpern AR. Cerebral substrates of musical imagery. Ann N Y Acad Sci. 2001;930:
179–192.
580
581
21. Mikumo M. Motor Encoding Strategy for Pitches of Melodies. Music Percept Interdiscip
J. 1994;12: 175–197. doi:10.2307/40285650
582
583
22. Petsche H, von Stein A, Filz O. EEG aspects of mentally playing an instrument. Brain
Res Cogn Brain Res. 1996;3: 115–123.
584
585
23. Brodsky W, Henik A, Rubinstein B-S, Zorman M. Auditory imagery from musical
notation in expert musicians. Percept Psychophys. 2003;65: 602–612.
586
587
24. Schürmann M, Raij T, Fujiki N, Hari R. Mind’s ear in a musician: where and when in
the brain. NeuroImage. 2002;16: 434–440. doi:10.1006/nimg.2002.1098
588
589
590
25. Bunzeck N, Wuestenberg T, Lutz K, Heinze H-J, Jancke L. Scanning silence: mental
imagery of complex sounds. NeuroImage. 2005;26: 1119–1127.
doi:10.1016/j.neuroimage.2005.03.013
591
592
26. Yoo SS, Lee CU, Choi BG. Human brain mapping of auditory imagery: event-related
functional MRI study. Neuroreport. 2001;12: 3045–3049.
593
594
595
27. Aertsen AMHJ, Olders JHJ, Johannesma PIM. Spectro-temporal receptive fields of
auditory neurons in the grassfrog: III. analysis of the stimulus-event relation for natural
stimuli. Biol Cybern. 1981;39: 195–209. doi:10.1007/BF00342772
596
597
598
28. Eggermont JJ, Aertsen AMHJ, Johannesma PIM. Quantitative characterisation
procedure for auditory neurons based on the spectro-temporal receptive field. Hear Res.
1983;10: 167–190. doi:10.1016/0378-5955(83)90052-7
599
600
601
29. Tian B. Processing of Frequency-Modulated Sounds in the Lateral Auditory Belt Cortex
of the Rhesus Monkey. J Neurophysiol. 2004;92: 2993–3013.
doi:10.1152/jn.00472.2003
602
603
30. Saenz M, Langers DRM. Tonotopic mapping of human auditory cortex. Hear Res.
2014;307: 42–52. doi:10.1016/j.heares.2013.07.016
604
605
31. Chi T, Ru P, Shamma SA. Multiresolution spectrotemporal analysis of complex sounds.
J Acoust Soc Am. 2005;118: 887. doi:10.1121/1.1945807
606
607
32. Clopton BM, Backoff PM. Spectrotemporal receptive fields of neurons in cochlear
nucleus of guinea pig. Hear Res. 1991;52: 329–344.
27
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
608
609
610
33. Pasley BN, David SV, Mesgarani N, Flinker A, Shamma SA, Crone NE, et al.
Reconstructing Speech from Human Auditory Cortex. Zatorre R, editor. PLoS Biol.
2012;10: e1001251. doi:10.1371/journal.pbio.1001251
611
612
613
34. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear
auditory neurons obtained using natural sounds. J Neurosci Off J Soc Neurosci. 2000;20:
2315–2331.
614
615
616
35. Lotte F, Brumberg JS, Brunner P, Gunduz A, Ritaccio AL, Guan C, et al.
Electrocorticographic representations of segmental features in continuous speech. Front
Hum Neurosci. 2015;09. doi:10.3389/fnhum.2015.00097
617
618
36. Mesgarani N, Cheung C, Johnson K, Chang EF. Phonetic Feature Encoding in Human
Superior Temporal Gyrus. Science. 2014;343: 1006–1010. doi:10.1126/science.1245994
619
620
37. Tankus A, Fried I, Shoham S. Structured neuronal encoding and decoding of human
speech features. Nat Commun. 2012;3: 1015. doi:10.1038/ncomms1995
621
622
623
38. Huth AG, de Heer WA, Griffiths TL, Theunissen FE, Gallant JL. Natural speech reveals
the semantic maps that tile human cerebral cortex. Nature. 2016;532: 453–458.
doi:10.1038/nature17637
624
625
626
39. Lachaux J-P, Axmacher N, Mormann F, Halgren E, Crone NE. High-frequency neural
activity and human cognition: past, present and possible future of intracranial EEG
research. Prog Neurobiol. 2012;98: 279–301. doi:10.1016/j.pneurobio.2012.06.008
627
628
629
40. Miller KJ, Leuthardt EC, Schalk G, Rao RPN, Anderson NR, Moran DW, et al. Spectral
changes in cortical surface potentials during motor movement. J Neurosci Off J Soc
Neurosci. 2007;27: 2424–2432. doi:10.1523/JNEUROSCI.3886-06.2007
630
631
632
41. Boonstra TW, Houweling S, Muskulus M. Does Asynchronous Neuronal Activity
Average out on a Macroscopic Scale? J Neurosci. 2009;29: 8871–8874.
doi:10.1523/JNEUROSCI.2020-09.2009
633
634
635
42. Towle VL, Yoon H-A, Castelle M, Edgar JC, Biassou NM, Frim DM, et al. ECoG
gamma activity during a language task: differentiating expressive and receptive speech
areas. Brain. 2008;131: 2013–2027. doi:10.1093/brain/awn147
636
637
638
43. Llorens A, Trébuchon A, Liégeois-Chauvel C, Alario F-X. Intra-Cranial Recordings of
Brain Activity During Language Production. Front Psychol. 2011;
doi:10.3389/fpsyg.2011.00375
639
640
641
44. Crone NE, Boatman D, Gordon B, Hao L. Induced electrocorticographic gamma activity
during auditory perception. Brazier Award-winning article, 2001. Clin Neurophysiol Off
J Int Fed Clin Neurophysiol. 2001;112: 565–582.
642
643
644
45. Sturm I, Blankertz B, Potes C, Schalk G, Curio G. ECoG high gamma activity reveals
distinct cortical representations of lyrics passages, harmonic and timbre-related changes
in a rock song. Front Hum Neurosci. 2014;8. doi:10.3389/fnhum.2014.00798
28
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
645
646
647
46. Aleman A, Nieuwenstein MR, Böcker KB., de Haan EH. Music training and mental
imagery ability. Neuropsychologia. 2000;38: 1664–1668. doi:10.1016/S00283932(00)00079-8
648
649
47. Janata P, Paroo K. Acuity of auditory images in pitch and time. Percept Psychophys.
2006;68: 829–844.
650
651
652
48. Lotze M, Scheler G, Tan H-RM, Braun C, Birbaumer N. The musician’s brain:
functional imaging of amateurs and professionals during performance and imagery.
NeuroImage. 2003;20: 1817–1829.
653
654
49. Ellis D. Dynamic time warping (DTW) in Matlab [Internet]. 2003. Available:
http://www.ee.columbia.edu/~dpwe/resources/matlab/dtw/
655
656
657
658
50. Lartillot O, Toiviainen P, Eerola T. A Matlab Toolbox for Music Information Retrieval.
In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R, editors. Data Analysis,
Machine Learning and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg;
2008. pp. 261–268. Available: http://link.springer.com/10.1007/978-3-540-78246-9_31
659
660
51. Haynes J-D, Rees G. Predicting the orientation of invisible stimuli from activity in
human primary visual cortex. Nat Neurosci. 2005;8: 686–691. doi:10.1038/nn1445
661
662
52. Horikawa T, Tamaki M, Miyawaki Y, Kamitani Y. Neural Decoding of Visual Imagery
During Sleep. Science. 2013;340: 639–642. doi:10.1126/science.1234330
663
664
665
53. Reddy L, Tsuchiya N, Serre T. Reading the mind’s eye: Decoding category information
during mental imagery. NeuroImage. 2010;50: 818–825.
doi:10.1016/j.neuroimage.2009.11.084
666
667
668
54. Martin S, Brunner P, Holdgraf C, Heinze H-J, Crone NE, Rieger J, et al. Decoding
spectrotemporal features of overt and covert speech from the human cortex. Front
Neuroengineering. 2014; doi:10.3389/fneng.2014.00014
669
670
55. Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech
in human cortex. Nature Communications. 2016; doi:10.1038/ncomms13619
671
672
673
56. Holdgraf CR, de Heer W, Pasley B, Rieger J, Crone N, Lin JJ, et al. Rapid tuning shifts
in human auditory cortex enhance speech intelligibility. Nat Commun. 2016;7: 13654.
doi:10.1038/ncomms13654
674
675
676
57. Kosslyn SM, Thompson WL. Shared mechanisms in visual imagery and visual
perception: Insights from cognitive neuroscience. In: Gazzaniga MS, editor. The New
Cognitive Neurosciences. 2nd ed. Cambridge, MA: MIT Press; 2000.
677
678
58. Kosslyn SM, Thompson WL. When is early visual cortex activated during visual mental
imagery? Psychol Bull. 2003;129: 723–746. doi:10.1037/0033-2909.129.5.723
679
680
681
59. Herholz SC, Lappe C, Knief A, Pantev C. Neural basis of music imagery and the effect
of musical expertise. Eur J Neurosci. 2008;28: 2352–2360. doi:10.1111/j.14609568.2008.06515.x
29
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
682
683
60. Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory
cortical representation in musicians. Nature. 1998;392: 811–814. doi:10.1038/33918
684
685
686
61. Wu MC-K, David SV, Gallant JL. Complete functional characterization of sensory
neurons by system identification. Annu Rev Neurosci. 2006;29: 477–505.
doi:10.1146/annurev.neuro.29.051605.113024
687
688
62. Thirion B, Duschenay E, Michel V, Varoquaux G, Grisel O, VanderPlas J, et al.
scikitlearn. J Mach Learn Res. 2011;12: 2825–2830.
689
63. Garofolo JS. TIMIT acoustic-phonetic continuous speech corpus. 1993;
690
691
Supporting Information
692
693
694
695
696
697
Figure S1. Anatomical distribution of significant electrodes. Electrodes with significant encoding accuracy
overlaid on cortical surface reconstruction of the participant’s cerebrum. To define auditory sensory areas (pink),
we built an encoding model on data recorded while participant listened passively to speech sentences from the
TIMIT corpus [64]. Electrodes with significant encoding accuracy (p<0.05; FDA correction) are highlighted.
30
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
698
699
700
Figure S2. Spectrotemporal receptive fields. STRFs for the perception (top panel) and imagery (bottom panel)
models. Grey electrodes were removed from the analysis due to excessive noise.
31
bioRxiv preprint first posted online Feb. 7, 2017; doi: http://dx.doi.org/10.1101/106617. The copyright holder for this preprint (which was
not peer-reviewed) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
701
702
703
704
705
706
707
Figure S3. Neural encoding during passive listening. Standard STRFs for the passive listening model – built
on data recorded while the participant listened passively to speech sentences from the TIMIT corpus [64]. Grey
electrodes were removed from the analysis due to excessive noise. Encoding accuracy is plotted on the cortical
surface reconstruction of the participant’s cerebrum (map thresholded at p<0.05; FDR correction).
32