Supplementary Material (docx 356K)

1
Supplementary Information
2
Supplemental Methods
3
Computational Modeling of choice behavior. Different Reinforcement-Learning (RL-) models
4
were fit to the observed choice data: a model-free Single-Update (SU-) model, a full Double-
5
Update (DU-) model, and a third model that individually weights the degree of double-
6
updating via a parameter (iDU). In the following, we describe each model in detail
7
accompanied by equations.
8
First, the SU-algorithm updates a decision value π‘„π‘Ž,𝑑 for the chosen stimulus via the RPE
9
π›Ώπ‘„π‘Ž,𝑑 which is defined as the difference between the received reward 𝑅𝑑 and the anticipated
10
reward for the chosen stimulus π‘„π‘Ž,𝑑 :
11
(1) π›Ώπ‘„π‘Ž,𝑑 = 𝑅𝑑 βˆ’ π‘„π‘Ž,𝑑
12
The RPE π›Ώπ‘„π‘Ž,𝑑 is used to update decision values of the chosen decision value trial-by-trial:
13
(2) π‘„π‘Ž,𝑑+1 = π‘„π‘Ž,𝑑 + π›Όπ›Ώπ‘„π‘Ž,𝑑
14
Here, 𝛼 depicts the learning rate, which weights the influence of RPEs π›Ώπ‘„π‘Ž,𝑑 on the updated
15
values. 𝛼 has natural boundaries between 0 and 1. Importantly, this model neglects the anti-
16
correlated task structure by only updating decision values for the chosen stimulus while the
17
value of the unchosen stimulus π‘„π‘’π‘Ž,𝑑 remains unchanged:
18
(3) π‘„π‘’π‘Ž,𝑑+1 = π‘„π‘’π‘Ž,𝑑
19
Second, the DU-algorithm updates chosen and unchosen decision values in each trial. This
20
takes the anti-correlated structure of the task into account. In our modeling approach, this is
21
captured by additionally updating the unchosen decision. The RPE for the DU-model is:
22
(4) π›Ώπ‘„π‘’π‘Ž,𝑑 = βˆ’π‘…π‘‘ βˆ’ π‘„π‘’π‘Ž,𝑑
23
The same learning rate 𝛼 is used for updating unchosen values, thus, equation 5 gives the
24
same weight to the update of unchosen decision values as to that of chosen decision values:
25
(5) π‘„π‘’π‘Ž,𝑑+1 = π‘„π‘’π‘Ž,𝑑 + π›Όπ›Ώπ‘„π‘’π‘Ž,𝑑
26
Third, the iDU-algorithm assumes that the degree of updating the alternative choice option
27
differs across individuals. This is provided by the parameter πœ…, which weights the learning
28
rate 𝛼 for the unchosen RPE π›Ώπ‘„π‘’π‘Ž,𝑑 :
29
(6) π‘„π‘’π‘Ž,𝑑+1 = π‘„π‘’π‘Ž,𝑑 + πœ…π›Όπ›Ώπ‘„π‘’π‘Ž,𝑑
30
Please note that the three models described are nested. In the iDU-model, the RPE π›Ώπ‘„π‘’π‘Ž,𝑑 is
31
weighted by the product of the learning rate for the chosen value and the parameter πœ…, where
32
πœ… = 0 reduces to the SU-Model and πœ… = 1 to the DU-Model. This results in slower learning
33
rates for DU-learning. For the task at hand, as double-updating depends on inference
34
derived from feedback actually experienced, updating the unchosen stimulus always relies
35
on learning from feedback for the chosen stimulus, that is, it is rather unlikely to be a process
1
36
which is independent from updating the chosen stimulus (compare Li and Daw, 2011 for the
37
identical implementation).
38
Additionally, we included a model with an adaptive learning rate, Sutton-K1 (Sutton, 1992),
39
which was discussed and used as a non-hierarchical approximation of a dynamic learning
40
rate (Chumbley et al, 2012; Iglesias et al, 2013; Kepecs and Mainen, 2012; Landy et al,
41
2012; Mathys et al, 2014). By including it, we tested whether a dynamic learning rate
42
captures the observed behavior generally better than algorithms with a fixed learning rate. In
43
this model, values are also updated via prediction errors as in equations (1) and (2).
44
Differently, learning rate 𝛼 is dynamically updated as a function of the change in prediction
45
errors encountered (Sutton, 1992). The dynamic learning rate is transformed with a logistic
46
function to remain in boundaries between 0 and 1:
1
47
(7) 𝛼𝑑 = 1+ exp(βˆ’πœ„ )
48
This is initialized with πœ„ =0 corresponding to an initial learning rate of .5. The update of πœ„ for
49
the next trial depends on the change in reward prediction errors where
t
(8) πœ„t+1= πœ„(𝑑) + ΞΌπ›Ώπ‘„π‘Ž,𝑑 β„Žt
50
51
and
52
(9) β„Žt+1 = (β„Žt + 𝛼t 𝛿𝑄a,t ) βˆ— max((1 βˆ’ 𝛼t) , 0)
53
πœ‡ given in (8) is a parameter which controls the dynamic update of the learning rate. πœ„ is a
54
sensitivity parameter of the learning rate, controlling the influence of the RPE from the last
55
trial on a trial-by-trial basis as a function of πœ‡. Again, note that this model is nested with RL-
56
models with a constant learning rate because setting =0 keeps  constant.
57
Decision model. For all models, decisions are transformed into action probabilities by
58
applying a softmax equation. This includes a parameter 𝛽, which reflects the stochasticity of
59
the choices and also captures aspects of the exploration–exploitation dimension (Daw,
60
Cohen).
61
(7) p(a) =
62
Model Fitting. Models were fitted using the HGF toolbox 2.0 (Mathys et al, 2011; Mathys et
63
al, 2014) as part of TNU Algorithms for Psychiatry-Advancing Science (TAPAS,
64
http://www.translationalneuromodeling.org/tapas/). For priors on parameters of the learning
65
algorithm and the observation model (softmax), please see S-Table 3. For optimization, a
66
quasi-Newton optimization algorithm was applied. For group-level model selection, the
67
negative variational free energy (as an approximation to the log model evidence) for each
68
model and each individual was subjected to a random-effects Bayesian Model Selection
69
procedure (BMS) (Stephan et al, 2009). After comparison of best-fitting modeling parameters
70
between groups, we also controlled for the possibility that parameter comparisons can be
71
confounded by poor absolute model fit, namely that a model cannot explain the data better
exp(Ξ²Q(a))
βˆ‘ exp(Ξ²Q(aβ€²))
2
72
than chance. This was done by looking at each individual’s negative log-likelihood (the
73
probability that the data is given by the parameters) relative to the number of trials. If this
74
β€œpercentage of explained trials” did not exceed .55, a subject was classified as not fit better
75
than chance. This was the case for two patients only. This control analysis is also important
76
because the stochasticity parameter beta, which describes the steepness of the softmax,
77
reaches levels of randomness when fit is below chance, which also confounds an
78
interpretation along the exploration-exploitation dimension captured by this parameter. As
79
reported in the main manuscript, between-group findings remained significant when
80
excluding the two patients not fit better than chance.
81
MRI data acquisition. Functional imaging was conducted on a 3 Tesla Siemens Trio scanner
82
to acquire gradient echo T2*-weighted echo-planar images with blood oxygenation level
83
dependent contrast (40 slices at 20° to the AC-PC line, ascending order, 2.5-mm thickness,
84
3x3mm² in-plane voxel resolution, 0.5-mm gap, TR=2.09s, TE=22ms, flip angle Ξ±=90°). To
85
account for individual homogeneity differences of the magnetic field, we acquired a field
86
distortion map. T1-weighted anatomical images were collected for normalization purposes.
87
Preprocessing of fMRI data. Data were preprocessed and analyzed using SPM8. Images
88
were corrected for delay of slice time acquisition. Voxel-displacement maps were estimated
89
based on acquired field maps. For the purpose of motion correction, all images were
90
realigned and additionally corrected for distortion and the interaction of distortion and motion.
91
Normalization parameters were derived from the segmentation of the individual T1-weighted
92
structural image (Ashburner and Friston, 2005) and used for spatial normalization of the
93
functional images to the Montreal Neurological Institute space. Normalized images were
94
spatially smoothed (isotropic Gaussian kernel, 6mm full-width at half maximum).
95
96
3
97
Supplemental Results
98
Switching as a function of task phases. According to a reviewer’s suggestion, switching
99
behavior was analyzed as a function of feedback and phase and the between-subjects factor
100
group. This revealed a significant main effect of feedback (F=263.81, p<.001), a significant
101
main effect of phase (F=14.64, p<.001) and a significant interaction of feedback x phase
102
(F=7.07, p=.001). The latter interaction was descriptively due to more pronounced switching
103
after losses from the pre-reversal over the reversal to the post-reversal phase. Regarding
104
group differences, we did not observe a significant feedback x phase x group, nor a phase x
105
group interaction, nor a feedback x group interaction (all Fs<=.59, p>=.56). The main effect
106
of group on switching behavior remained significant (F=8.97, p=.005).
107
Further, we analyzed perseveration in the context of loss as a function of phase and
108
observed a significant main effect of phase (F=13.80, p<.001), but neither a main effect of
109
group (F=.89, p=.35) nor an interaction effect of group x phase (F=.86, p=.44).
110
Neuropsychological Measurements. In exploratory post-hoc t-tests, we found uncorrected
111
group differences in verbal intelligence as well as visual attention (see Table 1). Next, we
112
tested, in both groups separately, for associations of indices from our decision-making task
113
with the neurocognitive measures. See Table S-1 for the correlation matrix. We observed
114
that both, verbal intelligence as well as cognitive speed, were positively correlated with task
115
measures in BED patients. Further, to test for any effect of these two cognitive measures on
116
between-group differences in task performance, we performed analyses of covariance
117
(ANCOVA) on correct choices including verbal intelligence or cognitive speed scores as
118
covariates, respectively: including verbal intelligence in the model confirmed a significant
119
effect of group (F=5.19, p=.03), as well as a significant interaction of group and verbal
120
intelligence (F=4.47, p=.04) while there was no significant main effect of verbal intelligence
121
(F=1.23, p=.27). Repeating this ANCOVA model with switching as dependent variable
122
revealed consistent results by showing a significant main effect of group (F=11.63, p=.002)
123
and a significant interaction effect of group and verbal intelligence (F=10.25, p=.003) but no
124
significant main effect of verbal intelligence (F=.60, p=.44). Regarding effects of cognitive
125
speed, we repeated the above described ANCOVAs. We observed a significant main effect
126
of group effect on correct choices (F=4.22, p=.047) as well as a main effect of cognitive
127
speed. The interaction effect between group and cognitive speed failed to reach significance
128
(F=2.98, p=.09). Entering switching behavior as dependent variable, a significant main effect
129
of group (F=6.92, p=.01) and of cognitive speed per se (F=4.93, p=.03) as well as a
130
significant interaction effect of group and cognitive speed (F=4.90, p=.03) were found. S-
131
Table 1 shows that the interaction effects between group and verbal IQ or cognitive spend
132
are driven by correlations in the BED group which were not significant in the control group.
133
As one might have expected, some cognitive measures co-vary to some degree with
4
134
performance in the flexible decision-making task, most prominently in patients, but do not
135
account for the same variance as the main effect of group was confirmed significant in each
136
of the cases. As additionally requested by one of our reviewers, we also included the within-
137
subject factor phase in these ANCOVA models and the above reported results remained
138
significant. Regardless of group, there was a phase x cognitive speed interaction but no
139
phase x verbal IQ interaction indicating that cognitive speed is positively correlated with inter-
140
individual performance differences between the phases irrespectively of between-group
141
differences. This is in line with findings in other task probing flexible decision-making (Reiter
142
et al, 2016; Schad et al, 2014)
143
Smoking. We tested whether smoking status influenced the observed results. Including the
144
additional factor smoking status in the repeated measures ANOVA on correct choices
145
(within-subject factor: phase, between-subject factor group) did not reveal any evidence for a
146
main effect of smoking status (F=.47, p=.50), no significant interaction between smoking
147
status x group (F=1.33, p=.26), nor a significant interaction between phase x smoking status,
148
nor a significant 3-way-interaction of phase x group x smoking status (F=.12, p=.89). The
149
main effect of group remained significant (F=6.51, p=.02).
150
5
151
152
153
Ashburner J, Friston KJ (2005). Unified segmentation. Neuroimage 26(3): 839-851.
154
155
156
Bartra O, McGuire JT, Kable JW (2013). The valuation system: a coordinate-based meta-analysis of
BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76: 412-427.
157
158
159
Chumbley JR, Flandin G, Bach DR, Daunizeau J, Fehr E, Dolan RJ, et al (2012). Learning and
generalization under ambiguity: an fMRI study. PLoS Comput Biol 8(1): e1002346.
160
161
162
Iglesias S, Mathys C, Brodersen KH, Kasper L, Piccirelli M, den Ouden HE, et al (2013). Hierarchical
Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning. Neuron 80(2): 519-530.
163
164
165
Kepecs A, Mainen ZF (2012). A computational framework for the study of confidence in humans and
animals. Philos Trans R Soc Lond B Biol Sci 367(1594): 1322-1337.
166
167
168
Landy MS, Trommershauser J, Daw ND (2012). Dynamic estimation of task-relevant variance in
movement under risk. J Neurosci 32(37): 12702-12711.
169
170
171
Li J, Daw ND (2011). Signals in human striatum are appropriate for policy update rather than value
prediction. J Neurosci 31(14): 5504-5511.
172
173
174
Mathys C, Daunizeau J, Friston KJ, Stephan KE (2011). A bayesian foundation for individual learning
under uncertainty. Front Hum Neurosci 5: 39.
175
176
177
Mathys CD, Lomakina EI, Daunizeau J, Iglesias S, Brodersen KH, Friston KJ, et al (2014). Uncertainty
in perception and the Hierarchical Gaussian Filter. Front Hum Neurosci 8: 825.
178
179
180
Reiter AMF, Deserno L, Wilbertz T, Heinze H-J, Schlagenhauf F (2016). Risk factors for addiction and
their association with model-based behavioral control. Frontiers in Behavioral Neuroscience 10.
181
182
183
Rushworth MF, Noonan MP, Boorman ED, Walton ME, Behrens TE (2011). Frontal cortex and rewardguided learning and decision-making. Neuron 70(6): 1054-1069.
184
185
186
187
Schad DJ, Junger E, Sebold M, Garbusow M, Bernhardt N, Javadi AH, et al (2014). Processing speed
enhances model-based over model-free reinforcement learning in the presence of high working
memory functioning. Front Psychol 5: 1450.
188
189
190
Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009). Bayesian model selection for
group studies. Neuroimage 46(4): 1004-1017.
191
192
193
Sutton RS (1992). Gain adaptation beats least squares? Proceedings of the 7th Yale Workshop on
Adaptive and Learning Systems, pp 161-166.
194
195
196
197
Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al (2002).
Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of
the MNI MRI single-subject brain. Neuroimage 15(1): 273-289.
References
198
6
199
7
200
Supplemental Tables
201
202
203
204
S-Table 1. Exploratory Correlation Matrix between behavioral outcome measures (correct
responses, switching behavior) and measures of the neurocognitive test battery. Significant
(uncorrected) correlations with task measures were found for cognitive speed in the BED
group and verbal intelligence in both groups (here, marked with * in red).
Correct
responses (%)
Reasoning
(Matrices)
HC
r=.038, p=.871
Switching to
alternative
option (%)
r=.-098, p=.673
BED
r=.189, p=.400
r=-.104, p=.646
Working Memory
(Backward Digit Span)
HC
r=-.012, p=.960
r=.172, p=.455
BED
r=.245, p=.271
r=-.098, p=.664
Visual Attention
(Trail Making A)
HC
r=-.031, p=.892
BED
r=-.197, p=.379
r=-.068, p=.771
r=.020, p=.930
HC
r=-.325, p=.150
r=-.078, p=.736
BED
r=-.051, p=.828
r=-.016, p=.947
HC
r=.123, p=.596
r =-.001, p=.996
BED
r=.480*, p=.024
r=-.602*, p=.003
HC
r=-1.76, p=.446
r=.307, p=.175
BED
r=.469*,
r=-.639*, p=.002
Complex Attention / Task Switching
(Trail Making B)
Cognitive Speed
(Digit Symbol Substitution Test)
Verbal Intelligence
(German Vocabulary Test)
205
206
207
8
p=.032
208
209
210
S-Table 2. Priors of parameters. The decision parameter of the observation model  was
estimated in log-space as well as the parameters  and h of Sutton K1; parameters  and 
of the learning algorithms were estimated in logit-space.
Prior Mean
Prior Variance
Observation Model for all learning models
Softmax

1
16
Learning models
SU

0
0

.55
1
SU-WL

(w/l)
0
0
.5 / .6
1/1
DU

1
0

.25
1
DU-WL

(w/l)
1
0
.4 / .1
1/1
iDU

.1
1

.55
1
iDU-WL

(w/l)
.1
1
.55 / .45
1
Sutton-K1

1
1
h
.005
1
211
212
213
214
215
216
S-Table 3. Model Selection: Expected Posterior Probabilities (PP) and Exceedance
Probabilities (XP) for all models. SU=single update, DU=double update, iDU=inidividuallyweighted double update, WL indicates that the model had separate learning rates for wins
and losses
All
(n=44)
HC
(n=22)
BED
(n=22)
SU
SU-WL
DU
DU-WL
iDU
iDU-WL
Sutton K-1
PP
XP
PP
.160
.054
.130
.072
.001
.073
.052
<.001
.060
.150
.039
.180
.271
601
.241
.231
.303
.280
066
<.001
.040
XP
.035
.005
.003
.113
.321
.522
<.001
PP
.175
.117
.096
.099
.219
.158
.137
XP
.210
.058
.030
.033
.426
.150
094
217
9
218
219
220
S-Table 4. Descriptive Statistics of parameters for iDU, the best-fitting model across HC and
BED. The index c index indicates the learning rate for chosen stimulus. M=mean,
SD=standard deviation.
Healthy Controls
(n=22)
Binge Eating Patients
(n=22)

M=6.13, SD=2.67
c
M=0.54, SD=0.19
*c
M=0.07, SD=0.04
M=4.17, SD=2.50
M=0.49, SD=0.24
M=0.07, SD=0.06
221
222
223
224
S-Table 5. Distribution of inferred parameters for iDU, the best-fitting model across HC and
BED

c
*c
25th Percentile
2.78
.37
.03
50th Percentile
4.58
.56
.07
75th Percentile
7.73
.67
.09
225
226
10
227
228
229
S-Table 6: Effects of Single-Update, Double-Update RPE (only clusters k=>10 are listed for
clarity) and conjoint activation of both.
Region
MNI
coordinates
cluster
size
T
p
(FWE)
6.18
<0.001
5.98
<0.001
28 -4 4
6.58
<0.001
30 -12 4
6.22
0.001
Single-Update RPE
8 12 -10
Caudate
Putamen
14 6 -14
30 6 4
76
101
-12 6 -14
5.65
0.009
6.90
6.63
<0.001
<0.001
Caudate/Amygdala
-26 -4 -18
141
6.07
<0.001
Hippocampus
28 -18 -18
12
5.65
0.009
-18 -6 -18
6.36
0.001
Hippocampus
26 -6 -20
18 -8 -18
24
5.98
0.002
Inferior Frontal Gyrus
-34 36 -16
18
5.58
0.001
5.86
0.004
5.89
0.004
6.20
0.001
5.20
0.042
5.90
0.003
5.41
0.020
6.02
0.002
5.19
0.043
30 36 -14
Inferior Frontal Gyrus
26 30 -18
42
-14 60 -2
Superior Medial Gyrus
-14 66 -6
Middle Orbital Gyrus
-6 50 -14
Posterior Cingulate Cortex
-4 -36 38
24
-6 60 -8
19
6 -48 30
67
6.36
<0.001
Middle Temporal Gyrus
-60 -40 -12
-62 -32 -14
38
5.66
0.012
Inferior Temporal Gyrus
-56 -58 -8
14
5.34
Cerebellum
-40 -70 -42
7.29
0.004
<0.001
-40 -78 -30
6.58
<0.001
6.05
<0.001
6.81
<0.001
6.34
0.001
-28 -86 -28
204
44 -74 -32
Cerebellum
44 -68 -42
101
Double-Update RPE
Caudate
-4 18 -8
13
6.27
<0.001
Angular Gyrus
-40 -68 30
24
5.79
<0.001
Middle Orbital Gyrus
2 50 -12
16
5.60
<0.001
Middle Orbital Gyrus
-8 54 -2
19
5.62
0.009
Conjunction Single-Update and Double-Update RPE
Middle Orbital Gyrus
-6 52 -12
3
5.27
0.033
Inferior Frontal Gyrus
32 36 -12
1
5.79
0.040
11
230
231
S-Table 7. Activation in Exploratory vs. Exploitative Trials (F-contrast)
Main Effect: Exploitation vs. Exploration
MNI
Cluster
Region
coordinate
size
F
anterior Insula/vlPFC
p(FWEcorr)
-28 26 -2
41
59.95
<0.001
anterior Insula/vlPFC
32 24 -8
17
53.34
<0.001
dmPFC
-4 16 46
7
41.22
<0.001
anterior Insula/vlPFC
44 22 -10
5
38.87
<0.001
anterior Insula/vlPFC
-32 20 -8
2
37.77
<0.001
12
232
233
Supplemental Figures
234
S-Figure 1. For the ventro-medial prefrontal cortex, the a priori region of interest, an
235
anatomical search volume was defined according to criteria described in Rushworth et al.
236
(2011) comprising the superior medial frontal gyrus and the medial orbitofrontal gyrus
237
(Rushworth et al, 2011), based on anatomical labeling (Tzourio-Mazoyer et al, 2002),
238
truncated dorsally at MNI z=+10 (also compare Bartra et al, 2013). Blue numbers denote
239
axial slices and the region of interest is depicted in red (21760mm3 corresponding to 2720
240
voxels based on an isotropic voxel size of 2mm).
241
242
243
244
S-Figure 2. Correct choices as a function of phase and group.
245
13