1 Supplementary Information 2 Supplemental Methods 3 Computational Modeling of choice behavior. Different Reinforcement-Learning (RL-) models 4 were fit to the observed choice data: a model-free Single-Update (SU-) model, a full Double- 5 Update (DU-) model, and a third model that individually weights the degree of double- 6 updating via a parameter (iDU). In the following, we describe each model in detail 7 accompanied by equations. 8 First, the SU-algorithm updates a decision value ππ,π‘ for the chosen stimulus via the RPE 9 πΏππ,π‘ which is defined as the difference between the received reward π π‘ and the anticipated 10 reward for the chosen stimulus ππ,π‘ : 11 (1) πΏππ,π‘ = π π‘ β ππ,π‘ 12 The RPE πΏππ,π‘ is used to update decision values of the chosen decision value trial-by-trial: 13 (2) ππ,π‘+1 = ππ,π‘ + πΌπΏππ,π‘ 14 Here, πΌ depicts the learning rate, which weights the influence of RPEs πΏππ,π‘ on the updated 15 values. πΌ has natural boundaries between 0 and 1. Importantly, this model neglects the anti- 16 correlated task structure by only updating decision values for the chosen stimulus while the 17 value of the unchosen stimulus ππ’π,π‘ remains unchanged: 18 (3) ππ’π,π‘+1 = ππ’π,π‘ 19 Second, the DU-algorithm updates chosen and unchosen decision values in each trial. This 20 takes the anti-correlated structure of the task into account. In our modeling approach, this is 21 captured by additionally updating the unchosen decision. The RPE for the DU-model is: 22 (4) πΏππ’π,π‘ = βπ π‘ β ππ’π,π‘ 23 The same learning rate πΌ is used for updating unchosen values, thus, equation 5 gives the 24 same weight to the update of unchosen decision values as to that of chosen decision values: 25 (5) ππ’π,π‘+1 = ππ’π,π‘ + πΌπΏππ’π,π‘ 26 Third, the iDU-algorithm assumes that the degree of updating the alternative choice option 27 differs across individuals. This is provided by the parameter π , which weights the learning 28 rate πΌ for the unchosen RPE πΏππ’π,π‘ : 29 (6) ππ’π,π‘+1 = ππ’π,π‘ + π πΌπΏππ’π,π‘ 30 Please note that the three models described are nested. In the iDU-model, the RPE πΏππ’π,π‘ is 31 weighted by the product of the learning rate for the chosen value and the parameter π , where 32 π = 0 reduces to the SU-Model and π = 1 to the DU-Model. This results in slower learning 33 rates for DU-learning. For the task at hand, as double-updating depends on inference 34 derived from feedback actually experienced, updating the unchosen stimulus always relies 35 on learning from feedback for the chosen stimulus, that is, it is rather unlikely to be a process 1 36 which is independent from updating the chosen stimulus (compare Li and Daw, 2011 for the 37 identical implementation). 38 Additionally, we included a model with an adaptive learning rate, Sutton-K1 (Sutton, 1992), 39 which was discussed and used as a non-hierarchical approximation of a dynamic learning 40 rate (Chumbley et al, 2012; Iglesias et al, 2013; Kepecs and Mainen, 2012; Landy et al, 41 2012; Mathys et al, 2014). By including it, we tested whether a dynamic learning rate 42 captures the observed behavior generally better than algorithms with a fixed learning rate. In 43 this model, values are also updated via prediction errors as in equations (1) and (2). 44 Differently, learning rate πΌ is dynamically updated as a function of the change in prediction 45 errors encountered (Sutton, 1992). The dynamic learning rate is transformed with a logistic 46 function to remain in boundaries between 0 and 1: 1 47 (7) πΌπ‘ = 1+ exp(βπ ) 48 This is initialized with π =0 corresponding to an initial learning rate of .5. The update of π for 49 the next trial depends on the change in reward prediction errors where t (8) πt+1= π(π‘) + ΞΌπΏππ,π‘ βt 50 51 and 52 (9) βt+1 = (βt + πΌt πΏπa,t ) β max((1 β πΌt) , 0) 53 π given in (8) is a parameter which controls the dynamic update of the learning rate. π is a 54 sensitivity parameter of the learning rate, controlling the influence of the RPE from the last 55 trial on a trial-by-trial basis as a function of π. Again, note that this model is nested with RL- 56 models with a constant learning rate because setting ο=0 keeps ο‘ constant. 57 Decision model. For all models, decisions are transformed into action probabilities by 58 applying a softmax equation. This includes a parameter π½, which reflects the stochasticity of 59 the choices and also captures aspects of the explorationβexploitation dimension (Daw, 60 Cohen). 61 (7) p(a) = 62 Model Fitting. Models were fitted using the HGF toolbox 2.0 (Mathys et al, 2011; Mathys et 63 al, 2014) as part of TNU Algorithms for Psychiatry-Advancing Science (TAPAS, 64 http://www.translationalneuromodeling.org/tapas/). For priors on parameters of the learning 65 algorithm and the observation model (softmax), please see S-Table 3. For optimization, a 66 quasi-Newton optimization algorithm was applied. For group-level model selection, the 67 negative variational free energy (as an approximation to the log model evidence) for each 68 model and each individual was subjected to a random-effects Bayesian Model Selection 69 procedure (BMS) (Stephan et al, 2009). After comparison of best-fitting modeling parameters 70 between groups, we also controlled for the possibility that parameter comparisons can be 71 confounded by poor absolute model fit, namely that a model cannot explain the data better exp(Ξ²Q(a)) β exp(Ξ²Q(aβ²)) 2 72 than chance. This was done by looking at each individualβs negative log-likelihood (the 73 probability that the data is given by the parameters) relative to the number of trials. If this 74 βpercentage of explained trialsβ did not exceed .55, a subject was classified as not fit better 75 than chance. This was the case for two patients only. This control analysis is also important 76 because the stochasticity parameter beta, which describes the steepness of the softmax, 77 reaches levels of randomness when fit is below chance, which also confounds an 78 interpretation along the exploration-exploitation dimension captured by this parameter. As 79 reported in the main manuscript, between-group findings remained significant when 80 excluding the two patients not fit better than chance. 81 MRI data acquisition. Functional imaging was conducted on a 3 Tesla Siemens Trio scanner 82 to acquire gradient echo T2*-weighted echo-planar images with blood oxygenation level 83 dependent contrast (40 slices at 20° to the AC-PC line, ascending order, 2.5-mm thickness, 84 3x3mm² in-plane voxel resolution, 0.5-mm gap, TR=2.09s, TE=22ms, flip angle Ξ±=90°). To 85 account for individual homogeneity differences of the magnetic field, we acquired a field 86 distortion map. T1-weighted anatomical images were collected for normalization purposes. 87 Preprocessing of fMRI data. Data were preprocessed and analyzed using SPM8. Images 88 were corrected for delay of slice time acquisition. Voxel-displacement maps were estimated 89 based on acquired field maps. For the purpose of motion correction, all images were 90 realigned and additionally corrected for distortion and the interaction of distortion and motion. 91 Normalization parameters were derived from the segmentation of the individual T1-weighted 92 structural image (Ashburner and Friston, 2005) and used for spatial normalization of the 93 functional images to the Montreal Neurological Institute space. Normalized images were 94 spatially smoothed (isotropic Gaussian kernel, 6mm full-width at half maximum). 95 96 3 97 Supplemental Results 98 Switching as a function of task phases. According to a reviewerβs suggestion, switching 99 behavior was analyzed as a function of feedback and phase and the between-subjects factor 100 group. This revealed a significant main effect of feedback (F=263.81, p<.001), a significant 101 main effect of phase (F=14.64, p<.001) and a significant interaction of feedback x phase 102 (F=7.07, p=.001). The latter interaction was descriptively due to more pronounced switching 103 after losses from the pre-reversal over the reversal to the post-reversal phase. Regarding 104 group differences, we did not observe a significant feedback x phase x group, nor a phase x 105 group interaction, nor a feedback x group interaction (all Fs<=.59, p>=.56). The main effect 106 of group on switching behavior remained significant (F=8.97, p=.005). 107 Further, we analyzed perseveration in the context of loss as a function of phase and 108 observed a significant main effect of phase (F=13.80, p<.001), but neither a main effect of 109 group (F=.89, p=.35) nor an interaction effect of group x phase (F=.86, p=.44). 110 Neuropsychological Measurements. In exploratory post-hoc t-tests, we found uncorrected 111 group differences in verbal intelligence as well as visual attention (see Table 1). Next, we 112 tested, in both groups separately, for associations of indices from our decision-making task 113 with the neurocognitive measures. See Table S-1 for the correlation matrix. We observed 114 that both, verbal intelligence as well as cognitive speed, were positively correlated with task 115 measures in BED patients. Further, to test for any effect of these two cognitive measures on 116 between-group differences in task performance, we performed analyses of covariance 117 (ANCOVA) on correct choices including verbal intelligence or cognitive speed scores as 118 covariates, respectively: including verbal intelligence in the model confirmed a significant 119 effect of group (F=5.19, p=.03), as well as a significant interaction of group and verbal 120 intelligence (F=4.47, p=.04) while there was no significant main effect of verbal intelligence 121 (F=1.23, p=.27). Repeating this ANCOVA model with switching as dependent variable 122 revealed consistent results by showing a significant main effect of group (F=11.63, p=.002) 123 and a significant interaction effect of group and verbal intelligence (F=10.25, p=.003) but no 124 significant main effect of verbal intelligence (F=.60, p=.44). Regarding effects of cognitive 125 speed, we repeated the above described ANCOVAs. We observed a significant main effect 126 of group effect on correct choices (F=4.22, p=.047) as well as a main effect of cognitive 127 speed. The interaction effect between group and cognitive speed failed to reach significance 128 (F=2.98, p=.09). Entering switching behavior as dependent variable, a significant main effect 129 of group (F=6.92, p=.01) and of cognitive speed per se (F=4.93, p=.03) as well as a 130 significant interaction effect of group and cognitive speed (F=4.90, p=.03) were found. S- 131 Table 1 shows that the interaction effects between group and verbal IQ or cognitive spend 132 are driven by correlations in the BED group which were not significant in the control group. 133 As one might have expected, some cognitive measures co-vary to some degree with 4 134 performance in the flexible decision-making task, most prominently in patients, but do not 135 account for the same variance as the main effect of group was confirmed significant in each 136 of the cases. As additionally requested by one of our reviewers, we also included the within- 137 subject factor phase in these ANCOVA models and the above reported results remained 138 significant. Regardless of group, there was a phase x cognitive speed interaction but no 139 phase x verbal IQ interaction indicating that cognitive speed is positively correlated with inter- 140 individual performance differences between the phases irrespectively of between-group 141 differences. This is in line with findings in other task probing flexible decision-making (Reiter 142 et al, 2016; Schad et al, 2014) 143 Smoking. We tested whether smoking status influenced the observed results. Including the 144 additional factor smoking status in the repeated measures ANOVA on correct choices 145 (within-subject factor: phase, between-subject factor group) did not reveal any evidence for a 146 main effect of smoking status (F=.47, p=.50), no significant interaction between smoking 147 status x group (F=1.33, p=.26), nor a significant interaction between phase x smoking status, 148 nor a significant 3-way-interaction of phase x group x smoking status (F=.12, p=.89). The 149 main effect of group remained significant (F=6.51, p=.02). 150 5 151 152 153 Ashburner J, Friston KJ (2005). Unified segmentation. Neuroimage 26(3): 839-851. 154 155 156 Bartra O, McGuire JT, Kable JW (2013). The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76: 412-427. 157 158 159 Chumbley JR, Flandin G, Bach DR, Daunizeau J, Fehr E, Dolan RJ, et al (2012). Learning and generalization under ambiguity: an fMRI study. PLoS Comput Biol 8(1): e1002346. 160 161 162 Iglesias S, Mathys C, Brodersen KH, Kasper L, Piccirelli M, den Ouden HE, et al (2013). Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning. Neuron 80(2): 519-530. 163 164 165 Kepecs A, Mainen ZF (2012). A computational framework for the study of confidence in humans and animals. Philos Trans R Soc Lond B Biol Sci 367(1594): 1322-1337. 166 167 168 Landy MS, Trommershauser J, Daw ND (2012). Dynamic estimation of task-relevant variance in movement under risk. J Neurosci 32(37): 12702-12711. 169 170 171 Li J, Daw ND (2011). Signals in human striatum are appropriate for policy update rather than value prediction. J Neurosci 31(14): 5504-5511. 172 173 174 Mathys C, Daunizeau J, Friston KJ, Stephan KE (2011). A bayesian foundation for individual learning under uncertainty. Front Hum Neurosci 5: 39. 175 176 177 Mathys CD, Lomakina EI, Daunizeau J, Iglesias S, Brodersen KH, Friston KJ, et al (2014). Uncertainty in perception and the Hierarchical Gaussian Filter. Front Hum Neurosci 8: 825. 178 179 180 Reiter AMF, Deserno L, Wilbertz T, Heinze H-J, Schlagenhauf F (2016). Risk factors for addiction and their association with model-based behavioral control. Frontiers in Behavioral Neuroscience 10. 181 182 183 Rushworth MF, Noonan MP, Boorman ED, Walton ME, Behrens TE (2011). Frontal cortex and rewardguided learning and decision-making. Neuron 70(6): 1054-1069. 184 185 186 187 Schad DJ, Junger E, Sebold M, Garbusow M, Bernhardt N, Javadi AH, et al (2014). Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Front Psychol 5: 1450. 188 189 190 Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009). Bayesian model selection for group studies. Neuroimage 46(4): 1004-1017. 191 192 193 Sutton RS (1992). Gain adaptation beats least squares? Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems, pp 161-166. 194 195 196 197 Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, et al (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15(1): 273-289. References 198 6 199 7 200 Supplemental Tables 201 202 203 204 S-Table 1. Exploratory Correlation Matrix between behavioral outcome measures (correct responses, switching behavior) and measures of the neurocognitive test battery. Significant (uncorrected) correlations with task measures were found for cognitive speed in the BED group and verbal intelligence in both groups (here, marked with * in red). Correct responses (%) Reasoning (Matrices) HC r=.038, p=.871 Switching to alternative option (%) r=.-098, p=.673 BED r=.189, p=.400 r=-.104, p=.646 Working Memory (Backward Digit Span) HC r=-.012, p=.960 r=.172, p=.455 BED r=.245, p=.271 r=-.098, p=.664 Visual Attention (Trail Making A) HC r=-.031, p=.892 BED r=-.197, p=.379 r=-.068, p=.771 r=.020, p=.930 HC r=-.325, p=.150 r=-.078, p=.736 BED r=-.051, p=.828 r=-.016, p=.947 HC r=.123, p=.596 r =-.001, p=.996 BED r=.480*, p=.024 r=-.602*, p=.003 HC r=-1.76, p=.446 r=.307, p=.175 BED r=.469*, r=-.639*, p=.002 Complex Attention / Task Switching (Trail Making B) Cognitive Speed (Digit Symbol Substitution Test) Verbal Intelligence (German Vocabulary Test) 205 206 207 8 p=.032 208 209 210 S-Table 2. Priors of parameters. The decision parameter of the observation model ο’ was estimated in log-space as well as the parameters ο and h of Sutton K1; parameters ο‘ and ο« of the learning algorithms were estimated in logit-space. Prior Mean Prior Variance Observation Model for all learning models Softmax ο’ 1 16 Learning models SU ο« 0 0 ο‘ .55 1 SU-WL ο« ο‘(w/l) 0 0 .5 / .6 1/1 DU ο« 1 0 ο‘ .25 1 DU-WL ο« ο‘(w/l) 1 0 .4 / .1 1/1 iDU ο« .1 1 ο‘ .55 1 iDU-WL ο« ο‘(w/l) .1 1 .55 / .45 1 Sutton-K1 ο 1 1 h .005 1 211 212 213 214 215 216 S-Table 3. Model Selection: Expected Posterior Probabilities (PP) and Exceedance Probabilities (XP) for all models. SU=single update, DU=double update, iDU=inidividuallyweighted double update, WL indicates that the model had separate learning rates for wins and losses All (n=44) HC (n=22) BED (n=22) SU SU-WL DU DU-WL iDU iDU-WL Sutton K-1 PP XP PP .160 .054 .130 .072 .001 .073 .052 <.001 .060 .150 .039 .180 .271 601 .241 .231 .303 .280 066 <.001 .040 XP .035 .005 .003 .113 .321 .522 <.001 PP .175 .117 .096 .099 .219 .158 .137 XP .210 .058 .030 .033 .426 .150 094 217 9 218 219 220 S-Table 4. Descriptive Statistics of parameters for iDU, the best-fitting model across HC and BED. The index c index indicates the learning rate for chosen stimulus. M=mean, SD=standard deviation. Healthy Controls (n=22) Binge Eating Patients (n=22) ο’ M=6.13, SD=2.67 ο‘c M=0.54, SD=0.19 ο«*ο‘c M=0.07, SD=0.04 M=4.17, SD=2.50 M=0.49, SD=0.24 M=0.07, SD=0.06 221 222 223 224 S-Table 5. Distribution of inferred parameters for iDU, the best-fitting model across HC and BED ο’ ο‘c ο«*ο‘c 25th Percentile 2.78 .37 .03 50th Percentile 4.58 .56 .07 75th Percentile 7.73 .67 .09 225 226 10 227 228 229 S-Table 6: Effects of Single-Update, Double-Update RPE (only clusters k=>10 are listed for clarity) and conjoint activation of both. Region MNI coordinates cluster size T p (FWE) 6.18 <0.001 5.98 <0.001 28 -4 4 6.58 <0.001 30 -12 4 6.22 0.001 Single-Update RPE 8 12 -10 Caudate Putamen 14 6 -14 30 6 4 76 101 -12 6 -14 5.65 0.009 6.90 6.63 <0.001 <0.001 Caudate/Amygdala -26 -4 -18 141 6.07 <0.001 Hippocampus 28 -18 -18 12 5.65 0.009 -18 -6 -18 6.36 0.001 Hippocampus 26 -6 -20 18 -8 -18 24 5.98 0.002 Inferior Frontal Gyrus -34 36 -16 18 5.58 0.001 5.86 0.004 5.89 0.004 6.20 0.001 5.20 0.042 5.90 0.003 5.41 0.020 6.02 0.002 5.19 0.043 30 36 -14 Inferior Frontal Gyrus 26 30 -18 42 -14 60 -2 Superior Medial Gyrus -14 66 -6 Middle Orbital Gyrus -6 50 -14 Posterior Cingulate Cortex -4 -36 38 24 -6 60 -8 19 6 -48 30 67 6.36 <0.001 Middle Temporal Gyrus -60 -40 -12 -62 -32 -14 38 5.66 0.012 Inferior Temporal Gyrus -56 -58 -8 14 5.34 Cerebellum -40 -70 -42 7.29 0.004 <0.001 -40 -78 -30 6.58 <0.001 6.05 <0.001 6.81 <0.001 6.34 0.001 -28 -86 -28 204 44 -74 -32 Cerebellum 44 -68 -42 101 Double-Update RPE Caudate -4 18 -8 13 6.27 <0.001 Angular Gyrus -40 -68 30 24 5.79 <0.001 Middle Orbital Gyrus 2 50 -12 16 5.60 <0.001 Middle Orbital Gyrus -8 54 -2 19 5.62 0.009 Conjunction Single-Update and Double-Update RPE Middle Orbital Gyrus -6 52 -12 3 5.27 0.033 Inferior Frontal Gyrus 32 36 -12 1 5.79 0.040 11 230 231 S-Table 7. Activation in Exploratory vs. Exploitative Trials (F-contrast) Main Effect: Exploitation vs. Exploration MNI Cluster Region coordinate size F anterior Insula/vlPFC p(FWEcorr) -28 26 -2 41 59.95 <0.001 anterior Insula/vlPFC 32 24 -8 17 53.34 <0.001 dmPFC -4 16 46 7 41.22 <0.001 anterior Insula/vlPFC 44 22 -10 5 38.87 <0.001 anterior Insula/vlPFC -32 20 -8 2 37.77 <0.001 12 232 233 Supplemental Figures 234 S-Figure 1. For the ventro-medial prefrontal cortex, the a priori region of interest, an 235 anatomical search volume was defined according to criteria described in Rushworth et al. 236 (2011) comprising the superior medial frontal gyrus and the medial orbitofrontal gyrus 237 (Rushworth et al, 2011), based on anatomical labeling (Tzourio-Mazoyer et al, 2002), 238 truncated dorsally at MNI z=+10 (also compare Bartra et al, 2013). Blue numbers denote 239 axial slices and the region of interest is depicted in red (21760mm3 corresponding to 2720 240 voxels based on an isotropic voxel size of 2mm). 241 242 243 244 S-Figure 2. Correct choices as a function of phase and group. 245 13
© Copyright 2026 Paperzz