The algorithmic neuroanatomy of action-outcome learning Supplemental Material We performed simulations with generative versions of each model (prediction-error, MBRL and Kalman algorithm) to determine whether each model could independently reproduce the selective effect on action-selection of outcome-specific contingency degradation. Simulations Each simulation occurred across 120 time points to mimic the 120 s block duration in the task. Two outcomes (O1,O2) were generated with equal conditional probabilities by each action, P(O1|A1) = 0.2 and P(O2|A2) = 0.2, and one outcome (O1) was generated in the absence of both actions, P(O1|~A1,~A2) = 0.2. In this manner we arranged ∆PA1O1 = 0.2 and ∆PA2O2 = 0, and ensured that the contingent action would not result in more outcomes, which mimics the experimental contingencies used in the task. Each model was able to generate two actions (A1,A2) as well as a third action (waiting) in order to mimic the option of waiting in the experiment. We repeated each simulation 100 times and present the results in Supplementary Figure 2. The colorbars in this figure indicate the difference in learned action values (contingent – degraded), and the difference in predicted performance (contingent – degraded), across parameter space. The results demonstrate that both the prediction-error model and the Kalman algorithm were able to discriminate the causal action while the MBRL model was not. The learned values of the causal action converged to ∆P = 0.2 in the prediction-error model and the Kalman algorithm (bright yellow in Figure S2A and S2B), and in a corresponding fashion the probability of the causal action increased (light blue to bright yellow). By contrast, the action values of the MBRL, which were based on the covariance between each action and outcome, tended to converge to the positive contingencies, P(O1|A1) and P(O2|A2). Furthermore, the value of waiting competed equally with both actions in this model, so the model did not select the most causal action. These simulation results were not substantially dependent on the learning rate parameters chosen; the differences appeared across a wide range of parameters (Supplementary Figure 2). Model fitting Causal Induction. In order compare our model fitting results against computational models of causal induction, we adapted the Causal Support model described by Griffiths & Tenenbaum (2005) and provided as a Matlab function on the website (http://web.mit.edu/cocosci/Papers/support.zip). While this model is usually applied to summary data, we adapted it to our time-series analysis by keeping a running tally of observations required to assess contingencies, and applying the model after each accumulation. This model compared a causal structure (graph 0) where the outcome is independent from the action (and so caused solely by the background), with an alternate structure (graph 1) where both the action and background cause the outcome. The evidence for each action (“causal support”) is therefore taken as the log likelihood ratio of P(D|Graph 1) over P(D|Graph 0). Because solution to the Bayes integral is intractable, the P(D|Graph 1) and P(D|Graph 0) terms were approximated by Monte Carlo simulation using 10,000 samples in the supportsampler Matlab script. The results are presented in Supplementary Table 1 for comparison with the null model (see main text). fMRI. For the purpose of generating model-predicted time series for fMRI regression analysis, ∆AO and ∆XO values (taken from ∆µ) and ∆C values were generated by the Kalman algorithm using parameters (v, k and 𝝉) set to the maximum-likelihood estimate over the whole-group (Daw, 2011). The group estimates were v = 8.6 and 𝝉 = 11.74, and 30 simulations (Supplementary Table 2), representing N, determined these parameters could be accurately recovered from choice data, with mean (SEM) 8.48 (1.19) and 11.76 (0.74) for v and 𝝉, respectively. Sample size estimate We conducted a fMRI-based power analysis (Mumford & Nichols, 2008) of the mPFC effects in Experiment 1 to determine a sufficient sample size to detect the same effects in a new sample. We selected the smallest effect in the mPFC reported in Experiment 1: ∆AO response in BA9 (Z = 4.71). The power analysis was conducted on the BA9 fROI, with a p-value threshold of 0.05 for a 1-sided hypothesis test. The analysis revealed that N > 20 would provide more than 95% power (Supplementary Figure 3). References Daw, N. D. (2011). Trial-by-trial data analysis using computational models. In M. R. Delgado, E. A. Phelps, & T. W. Robbins (Eds.), Decision making, affect, and learning: Attention and performance XXIII (Vol. 23, pp. 3-38). New York: Oxford University Press. Mumford, J. A. & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. Neuroimage, 39(1), 261-268. doi:10.1016/j.neuroimage.2007.07.061 Supplementary Table 1. Model evidence and comparison scores for Causal induction Free parameters Negative log likelihood LRT X2 LRT p Pseudo-R2 Relative Bayes Factor (H1 – H0) No. favoring H1/H0 model Causal induction k and 𝝉 16,885 𝛸21 = 4,040 < 1E–30 0.11 -1981 11/19 Null model (H0) 𝝉 15,992 𝛸21 = 5,826 < 1E–30 0.13 Legend: The optimal model evidence (aggregate negative log likelihood scores), model significant differences from chance (Likelihood ratio test, LRT), model fit (Psuedo-R2), as well as Bayesian model comparisons to the informed model (H0). k is the AO delay temporal threshold, and 𝝉 is inverse temperature (exploitation/exploration). Supplementary Table 2. Recovered Kalman parameters from 30 simulations v 𝝉 1 17.67 23.21 2 11.45 14.1 3 15.41 12.8 4 6.9 12.58 5 12.08 13.56 6 3.58 13.37 7 2.89 6.59 8 9.09 12.35 9 2.19 5.74 10 0.01 1.57 11 6.45 10.88 12 14.68 15 13 7.82 13.23 14 10.18 11.91 15 1.69 8.89 16 6.16 13.48 17 3.27 10.78 18 9.65 12.53 19 3.4 10.9 20 6.27 8.38 21 0 4.2 22 7.14 12.89 23 25 17.98 24 1.56 11.82 25 15.37 14.48 26 25 9.62 27 8.22 11.41 28 2.27 9.76 29 7.84 14.58 30 11.24 14.24
© Copyright 2026 Paperzz