The algorithmic neuroanatomy of action-outcome learning

The algorithmic neuroanatomy of action-outcome learning
Supplemental Material
We performed simulations with generative versions of each model (prediction-error, MBRL and
Kalman algorithm) to determine whether each model could independently reproduce the selective
effect on action-selection of outcome-specific contingency degradation.
Simulations
Each simulation occurred across 120 time points to mimic the 120 s block duration in the task. Two
outcomes (O1,O2) were generated with equal conditional probabilities by each action, P(O1|A1) =
0.2 and P(O2|A2) = 0.2, and one outcome (O1) was generated in the absence of both actions,
P(O1|~A1,~A2) = 0.2. In this manner we arranged ∆PA1O1 = 0.2 and ∆PA2O2 = 0, and ensured that the
contingent action would not result in more outcomes, which mimics the experimental contingencies
used in the task. Each model was able to generate two actions (A1,A2) as well as a third action
(waiting) in order to mimic the option of waiting in the experiment. We repeated each simulation
100 times and present the results in Supplementary Figure 2. The colorbars in this figure indicate the
difference in learned action values (contingent – degraded), and the difference in predicted
performance (contingent – degraded), across parameter space.
The results demonstrate that both the prediction-error model and the Kalman algorithm were able
to discriminate the causal action while the MBRL model was not. The learned values of the causal
action converged to ∆P = 0.2 in the prediction-error model and the Kalman algorithm (bright yellow
in Figure S2A and S2B), and in a corresponding fashion the probability of the causal action increased
(light blue to bright yellow). By contrast, the action values of the MBRL, which were based on the
covariance between each action and outcome, tended to converge to the positive contingencies,
P(O1|A1) and P(O2|A2). Furthermore, the value of waiting competed equally with both actions in
this model, so the model did not select the most causal action. These simulation results were not
substantially dependent on the learning rate parameters chosen; the differences appeared across a
wide range of parameters (Supplementary Figure 2).
Model fitting
Causal Induction. In order compare our model fitting results against computational models of causal
induction, we adapted the Causal Support model described by Griffiths & Tenenbaum (2005) and
provided as a Matlab function on the website (http://web.mit.edu/cocosci/Papers/support.zip).
While this model is usually applied to summary data, we adapted it to our time-series analysis by
keeping a running tally of observations required to assess contingencies, and applying the model
after each accumulation. This model compared a causal structure (graph 0) where the outcome is
independent from the action (and so caused solely by the background), with an alternate structure
(graph 1) where both the action and background cause the outcome. The evidence for each action
(“causal support”) is therefore taken as the log likelihood ratio of P(D|Graph 1) over P(D|Graph 0).
Because solution to the Bayes integral is intractable, the P(D|Graph 1) and P(D|Graph 0) terms were
approximated by Monte Carlo simulation using 10,000 samples in the supportsampler Matlab script.
The results are presented in Supplementary Table 1 for comparison with the null model (see main
text).
fMRI. For the purpose of generating model-predicted time series for fMRI regression analysis, ∆AO
and ∆XO values (taken from ∆µ) and ∆C values were generated by the Kalman algorithm using
parameters (v, k and 𝝉) set to the maximum-likelihood estimate over the whole-group (Daw, 2011).
The group estimates were v = 8.6 and 𝝉 = 11.74, and 30 simulations (Supplementary Table 2),
representing N, determined these parameters could be accurately recovered from choice data, with
mean (SEM) 8.48 (1.19) and 11.76 (0.74) for v and 𝝉, respectively.
Sample size estimate
We conducted a fMRI-based power analysis (Mumford & Nichols, 2008) of the mPFC effects in
Experiment 1 to determine a sufficient sample size to detect the same effects in a new sample. We
selected the smallest effect in the mPFC reported in Experiment 1: ∆AO response in BA9 (Z = 4.71).
The power analysis was conducted on the BA9 fROI, with a p-value threshold of 0.05 for a 1-sided
hypothesis test. The analysis revealed that N > 20 would provide more than 95% power
(Supplementary Figure 3).
References
Daw, N. D. (2011). Trial-by-trial data analysis using computational models. In M. R. Delgado, E. A.
Phelps, & T. W. Robbins (Eds.), Decision making, affect, and learning: Attention and
performance XXIII (Vol. 23, pp. 3-38). New York: Oxford University Press.
Mumford, J. A. & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for
arbitrary design and temporal autocorrelation. Neuroimage, 39(1), 261-268.
doi:10.1016/j.neuroimage.2007.07.061
Supplementary Table 1. Model evidence and comparison scores for Causal induction
Free
parameters
Negative log
likelihood
LRT X2
LRT p
Pseudo-R2
Relative
Bayes
Factor (H1
– H0)
No.
favoring
H1/H0
model
Causal
induction
k and 𝝉
16,885
𝛸21 = 4,040
< 1E–30
0.11
-1981
11/19
Null model
(H0)
𝝉
15,992
𝛸21 = 5,826
< 1E–30
0.13
Legend: The optimal model evidence (aggregate negative log likelihood scores), model significant
differences from chance (Likelihood ratio test, LRT), model fit (Psuedo-R2), as well as Bayesian model
comparisons to the informed model (H0). k is the AO delay temporal threshold, and 𝝉 is inverse
temperature (exploitation/exploration).
Supplementary Table 2. Recovered Kalman parameters from 30 simulations
v
𝝉
1
17.67
23.21
2
11.45
14.1
3
15.41
12.8
4
6.9
12.58
5
12.08
13.56
6
3.58
13.37
7
2.89
6.59
8
9.09
12.35
9
2.19
5.74
10
0.01
1.57
11
6.45
10.88
12
14.68
15
13
7.82
13.23
14
10.18
11.91
15
1.69
8.89
16
6.16
13.48
17
3.27
10.78
18
9.65
12.53
19
3.4
10.9
20
6.27
8.38
21
0
4.2
22
7.14
12.89
23
25
17.98
24
1.56
11.82
25
15.37
14.48
26
25
9.62
27
8.22
11.41
28
2.27
9.76
29
7.84
14.58
30
11.24
14.24