Intelligence moderates neural responses to monetary reward and

J Neurophysiol 111: 1823–1832, 2014.
First published February 12, 2014; doi:10.1152/jn.00393.2013.
Intelligence moderates neural responses to monetary reward and punishment
Daniel R. Hawes,1 Colin G. DeYoung,2 Jeremy R. Gray,3 and Aldo Rustichini4
1
Department of Applied Economics, University of Minnesota, St. Paul, Minnesota; 2Department of Psychology, University of
Minnesota, Minneapolis, Minnesota; 3Department of Psychology, Michigan State University, East Lansing, Michigan; and
4
Department of Economics, University of Minnesota, Minneapolis, Minnesota
Submitted 31 May 2013; accepted in final form 6 February 2014
intelligence; reinforcement learning; risk; decision making; reward;
punishment
in intelligence are linked to systematic
differences in preferences and choice behavior during laboratory experiments (Benjamin et al. 2005; Burks et al. 2009;
Rustichini 2009; Shamosh et al. 2008; Shamosh and Gray
2008). Furthermore, measures of intelligence (IQ) correlate
with important life outcomes pertaining to educational achievement, job performance, wealth, and health status (Deary et al.
2007; Gottfredson 1997; Gottfredson and Deary 2004; Lawlor
et al. 2006). Importantly for modern theories of decision
making, differences in IQ consistently predict parameters describing individuals’ preferences with respect to risk and temporally delayed rewards (Burks et al. 2009; Rustichini et al.
2009; Shamosh et al. 2008; Shamosh and Gray 2008), suggesting that these fundamental decision parameters may be critically influenced by common neurobiological mechanisms related to intelligence. An important step toward understanding
INDIVIDUAL DIFFERENCES
Address for reprint requests and other correspondence: D. R. Hawes,
Department of Applied Economics, University of Minnesota, 213 Ruttan Hall,
1994 Buford Avenue, St. Paul, MN 55108 (e-mail: [email protected]).
www.jn.org
the functional role of intelligence in decision making is to
investigate the existence of a link between IQ and the basic
experience of rewarding and punishing outcomes of decisions.
Finding such a link would open the possibility that the relation
of IQ to preferences may be at least partly rooted in how
decision outcomes are experienced ex post, rather than being
limited to how options are evaluated ex ante.
The striatum and medial prefrontal cortex are essential
structures for human reward processing, and figure centrally in
the brain systems that mediate goal directed behavior and
experience-based learning (Balleine et al. 2007; Daw et al.
2011; Delgado 2007; Hawes et al. 2012; Schönberg et al.
2007), making these structures natural regions of interest for
investigating the influence of intelligence on decision making
and preferences. The present study focused on brain responses
in the striatum and medial prefrontal cortex during reward
processing to establish neural feasibility of a functional link
between intelligence and gain and loss processing. To do so,
we evaluated 94 subjects’ behavior and neural responses within
a paradigm very similar to a task developed by Delgado et al.
(2000). In our version of this well-known task, participants
guessed whether a computer-generated number would be high
or low and received monetary gains and punishments depending on whether their guesses were correct or incorrect. To
minimize differences in reward responses due to randomization
or systematic differences in guessing success, we experimentally manipulated the task so as to present each subject with the
same pseudorandom sequence of gains and losses. That is, the
computer-generated number was produced after the subject’s
guess and was chosen such that each subject would experience
the same sequence of gains and losses. Hence, our design
eliminated the opportunity for subjects to experience different
performance histories and thus minimized any potential concern that between-subject variance in choice behavior and
outcome evaluation could be caused by differences in the
history of obtained rewards. Instead, remaining individual
differences in responses to gains and losses during our task
were restricted to differences in preferences for reward/punishment or to individual differences in how the outcome of
reward/punishment is experienced.
Our analysis was primarily designed to assess whether an
association of IQ with neural responses to rewards and punishments exists. Toward this aim we focused attention on the
influence of intelligence on feedback processing in the rostral
part of the caudate. This region was chosen a priori because of
its joint implication for reward/punishment processing, reinforcement learning, and decision making in previous studies
with similar task design (Delgado et al. 2000; Li et al. 2011;
van den Bos et al. 2012a). Additionally, the caudate is identi-
0022-3077/14 Copyright © 2014 the American Physiological Society
1823
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
Hawes DR, DeYoung CG, Gray JR, Rustichini A. Intelligence
moderates neural responses to monetary reward and punishment. J
Neurophysiol 111: 1823–1832, 2014. First published February 12,
2014; doi:10.1152/jn.00393.2013.—The relations between intelligence (IQ) and neural responses to monetary gains and losses were
investigated in a simple decision task. In 94 healthy adults, typical
responses of striatal blood oxygen level-dependent (BOLD) signal
after monetary reward and punishment were weaker for subjects with
higher IQ. IQ-moderated differential responses to gains and losses
were also found for regions in the medial prefrontal cortex, posterior
cingulate cortex, and left inferior frontal cortex. These regions have
previously been identified with the subjective utility of monetary
outcomes. Analysis of subjects’ behavior revealed a correlation between IQ and the extent to which choices were related to experienced
decision outcomes in preceding trials. Specifically, higher IQ predicted behavior to be more strongly correlated with an extended
period of previously experienced decision outcomes, whereas lower
IQ predicted behavior to be correlated exclusively to the most recent
decision outcomes. We link these behavioral and imaging findings to
a theoretical model capable of describing a role for intelligence during
the evaluation of rewards generated by unknown probabilistic processes. Our results demonstrate neural differences in how people of
different intelligence respond to experienced monetary rewards and
punishments. Our theoretical discussion offers a functional description for how these individual differences may be linked to choice
behavior. Together, our results and model support the hypothesis that
observed correlations between intelligence and preferences may be
rooted in the way decision outcomes are experienced ex post, rather
than deriving exclusively from how choices are evaluated ex ante.
1824
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
METHODS
The research was approved and conducted in accordance with the
stipulations of the Yale University Institutional Review Board.
Participants
We collected data from 114 male, right-handed subjects. Subjects
were recruited from Yale University (n ⫽ 25) and the surrounding
community by distribution of fliers and Internet advertisements. We
performed functional magnetic resonance imaging (fMRI) preprocessing on 104 subjects, leaving out 10 subjects who exhibited highly
irregular responses in parts of the experimental battery (e.g., reporting
having fallen asleep during the scan). Of the 104 subjects, data were
discarded for 6 subjects because of excessive head motion in the scanner
and for 4 subjects because of poor quality of obtained structural images.
These exclusions left 94 participants in our analysis. Subjects’ average IQ
was 122.9 (minimum: 95.5, maximum 148.0, SD: 11.6). Median age of
our subjects was 22 yr (minimum: 18 yr, maximum: 38 yr). The sample
was selected to be all male for the purposes of genetic research unrelated
to the present study (Shehzad et al. 2012).
Measures
Subjects were administered the Wechsler Abbreviated Scale of
Intelligence (WASI; Wechsler 1999), which provides an estimate of
full-scale IQ using four subtests (vocabulary, similarities, block design, and matrix reasoning). Subjects further completed a battery of
questionnaires and cognitive tasks that included an n-back working
memory task. In the working memory task, subjects viewed a series of
words and judged whether each word matched the one appearing three
previously. Performance on this task (d-prime) was used as an
indicator of working memory.
fMRI Procedures
Subjects performed the task described below and three additional
unrelated tasks for a scanning time of 1.25 h. Imaging data were
collected using a 3-Tesla Siemens Trio scanner at the Yale Magnetic
Resonance Research Center. For each participant, a high-resolution
T1-weighted anatomic image [MPRAGE; time repetition (TR) ⫽
2,500 ms; time echo (TE) ⫽ 3.34 ms; inversion time ⫽ 1,100 ms; flip
angle ⫽ 7°; slices ⫽ 256, voxel size ⫽ 1 ⫻ 1 ⫻ 1 mm] and 180
contiguous functional volumes [gradient-echo EPI sequence; TR ⫽
2,000 ms; TE ⫽ 25 ms; field of view (FOV) ⫽ 240 cm; flip angle ⫽
80°; voxel size ⫽ 3.75 ⫻ 3.75 ⫻ 4 mm] were acquired. Participants
viewed stimuli projected onto a screen through a mirror mounted on
the head coil. Responses were made via fiber-optic response buttons
using the fingers of the right hand. Stimuli were presented in
PsyScope (Cohen et al. 1993). Caudate volume was calculated in
Freesurfer using the “asegstats2table” command in its default settings
(Fischl and Dale 2000).
Stimuli and Design
Subjects were instructed to guess whether an upcoming computer
generated number would be either Low (in the set 1, 2, 3) or High (4,
5, 6). Subjects received a reward of $2 for each correct guess and a
punishment of ⫺$1 for each incorrect guess. Forty of these rewardrelevant trials were interspersed with 20 reward-neutral control trials,
during which subjects were also instructed to press a button but for which
they received neither feedback nor monetary rewards/punishments.
As depicted in Fig. 1, during reward-relevant trials subjects first
saw a one-dollar bill displayed on the screen for 3 s. During this time
subjects indicated their guess regarding the upcoming number. After
3 s, subjects first saw a computer-generated number for 1 s and then,
depending on trial type, a green upward or a red downward arrow
containing the words “you win” or “you lose” for another 1 s. All
trials were separated by fixation periods of 3, 5, or 7 s in duration
(jittered). Reward-neutral trials started with a 3-s display of a gray
rectangle of the same size as the dollar bill, followed by an asterisk for
1 s and a blue rectangle containing the word “same” for another 1 s.
Each subject saw the same sequence of gain and loss trials (20
gains and 20 losses in total) in the same order: the computer responded to the subject’s guess in each trial by presenting a high or low
number to match the predetermined outcome of the fixed trial sequence. Subjects were unaware that the computer’s number-generating process was fixed in this way and received instructions only about
the mechanics and incentives of the task. Subjects were not instructed
about the underlying reward-generating process. Therefore, reward-motivated subjects had a monetary incentive to try to discover possible
patterns or regularities in the computer’s number-generating behavior.
fMRI Data Analysis
All data were preprocessed and analyzed using FSL version 4.1.9
(Jenkinson et al. 2012). Motion correction was performed using
MCFLIRT. T1-weighted anatomic images were registered to MNI
space using 12 degrees of transformational freedom as the FSL
registration defaults. Functional data were preprocessed by applying
slice time correction, spatial smoothing (using a 7-mm Gaussian
kernel), linear trend removal, and temporal high-pass filtering (using
FSL default settings). Subjects were eliminated if motion correction
indicated deviations in the estimated center of mass ⬎3 mm, leading
to the elimination of 6 subjects.
For statistical analysis, we computed a general linear model (GLM)
on the blood oxygen level-dependent (BOLD) signal time course of
94 subjects. Our model contained four predictors, specified as a
zero-one box car-shaped variable, 2 s in duration. One of these
predictors matched the onset of relevant trials (Rel); another predictor
matched the onset of control trials (Ctrl). The remaining two predictors indicated the valence of feedback as positive (FB⫹) or negative
(FB⫺) and coincided with the onset of each type of feedback stimulus,
respectively. Predictors were convolved with a double gamma function
estimate of the hemodynamic response. The model also contained the
motion correction parameters (MCP) obtained during preprocessing.
The final GLM was given by
BOLD ⫽ a ⫹ b ⫻ Rel ⫹ c ⫻ Ctrl ⫹ d ⫻ FB⫹ ⫹ e ⫻ FB⫺
⫹ X' ⫻ MCP ⫹ error.
Significant differences in gain vs. loss responses were identified
according to t-tests performed on the whole brain contrast FB⫹–FB⫺.
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
fied as the only subcortical structure for which anatomic
volume is correlated with IQ (Grazioplene RG, Ryman S, Grey
Jeremy R, Rustichini A, Jung RE, and DeYoung CG, unpublished observations). Our main investigation entailed extracting the strength of neural responses to gains and losses in the
caudate and demonstrating their statistical association with IQ.
Following our finding of these associations, we utilized
more exploratory analysis of the behavioral data to investigate
a potential theoretical model that might explain our findings
and that might be tested more extensively in future research. In
particular we considered the relation of our results to known
reinforcement learning functions of the regions of interest and
considered the relation of IQ to how subjects may learn reward
associations in a task like ours. Because the number of trials in
our task was insufficient for fitting a reinforcement learning
model for each subject individually, the suggested model is
ultimately not critically tested by our data and should therefore
be understood as an additional interpretational aid, complementing the central correlational analysis of this research.
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
1825
Using FSL defaults for computing significance of contingent clusters,
based on the number of voxels and the smoothness of the data, we set
a cluster-forming threshold at z ⬎ 7.5. Voxels within these clusters
were significant at P ⬍ 0.01.
Regression weights for the GLM were extracted separately for each
subject and then correlated with IQ scores. Subject-specific regression
weights were obtained by performing the above GLM on voxels
falling into an anatomic template of the caudate head according to the
Talairach atlas (Talairach and Tournoux 1988). Importantly, we
performed individual differences analysis on this a priori anatomically
defined region of interest (ROI), rather than the functionally identified
region that showed the strongest contrast between gain and loss
feedback. Use of a priori ROI for the investigation of individual
differences is preferable to investigation of ROIs showing the strongest contrast because important influences of individual differences
may often imply small main (i.e., group average) effects. Alternatively, identifying ROIs based on the presence of individual-difference effects renders any subsequent test of individual differences
based on that ROI nonindependent (Vul et al. 2009).
We performed additional exploratory analysis on three functionally
identified ROIs that showed significant activation for gains compared
with losses in our task. These regions comprised ROIs in the ventromedial prefrontal cortex [vmPFC; 575 contiguous voxels, maximum
z-stat at 4, 50, ⫺2 (MNI)], the posterior cingulate cortex [pCC; 257
contiguous voxels, maximum z-stat at 0, ⫺32, 32 (MNI)], and also an
ROI in the left inferior frontal lobe [liPFC; 23 contiguous voxels,
maximum z-stat at ⫺34, 54, ⫺8 (MNI)]. The vmPFC and pCC both
have been linked to the subjective utility of monetary outcomes of
risky choices (Wu et al. 2011), and the liPFC has been associated with
neural representations of loss aversion (Tom et al. 2007).
Additional event-related averaging of BOLD signal was performed
for the depiction of BOLD time courses in RESULTS. Event-related
averages were computed relative to average BOLD at trial onset.
Regression Analysis
Regressions in Tables 2–5 were performed using the statistical
programming software R. Regressions of behavioral data were computed using mixed-effects linear logistic regression with coefficients
conditioned at the subject level, where appropriate (Tables 3–5), using
the general linear modeling packages lme4 (Bates et al. 2012). The
regressions reported in Table 2 were performed using robust regression to control for potential influence of outliers using the modeling
package robust (Wang et al. 2008).
RESULTS
Brain Imaging Results
Striatum. Averaged across all subjects, and ignoring for the
moment the effects of IQ, neuroimaging results for our large
sample replicate and extend findings reported by Delgado et al.
(2000) on a sample of 9 subjects. We found increased BOLD
response in the caudate at the onset of reward relevant trials,
and this BOLD response remained elevated after revelation of
gains but decreased steeply below baseline after revelation of
losses. Figure 2 shows this pattern of BOLD response for the
anatomically identified caudate head. Brain regions showing
significantly more activation after gains than after losses are
shown in Fig. 3 and listed in Table 1. No brain areas showed
significantly larger activation after losses compared with gains.
We compared subject-specific responses to gains and losses
by obtaining each subjects’ predictor of percentage BOLD
change following rewarding (FB⫹) and punishing (FB⫺) feedback using the regression model described in METHODS. For the
anatomically defined caudate, the difference relation between
gain and loss responses and subject IQ is illustrated in Fig. 2,
D and E, as well as in the regressions of Table 2.
For the caudate [468 contiguous voxels, barycenter: 10, 4,
⫺4 (MNI)], percentage BOLD change after losses is positively
predicted by larger IQ (i.e., the effect is less negative with
higher IQ) (␤ ⫽ 0.87, P ⫽ 0.03). Controlling for caudate
volume, age, and performance in the working memory task
does not significantly alter the coefficient obtained for IQ. The
equivalent regressions for the BOLD predictor of gains (FB⫹)
showed no significant correlation between IQ and gain responses (P ⫽ 0.26). The simple correlation between IQ and the
neural loss response is significant at r ⫽ 0.21 (P ⫽ 0.047). The
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
Fig. 1. Task design. Subjects engaged in 40 reward
relevant trials. At the beginning of these trials a U.S.
dollar bill was displayed for 3 s, during which subjects
pressed 1 of 2 buttons indicating their guess as to
whether a computer-generated number would be Low
(1–3) or High (4 – 6). Guesses were followed by 2
feedback screens, each 1 s in duration. Reward relevant
trials were interspersed with 20 reward-neutral trials,
which were signaled by a gray rectangle. In the image
shown (but not in the task itself), a green outline marks
the sequence of screens seen during gain trials, a red
outline marks the sequence of screens for loss trials, and
a gray outline marks the sequence of screens for rewardneutral control trials. Correct or incorrect outcomes
were rewarded or punished with $2 or ⫺1, respectively.
1826
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
B
% BOLD from global baseline
% BOLD from global baseline
A
0.1
0.0
Condition
Loss
Win
−0.1
0.2
0.1
0.0
Condition
Loss (high IQ)
Loss (low IQ)
−0.1
Win (high IQ)
Win (low IQ)
0
3
5
8
0
3
C
y = 13
x=8
D
8
z=2
GLM coefficient for BOLD response to Gains (FB+)
E
100
GLM coefficient for BOLD response to Losses (FB−)
100
%BOLD change
%BOLD change
5
Time since trial onset (in seconds)
0
−100
0
−100
100
110
120
130
140
150
100
110
IQ
120
130
140
150
IQ
Fig. 2. Blood oxygen level-dependent (BOLD) response is FB⫹ ⬎ FB⫺ (gain ⬎ loss) in the bilateral caudate. BOLD response in caudate after gains is
significantly higher than after losses. This results holds for whole brain analysis as well as a masked regression on the anatomically defined caudate (shown).
A: event-related average BOLD response in caudate replicates the time course identified by Delgado et al. (2000). B: separate event-related average BOLD for
47 highest IQ and 47 lowest IQ subjects demonstrates that BOLD decrease after losses is less pronounced for higher IQ subjects. C: the mask used for analysis
of caudate. D: general linear model (GLM) regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.13, P ⫽ 0.185). E: GLM regression
coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.20, P ⫽ 0.047).
effect of IQ on gain and loss responses is further illustrated in
Fig. 2, A and B, which shows average BOLD time course for
the 47 highest and 47 lowest IQ subjects of our sample. The
results indicate reduced differential responses to monetary
outcomes for higher IQ, driven in particular by the subdued
response after losses.
vmPFC and liPFC. We performed the same analysis on
BOLD responses after gains and losses for the vmPFC, pCC,
and left inferior/middle frontal gyrus (liPFC). For the
vmPFC, we found a significant relation between intelligence
and the BOLD responses to gains as well as losses: r ⫽ 0.36
(P ⬍ 0.01) for losses, and r ⫽ 0.30 (P ⬍ 0.01) for gains. For
the liPFC, the correlation with the loss predictor is r ⫽ 0.29
(P ⬍ 0.01), and that for gains is r ⫽ 0.31 (P ⬍ 0.01). A
marginally significant correlation was found for loss responses in the pCC with r ⫽ 0.20 (P ⫽ 0.051), but not for
gain responses: r ⫽ 0.11 (P ⫽ 0.27). These results are
further illustrated in Figs. 4, 5, and 6.
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
Time since trial onset (in seconds)
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
1827
Fig. 3. Regions of interest showing significantly higher activation for gains than losses in whole brain analysis. Tal, Talairach coordinates.
Model
Our task is analog to a repeated, simultaneous choice,
matching pennies game between a participant and a computer
opponent. The participant chooses between two options (l and
r), while the opponent chooses between L and R.
Suppose the opponent (the environment, or any rewardgenerating process) is choosing L with a fixed probability p
independently in every period, and the participant believes that
the choice of L is fixed in every period and independent, but
occurs with probability q, possibly different from p.
Table 1. Regions of interest differentiating between gain and loss
feedback
Region Identified by
FB⫹ ⬎ FB⫺
Ventromedial prefrontal cortex
Posterior cingulate cortex
Caudate (bilateral)
Left medial/inferior prefrontal
cortex
Peak (X, Y, Z)
No. of
Voxels
Average z
Statistic
43
45
51
88
47
67
4
52
33
575
257
2160
7.54
7.46
5.64
61
91
33
23
7.21
Regions with significant activation for the gain feedback (FB⫹) ⬎ loss
feedback (FB⫺) contrast in whole brain analysis (cluster correction z ⬎7.5,
P ⬍ 0.01).
Prediction error is, as usual, the difference between the
realized reward (according to p) and the participant’s subjective expectation (according to q, when the action is the optimal
one with respect to q) of the reward. For example, if q ⬎ ½, the
individual thinks that L is more likely, so he will then choose
l all the time and have a q-expected payoff 2q ⫺ 1. The
prediction error is 1 ⫺ (2q ⫺ 1) with probability p and ⫺1 ⫺
(2q ⫺ 1) with probability 1 ⫺ p.
In this simple case, it is easy to see that the expected (with
respect to the true probability p) prediction error of a participant with belief q has some basic properties, which hold more
generally: If the belief is correct, that is, q ⫽ p, then the
expected prediction error is zero. If individuals learn about the
true distribution by Bayesian updating and those with higher
IQ are willing to consider a larger set of initial values, then
they are more likely to hold a correct belief and thus are more
likely to have a zero prediction error.
When the behavior of the opponent is partially predictable
(that is, p is not ½), then the expected prediction error can be
positive or negative. For example, if p ⬎ ½ and q ⬎ ½ (so the
action chosen by the individual is the correct one), the p-expected prediction error is 2(p ⫺ q). This is positive when p is
larger than q because the individual chooses the right action all
the times (he is on the right side of ½), but he is guessing right
more times than he thinks.
The case closest to the environment in our experiment is that
where the behavior of the opponent is unpredictable, that is,
Table 2. Impact of IQ on caudate BOLD signal after losses
Model 1 (b/SE)
Model 2 (b/SE)
Model 3 (b/SE)
Dependent variable: FB⫺
⫺115.6*
(51.0)
0.99*
(0.41)
Intercept
IQ
Working memory,
d-prime
⫺114.6*
(56.7)
0.99*
(0.5)
⫺141.6*
(64.8)
0.93*
(0.44)
⫺0.3
(6.6)
⫺1.2
(6.5)
4.5
(5.8)
3
Caudate volume, cm
R2
0.061
0.062
0.069
The regression coefficient for neural responses to losses correlates significantly
with intelligence. Controlling for working memory and caudate volume does not
affect the independent effect of intelligence. BOLD responses after losses increase
with higher IQ. Results are from a robust regression (*P ⬍ 0.05). IQ scores on
standard scale: mean 100, SD 15; caudate volume: mean 81 cm3, SD 92 cm3;
d-prime as a measure of working memory: mean 1.9, SD 0.83.
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
Our findings provide the first neurobiological evidence for a
link between intelligence and ex post processing, or experience, of monetary rewards and punishments. The additional
results for vmPFC, because of its role in coding expected
utility, suggest a potential link to differences in experienced
utility for the outcomes in our task.
Given the prominent roles of the caudate and the medial
prefrontal cortex during reinforcement learning, and the conceptual link between our task and standard reinforcement
learning tasks, our results suggest the hypothesis that outcomes
of probabilistic events have differential impact on negative and
positive reinforcement signals for subjects who differ in IQ.
Support for this hypothesis would be an important next step
toward explaining how individual differences in intelligence
relate to long-term differences in attitudes to risk, and to
economic preferences generally. Within the limitations of the
experimental design, we therefore considered further evidence
for this conceptual link between the reinforcement learning
value of gain/loss responses and IQ in the behavioral data. In
particular, we considered a link motivated by the following
theoretical view of how prediction errors relate to cognitive
ability in tasks like ours.
1828
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
A
100
%BOLD change
B
% BOLD from global baseline
GLM coefficient for BOLD response to Losses (FB )
0
100
100
110
120
130
140
150
0.00
Condition
Loss
0.05
Win
0
3
5
8
GLM coefficient for BOLD response to Gains (FB+)
D
%BOLD change
100
0
100
100
110
120
130
140
150
IQ
Fig. 4. Results for ventromedial prefrontal cortex. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.36, P ⬍ 0.01).
B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.30, P ⬍ 0.01). D: location of regions
of interest (ROI).
p ⫽ ½. In this case the p-expected prediction error is negative
no matter what q is. This is because the p-expected payoff is
zero no matter what the policy of the individual is, and the
q-expected payoff is positive or zero. So in the completely
unpredictable case, any deviation of q from ½ makes the
individual overconfident in his ability to predict, that is, he
overestimates the expected payoff, which is bound to be zero,
thus the expected disappointment. In our experimental environment, the computer’s behavior has no exploitable pattern.
Thus, if participants differ in IQ, and if those with high IQ have
beliefs closer to the truth, then the prediction error of those
with lower IQ will be lower (more negative) than the prediction
error of participants with higher IQ. One way in which subjects
can search for patterns in the computer’s behavior in our task
and obtain predictions that are closer to the truth is to consider
longer histories of observed computer choices during the pattern search. We thus consider the relation between subject
behavior and observed computer choices in the behavioral
data.
Behavioral Results
Response times in our task did not substantially differ with
respect to subject IQ (r ⫽ ⫺0.14, P ⫽ 0.16). A mixed-effects
logistic panel regression (Table 3) grouped by subject showed
that subjects’ guesses in any given trial were influenced primarily by the computer choices in the most recent two trials,
with subjects more likely to choose the option not recently
selected by the computer. The payoff outcome (gain or loss) of
the previous trials did not exhibit a significant effect on
subjects’ choices; thus we did not find evidence of a pervasive
use of a win-stay/lose-shift heuristic. Additionally, the observed overall frequency of High choices by the computer did
not influence subjects’ behavior after controlling for the computer’s choice one and two trials back (Table 3, model 3),
indicating that subjects modified their choices with respect to
past observations of computer behavior in a manner that
extends beyond responding to current total frequency. Subjects
appeared engaged, consciously or unconsciously, in an attempt
to learn and exploit perceived patterns in the reward-generating
process by responding to recently observed computer choices.
We found that this behavior was systematically moderated by
intelligence.
For this moderation analysis, we first considered the observed trial-by-trial frequencies of observed computer choices
for each subject. In particular we considered three frequencies:
F0t , the unconditional frequency of observed computer choices
at trial t; F1t , the frequency of observed computer choices up to
trial t conditional on the computer’s choice in the previous
trial; and F2t , the computer choices conditional on the compu-
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
Time since trial onset (in seconds)
IQ
C
0.05
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
A
100
%BOLD change
B
% BOLD from global baseline
GLM coefficient for BOLD response to Losses (FB )
0
100
0.10
0.05
0.00
Condition
Loss
Win
0.05
100
110
120
130
140
150
0
3
5
8
Time since trial onset (in seconds)
IQ
GLM coefficient for BOLD response to Gains (FB+)
D
%BOLD change
100
0
100
100
110
120
130
140
150
IQ
Fig. 5. Results for left inferior frontal lobe. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.31, P ⬍ 0.01).
B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.30, P ⬍ 0.01). D: location of ROI.
ter’s choices two trials back.1 Hence, a given value F0 describes the overall frequency of High/Low responses of the
computer. F1 indicates the frequency of pairs of High-High,
High-Low, Low-High, and Low-Low choices, which is to say
the expected probability of a High/Low number given the
identity of the previous number. Likewise, F2 considers the
expected probability of a High/Low number given the identity
of the number two periods back.
On the basis of each of these frequencies, we calculated the
trial-by-trial expected value of guessing High for each subject
and entered these values into a mixed-effects logistic panel
regression, grouped by subject, to assess the impact of these
conditional frequencies on subject choice (Table 4). As already
demonstrated by the results in Table 3, expected values based
on the unconditional frequency of computer choices, EV0,
were unrelated to subject choices. However, expected values
based on conditional frequencies relating to one and two trials
back, denoted EV1 and EV2, respectively, significantly predicted subjects’ guessing behavior. For subjects at the lower
range of intelligence for our sample, conditional frequencies
one trial back positively predicted guessing behavior, whereas
subjects at the higher range of intelligence displayed a larger
effect of the events two trials back. Note that individuals with
1
Considering no more than two previous trials is justified in light of our
analysis, described above.
lower IQ within our sample have an average IQ with respect to
the overall population.
Because our experiment was not designed to differentiate
between competing possible models as to how subjects use past
information to determine future choices, our analysis remains
restricted to showing that intelligence predicts the extent to
which information farther back in time predicts subjects’ behavior. To illustrate this conclusion more clearly without
assuming any specific functional form for information integration, Table 5 combines information from one period and two
periods back into a single regressor, EV1⫹2, consisting of the
unweighted average of F1t and F2t . In this analysis, controlling
for F1t does not eliminate the hypothesized effect that the
influence of the composite information on subjects’ guesses
significantly increases with IQ. Hence, we conclude that subjects with higher IQ were influenced by events one and two
periods back, whereas subjects with lower IQ were chiefly
responding to events only one period back.
This conclusion is further illustrated in Table 6, which
reports predicted probabilities of guessing High based on the
true observed expected value of this guess. Table 6 shows that
the influence of the expected value computed by considering
only one period back is more influential for subjects when IQ
is low, whereas the influence of the expected value considering
events two periods back moderates subject choices when IQ is
high. An informative result is obtained by further comparing
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
C
1829
1830
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
A
100
%BOLD change
B
% BOLD from global baseline
GLM coefficient for BOLD response to Losses (FB )
0
100
0.10
0.05
0.00
Condition
Loss
0.05
Win
100
110
120
130
140
150
0
5
8
GLM coefficient for BOLD response to Gains (FB+)
D
%BOLD change
100
0
100
100
110
120
130
140
150
IQ
Fig. 6. Results for posterior cingulate cortex. A: GLM regression coefficients for BOLD response after losses plotted against IQ (r ⫽ 0.20, P ⬍ 0.051).
B: event-related average plot. C: GLM regression coefficients for BOLD response after gains plotted against IQ (r ⫽ 0.11, P ⬍ 0.27). D: location of ROI.
the average expected value of a reward-maximizing subject
using the model based on one period back, mean(EV1), to that
for a subject using the model based on one and two periods
back, mean(EV1⫹2). For this comparison, using the actual
observed histories of subjects in our task, we find that the
model which considers the longer history produces a significantly smaller average expected value across all trials [mean(EV1) ⫽ 0.269, mean(EV1⫹2) ⫽ 0.253, P ⫽ 0.006]. This
result should not be surprising, and it is implicit in the model
we consider, since by considering a more complex relationship
among observed data, the mean(EV1⫹2) essentially produces a
less noisy estimate of the true expected value, which is zero in
our task.
In evaluating these findings it is important to remember that,
in our task, considering more information and a larger class of
models is useful only to learn that there is nothing meaningful
for the subject to learn, given the pseudorandom nature of the
outcomes. However, reaching this conclusion should produce
reduced prediction error signals for punishing outcomes, and
this is consistent with the caudate responses we report.
DISCUSSION
The results reported in this article provide an important first
step in understanding the functional role of intelligence during
decision making, by establishing the existence of a link between intelligence and neural reward processing. Our main
result demonstrates that IQ moderates BOLD responses to
monetary outcomes of decisions. In the caudate this moderating effect is in large part due to a relation between IQ and
responses to monetary losses. Our simple modeling account
extends a possible explanation for how these neural findings
may link to behavior in the context of prediction error-based
reward learning. The model conceptually describes higher IQ
subjects as considering richer sets of possibilities (or mental
models) during reward learning, and it predicts that for random
reward processes such as the one considered in our task,
subjects with higher IQ should experience, on average, reduced
negative prediction errors. Assuming disutility from negative
prediction errors, this account could be extended to an explanation of how cognitive ability affects willingness to take risks.
Specifically, our results provide first support for the hypothesis
that observed correlations between IQ and economic preferences could be based on systematic differences in the way
rewards and punishments are experienced ex post. This effect
of intelligence would be in addition to, but distinct from, its
role during the ex ante evaluation of decision options. We find
support for this hypothesis in the observed correlations between IQ and neural activity in the caudate following punishment, as well as increased activity in utility tracking regions
(vmPFC).
Our findings and the discussed model are consistent and
compatible with current research considering contributions of
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
C
3
Time since trial onset (in seconds)
IQ
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
Table 3. Influences of past computer actions on subject choices
Model 1 (b/SE)
Model 2 (b/SE)
Model 3 (b/SE)
1831
Table 5. Relation of IQ to conditional probabilities influencing
choice behavior
Model 1 (b/SE)
Dependent variable: Subject’s choice of H
Intercept
0.091
(0.051)
⫺1.136‡
(0.333)
Freq Hcpu
0.453‡
(0.084)
⫺0.091
(0.374)
⫺0.210†
(0.073)
⫺0.44‡
(0.073)
⫺0.110
(0.073)
Hcpu 1-back
Hcpu 2-back
Hcpu 3-back
Hcpu 1-back ⫻ Won
4,802
⫺2,389
4,784
⫺2,368
4,790
⫺2,367
Mixed-effects logistic panel regression grouped by subject [N ⫽ 3,478
observations (94 subjects ⫻ 37 trials; 3 trials were not used because of
reference 3-back)]. Hcpu i-back is a dummy variable representing whether the
computer chose “High” i periods back. Freq Hcpu is the experienced frequency
of High choices by the computer up to each trial, expressed as the difference
from 0.5. *P ⬍ 0.05; †P ⬍ 0.01: ‡P ⬍ 0.001. BIC, Bayesian Information
Criterion; LL, log-likelihood.
cognitive ability to reinforcement learning and the influence of
experimentally induced increases in cognitive load on decision
makers’ reliance on mental models during reinforcement learning tasks (Collins et al. 2012; Otto et al. 2013; van den Bos et
al. 2012b). However, the proposed model is not critically tested
on our data and should therefore be viewed as an exploratory
account and interpretative aid for a possible mechanism underlying the neural results that constitute the main result of this
study.
Table 4. Influence of conditional transition probabilities on
subject behavior
Model 1 (b/SE)
Model 2 (b/SE)
Dependent variable: subject’s choice of H
Intercept
EVm0(H)
EVm1(H)
EVm2(H)
EVm0(H) ⫻ IQ
EVm1(H) ⫻ IQ
EVm2(H) ⫻ IQ
BIC
LL
0.102
(0.05)
⫺1.039
(2.36)
2.66*
(1.22)
⫺1.87*
(1.00)
0.001
(0.02)
⫺0.02*
(0.01)
0.02*
(0.01)
4,828
⫺2,382
0.06
(0.05)
2.45*
(1.06)
2.00*
(0.93)
⫺0.02*
(0.01)
0.02*
(0.01)
4,828
⫺2,389
Mixed-effects logistic panel regression grouped by subject (N ⫽ 3,478
observations). Expected values of choosing High [EVmi(H)] were calculated
based on the frequencies, Ft, of past computer choices as described in text.
*P ⬍ 0.05. IQ scores on standard scale: mean 100, SD 15; EV centered:
mean 0.5.
Dependent variable: subject’s choice of H
Intercept
0.066
(0.05)
4.45†
(1.59)
⫺4.00*
(1.96)
⫺0.04†
(0.01)
0.03*
(0.015)
EVm1(H)
EVm1⫹m2(H)
EVm1(H) ⫻ IQ
EVm1⫹m2(H) ⫻ IQ
BIC
LL
4,828
⫺2,389
Mixed-effects logistic panel regression grouped by subject (N ⫽ 3,478
observations). EVmi(H) expresses the expected value as a simple average of the
expected conditional probabilities i periods back. *P ⬍ 0.05; †P ⬍ 0.01. IQ
scores on standard scale: mean 100, SD 15; EV centered: mean 0.5.
Our results add to an emerging literature suggesting the
potential of investigating neural correlates of stable individual
differences in cognitive ability as a means of better understanding neural computations underlying feedback-based learning
(Collins et al. 2012; Otto et al. 2013; van den Bos et al. 2012b).
Our results for medial prefrontal cortex, left inferior frontal
gyrus, and posterior cingulate cortex further suggest that investigation into the relation between intelligence and experienced utility may benefit particularly from analyzing the functional integration of utility-coding prefrontal areas with subcortical regions during reinforcement learning.2
In addition to the link to theories of behavior discussed in
this article, our results mark a neurophysiological path toward
a potential bridge between theoretical neuroscience and psychological research on intelligence: Although it has been
shown that general intelligence is predicted by subjects’ efficiency during associative learning (Kaufman et al. 2009), the
mechanism via which associative learning contributes to general intelligence has remained essentially unexplored in psy2
Notably, a link between developmental changes in reinforcement learning
and striatal-medial prefrontal cortex connectivity has been identified in a recent
study (van den Bos et al. 2012a). Additionally, neural responses in medial
orbitofrontal cortex and dorsomedial striatum have been shown to co-vary as
a function of causal contingency (Tanaka et al. 2008).
Table 6. Predicted probabilities of choosing High given the
experienced expected values for High
p(H IQ, EVm1, EVm2)
Expected Value
EVm1 ⫽ EVm2 ⫽ 0
EVm1 ⫽ 0.5 ⬎ EVm2 ⫽ 0
EVm1 ⫽ 0 ⬍ EVm2 ⫽ 0.5
EVm1 ⫽ EVm2 ⫽ 0.5
min IQ (IQ ⫽ 100)
max IQ (IQ ⫽ 140)
0.51
0.63
0.57
0.68
0.51
0.51
0.64
0.65
Rows show different combinations of expected values computed for the
model considering 1 period back (EVm1) and the model considering 2 periods
back (EVm2). Min IQ data are for the lowest IQ subject in our sample, max IQ
data are for the highest IQ subject in our sample. Predicted probabilities (p) are
based on regression results in Table 5.
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
0.168
(0.10)
⫺0.12
(0.09)
Hcpu 1-back ⫻ Lost
BIC
LL
0.386‡
(0.07)
⫺0.23
(0.360)
⫺0.273†
(0.087)
⫺0.373‡
(0.086)
1832
INTELLIGENCE MODERATES REINFORCEMENT SIGNALS
chology. Further investigation of the computational role of IQ
during associative learning may therefore benefit from a targeted investigation of how IQ relates to the complexity of
mental models held by decision makers.
ACKNOWLEDGMENTS
We thank technical support staff and seminar participants at the University
of Minnesota, as well as conference participants at the Annual Meeting for
Cognitive Neuroscience 2012, for valuable feedback. In particular, we thank
Edward Patzelt, Philip Burton, Piotr Evdokimov, Rachael Grazioplene, KimSau Chung, Claudia Civai, and Mauricio Delgado.
GRANTS
DISCLAIMER
Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily reflect the views
of the National Science Foundation.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
D.R.H. analyzed data; D.R.H. and A.R. interpreted results of experiments;
D.R.H. prepared figures; D.R.H. and A.R. drafted manuscript; D.R.H., C.G.D.,
J.R.G., and A.R. edited and revised manuscript; D.R.H. and A.R. approved
final version of manuscript; C.G.D. and J.R.G. conception and design of
research; C.G.D. and J.R.G. performed experiments.
REFERENCES
Balleine BW, Delgado MR, Hikosaka O. The role of the dorsal striatum in
reward and decision-making. J Neurosci 27: 8161– 8165, 2007.
Bates D, Maechler M, Bolker B. lme4: Linear mixed-effects models using S4
classes (Online). Version 0.999999-0. http://CRAN.R-project.org/package⫽
lme4 [2012].
Benjamin D, Brown S, Shapiro J. Who is ‘behavioral’? Cognitive ability and
anomalous preferences. J Eur Econ Assoc 11: 1231–1255, 2013.
Burks SV, Carpenter Jeffery P, Goette L, Rustichini A. Cognitive skills
affect economic preferences, strategic behavior and job attachment. Proc
Natl Acad Sci USA 106: 7745–7750, 2009.
Cohen J, MacWhinney B, Flatt M, Provost J. PsyScope: an interactive
graphic system for designing and controlling experiments in the psychology
laboratory using Macintosh computers. Behav Res Methods Instrum Comput
25: 257–271, 1993.
Collins AG, Frank MJ. How much of reinforcement learning is working
memory, not reinforcement learning? A behavioral, computational, and
neurogenetic analysis. Eur J Neurosci 35: 1024 –1035, 2012.
Daw N, Gershman SJ, Seymoour B, Dayan P, Dolan RJ. Model-based
influences on humans’ choices and striatal prediction errors. Neuron 69:
1204 –1215, 2011.
Deary IJ, Strand S, Smith P, Fernandes C. Intelligence and educational
achievement. Intelligence 35: 13–21, 2007.
Delgado MR. Reward-related responses in the human striatum. Ann NY Acad
Sci 1104: 70 – 88, 2007.
J Neurophysiol • doi:10.1152/jn.00393.2013 • www.jn.org
Downloaded from http://jn.physiology.org/ by 10.220.32.246 on June 14, 2017
This work was supported by National Institute of Mental Health Grant F32
MH077382 (to C. G. DeYoung) and National Science Foundation Grants DRL
0644131 (to J. R. Gray) and SES-1061817 (to A. Rustichini).
Delgado MR, Nystrom L, Fissel C. Tracking the hemodynamic responses to
reward and punishment in the striatum. J Neurophysiol 84: 3072–3077,
2000.
Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex
from magnetic resonance images. Proc Natl Acad Sci USA 97: 11050 –
11055, 2000.
Gottfredson LS. Why g matters: the complexity of everyday life. Intelligence
24: 79 –132, 1997.
Gottfredson LS, Deary IJ. Intelligence predicts health and longevity, but
why? Curr Dir Psychol Sci 13: 1– 4, 2004.
Hawes DR, Vostroknutov A, Rustichini A. Experience and abstract reasoning in learning backward induction. Front Neurosci 6: 23, 2012.
Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL.
Neuroimage 62: 782–790, 2012.
Lawlor D, Clark H, Smith GD, Leon D. Childhood intelligence, educational
attainment and adult body mass index: findings from a prospective cohort
and within sibling-pairs analysis. Int J Obes 30: 1758 –1765, 2006.
Li J, Delgado MR, Phelps EA. How instructed knowledge modulates the
neural systems of reward learning. Proc Natl Acad Sci USA 108: 55– 60,
2011.
Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning
dissecting multiple reinforcement-learning systems by taxing the central
executive. Psychol Sci 24: 751–761, 2013.
Rustichini A. Neuroeconomics: what have we found, and what should we
search for. Curr Opin Neurobiol 19: 672– 677, 2009.
Schönberg T, Daw ND, Joel D, O’Doherty JP. Reinforcement learning
signals in the human striatum distinguish learners from nonlearners during
reward-based decision making. J Neurosci 27: 12860 –12867, 2007.
Shamosh NA, DeYoung CG, Green AE, Reis DL, Johnson MR, Conway
AR, Gray JR. Individual differences in delay discounting relation to
intelligence, working memory, and anterior prefrontal cortex. Psychol Sci
19: 904 –911, 2008.
Shamosh NA, Gray JR. Delay discounting and intelligence: a meta-analysis.
Intelligence 36: 289 –305, 2008.
Shehzad Z, DeYoung CG, Kang Y, Grigorenko EL, Gray JR. Interaction
of COMT val158 met and externalizing behavior: relation to prefrontal brain
activity and behavioral performance. Neuroimage 60: 2158 –2168, 2012.
Talairach J, Tournoux P. 3-Dimensional proportional system: an approach to
cerebral imaging. In: Co-Planar Stereotaxic Atlas of the Human Brain. New
York: Thieme, 1988.
Tanaka SC, Balleine BW, O’Doherty JP. Calculating consequences: brain
systems that encode the causal effects of actions. J Neurosci 28: 6750 –
6755, 2008.
Tom SM, Fox CR, Trepel C, Poldrack RA. The neural basis of loss aversion
in decision-making under risk. Science 315: 515–518, 2007.
van den Bos W, Cohen MX, Kahnt T, Crone EA. Striatum-medial prefrontal
cortex connectivity predicts developmental changes in reinforcement learning. Cereb Cortex 22: 1247–1255, 2012a.
van den Bos W, Crone EA, Güroğlu B. Brain function during probabilistic
learning in relation to IQ and level of education. Dev Cogn Neurosci 2:
S78 –S89, 2012b.
Vul E, Harris C, Winkielman P, Pashler H. Puzzlingly high correlations in
fMRI studies of emotion, personality, and social cognition. Perspect Psychol
Sci 4: 274 –290, 2009.
Wang J, Zamar R, Marazzi A, Yohai V, Salibian-Barrera M, Maronna R,
Zivot E, Rocke D, Martin D, Konis K. robust: Insightful Robust Library
(Online); R package version 0.3– 4. http://CRAN.R-project.org/package⫽
robust [2008].
Wechsler D. Wechsler Abbreviated Scale of Intelligence (3rd ed.). San
Antonio, TX: Pearson Education, 1999.
Wu SW, Delgado MR, Maloney LT. The neural correlates of subjective
utility of monetary outcome and probability weight in economic and in
motor decision under risk. J Neurosci 31: 8822– 8831, 2011.