Neural Currencies for Valuation and Decision Making

Choosing the Greater of Two Goods:
Neural Currencies for Valuation and Decision Making
Leo P. Surgre, Gres S. Corrado and William T. Newsome
Presenter: He Crane Huang
04/20/2010
Outline
• Studies on neural correlates of simple perceptual
decisions
• Interactions between decision making and reward
Studies
of
value-based
decisions
on
‘free-choice’
•
tasks
Neural correlates of perceptual decisions
Sensory input
Motor output
WS
Sensation
Action
identified sensory representations as well as decisionSEF
related signals in areas of the parietal and frontal cortices. At the neural level, differentiating sensory signals
from decision-related signals is relatively straightforArea 46
ward. First, sensory signals require the presence of the
V4
sensory stimulus, and extinguish with stimulus offset.
V1
Second, and more importantly, in discrimination tasks
in which behavioural decisions and neural activity are
measured across a range of stimulus strengths, animals
IT
make both correct and incorrect judgements in response
Visual and to
oculomotor
systems
of thestimuli.
primate
braintrials,
the presentation
of identical
On these
sensory neurons encode the visual stimulus itself,
Figure 1 | Visual and oculomotor systems of the primate
whereas the activity of decision-related neurons reflects
brain. Lateral view of the cerebral hemisphere of the macaque
LIP
FEF
Conceptual framework for (2AFC) decision
making
a Conceptual framework for
simple perceptual decisions
2-Alternative forced-choice
discrimination task
Sensory input
Sensory
transformation
Sensory
representation
Decision
transformation
Probability of
choice
Action
implementation
Choice
b Conceptual framework f
simple value-based dec
Actor
Sensory input
+ physiological need
①
Common reward
currency
②
Value
transformation
E
Value
representation
③
Decision
transformation
Probability of
choice
Action
P
(Perceptual) Decision-making tasks
RE
EV
R
V II EE W
WSS
and recordings from monkey caudate neurons during
and recordings from monkey caudate neurons during
simple associative conditioning tasks show activity that
Left?
Left?
simple associative conditioning tasks show activity that
Left? Value
Left?
Motion
is during
consistent
with the creation of such stimulus–
LIP
activity
the
Motion
Right?
Right?
is consistent
with the creation of such stimulus–
Value
44–46
Right?
response
bonds
. However,
the direct
yoking
of • Response
Right?
9482 J. Neurosci., November
1, 2002, 22(21):9475–9489
Roitman
and Shadlen
44–46
RT-direction
discrimination
task
response
bonds
. However,
direct
yoking of
stimuli
to actions
and outcomes
impliedthe
by the
current
stimulioftothese
actions
and outcomes
implied
the current
generation
models
fails to capture
the by
facility
of organisms
these models
fails tocomplex
capture repthe facility
withgeneration
which higher
construct
with which
higher
construct
resentations
of value
andorganisms
flexibly link
them to complex
action representations
BOX 2. of value and flexibly link them to action
selection
.
selection BOX
Responding
to 2these
limitations, more recent
Responding
these
limitations,
theoretical
proposals to
have
expanded
the rolemore
of the recent
theoretical
have
themore
role of the
dopamine
signalproposals
to include
theexpanded
shaping of
47–49
dopamine
signal
to
include
the shaping
of more
. Consistent
with this
abstract
models
of
valuation
b Linking behaviour to perception
d Linking behaviour to valuation
47–49
FIG.
2b portrays
the
dopamine
system aswith
a this
approach,
. Consistent
abstract
models
of
valuation
b Linking behaviour to perception
d Linking behaviour to valuation
criticapproach,
whose influence
the generation
FIG. 2b extends
portraysbeyond
the dopamine
system as a
of simple
predictions
to the
construction
critic associative
whose influence
extends
beyond
the generation
and of
modification
of complex
value transformations.
simple associative
predictions
to the construction
In this
scheme,
the striatum
is considered
have the
and
modification
of complex
value to
transformations.
crucial
rolescheme,
of liaison
and critic.
In this
thebetween
striatumactor
is considered
to If
have the
correct,
this role
proposal
indicates
that dopamine
neucrucial
of liaison
between
actor and
critic. If
ronscorrect,
have access
the valueindicates
representation
depicted neuthistoproposal
that dopamine
in FIG. 2b. Consistent with this idea, Nakahara and colrons
have access to the value representation depicted
50
leagues recently showed that dopamine responses
in FIG. 2b. Consistent with this idea, Nakahara and colwere strongly
by contextual
information
Stimulus strength (left)
of left
option
50 modulatedtask.
Figure 7.Value
Time
course
of LIP activity in the RT-direction-discrimination
A,
Average
response
from
54 LIP neuro
leagues recentlyRoitman
showedand
that
dopamine
responses
Shadlen,
2002
that
pertained
to the
evolution
of
reward
probability
motion
strength
and
choice
as
indicated
by
color
and
line
type.
The
responses
are
aligned
to
two
events in the trial. On
(Task
difficulty
decrease/
Coherence
Increases)
were
strongly
modulated
by
contextual
information
Stimulus strength
(left)a | General structure
Value
of discrimination
left option
Figure 3 | Decision-making
tasks.
of aonset
perceptual
task, in averages
to the
of stimulus
motion. Response
in successive
this portiontrials
of theingraph
areeven
drawn
to the
median
across
a task,
when
this
infor-RT for each m
pertained
to theare
evolution
reward
which a monkey reports its judgement of the directionactivity
of motion
in a random
dotofstimulus
with
within
100 msec
eye movement
initiation.that
On the
right, responses
aligned toofinitiation
ofprobability
the eye movemen
mation
was
not
accompanied
by any explicit
sensory
c Free-choice task
c Free-choice task
% left choices
% left choices
left choices
% left%choices
a Perceptual discrimination task
a Perceptual discrimination task
LIP activity in RT-motion discrimination task
•
•
•
•
Implements the decision transformation.
•
LIP reflects a general decision variable that is monotonically related to the log
likelihood ratio that the animal will select on of the two alternatives. -Thursday’a paper
Convert a sensory representation of visual motion into a decision variable
Predicts not only the decision, but when the decision has been reached.
[box1]: Encode a mixture of sensory, motor and cognitive signals that might
guide decisions about upcoming behavioral responses.
Decision making and reward
Adaptive decisions
-take ‘reward’ into account
a Conceptual framework for
simple perceptual decisions
b Conceptual framework for
simple value-based decisions
expect
simple value-based
decisions needs
simple perceptual decisions
+ physiological
Psychol
Actor Sensory input
Sensory inputSensory
the infl
transformation
+ physiological needs
making
Sensory
Critic
Striatum
transformation
Sensory
Common reward
notably
representation
currency Striatum
Critic
Outcomes reward
Sensory
Common reward
most ph
Value
Decision
representation
currency
Error Outcomes
signal
transformation
transformation
ing hol
Value
Decision
activity
Error signal
Value
transformation
Probability of
transformation
Dopamine (VTA) transfor
representation
choice
tors beg
Value
Probability of Action
Decision
Dopamine (VTA)
order to
representation
choiceimplementation
Predictions
transformation
adaptive
Choice
Action
Decision
A co
Probability
of
Predictions
implementation
transformation
choice
value-b
Choice
clusive n
Action
Probability of
implementation
from w
choice
internal
Choice
Action
sentatio
implementation
Figure 2 | Conceptual frameworks for decision making. A conceptual framework that hand si
illustrates proposed processing stages for
the formation of simple perceptual and value-based
propose
Choice
framework
forinput
a Conceptual framework
for inputb ConceptualActor
Sensory
Sensory
Decision making and reward
A common neural currency for reward
a Conceptual framework for
simple perceptual decisions
•
•
•
Sensory input
Sensory
Reward: anything that an animal
will
work
to
transformation
acquire, consists motivational and affective
Sensory
dimensions.
representation
b Conceptual framework for
simple value-based decisions
Actor
Sensory input
+ physiological needs
Striatum
Common reward
currency
Decisionis a
Brain stimulation reward (BSR):
there
transformation
dedicated neural network devoted to reward
Probability of
processing.
choice
Value
transformation
Action
Shizgal and colleagues: BSR contributes
as a
implementation
reward signal that is responsible for valuation.
Decision
transformation
Choice
Critic
Outcomes
Error signal
Value
representation
Dopamine (VTA)
Predictions
Probability of
choice
Action
implementation
Choice
Figure 2 | Conceptual frameworks for decision making. A conceptual framework that
illustrates proposed processing stages for the formation of simple perceptual and value-based
expectati
Psycholog
the influe
making in
notably abs
reward is
most physi
ing hold r
activity th
transform
tors begun
order to e
adaptive b
A conc
value-base
clusive no
from whic
internal re
sentation
hand side
proposed
this fram
comprises
l framework for
ceptual decisions
ory input
y
n
ory
sentation
n
n
ability of
e
n
n
Choice
Decision making and reward
REVIEWS
Incentives and errors
b Conceptual framework for
simple value-based decisions
Actor
Sensory input
+ physiological needs
Striatum
Common reward
currency
Value
transformation
Outcomes
•
•
Error signal
Value
representation
Decision
transformation
Critic
expectation of the likely abundance of fish.
Psychologists and economists have long appreciated
the influence of reward and valuation on decision
making in higher mammals24, but these factors were
notably absent from our preceding discussion. Although
reward is an implicit variable in every operant task,
mostDopamine
physiological studies
of perceptual decision maksystem:
ing hold reward contingencies constant to isolate
activityathat
is specifically
to sensorimotor
central
role inrelated
processing
the motivational
transformations (FIG. 2a). Only recently have investigaaspect of reward
tors begun to manipulate reward independently in
order to explore the neural basis of valuation and
not signal the occurrence of rewards, but
adaptivedo
behaviour.
A conceptual
withinto
which
to consider
can beframework
considered
code
reward prediction
value-based
choice is proposed in FIG. 2b. Neither conerror
clusive nor complete, it is intended as a starting point
from which to discuss the basic steps in building an
internal representation of value and using that representation to guide behaviour. Focus first on the lefthand side of this diagram (labelled ‘actor’). Like the
proposed framework for perceptual decisions (FIG 2a),
this framework for value-based decision making
comprises three key processing stages. At the first
stage, a value transformation takes the input —
Dopamine (VTA)
Predictions
Probability of
choice
Action
implementation
Choice
ceptual frameworks for decision making. A conceptual framework that
osed processing stages for the formation of simple perceptual and value-based
•
Value-based decision making
The cortex as the stage for valuation
a Conceptual framework for
simple perceptual decisions
•
Sensory input
Anatomically, several regions Sensory
transformation
within the prefrontal and parietal
association cortices are positioned
Sensory
representation
to link reward to behavioral
responses (motor planning). Decision
transformation
b Conceptual framework for
simple value-based decisions
Actor
Sensory input
+ physiological needs
Striatum
Common reward
currency
Value
transformation
Critic
Outcomes
Error signal
identified sensory representations as well as decisionValue
SEF Probability of
related
signals
in
areas
of
the
parietal
and
frontal
corDopamine
(VTA)
representation
choice
tices. At the neural level, differentiating sensory signals
Action
Decision
from
decision-related
signals
is
relatively
straightforArea 46
Predictions
implementation
transformation
ward. First, sensory signals require the presence of the
V4
Choice
sensory stimulus,
and of
extinguish with stimulus offset.
Probability
V1
choiceimportantly, in discrimination tasks
Second, and more
Action
in which behavioural
decisions and neural activity are
implementation
measured across a range of stimulus strengths, animals
IT
make both correctChoice
and incorrect judgements in response
to the presentation of identical stimuli. On these trials,
Figure 2 | Conceptual frameworks for decision making. A conceptual framework that
sensory
encode
visual
Figure 1 | Visual and oculomotor systems
of theproposed
primateprocessing
illustrates
stages forneurons
the formation
of simplethe
perceptual
andstimulus
value-baseditself,
LIP
FEF
expectat
Psycholog
the influe
making in
notably ab
reward is
most phys
ing hold
activity th
transform
tors begu
order to e
adaptive b
A conc
value-base
clusive no
from whic
internal re
sentation
hand side
proposed
this fram
comprise
stage, a v
Value-based decision making
R E V I E W S ‘free choice’ task design
a Perceptual discrimination task
Left?
Motion
c Free-choice task
Left?
Right?
Improve
Value
Limitation of the the imperative tasks:
-The valueb transformation
is all
or none
Linking behaviour
to perception
-The ‘decision’ is simple one-to-one mapping
d Linking behaviour to valuation
Right?
and re
simple
is con
respo
stimu
gener
with w
resent
select
Re
theor
dopam
abstra
appro
critic
Value-based decision making
Demonstrating behavioral control
•
Two different approaches:
•
Nash equilibrium from the theory of
competitive games
•
The matching law from a general
principle of animal foraging behavior.
Provide a means of assessing
behavioral control.
Value-based decision making
Understanding local strategy
Nash Equations
Matching law
average behavior at equilibrium
Local strategies that produce these average behavioral phenomena?
(Value-based behavior control: behavior is under ‘stimulus control’)
Quantitative modeling of local choice strategy: the ‘variables’ link reward history to
behavior
Neurophysiological exploration of model variables: how to understand the model at the
neural level.
Value-based decision making
Three free-choice studies
•
Value signals in frontal cortex
•
Value signals in parietal cortex
•
•
‘The matching pennies game’, Barraclough
and colleagues
behavioral dynamics
Matching behavior and value representation,
Leo P. Sugre, et al.
behavioral equilibria
‘The inspection game’, Dorris and Glimcher
Value-based decision making
free-choice study 1: Value signals in frontal cortex
Frequency histograms of P(right)
REVIEWS
a Matching pennies
b The inspection game
Computer chooses
Red
Green
Computer chooses
Left
Right
0
0
1
1
1
0
Red
1
1–i
0.5
0
0.5
Green
0
Monkey chooses
Left
0
Right
Monkey chooses
1
2–i
Payoff for monkey
Payoff for monkey
Payoff for computer
Payoff for computer
2
Figure 4 | Payoff matrices for competitive games. For a trial in the matching pennies game
(a) or the inspection game (b), a payoff matrix defines the outcome for each player on the basis
of the combined
actions of both players.
Green and blue represent
the payoff
experienced
by the
Barraclough
and colleagues:
matching
pennies
game
monkey and computer, respectively, for each possible combination of choices. In the inspection
game, ‘i’ defines a cost to the computer for choosing (‘inspecting’) the red (‘risky’) target. By
manipulating this cost across blocks, the mixed strategy predicted by the Nash equilibrium can
be changed. Panel a adapted, with permission, from REF. 76 © (2004) Macmillan Publishers Ltd.
Panel b adapted, with permission, from REF. 78 © (2004) Elsevier Science.
•
high, whether or not that target is ultimately chosen.
This effect of fractional income is independent of any
trial-to-trial variation in the fine details of the monkey’s eye movements. Importantly, these signals are
apparent to us only because our behavioural model
affords access to the animal’s underlying value transformation, which is local in time. Our previous attempts
to detect value signals in LIP on the basis of global
behavioural changes between blocks were marginally
successful at best. Like Barraclough and colleagues, we
owe our progress to an approach that combined valuation-based behavioural control, modelling of the proximal algorithm that generates individual choices and
the neurophysiological study of the variables revealed
by this model.
The perspective of behavioural equilibria. Dorris and
Glimcher78 base their experiments on a well-characterized competitive interaction known as ‘the inspection
game’. The general structure of this task is similar to
that used by Barraclough and colleagues: on every trial,
the monkey and the computer each select one of two
eye movement targets, and the outcome depends on
their combined choices. However, in this task the payoff matrix that defines the relationship between choices
and outcomes is more complex (FIG. 4b). The monkey
Foredelay
period
Value-based decision making
free-choice study 1: Value signals in frontal cortex
•
Reinforcement learning algorithms provide a general
framework for finding optimal strategies in a dynamic
environment.
•
Prefrontal cortex might have a key role in optimizing
decision-making strategies.
ability of choosing red will vary linwith the local fractional income from
the unity line in Fig. 2E). Figure 2E
s this to be approximately true for the
vioral data. Second, because the model
ictly probabilistic, it predicts that the
ber of successive trials on which a
r (monkey or model) will choose a
n color before switching will be dised as the average of a family of exntials. Figure 2F plots these distribuof stay durations; not only is the
The model provides us a window into the
animal’s internal valuation of available options and gives us a metric—local fractional
income—that allows us to estimate how the
monkey values each of the two colors on
every trial, even before it renders a decision.
Equipped with this quantitative trial-by-trial
measure, we are poised to explore how value
is represented in the brain.
The representation of fractional income in the parietal cortex. The lateral
intraparietal (LIP) area of the posterior pari-
Matching behavior in monkeys
1. Matching ber in monkeys. Leo P. Sugre, et al. 2004
The sequence of
s of an oculomomatching task: (i)
. To begin a run
ials, the animal
fixate the central
(ii) Delay. Sactargets appear
omized spatially
olor) in opposite
fields while the
al maintains fixa(iii) Go. Dimming
e fixation cross
a saccadic ree and hold. (iv)
n. Brightening of
xation cross cues
, target colors are
rerandomized,
he delay period of
next trial begins.
rd is delivered at
ime of the re-
Downloaded from www.sciencemag.org on April 20, 2010
Value-based decision making
free-choice study 2: Value signals in parietal
cortex (area LIP)-behavioral dynamics
* Rewards are assigned to the two colors
at rates that are independent and
stochastic.
* Once assigned, a reward remains
available until the associated color is
chosen.
* “This persistence of assigned rewards
means that the likelihood of being
rewarded increases with the time since a
color was last chosen....” and ensures
that matching approximates the optimal
probabilistic strategy in this task .
Value-based decision making
free-choice study 2: Value signals in parietal
R Ecortex
VIEWS
(area LIP)- behavioral dynamics
Dynamic matching behavior
A local model of matching behavior
CH ARTICLE
a field termed the cell’s response Leaky
integrators
field (RF).
Local
income
Local
fractional
income
Probability
of choice
Neural response (spikes s–1)
ww.sciencemag.org on April 20, 2010
Approximately one-third of the cells that
we encountered in LIP met this criterion,
including 33 neurons from the left hemiIred
FIred
PCred
τ
000100001
sphere of Monkey G and 29 from the right
hemisphere of Monkey F.
Figure Reward
4A illustrates how we studied
these 62 LIP
neurons in the matching conhistories
text. Critically, in this setting, trials that
shared an identical visual stimulus configIgreen
1 0 0ended
0 1 1in0 0the same motor reuration0 and
τ
sponse still varied widely in the local fractional income of the chosen target. Thus, on
some trials the monkey chose the target
Into the RF
inside the cell’s RF and this target had a
Out of the RF
b high
d 50
1 fractional income, whereas on other c
trials the fractional
Model income was much lowModel
400
er. Our experimental
Monkeysquestion was whether,
Monkeys
40
within each category of motor response,
activity in LIP is influenced by the local
300
30
fractional income of the chosen target.
Figure 4B shows representative data from
200
the same cell featured in Fig. 3, now recorded
20
during performance of the matching task. For
each trial, the cell’s mean delay-period response
100
10
is plotted against the local fractional income of
the chosen target. Activity is shown separately
for0trials that end in saccades into (blue) and out
0
0
of (green)
the cell’s RF. We observed a positive
0
1
0
200
400
300
100
0
1
correlation between firing rate and fractional
Local fractional income (RF target)
Cumulative green choices
Local fractional income (red)
income for choices into the RF and a negative
correlation
for choices
out of
RF. The solid
Figure
6 | A local
model
ofthe
matching
behaviour. a | A linear–nonlinear probabilistic model uses leaky integration over recent
lines are
regressionsto
fitestimate
to these two
sets
of data
reward
experience
the
local
income due to each response option (Ired, Igreen). In a local formulation of Herrnstein’s
by the method of least squares and are charac-
Cumulative red choices
el of dyhing beEquation
restateclassical
ing law,
onal inractional
d here in
red tartic (botthat in
ing, cume, I, is
perfect
of the
wards up
time, t .
(28). If this suggestion is correct, and LIP is
indeed an important locus for oculomotor
decisions, then in a setting where eye movement decisions are informed by reward history and expectation, we anticipate the appropriate decision variable to be represented in
LIP. Accordingly, the following physiological experiments test the prediction that in the
matching task, neurons in LIP encode the
local fractional income (Fig. 2B) of competing target colors.
We selected for study LIP neurons that
showed sustained, spatially selective activity in the context of a classical delayed
saccade task (Fig. 3A). These neurons respond only when targets are presented
within a circumscribed region of the visual
Probability of choice (red)
ontains activity appropriate for
dic eye movements, signals that
ariously interpreted as working
visual targets, attention to salient
ons, or motor planning (20–23).
xt of more sophisticated eye
asks, investigators have documodulation of LIP activity by the
ensory evidence that supports a
dgment (24–26) and by both the
lity that a particular movement
and the volume of juice associmovement (27). Such encoding
n from diverse sources is a proty of brain areas responsible for
utative decision variables that
information to motor responses
matching law, these estimates are used to compute the local fractional income of each option (that is, FI ), which directly
Value-based decision making
free-choice study 2: Value signals in parietal
cortex (area LIP)-behavioral dynamics
•
Local fractional income is the ‘valuation variable’ that
modulates LIP firing rates.
•
LIP neuron activity predicts the monkey’s eye movement
responses, contribute to plan shifts in gaze or visual
attention.
en ratio. Mixed
actions
ity. The
at multist each
e desirhenever
dicted is
strategy
subjecdividual
ary.
ctivity is
y of sacsirability
esponse
sirability
rnstein,
ased on
subsethat huy during
, 1979).
latt and
Value-based decision making
free-choice study 3: Value signals in parietal
cortex (area LIP)-behavioral equilibria
Posterior Parietal Cortex
367
Human vs. Human in 3 blocks
Nash prediction
second block of trials in
the predicted 90% rat
This had the effect of
inspect option for the o
his inspect rate.
To quantify the influe
cost of inspection var
choosing the risky opti
havior over the last ha
had presumably reache
found that the human s
mixed strategies and th
ing the risky option we
opponents’ payoffs (Fig
mixed strategies of a
sponse is consistent w
subjective desirability
equal for both players
1950).
Risky (subject)
Inspect (opponent)
Figure 2. Human versus Human Choice Behavior during Three
Blocks of Inspection Game Trials
Dorris
Figure 1. The Mixed Strategy Inspection Game
and Glimcher, 2004
(A) General form of the payoff matrix. The variables in the bottom
left of each cell determine the subject’s payoffs, and the variables
in the top right of each cell determine the opponent’s payoffs for
The thick black line denotes the 20 trial running average of the
percentage of the risky option chosen by the subject. The horizontal
black lines denote the subject’s rate of choosing the risky option
predicted at the Nash equilibrium. The thin gray line denotes the
corresponding 20 trial running average of the percentage of the
inspect option chosen by the opponent. The opponent’s predicted
rate of choosing the inspect option at Nash equilibrium was 50%
for all blocks of trials. The opponent’s costs of inspection were
stepped sequentially from 0.5 to 0.9 to 0.3 across the three blocks
Humans versus Comp
Having quantified beh
against human oppone
dardized computer “op
experiments. In brief, t
Probabilit
0.5
Neural response (spikes s–1)
Local
Probability of choice (%)
Normalized ne
Neural response (spikes s–1)
Probability of choice (%)
Local
therefo
rates
do
Value-based decision making
as evide
free-choice study 3: Value signals in parietal desirab
0.23
0.11
0
Howeve
0
cortex
(area
LIP)-behavioral
equilibria
0.5
0
1.0
0
0.5
1.0
result
is
Global value
Global value
to
exer
A
player’s
overall
choice
distribution
d
50
100
c 100
100
value o
should equalize the average payoff
resulting from alternative action.(Global
entirely
Nash equilibrium)
when t
change
25
50
50
50
which t
but the
LIP encodes each alternative’s average
payoff (which is a constant), rather then the and con
probability
of
choosing
that
alternative
0
0
trials w
0
0
0 varies). 200
400
200
400
(which
field
tar
Trial number
Trial number
This
Figure 7 | Influence of global and local values onLIP
monkey
choices
and lateral
encodes
an abstract
representation of
only to
intraparietal area activity. a | Average normalizedthe
neural
response
of
43
lateral
stimuli apart from specific motor plan.
respons
intraparietal (LIP) neurons in the matching task as a function of both the global (abscissa)
and local (ordinate) values of the response field target. b | Monkey’s probability of choosing
which
the response field target in the matching task as a function of both the global (abscissa) and
high, an
0.5
Conclusion
•
New efforts to understand value-based decision making might bring together
two areas of neuroscience that have traditionally existed in separate spheres —
the study of perception and cognition, and the study of reward and motivation.
•
The study of value-based choice might be uniquely positioned to lay the
foundations for this unified neurobiology of choice behavior.