Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making Leo P. Surgre, Gres S. Corrado and William T. Newsome Presenter: He Crane Huang 04/20/2010 Outline • Studies on neural correlates of simple perceptual decisions • Interactions between decision making and reward Studies of value-based decisions on ‘free-choice’ • tasks Neural correlates of perceptual decisions Sensory input Motor output WS Sensation Action identified sensory representations as well as decisionSEF related signals in areas of the parietal and frontal cortices. At the neural level, differentiating sensory signals from decision-related signals is relatively straightforArea 46 ward. First, sensory signals require the presence of the V4 sensory stimulus, and extinguish with stimulus offset. V1 Second, and more importantly, in discrimination tasks in which behavioural decisions and neural activity are measured across a range of stimulus strengths, animals IT make both correct and incorrect judgements in response Visual and to oculomotor systems of thestimuli. primate braintrials, the presentation of identical On these sensory neurons encode the visual stimulus itself, Figure 1 | Visual and oculomotor systems of the primate whereas the activity of decision-related neurons reflects brain. Lateral view of the cerebral hemisphere of the macaque LIP FEF Conceptual framework for (2AFC) decision making a Conceptual framework for simple perceptual decisions 2-Alternative forced-choice discrimination task Sensory input Sensory transformation Sensory representation Decision transformation Probability of choice Action implementation Choice b Conceptual framework f simple value-based dec Actor Sensory input + physiological need ① Common reward currency ② Value transformation E Value representation ③ Decision transformation Probability of choice Action P (Perceptual) Decision-making tasks RE EV R V II EE W WSS and recordings from monkey caudate neurons during and recordings from monkey caudate neurons during simple associative conditioning tasks show activity that Left? Left? simple associative conditioning tasks show activity that Left? Value Left? Motion is during consistent with the creation of such stimulus– LIP activity the Motion Right? Right? is consistent with the creation of such stimulus– Value 44–46 Right? response bonds . However, the direct yoking of • Response Right? 9482 J. Neurosci., November 1, 2002, 22(21):9475–9489 Roitman and Shadlen 44–46 RT-direction discrimination task response bonds . However, direct yoking of stimuli to actions and outcomes impliedthe by the current stimulioftothese actions and outcomes implied the current generation models fails to capture the by facility of organisms these models fails tocomplex capture repthe facility withgeneration which higher construct with which higher construct resentations of value andorganisms flexibly link them to complex action representations BOX 2. of value and flexibly link them to action selection . selection BOX Responding to 2these limitations, more recent Responding these limitations, theoretical proposals to have expanded the rolemore of the recent theoretical have themore role of the dopamine signalproposals to include theexpanded shaping of 47–49 dopamine signal to include the shaping of more . Consistent with this abstract models of valuation b Linking behaviour to perception d Linking behaviour to valuation 47–49 FIG. 2b portrays the dopamine system aswith a this approach, . Consistent abstract models of valuation b Linking behaviour to perception d Linking behaviour to valuation criticapproach, whose influence the generation FIG. 2b extends portraysbeyond the dopamine system as a of simple predictions to the construction critic associative whose influence extends beyond the generation and of modification of complex value transformations. simple associative predictions to the construction In this scheme, the striatum is considered have the and modification of complex value to transformations. crucial rolescheme, of liaison and critic. In this thebetween striatumactor is considered to If have the correct, this role proposal indicates that dopamine neucrucial of liaison between actor and critic. If ronscorrect, have access the valueindicates representation depicted neuthistoproposal that dopamine in FIG. 2b. Consistent with this idea, Nakahara and colrons have access to the value representation depicted 50 leagues recently showed that dopamine responses in FIG. 2b. Consistent with this idea, Nakahara and colwere strongly by contextual information Stimulus strength (left) of left option 50 modulatedtask. Figure 7.Value Time course of LIP activity in the RT-direction-discrimination A, Average response from 54 LIP neuro leagues recentlyRoitman showedand that dopamine responses Shadlen, 2002 that pertained to the evolution of reward probability motion strength and choice as indicated by color and line type. The responses are aligned to two events in the trial. On (Task difficulty decrease/ Coherence Increases) were strongly modulated by contextual information Stimulus strength (left)a | General structure Value of discrimination left option Figure 3 | Decision-making tasks. of aonset perceptual task, in averages to the of stimulus motion. Response in successive this portiontrials of theingraph areeven drawn to the median across a task, when this infor-RT for each m pertained to theare evolution reward which a monkey reports its judgement of the directionactivity of motion in a random dotofstimulus with within 100 msec eye movement initiation.that On the right, responses aligned toofinitiation ofprobability the eye movemen mation was not accompanied by any explicit sensory c Free-choice task c Free-choice task % left choices % left choices left choices % left%choices a Perceptual discrimination task a Perceptual discrimination task LIP activity in RT-motion discrimination task • • • • Implements the decision transformation. • LIP reflects a general decision variable that is monotonically related to the log likelihood ratio that the animal will select on of the two alternatives. -Thursday’a paper Convert a sensory representation of visual motion into a decision variable Predicts not only the decision, but when the decision has been reached. [box1]: Encode a mixture of sensory, motor and cognitive signals that might guide decisions about upcoming behavioral responses. Decision making and reward Adaptive decisions -take ‘reward’ into account a Conceptual framework for simple perceptual decisions b Conceptual framework for simple value-based decisions expect simple value-based decisions needs simple perceptual decisions + physiological Psychol Actor Sensory input Sensory inputSensory the infl transformation + physiological needs making Sensory Critic Striatum transformation Sensory Common reward notably representation currency Striatum Critic Outcomes reward Sensory Common reward most ph Value Decision representation currency Error Outcomes signal transformation transformation ing hol Value Decision activity Error signal Value transformation Probability of transformation Dopamine (VTA) transfor representation choice tors beg Value Probability of Action Decision Dopamine (VTA) order to representation choiceimplementation Predictions transformation adaptive Choice Action Decision A co Probability of Predictions implementation transformation choice value-b Choice clusive n Action Probability of implementation from w choice internal Choice Action sentatio implementation Figure 2 | Conceptual frameworks for decision making. A conceptual framework that hand si illustrates proposed processing stages for the formation of simple perceptual and value-based propose Choice framework forinput a Conceptual framework for inputb ConceptualActor Sensory Sensory Decision making and reward A common neural currency for reward a Conceptual framework for simple perceptual decisions • • • Sensory input Sensory Reward: anything that an animal will work to transformation acquire, consists motivational and affective Sensory dimensions. representation b Conceptual framework for simple value-based decisions Actor Sensory input + physiological needs Striatum Common reward currency Decisionis a Brain stimulation reward (BSR): there transformation dedicated neural network devoted to reward Probability of processing. choice Value transformation Action Shizgal and colleagues: BSR contributes as a implementation reward signal that is responsible for valuation. Decision transformation Choice Critic Outcomes Error signal Value representation Dopamine (VTA) Predictions Probability of choice Action implementation Choice Figure 2 | Conceptual frameworks for decision making. A conceptual framework that illustrates proposed processing stages for the formation of simple perceptual and value-based expectati Psycholog the influe making in notably abs reward is most physi ing hold r activity th transform tors begun order to e adaptive b A conc value-base clusive no from whic internal re sentation hand side proposed this fram comprises l framework for ceptual decisions ory input y n ory sentation n n ability of e n n Choice Decision making and reward REVIEWS Incentives and errors b Conceptual framework for simple value-based decisions Actor Sensory input + physiological needs Striatum Common reward currency Value transformation Outcomes • • Error signal Value representation Decision transformation Critic expectation of the likely abundance of fish. Psychologists and economists have long appreciated the influence of reward and valuation on decision making in higher mammals24, but these factors were notably absent from our preceding discussion. Although reward is an implicit variable in every operant task, mostDopamine physiological studies of perceptual decision maksystem: ing hold reward contingencies constant to isolate activityathat is specifically to sensorimotor central role inrelated processing the motivational transformations (FIG. 2a). Only recently have investigaaspect of reward tors begun to manipulate reward independently in order to explore the neural basis of valuation and not signal the occurrence of rewards, but adaptivedo behaviour. A conceptual withinto which to consider can beframework considered code reward prediction value-based choice is proposed in FIG. 2b. Neither conerror clusive nor complete, it is intended as a starting point from which to discuss the basic steps in building an internal representation of value and using that representation to guide behaviour. Focus first on the lefthand side of this diagram (labelled ‘actor’). Like the proposed framework for perceptual decisions (FIG 2a), this framework for value-based decision making comprises three key processing stages. At the first stage, a value transformation takes the input — Dopamine (VTA) Predictions Probability of choice Action implementation Choice ceptual frameworks for decision making. A conceptual framework that osed processing stages for the formation of simple perceptual and value-based • Value-based decision making The cortex as the stage for valuation a Conceptual framework for simple perceptual decisions • Sensory input Anatomically, several regions Sensory transformation within the prefrontal and parietal association cortices are positioned Sensory representation to link reward to behavioral responses (motor planning). Decision transformation b Conceptual framework for simple value-based decisions Actor Sensory input + physiological needs Striatum Common reward currency Value transformation Critic Outcomes Error signal identified sensory representations as well as decisionValue SEF Probability of related signals in areas of the parietal and frontal corDopamine (VTA) representation choice tices. At the neural level, differentiating sensory signals Action Decision from decision-related signals is relatively straightforArea 46 Predictions implementation transformation ward. First, sensory signals require the presence of the V4 Choice sensory stimulus, and of extinguish with stimulus offset. Probability V1 choiceimportantly, in discrimination tasks Second, and more Action in which behavioural decisions and neural activity are implementation measured across a range of stimulus strengths, animals IT make both correctChoice and incorrect judgements in response to the presentation of identical stimuli. On these trials, Figure 2 | Conceptual frameworks for decision making. A conceptual framework that sensory encode visual Figure 1 | Visual and oculomotor systems of theproposed primateprocessing illustrates stages forneurons the formation of simplethe perceptual andstimulus value-baseditself, LIP FEF expectat Psycholog the influe making in notably ab reward is most phys ing hold activity th transform tors begu order to e adaptive b A conc value-base clusive no from whic internal re sentation hand side proposed this fram comprise stage, a v Value-based decision making R E V I E W S ‘free choice’ task design a Perceptual discrimination task Left? Motion c Free-choice task Left? Right? Improve Value Limitation of the the imperative tasks: -The valueb transformation is all or none Linking behaviour to perception -The ‘decision’ is simple one-to-one mapping d Linking behaviour to valuation Right? and re simple is con respo stimu gener with w resent select Re theor dopam abstra appro critic Value-based decision making Demonstrating behavioral control • Two different approaches: • Nash equilibrium from the theory of competitive games • The matching law from a general principle of animal foraging behavior. Provide a means of assessing behavioral control. Value-based decision making Understanding local strategy Nash Equations Matching law average behavior at equilibrium Local strategies that produce these average behavioral phenomena? (Value-based behavior control: behavior is under ‘stimulus control’) Quantitative modeling of local choice strategy: the ‘variables’ link reward history to behavior Neurophysiological exploration of model variables: how to understand the model at the neural level. Value-based decision making Three free-choice studies • Value signals in frontal cortex • Value signals in parietal cortex • • ‘The matching pennies game’, Barraclough and colleagues behavioral dynamics Matching behavior and value representation, Leo P. Sugre, et al. behavioral equilibria ‘The inspection game’, Dorris and Glimcher Value-based decision making free-choice study 1: Value signals in frontal cortex Frequency histograms of P(right) REVIEWS a Matching pennies b The inspection game Computer chooses Red Green Computer chooses Left Right 0 0 1 1 1 0 Red 1 1–i 0.5 0 0.5 Green 0 Monkey chooses Left 0 Right Monkey chooses 1 2–i Payoff for monkey Payoff for monkey Payoff for computer Payoff for computer 2 Figure 4 | Payoff matrices for competitive games. For a trial in the matching pennies game (a) or the inspection game (b), a payoff matrix defines the outcome for each player on the basis of the combined actions of both players. Green and blue represent the payoff experienced by the Barraclough and colleagues: matching pennies game monkey and computer, respectively, for each possible combination of choices. In the inspection game, ‘i’ defines a cost to the computer for choosing (‘inspecting’) the red (‘risky’) target. By manipulating this cost across blocks, the mixed strategy predicted by the Nash equilibrium can be changed. Panel a adapted, with permission, from REF. 76 © (2004) Macmillan Publishers Ltd. Panel b adapted, with permission, from REF. 78 © (2004) Elsevier Science. • high, whether or not that target is ultimately chosen. This effect of fractional income is independent of any trial-to-trial variation in the fine details of the monkey’s eye movements. Importantly, these signals are apparent to us only because our behavioural model affords access to the animal’s underlying value transformation, which is local in time. Our previous attempts to detect value signals in LIP on the basis of global behavioural changes between blocks were marginally successful at best. Like Barraclough and colleagues, we owe our progress to an approach that combined valuation-based behavioural control, modelling of the proximal algorithm that generates individual choices and the neurophysiological study of the variables revealed by this model. The perspective of behavioural equilibria. Dorris and Glimcher78 base their experiments on a well-characterized competitive interaction known as ‘the inspection game’. The general structure of this task is similar to that used by Barraclough and colleagues: on every trial, the monkey and the computer each select one of two eye movement targets, and the outcome depends on their combined choices. However, in this task the payoff matrix that defines the relationship between choices and outcomes is more complex (FIG. 4b). The monkey Foredelay period Value-based decision making free-choice study 1: Value signals in frontal cortex • Reinforcement learning algorithms provide a general framework for finding optimal strategies in a dynamic environment. • Prefrontal cortex might have a key role in optimizing decision-making strategies. ability of choosing red will vary linwith the local fractional income from the unity line in Fig. 2E). Figure 2E s this to be approximately true for the vioral data. Second, because the model ictly probabilistic, it predicts that the ber of successive trials on which a r (monkey or model) will choose a n color before switching will be dised as the average of a family of exntials. Figure 2F plots these distribuof stay durations; not only is the The model provides us a window into the animal’s internal valuation of available options and gives us a metric—local fractional income—that allows us to estimate how the monkey values each of the two colors on every trial, even before it renders a decision. Equipped with this quantitative trial-by-trial measure, we are poised to explore how value is represented in the brain. The representation of fractional income in the parietal cortex. The lateral intraparietal (LIP) area of the posterior pari- Matching behavior in monkeys 1. Matching ber in monkeys. Leo P. Sugre, et al. 2004 The sequence of s of an oculomomatching task: (i) . To begin a run ials, the animal fixate the central (ii) Delay. Sactargets appear omized spatially olor) in opposite fields while the al maintains fixa(iii) Go. Dimming e fixation cross a saccadic ree and hold. (iv) n. Brightening of xation cross cues , target colors are rerandomized, he delay period of next trial begins. rd is delivered at ime of the re- Downloaded from www.sciencemag.org on April 20, 2010 Value-based decision making free-choice study 2: Value signals in parietal cortex (area LIP)-behavioral dynamics * Rewards are assigned to the two colors at rates that are independent and stochastic. * Once assigned, a reward remains available until the associated color is chosen. * “This persistence of assigned rewards means that the likelihood of being rewarded increases with the time since a color was last chosen....” and ensures that matching approximates the optimal probabilistic strategy in this task . Value-based decision making free-choice study 2: Value signals in parietal R Ecortex VIEWS (area LIP)- behavioral dynamics Dynamic matching behavior A local model of matching behavior CH ARTICLE a field termed the cell’s response Leaky integrators field (RF). Local income Local fractional income Probability of choice Neural response (spikes s–1) ww.sciencemag.org on April 20, 2010 Approximately one-third of the cells that we encountered in LIP met this criterion, including 33 neurons from the left hemiIred FIred PCred τ 000100001 sphere of Monkey G and 29 from the right hemisphere of Monkey F. Figure Reward 4A illustrates how we studied these 62 LIP neurons in the matching conhistories text. Critically, in this setting, trials that shared an identical visual stimulus configIgreen 1 0 0ended 0 1 1in0 0the same motor reuration0 and τ sponse still varied widely in the local fractional income of the chosen target. Thus, on some trials the monkey chose the target Into the RF inside the cell’s RF and this target had a Out of the RF b high d 50 1 fractional income, whereas on other c trials the fractional Model income was much lowModel 400 er. Our experimental Monkeysquestion was whether, Monkeys 40 within each category of motor response, activity in LIP is influenced by the local 300 30 fractional income of the chosen target. Figure 4B shows representative data from 200 the same cell featured in Fig. 3, now recorded 20 during performance of the matching task. For each trial, the cell’s mean delay-period response 100 10 is plotted against the local fractional income of the chosen target. Activity is shown separately for0trials that end in saccades into (blue) and out 0 0 of (green) the cell’s RF. We observed a positive 0 1 0 200 400 300 100 0 1 correlation between firing rate and fractional Local fractional income (RF target) Cumulative green choices Local fractional income (red) income for choices into the RF and a negative correlation for choices out of RF. The solid Figure 6 | A local model ofthe matching behaviour. a | A linear–nonlinear probabilistic model uses leaky integration over recent lines are regressionsto fitestimate to these two sets of data reward experience the local income due to each response option (Ired, Igreen). In a local formulation of Herrnstein’s by the method of least squares and are charac- Cumulative red choices el of dyhing beEquation restateclassical ing law, onal inractional d here in red tartic (botthat in ing, cume, I, is perfect of the wards up time, t . (28). If this suggestion is correct, and LIP is indeed an important locus for oculomotor decisions, then in a setting where eye movement decisions are informed by reward history and expectation, we anticipate the appropriate decision variable to be represented in LIP. Accordingly, the following physiological experiments test the prediction that in the matching task, neurons in LIP encode the local fractional income (Fig. 2B) of competing target colors. We selected for study LIP neurons that showed sustained, spatially selective activity in the context of a classical delayed saccade task (Fig. 3A). These neurons respond only when targets are presented within a circumscribed region of the visual Probability of choice (red) ontains activity appropriate for dic eye movements, signals that ariously interpreted as working visual targets, attention to salient ons, or motor planning (20–23). xt of more sophisticated eye asks, investigators have documodulation of LIP activity by the ensory evidence that supports a dgment (24–26) and by both the lity that a particular movement and the volume of juice associmovement (27). Such encoding n from diverse sources is a proty of brain areas responsible for utative decision variables that information to motor responses matching law, these estimates are used to compute the local fractional income of each option (that is, FI ), which directly Value-based decision making free-choice study 2: Value signals in parietal cortex (area LIP)-behavioral dynamics • Local fractional income is the ‘valuation variable’ that modulates LIP firing rates. • LIP neuron activity predicts the monkey’s eye movement responses, contribute to plan shifts in gaze or visual attention. en ratio. Mixed actions ity. The at multist each e desirhenever dicted is strategy subjecdividual ary. ctivity is y of sacsirability esponse sirability rnstein, ased on subsethat huy during , 1979). latt and Value-based decision making free-choice study 3: Value signals in parietal cortex (area LIP)-behavioral equilibria Posterior Parietal Cortex 367 Human vs. Human in 3 blocks Nash prediction second block of trials in the predicted 90% rat This had the effect of inspect option for the o his inspect rate. To quantify the influe cost of inspection var choosing the risky opti havior over the last ha had presumably reache found that the human s mixed strategies and th ing the risky option we opponents’ payoffs (Fig mixed strategies of a sponse is consistent w subjective desirability equal for both players 1950). Risky (subject) Inspect (opponent) Figure 2. Human versus Human Choice Behavior during Three Blocks of Inspection Game Trials Dorris Figure 1. The Mixed Strategy Inspection Game and Glimcher, 2004 (A) General form of the payoff matrix. The variables in the bottom left of each cell determine the subject’s payoffs, and the variables in the top right of each cell determine the opponent’s payoffs for The thick black line denotes the 20 trial running average of the percentage of the risky option chosen by the subject. The horizontal black lines denote the subject’s rate of choosing the risky option predicted at the Nash equilibrium. The thin gray line denotes the corresponding 20 trial running average of the percentage of the inspect option chosen by the opponent. The opponent’s predicted rate of choosing the inspect option at Nash equilibrium was 50% for all blocks of trials. The opponent’s costs of inspection were stepped sequentially from 0.5 to 0.9 to 0.3 across the three blocks Humans versus Comp Having quantified beh against human oppone dardized computer “op experiments. In brief, t Probabilit 0.5 Neural response (spikes s–1) Local Probability of choice (%) Normalized ne Neural response (spikes s–1) Probability of choice (%) Local therefo rates do Value-based decision making as evide free-choice study 3: Value signals in parietal desirab 0.23 0.11 0 Howeve 0 cortex (area LIP)-behavioral equilibria 0.5 0 1.0 0 0.5 1.0 result is Global value Global value to exer A player’s overall choice distribution d 50 100 c 100 100 value o should equalize the average payoff resulting from alternative action.(Global entirely Nash equilibrium) when t change 25 50 50 50 which t but the LIP encodes each alternative’s average payoff (which is a constant), rather then the and con probability of choosing that alternative 0 0 trials w 0 0 0 varies). 200 400 200 400 (which field tar Trial number Trial number This Figure 7 | Influence of global and local values onLIP monkey choices and lateral encodes an abstract representation of only to intraparietal area activity. a | Average normalizedthe neural response of 43 lateral stimuli apart from specific motor plan. respons intraparietal (LIP) neurons in the matching task as a function of both the global (abscissa) and local (ordinate) values of the response field target. b | Monkey’s probability of choosing which the response field target in the matching task as a function of both the global (abscissa) and high, an 0.5 Conclusion • New efforts to understand value-based decision making might bring together two areas of neuroscience that have traditionally existed in separate spheres — the study of perception and cognition, and the study of reward and motivation. • The study of value-based choice might be uniquely positioned to lay the foundations for this unified neurobiology of choice behavior.
© Copyright 2026 Paperzz