Date of Draft: 7/1/03 Neuronal Studies of Decision Making in the Visual-Saccadic System Paul W. Glimcher and Michael Dorris Center for Neural Science, New York University, New York, NY 10003, USA Paul W. Glimcher, Ph.D. Center for Neural Science 4 Washington Place, 809 New York, NY 10013 Phone: 212-998-3904 FAX: 212-995-4011 [email protected] Glimcher, Paul W. / page 2 Introduction Making a behaviorally relevant decision of any kind involves selecting, and ultimately executing, a course of action. To accomplish this, organisms must combine available sensory data with stored information about the structure of the environment in a manner appropriate to the type of decision that they face. Over the last few decades neuroscientists have examined decision making by focusing on conditions in which sensory signals identify a single response as rewarded, or in which learned information about the probabilities and magnitudes of rewards associated with each possible action specify the best single response. These studies have yielded insights into the sensory-motor pathways and computational processes that underlie these forms of decision making, and the basic outlines of the circuits responsible for simple decision making are now beginning to emerge. We have, however, only just begun to study the kinds of decisions that are made when environmental conditions do not uniquely identify a best single response from amongst a set of alternatives. It is at present unclear how the neural architecture produces decisions under such free choice conditions. Some of the most promising research aimed at this problem has begun to employ analytic techniques developed in the social sciences, and these studies have begun to define a rigorous approach that can be used to study even the most complicated forms of decision making. While real theoretical and experimental challenges remain, these approaches are laying the biological foundations for studying one of the most elusive properties of mind, the neural basis of voluntary choice. Glimcher, Paul W. / page 3 Simple Decision Making: Identifying and Executing the Best Response Over the past decade studies in the primate visual-saccadic system, the brain network that uses visual data to guide the selection and execution of orienting eye movements, have made significant progress towards explaining the neurobiological basis of simple decision making (see Glimcher 2001; 2003a). Several sets of studies have, for example, succeeded in identifying the neuronal processes underlying the selection of a rewarded saccadic target from a number of unrewarded alternatives (Hikosaka et al. 2000; Newsome et al. 1995; Schall 2001). One line of this research has demonstrated that the extrastriate visual cortices play a critical role in stimulus analysis and that the outputs of these areas can be used to identify saccades that will yield rewards (Newsome et al. 1989; 1995). In the frontal cortices, another line of research has identified mechanisms that appear to initiate or withhold saccades in response to reward contingencies signaled by visual stimuli (Schall 2001). These experiments, together with others that have shown how the neural circuitry transforms sensory signals into coordinate frameworks appropriate for movement generation (Andersen et al. 2002; Colby et al. 1995; Colby and Goldberg 1999; Sparks and Mays 1990), have provided us with a preliminary understanding of how the nervous system selects courses of action based on sensory cues. In a similar way, neurobiological studies have also begun to describe the processes by which neuronal activity encodes variables that play an important role in guiding choice behavior but are not present in the immediate sensory environment (Hikosaka et al. 2000; Glimcher 2001; Gold and Shadlen 2001; 2002). Several lines of evidence have identified neuronal circuits that lie between sensory and motor brain regions that appear to encode the value of the behavioral responses available to an animal. Signals have been identified in Glimcher, Paul W. / page 4 parietal cortex and basal ganglia, for example, which encode either the amount of reward that a movement will produce or the likelihood that a movement will produce a reward (Kawagoe et al. 1998; Platt and Glimcher 1999; Handel and Glimcher 2000). There are, however, classes of behavior that these studies have failed to engage, behaviors in which a single most valuable response is not fully specified by the information available in the stimulus or environment. It is not yet clear how the neural architecture accomplishes movement selection under these free choice conditions. One problem faced by these inquiries is that traditional physiological conceptualizations of the sensory-to-motor process offer very few tools for describing such free choice behavior. This has recently led a group of physiologists to turn to social scientific theories of decision making, which provide a powerful corpus of mathematical techniques specifically designed for the study of these classes of behavior. Economic Models of Decision Making A central goal of the social sciences has been to define the decision making process in general. Economic models, in particular, have been quite successful in formally describing simple decision making for over a century (e.g., Kreps 1990). It is only recently, however, that these social scientists have developed tools for characterizing decision making under conditions in which subjects are free to make any of several responses that have incomppletely specified values. Of particular interest to economists, in this regard, are situations in which humans interact with other decision makers whose behavior is unpredictable (e.g., Fudenberg and Tirole 1991). Consider two opponents repeatedly playing the childhood game of rock-paper-scissors. In each round, both players must simultaneously choose either rock, paper or scissors; paper beats rock, scissors beats paper, and rock beats Glimcher, Paul W. / page 5 scissors. The responses of the players are not constrained because no response is uniquely correct. Without knowing in advance exactly how one’s opponent will behave, a subject cannot produce a fixed single strategy that will always yield a maximal reward under a given set of conditions. The economic theory of games approaches the formal study of this type of behavior by assuming that all players desire strategies that will maximize their gains given the assumption that other players seek to do the same. Thus, when faced with a the opportunity to make a decision, players are assumed to consider the sensory and environmental cues that might influence the values of the options available to them, and then to adopt a behavioral strategy that combines this information with a strategic consideration of their opponent’s likely behavior. Economists refer to strategies of this type as rational. If two humans playing rock-paper-scissors behave rationally, they each settle on the strategy of choosing each possible action roughly one-third of the time. Quantifying the Value of a Strategy In games like rock-paper-scissors a stable behavioral strategy arises when the average subjective value of each available option, rock, or paper or scissors, is rendered equivalent by the behavior of one’s opponent. As long as one’s opponent is equally likely to choose rock or paper or scissors, then choosing any response has an equal probability of winning, and hence an equal subjective value. Economists employ two related but distinct measures to estimate the value of any course of action. The first is an objective measure, known as expected value, which is determined by multiplying the gain that could be realized from an action by the probability that the gain would be realized. The second is a subjective measure, expected utility, computed by adjusting the expected value to reflect subjective Glimcher, Paul W. / page 6 considerations, typically an aversion to risky courses of action. In practice, economists presume that it is this second measure which guides choice. The rationale for the first of these measures derives from the work of Blaise Pascal (Arnauld and Nicole 1662/1994; Pascal, 1670/1966). If one chooses rock, there is a 50% chance of winning one dollar and a 50% chance of losing one dollar (assuming that if the other player also picks rock the game is repeated). Therefore, over many repeated plays the average value, or expected value, of rock is 0 cents. Behavioral studies (Bernoulli, 1738/1954; Stephens and Krebs, 1986; Kreps, 1990) have, however, demonstrated that in many situations humans and animals reliably select courses of action that do not yield the maximal expected value, particularly when the option yielding maximal expected value involves significant risk. Under these conditions subjective and objective measures of value can be shown to differ empirically. Consider choosing between two actions, one which offers a 100% chance of earning $250,000 and a second which offers a 50% chance of earning $500,000 and a 50% chance of earning nothing. Both actions have equal expected values ($250,000), but most humans do not view them as equally desirable, preferring the certain gain of $250,000. Most humans, however, do find a 50% chance of winning $8,000,000 preferable to a guaranteed $250,000. The subjective value, or utility, of $500,000 is thus less than twice the subjective value of $250,000 for most decision makers, whereas the subjective value of $8,000,000 is significantly more than twice the subjective value of $250,000. The subjective estimate of average value , or expected utility, is presumed to reflect, amongst other things, a natural aversion to risk by human and animal decision makers. Thus a decision maker’s utility function, which can vary with his internal state, provides a means for Glimcher, Paul W. / page 7 combining sensory data and a representation of environmental uncertainty in a manner that encapsulates subjective preference. Importantly, in tasks of the kind used most extensively by neuroscientists to studying decision making, both the probability and values of all possible rewards are fully specified by the experimental paradigm. Under these conditions the probability and value of any reward can be viewed as fixed, if imperfectly known, quantities from which expected utility can be computed. During strategic interactions with an intelligent opponent, however, a new type of uncertainty enters the decision making process. The opponent may at any time alter the probability that he will produce a particular response, making expected utility more fundamentally uncertain and much more difficult to calculate on a trial-by-trial basis. While acknowledging this difficulty, the mathematician John Nash developed a powerful approach to the problem of computing expected utility during strategic interactions. Nash (1950) proved that whenever all the players engaged in a strategic interaction behave rationally, average behavior must converge to an equilibrium state at which the relative expected utilities of available courses of action can often be specified. Nash's approach abandoned any attempt to describe the trial-by-trial dynamics of strategic decision making and worked instead to at least descibe the average, or molar, behavior of rational players. While not all strategic behavior is perfectly predicted by the mathematical formalisms that Nash and later theorists developed, under many conditions these theories do define rational decision making when that process involves an assessment of the unpredictable actions of one's opponents. Both empirical and theoretical studies have built on this foundation to show that game theory can be used both to describe the variables that must guide strategic behavior and to rigorously analyze the properties of empirically observed Glimcher, Paul W. / page 8 human voluntary actions. These observations suggest that approaches to the study of free choice behavior rooted in economic theory may ultimately provide the theoretical leverage necessary for a rigorous neurobiological study of unconstrained decision making. Glimcher, Paul W. / page 9 Behavioral and Physiological Studies of Unconstrained Choice Together, these observations led us to ask whether game theory could be used to develop an animal model for examining how the economic variables that should guide free choice toward behavioral equilibrium in strategic interactions might be represented in the primate nervous system. The larger goal of this approach was to examine the neurobiological substrate for decision making under conditions that begin to approximate human voluntary choice behavior. Our goal was to develop a behavioral task that i)!engaged humans in what could be considered voluntary decision making, ii) could be well described by game theory, and iii)!could also be employed in a neurophysiological setting with nonhuman primates. To this end, we had both human and monkey subjects play the inspection game. In this game, two players must each select one of two possible actions and the payoffs they receive on each trial depend on both their own choice and that of their opponent (Fig. 1A). The experimental subject played the role of the employee and decided either to work, which resulted in a guaranteed payoff of one unit of reward, or to shirk, which resulted in either a reward twice that size or in no reward at all, depending on the action of the employer. The role of the employer was played by either another human or a dynamic computer algorithm that tracked the employee’s behavior and tried to maximize its own virtual reward. The employer decided whether to inspect or no inspect on each trial and the utility of this action depended on the behavior of employee. Like rock-paper-scissors, when this game is played repeatedly, rational players should converge on an equilibrium solution in which each response is produced a certain proportion Glimcher, Paul W. / page 10 of the time. However, unlike rock-paper-scissors as described above, the proportion of choosing each response at equilibrium need not be always fixed at a single value but can be manipulated experimentally. Somewhat counter-intuitively, the proportion of choices that the employee should devote to each response at equilibrium is controlled, not by changing the employee’s payoffs, but by changing those of the employer (Fudenberg and Tirole, 1991; Glimcher, 2003b). This reflects the fact that altering the employer payoff changes the utility of the options available to the employer and thus changes employer behavior, a change for which the employee ultimately compensates. The employee uses his own behavior as a lever, driving the employer back towards the equilibrium state. By holding the payoff structure for the employee constant, we can therefore insure the employer's rational strategy will always be to inspect 50% of the time (Figure 1A) while systematically varying the rational strategy for the employee. In the inspection game the employee faces a task in which the payoffs associated with each action remain constant while the proportion of responses that should be devoted to each action varies whenever we manipulate the cost of inspection to the employer. In games like this, trial-by-trial uncertainty derives, for both players, from incomplete knowledge of the future actions of one’s opponent. The economic analysis presumes that rational decision makers will choose the option with the highest expected utility, but on a trial-by-trial basis there seems no obvious way for the choosers to compute this parameter. The equilibrium approach addresses this problem more globally by presuming that if both subjects act rationally, a stable average rate of working and shirking will be reached when on average the expected utilities for working and shirking are driven towards equality over many trials by the dynamic behavior of one’s opponent1. 1 Thus at Nash equilibrium for the employee: EU(Shirk) = EU(Work) (1) Glimcher, Paul W. / page 11 Studies of Human and Monkey Behavior Across blocks of trials we varied the employer’s cost of inspection from 0.1 – 0.9 in steps of 0.2, and according to the Nash formulation this should have had the effect of varying the probability that the employee would shirk from roughly 10% to 90% in 20% steps. Humans competed in the inspection game for real monetary rewards, which were delivered at the end of the experiment, and in a typical session a subject would compete 300 times over about 30 minutes. Figure 1B shows a 20-trial running average of the typical behavior of a human employee playing a computer employer during two sequentially presented blocks of trials. The Nash equilibrium predicts a 70% shirk rate in the first block of trials (payoff matrix in Figure 1A middle panel) and a 30% shirk rate in the second block of trials (Figure 1A right panel). Although both players freely chose either of two actions on every trial, we found that the overall behavior of our human subjects was well predicted by these Nash equilibriums (gray lines)2. During the last half of each block, once subjects had reached a stable strategy, we determined the average shirk rate produced in response to changes in employer inspection which given the payoff matrix (Fig. 1A, left panel) expands to p(Inspect)*0 + (1-p(Inspect))*W = p(Inspect)*(W-C) + (1-p(Inspect))*(W-C) (2) solving for p(Inspect) p(Inspect) = C/W (3) where EU(Shirk) is the expected utility for choosing to shirk, EU(Work) is the expected utility for choosing to work, p(Inspect) is the probability of the employer inspecting and 1-p(Inspect) is the probability of the employer not inspecting when at equilibrium, W is the wage paid by the employer to the employee, and C is the cost of work to the employee. Similarly, at Nash equilibrium the expected utility for inspecting is equal to the expected utility for not inspecting for the employer. Solving for p(Shirk) p(Shirk)=I/W (4) where p(Shirk) is the probability of the employee shirking when at equilibrium and I is the cost of inspection. Unfortunately, using these equations to predict the behavior of rational players with precision requires knowledge of the subjective functions that relate value to utility. The equilibrium points occur when expected utilities are precisely equivalent, even though it is objective value that is most easily measured by an experimenter. For the purposes of the computations presented here we assume a linear utility function in the subsequent analysis. Although this would be expected to produce small metrical errors in our computations, it should not have any effect on the ordinal representations we compute, which form the core of this presentation. Glimcher, Paul W. / page 12 costs and plotted this against the shirk rate predicted at equilibrium (Figure 1C). We found that the responses of humans generally tracked the theoretical shirk rate but tended to overshirk at the lowest predicted rates, a phenomenon that may reflect a sampling strategy intended to maximize the accuracy with which employees estimate the rate at which their employer inspects. We then trained monkeys to play a version of the inspection game against our computer employer and assessed whether their behavior was comparable to that of humans. In these experiments, thirsty monkeys competed for a water reward and indicated their choices on each trial with a saccadic eye movement directed to one of two eccentric visual targets. On all trials, a red shirk target appeared in the center of the neuronal response field (Gnadt and Andersen 1988; Platt and Glimcher 1998) and a green work target appeared opposite the neuronal response field. Despite the difference in species and response modality, monkeys tracked the Nash equilibrium solutions (Figure 1D and 1E) and deviated from those solutions when shirking rates of 30% or less were efficient strategies, just like humans playing the inspection game. Studies of Neuronal Activity We studied lateral intraparietal (area LIP) neurons with a mixture of inspection game trials and instructed trials. In instructed trials, after an initial delay the color of the fixation stimulus changed from yellow to either red or green with equal probability. The monkey was rewarded for making a saccade to the eccentric target (work target or shirk target) that matched the color of the fixation stimulus. By examining the same neurons with blocks of both instructed trials and inspection game trials, we were able to examine LIP neurons both inside and outside the context of a strategic game. 2 Given the assumptions about the relationship of value and utility stated in the preceding footnote. Glimcher, Paul W. / page 13 Figure 2A examines the relationship between expected utility, behavior, and firing rate of a single LIP neuron. A great deal of work has suggested that the responses of these neurons reflect the intention to make an eye movement (Andersen et al. 2002) or the saliency of stimuli (Colby and Goldberg 1999; Kusonoki et al. 2000; Gottlieb 2002). Here we tested whether these neurons are in fact sensitive to the expected utility of movements or movement targets. For the remainder of this analysis, we restrict our discussion to trials that ended with a movement towards the target in the response field, trials on which all sensory stimuli and movements were essentially identical. This control insures that any changes in neuronal activity were unlikely to result from differences in aspects of sensory or motor processing but instead reflected differences in the decision making process itself. The lower axis of Figure 2A plots the trial numbers during which 6 sequential blocks of trials were presented. In the first block, only instructed trials were presented, in which a visual cue specified what movement would be reinforced. For this block a movement to the shirk target was reinforced with twice as much water as a movement to the work target (0.5 ml vs. 0.25 ml). The second block also presented instructed trials, but this time the rewards were reversed such that a movement to the shirk target yielded half as much juice as a movement to the other target. Blocks 3-6 presented game theoretic inspection trials in which the monkey was free to select any response and in which dynamic interactions of the two players should have maintained an expected utility for the two movements near equivalence. (During these trials working yielded 0.25 ml of water while shirking yielded either 0.5 or 0 ml of fluid.) The solid gray lines plot the trial-to-trial probability of the shirk target being the rewarded target during the first two instructed blocks followed by the Nash equilibrium response strategies during the four free choice inspection trial blocks. At a purely behavioral level, the animal seemed to Glimcher, Paul W. / page 14 closely approximate the rational response strategies predicted by theory. Initially the probability of looking at the shirk target was fixed at 50% during the instructed blocks, and then shifted dynamically to each of the Nash equilibrium strategies in the subsequent 4 inspection trial blocks. The dots plot the running average of neuronal firing rate during the visual epoch, a period shortly after target onset on each of these shirk trials. Note that when the expected utility of the shirk target is high in the first block, firing rate is high. When the expected utility is low in the second block, firing rate is low. Finally, when the expected utility is assumed to be at equivalence, according to the Nash formulation, the firing rate is at a fairly constant and intermediate level. This is the specific result that would be expected if LIP neurons encode the expected utility of movements into their response fields. The above result suggests that the activity of this LIP neuron is modulated by the expected utilities of the available courses of action. To assess whether this was consistently true across our neuronal sample we performed a similar analysis on the activity of our sample of neurons. Once again we only analyzed those trials in which the monkeys were either instructed (Figure 2B - instructed task) or freely chose (Figure 2C – inspection game) to look at the shirk target which was placed inside the response field. Twenty neurons were tested in two blocks of the instructed task with a high and low level of expected utility associated with the shirk response, as in the first 2 blocks of Figure 2A. Average neuronal activity was high when the expected utility associated with the shirk target was larger than the expected utility of the work target (Figure 2B - black line). This average neuronal activity was low when the expected utility associated with the shirk target was smaller than the expected utility of the work target (Figure 2B - gray line). Forty-one neurons were tested in 5 blocks of trials of the inspection game (Figure 2C) in which the strategy at Nash equilibrium ranged from Glimcher, Paul W. / page 15 responding from 10% (lightest gray line) to 90% (darkest line) of the time in the neuronal response field. Of these 41 neurons, 13 were also tested in the instructed task described above. As discussed previously, at equilibrium the expected utilities are roughly equal between the two targets regardless of the actual proportion of responses devoted to the target in the neuronal response field. Correspondingly, we found that the average neuronal activity remained unchanged as indicated by the superimposed post-stimulus time histograms that plot the average population firing rate across different Nash equilibrium blocks. Dissociating decision variables In a subset of 20 neurons we also examined the effects of reversing the locations of the work and shirk targets during 50% Nash equilibrium blocks of the inspection game each of which was about 100 trials long (Figure 3A). This changed both the probability and magnitude of reward associated with the target in the neuronal response field while the relative expected utility remained constant. Firing rates should differ across blocks if they reflect either probability of reward or magnitude of reward alone but they should remain constant if they reflect expected utility. In fact, the firing rates did not change which bolsters the hypothesis that LIP firing rates encode the expected utility of choices. Encoding relative versus absolute expected utility The preceding results suggest that neurons in LIP encode expected utility, the product of probability and the subjective value of reward. It is not clear from these observations, however, whether the firing rates of LIP neurons encode expected utility for the movement in the response field or the relative expected utilities of all available options. A number of authors have suggested that when humans and animals make decisions they consider the Glimcher, Paul W. / page 16 relative expected utility of each available action rather than considering the absolute expected utility of each action (Flaherty 1996; Herrnstein 1997). In order to test the hypothesis that LIP neurons encode the relative expected utility of movements rather than the absolute expected utility of movements, we examined 18 neurons while monkeys completed a block of about 100 trials in which the magnitude of both working and shirking rewards was doubled. If LIP activity is sensitive to absolute expected utility it should increase when the rewards are doubled. If, however, LIP activity is only sensitive to relative expected utility, then the firing rate should be the same for both blocks of trials. As Figure 3B shows there is no change in the firing rate of LIP neurons when absolute reward magnitude is doubled. This suggests that LIP neurons encode the relative expected utility of movements. Relative Expected Utility versus Relative Expected Value Throughout this discussion we have assumed that the expected utilities of the monkeys’ actions are reasonably approximated by the expected values of those actions. Although this may be reasonable, we did not test this assumption. It is critical to remember that this lack of information renders a direct quantitative comparison between the instructed task data and the inspection game data impossible. In the inspection game, according to the Nash equilibrium prediction, the expected utilities of the two available movements are roughly equal. However, in the instructed task we have no direct measure that would allow us to determine the expected utility of each response. Instead we can only compute the expected value of each response from the actual juice volumes and probabilities we employed and then presume that the subjective values of these responses approximate that objective measure. While it is probably reasonable to assume that the utility of juice is close to the value of juice in the range of volumes and at the range of animal satiety selected for Glimcher, Paul W. / page 17 these experiments, the inability to directly compare these two experiments highlights an outstanding issue in most neurobiological studies of decision making. The underlying utility functions, on which choices in decision making experiments are based, are rarely measured. Instead, experimentalists report expected values, or closely related quantities. One exception is work by Gallistel and colleagues, who have used elegant techniques based on Herrnstein’s (1997) Matching Law to directly measure the utility of electrical stimulation of the medial forebrain bundle in rats (reviewed in Gallistel, 1994). It seems clear that similar techniques could also be used to quantify expected utility during decision making in other species. Future studies of decision making will have to begin to include direct measurements of utility. Summary One of the problems that some neurobiological studies of decision making have faced is the absence of a theoretical framework for describing the computational process involved in generating free choice behavior. This has been evident in studies of voluntary behavior where the relationship between events in the outside world and internally generated decisions often appears unpredictable. Social scientists working in economics and psychology have, however, developed a theoretical corpus for describing choice behavior both when it is predictable on a trial-by-trial basis and when it is predictable only on an average, or molar, level. The data discussed above suggest that we can begin to use game theoretic approaches to examine the control of free choice behavior at the level of single neurons. Recently, a number of other closely related techniques have also been used to achieve this same goal in other electrophysiological studies. For example, Coe and colleagues (2002) used a dynamic, Glimcher, Paul W. / page 18 free choice task to show that the activity of neurons in the frontal and supplementary eye fields and area LIP predicted the choice a monkey would make well before the movement was executed. Researchers using functional magnetic resonance imaging are also beginning to adopt closely related approaches. In one experiment, Montague and Berns (2002) used a free choice task to divide human subjects into two groups, risky and conservative, depending on their willingness to accept negative payoffs. They were able to show that the nucleus accumbens was differentially active in the risky and conservative subjects. These examples present a small sample of the growing number of studies that are beginning to use economic-style models and techniques for studying voluntary choice behavior (Breiter et al. 2001; McCabe et al. 2001; Montague and Berns 2002; Montague et al. 2002; Sugrue et al., 2001). Together, these parallel lines of inquiry suggest a growing synthesis of social scientific and neuroscientific approaches that are beginning to define the outlines of the neural system for unconstrained decision making. Glimcher, Paul W. / page 19 Implications for the Neural Basis of Unconstrained Choice These experiments suggest that we can begin to use theoretical approaches from the social sciences to examine the macroscopic pattern of individual free choice behavior at a neurobiological level, but they tell us very little about the trial-by-trial process from which this aggregate behavior emerges. The inability of equilibrium formulations like these to describe the dynamics of choice behavior is, however, hardly unique to neurophysiology. Almost since the inception of equilibrium models, psychologists and economists have been developing alternative frameworks that seek to complement the equilibrium approach with explainations of how free choice behaviors are generated at a trial-by-trial level. Unfortunately, even for the social scientists who have devoted significant resources to achieving this goal, it is not yet possible to accurately predict whether a subject will select rock, or paper, or scissors on the next round of play (see for example Bush and Mosteller 1955; Luce 1959; Herrnstein 1997; Erev and Roth 1998; Fudenberg and Levine 1998; Dragoi and Staddon 1999; Camerer 2003; McKelvey and Palfrey; 1998). Does behavior produced under these conditions defy trial-by-trial prediction because we still lack adequate models for describing these processes or are some classes of behavior truly and irreducibly unpredictable, defying trial-by-trial prediction in principle? Traditionally that has been a difficult question to answer, but neurobiologists may now be able to engage this issue in a novel way. It may now be possible to ask whether behavior can be driven by irreducibly stochastic processes that operate at the neuronal or subneuronal level. We may now be able to determine whether the apparent unpredictability with which a subject chooses to play rock, reflects the action of a fundamentally stochastic underlying process. Glimcher, Paul W. / page 20 Randomness at the Neuronal and SubNeuronal Level One known source of stochasticity at the neuronal level is the mechanism by which synaptic inputs give rise to action potentials in cortical neurons. Abundant evidence indicates that when cortical neurons are repeatedly activated by precisely the same stimulus, the neurons do not deterministically generate action potentials in precisely the same pattern. Instead, the pattern of stimulation delivered to cortical neurons appears to determine only the average firing rates of those neurons, the instant-by-instant dynamics of action potential generation are highly variable and appear to defy precise prediction (Tolhurst et al., 1981; Dean, 1981). The available data suggests that this moment-by-moment variation, the overall variance in cortical firing rate, is related to mean firing rate by a roughly fixed constant of proportionality that has a value near 1.07 over a very broad range of mean rates (Tolhurst et al., 1981; Dean, 1981; Zohary, et al., 1994; Lee et al., 1998) and this seems to be true of essentially all cortical areas that have been examined including parietal cortex (Lee et al., 1998). This has led to the suggestion that action potential production can be described as something like a stochastic Poisson process3, a truly probabalistic operation for which the stimulus specifies an average rate but which generates action potentials in a fundamentally stochastic manner. More recently, there have even been several efforts to identify the biophysical source of this Poisson-like stochasticity. Mainen and Sejnowski (1995), for example, sought to determine whether the process of action potential generation in the cell body was the source of this stochasticity. Their work led to the conculsion that action potential generation is quite deterministically tied to membrane voltage, and thus that this process was not a source of intrinsic action potential variability. Subsequent studies have begun to suggest that it may Glimcher, Paul W. / page 21 instead be the process of synaptic transmission which imposes a stochastic pattern on cortical action potential production (For review see Stevens, 1994). We now know, for example, that presynaptic action potentials lead to post-synaptic depolarizations with surprisingly low probabilities in many cortical synapses, and that the sizes of the post-synaptic depolarizations that do occur can be quite variable. The actual pattern of instant-by-instant membrane voltage seems thus to be influenced by irreducible stochasticity at the level of the synapse, a stochasticity imposed by fluctuations in the amount of transmitter encapsulated by the kinetic processes that fill synaptic vesicles and by the dynamics of calcium diffusion, amongst other things. All of these data suggest that the precise pattern of activity in cortical neurons is stochastic. Exactly when an action potential is generated seems to depend on apparently random molecular and atomic-level processes. So the nervous system does seem to include stochastic elements at a very low level. The times at which action potentials occur seem to be fundamentally stochastic. What implications, if any, might this have for the generation of behavior? Randomness in Computational Systems One of the most influential studies of how this randomness in the activity of individual neurons might affect behavior is Shadlen and colleagues’ (1996) landmark model of visual-saccadic decision making. Their model sought to explain, at a computational level, a series of experiments (reviewed in Newsome et al. 1995) in which trained monkeys viewed a display of chaotically moving spots of light. On any given trial, a subset of the spots moved coherently in a single direction while the remaining spots moved randomly. The direction of this coherent motion indicated which of two possible saccadic eye movements would yield a reward and at the end of each trial animals were free to make a saccade. If they made the 3 To be more precise, the process of action potential generation appears to be a skewed Poisson process. Glimcher, Paul W. / page 22 correct movement, they then received that reward. Physiological data from those experiments indicated that the firing rates of single neurons in the middle temporal visual area (area MT) were correlated with the fraction of spots that moved coherently in a particular direction and thus with the movement produced by the subject at the end of the trial. Shadlen and colleagues (1996) found, however, that the combination of signals from as few as 50-100 of these area MT neurons could be used to identify the reinforced direction of motion with greater accuracy than was actually evidenced by the choice behavior of the monkeys. The trial-by-trial choices of the monkeys seemed to be slightly less accurate, or more unpredictable, than might be expected from an analysis of the area MT firing rates. To account for this finding Shadlen and colleagues proposed that the MT signal was, at a later stage in the neuronal architecture, corrupted by a noise source that effectively placed an upper limit on the efficiency with which the cortical signals could be combined during the moving spot task. Their model proposed that the cortical targets of MT neurons further randomized the behavior of the animals under the circumstances they had examined. From this, one might speculate that the physiological cost of more deterministically generating behavior from MT activity in later cortical areas may simply have been greater than the benefits which could have been accrued by the animal had the stochasticity of those later elements in the cortical architecture been reduced. To further explore this notion that computational elements may impose quite specific levels of unpredictability on behavior, consider a variant of Platt and Glimcher’s (1999) free choice experiments that we recently performed. In this behavioral experiment, monkeys once again chose to make one of two possible saccades and the expected utility of each movement was manipulated by varying both the magnitude and probability of fluid reward (associated Glimcher, Paul W. / page 23 with each movement) across blocks of trials. Figure 4 (black line) plots the behavior of a monkey performing this task. In an effort to examine the unpredictability of this relatively simple non-strategic behavior we modeled the monkey’s decisions on a trial-by-trial basis as an estimation process followed by a decision rule. In the estimation process, an exponentially weighted average of the recently obtained rewards was used to determine the expected utilities of each of the two possible movements. The time constant of this exponential weighted average, which determined how many previous trials influenced the current estimate of movement value, was left as a free parameter. The decision rule used a sigmoidal function to convert the difference in value of the two movements to a probability of choosing each movement. The slope of the sigmoid, which we refer to as the stochastic transfer function, was left as the second free parameter in this model. The steepness of the slope thus described the model’s sensitivity to the differences in the utility of the two possible movements. Put another way, the slope of the stochastic transfer function employed by the model quantified the level of trial-by-trial unpredictability evidenced by the monkey's decisions as a function of the relative utility of the two possible responses. (This is a variant of models that have been used extensively to describe choice behavior in animals and humans4. For examples see Luce 1959; Killeen 1981; Dow and Lea 1987; Shadlen et al., 1996; Egelman et al. 1998; Sugrue et al. 2001; 2002; Montague and Berns, 2002.) The grey line in Figure 4 plots the predictions of the model when both the time constant of the exponential weighted average and the stochastic transfer function were fit to the accompanying data. The model does a reasonable job of predicting the monkey’s choice behavior under these non-strategic conditions by employing these two free parameters. This 4 In Shadlen and colleagues' model, for example, the magnitude of the secondary noise source is essentially equivalent to the slope term for our stochastic transfer function. Glimcher, Paul W. / page 24 suggests that under these circumstances decisions may be based on a dynamic estimate of relative expected utility computed as a weighted average of recent reward history. But much more interesting is the observation that the slope of the sigmoid, which stochastically relates expected utility to behavior, is quite shallow. The model achieves the best possible prediction by incorporating a significant degree of randomness which would, in principle, defy trial-bytrial prediction. Even given a biophysical basis for neural stochasticity, and that successful models employ this stochasticity to generate behavior, should we actually believe that real animals are unpredictable because stochastic neural elements make them so or is it more realistic to assume that behavior is predictable and that with the appropriate model this predictability will become obvious? At one level this is a philosophical question but at another it can certainly be viewed as an evolutionary issue; could natural selection have preserved stochastic neural mechanisms that produce unpredictable behaviors if unpredictable behaviors yield greater evolutionary fitness? To begin to answer that question we need to be able to more quantitatively determine the costs and benefits of behavioral stochasticity to real animals. Assessing the Costs and Benefits of Randomness Consider once more the game of rock-paper-scissors. If one player uses a determinate strategy of playing rock, then scissors, then paper repeatedly in that order, their opponent could win every time by detecting this pattern and playing paper, then rock, then scissors. For this reason the production of any trial-by-trial pattern, no matter how subtle, puts a player at a potential disadvantage. This highlights the fact that an efficient mixed strategy equilibrium of the type Nash described does not simply require specific proportions of each choice, but Glimcher, Paul W. / page 25 also requires that dynamic process by which choices are allocated must be unpredictable. Unlike the Newsome (1995), Shadlen et al., (1996), and Platt and Glimcher (1999) experiments, under these specific conditions increasing the unpredictability of behavior would increase the gains achieved by a player. This seems a critical point because if under some conditions unpredictability is efficient, and we know of stochastic subneuronal mechanisms which could generate unpredictable behavior, then we might usefully begin to search for environmental conditions which call for specific levels of unpredictability. Under these conditions we might measure the difference between observed levels of unpredictability and efficient levels of unpredictability in order to begin to test the hypothesis that unpredictability is an evolved feature of behavior. Daeyeol Lee and his colleagues (Barraclough et al. 2002) have recently begun to examine this issue by studying the decisions of monkeys playing a game called matching pennies against a variable computer opponent. They found that the behavior of these monkeys did indeed depend on the properties of the computer opponent they faced. If the computer opponent was constructed with an ability to identify and exploit non-random patterns in the behavior of the monkeys, the animals produced behaviors which were more random. In contrast, if the computer opponent was only weakly able to detect patterns in the trial-by-trial dynamics of the monkeys’ behavior, then the animals adopted a less random strategy. These observations indicate that the level of trial-by-trial randomness produced by an animal can reflect the task that it faces; the level of randomness expressed by behavior may represent an adjustable process governed by an internal set of costs at each level of the neural architecture that we have not yet measured. Glimcher, Paul W. / page 26 Stochastic events occur at the subneuronal level. Models of behavior often must employ stochastic components if they are to simulate behavior accurately. In order to be efficient, some behaviors must be unpredictable. In sum, it seems that a number of elements point towards true randomness as an important feature of vertebrate behavior. What remains unclear, however, is how all of these processes are connected. How might largely fixed stochastic subneuronal processes give rise to variably random behavior? The answer to that critical last question in far from certain but there are some hints that we may be beginning to uncover at least one basic mechanism that could accomplish this linkage. Whether this mechanism actually serves to link neuronal stochasticity and behavioral unpredictability is still very unclear, but the existence of a mechanism of this general type within the primate neural architecture suggests that these linkages are at least possible. Linking the Stochasticity of Neurons and Behavior Shadlen and colleagues’ 1996 model demonstrated that relating neuronal firing rates to behavior required a knowledge of two critical parameters; the intrinsic variance in instantaneous firing rate evidenced by each cortical neuron (the Poisson-like variability of the action potential generation process) and the correlation in action potential patterns between the many neurons that participate in any neural computation (the inter-neuronal crosscorrelation). Shadlen and colleagues demonstrated that both of these properties contribute to the unpredictability evidenced by behavior. The variability in the firing rate of each neuron contributes to the unpredictability of behavior by producing an initial stochasticity in the neuronal architecture and the degree to which that stochasticity influences behavior depends on how tighly correlated are the firing patterns of the many neurons in a population. Glimcher, Paul W. / page 27 To make this insight clear consider a population of 1000 neurons all of which fire with the same mean rate and which have the same level of intrinsic variability, but are generating action potentials independently of each other. The members of such a population would be generating moment-by-moment patterns of action potentials that were completely uncorrelated; the only thing that they would share is a common underlying mean firing rate. Because of this independence, globally averaging the activity of all of these independent neurons would allow one to recover the underlying mean rate at any instant. A cortical target receiving diffuse inputs from these 1000 source neurons would therefore accurately and instantaneously have access to the underlying mean rate at which the population was firing, there would be nothing necessarily stochastic about the behavior of such a neuron. Consider, as an alternative, a circuit in which a population of 1000 neurons all still fire with the same mean rate and still have the same level of intrinsic variability, but in which each of the 1000 source neurons were tightly correlated in their activity patterns. Under these conditions, it is the stochastic and synchronous pattern of activity shared by all of the neurons in the population that is available to the target neuron at any moment, rather than the underlying mean rate. In a highly correlated system of this type the output at any moment is irreducibly stochastic. Of course these are just two extreme conditions along a continuum. Many levels of correlation between neurons are possible and each would provide the target with a slightly different level of access to the underlying mean rate, and a different level of instrinsic randomness. To address these issues of stochasticity in their original model, Shadlen and his colleagues (1996) were able to use available data to estimate both the intrinsic stochasticity of cortical neurons and the actual level of inter-neuronal correlation in area MT during the Glimcher, Paul W. / page 28 moving dot task they studied. A number of studies had shown that the intrinsic variance in the firing rates of cortical neurons, the cortical coefficient of variation, is largely fixed at 1.07 (Tolhurst, et al., 1981; Dean, 1981; Zohary, et al., 1994; Lee et al., 1999) and Zohary and colleagues (1994) had demonstrated that under the behavioral conditions being modeled, pairs of MT neurons that were close enough to be studied with he same electrode showed an inter-neuronal correlation of about 0.19. It was by using this number and knowledge of the unpredictability of the animal’s actual behavior that Shadlen and his colleagues were able to estimate the magnitude of the later randomizing element that they believed intervened between MT activity and the generation of behavior. More recently Parker and colleagues (2002; Dodd et al., 2001) examined the activity of this same population of MT neurons but in a different behavioral task that imposed different environmental contingencies. Like Zohary and colleagues, they were also able to record the activity of pairs of MT neurons and to determine both the coefficient of variation and the inter-neuronal correlation between these pairs under their behavioral conditions. They found that the coefficient of variation was essentially the same in their task but that the inter-neuronal correlation was quite different, a correlation coefficient of 0.44. At a behavioral level they also found that the stochastic firing rates of individual neurons were more tightly correlated with the stochastic behavior of their subjects than in the Zohary study (a choice probability of 0.67 rather than the 0.56 measured by Britten et al (1996)). In other words, MT neurons in the Parker task showed a higher level of inter-neuronal correlation and the behavior of the animals was more tightly coupled to the stochastic behavior of individual neurons. Just as one might have predicted, the level of observed inter-neuronal correlation in a single cortical area and the level of randomness in behavior appear to be related. Glimcher, Paul W. / page 29 Futhermore, the level of inter-neuronal correlation appears to be variable, dependent on the task which the animal is asked to perform. These may be very important results, not because the definitively explain how neurons and behavior are linked but because they demonstrate that such linkages are at least conceptually possible. We now know that in order to be efficient some behaviors must be unpredictable, and that the level of this unpredictability is, and should be, adjustable. We also know that there are intrinsic sources of stochasticity in the vertebrate nervous system. Evolution could, at least in principle, have yielded mechanisms that link these processes. Summary Over the last century social scientists have made significant progress towards describing the underlying computational processes that guide decision making. While their early successes focussed on predictable forms of decision making, more recent studies have examined the kinds of unpredictable decision making that occur under conditions like strategic interaction. The equilibrium approaches used in the social sciences to describe unpredictable decision making have, however, been unable to determine the ultimate source of the randomness evidenced in strategic behavior. Over the last decade neuroscientists have begun to employ many of the mathematical formulations developed by social scientists. Their rich set of computational mechanisms have proven to be powerful tools for understanding the neural architecture. The most recent studies of this architecture seem, however, to go beyond the insights available from the social sciences. These newest studies suggest that some irreducible level of randomness may be an essential feature of the vertebrate nervous system and may play a critical role in the generation of behavior. If the mechanism by which neuronal firing rates yield behavior can Glimcher, Paul W. / page 30 preserve a variable fraction of the neuronal stochasticity that we and others have observed, then the level of unpredictability expressed by behavior could be a reflection of this variable underlying physical process. Limitations imposed by that process could reflect an implicit cost function against which behavior is optimized. These observations may thererfore hint that the randomness captured in our neuroscientific models by elements like the stochastic transfer function may be the instantiation of an intrinsic stochasticity in the neurobiological architecture. Indeed, these observations may even suggest that the precise slope of the stochastic transfer function under a given set of environmental conditions represents some kind of adjustable neurophysiological process by which stochastic neuronal firing rates lead to the efficient generation of unpredictable behavior. Neuroscience may thus soon be able to provide a final answer to the social scientific question of whether some classes of behavior are truly and irreducibly unpredictable. Under some conditions behavior may well be irreducibly unpredictable and this ununpredictability may extend down to the molecular level at which synapses operate. Glimcher, Paul W. / page 31 Conclusions The ultimate goal of neurobiological studies of decision making is to explain human voluntary choice, a process often attributed to the agency of free will. When a real human employee faces a real human employer she must make a voluntary decision about whether to go to work or to stay at home and shirk. Many clear factors influence her decision; how recently and how often she has been inspected, how much she stands to gain by successfully shirking, and her own predispositions or biases. Were she, however, always to work and then shirk and then work and then shirk, alternating deterministically between these two actions, her behavior would seen less than voluntary. In large measure what makes the decision seem voluntary to an outside observer is that her response defies prediction on a decision-bydecision basis. Explaining the neurobiological source of that unpredictability will probably pose the greatest challenge for students of this process and will yield fundamental insights into the causal processes that underly human action. Glimcher, Paul W. / page 32 Acknowledgements The authors wish to thank Brian Lau for helpful discussions, thoughtful comments on earlier drafts of the manuscript, and for providing the model fit illustrated in Figure 4. We would also like to thank David Heeger, Hannah Bayer, Michael Platt, Daeyeol Lee, and Maggie Grantner for helpful discussions. This work was supported by the Klingenstein Foundation and the National Eye Institute. Glimcher, Paul W. / page 33 References Andersen, R. A. and C. A. Buneo (2002). Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25: 189-220. Arnauld, A. and P. Nicole (1994). Logic or the Art of Thinking. Cambridge, Cambridge University Press. Barraclough , D. J., M. L. Conroy, et al. (2002). Stochastic decision-making in a twoplayer competitive game. Society for Neuroscience Abstracts. 285.16. Britten, K.H., Newsome, W.T., Shadlen, M.N. Celebrini, S., and Movshon, J.A. (1996) A relationship between behavioral choice and the vsiual responses of neurons in macaque area MT. Vis. Neurosci. 13: 87-100. Bernoulli, D. (1954). Exposition on a new theory on the measurement of risk. Econometrica 22(1): 23-36. Breiter, H. C., I. Aharon, et al. (2001). Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30(2): 619-39. Bush, R. R. and F. Mosteller (1955). Stochastic Models for Learning. New York, Wiley. Camerer, C. F. (2003). Behavioral Game Theory: Experiments in Strategic Interaction. Princeton, Princeton University Press. Coe, B., K. Tomihara, et al. (2002). Visual and anticipatory bias in three cortical eye fields of the monkey during an adaptive decision-making task. J Neurosci 22(12): 508190. Colby, C. L., J. R. Duhamel, et al. (1995). Oculocentric spatial representation in parietal cortex. Cereb Cortex 5(5): 470-81. Colby, C. L. and M. E. Goldberg (1999). Space and attention in parietal cortex. Annu Rev Neurosci 22: 319-49. Dean, A.F. (1981), The variability of discharge of simple cells in the cat striate cortex. Exp. Brain Res. 44:437-40. Dodd, J. V., Krug, K., Cumming, B. G. and Parker, A. J. (2001) Perceptually bistable Glimcher, Paul W. / page 34 figures lead to high choice probabilities in cortical area MT. J. Neurophys. 21: 48094821. Dorris, M. C. and P. W. Glimcher. (2002) A neural correlate for the relative expected value of choices in the lateral intraparietal area. Soc. Neurosci. Abstr. 28: 280.6 Dow, S. M. and S. E. G. Lea (1987). Foraging in a changing environment: simulations in the operant laboratory. Quantitative Analyses of Behavior. M. L. Commons, A. Kacelnik and S. J. Shettleworth. Hillsdale, Lawrence Erlbaum Associates, Inc. VI. Dragoi, V. and J. E. Staddon (1999). The dynamics of operant conditioning. Psychol Rev 106(1): 20-61. Egelman, D. M., C. Person, et al. (1998). A computational role for dopamine delivery in human decision-making. J Cogn Neurosci 10(5): 623-30. Erev, I. and A. Roth (1998). Prediction how people play games: Reinforcement learning in games with unique strategy equilibrium. American Economic Review 88: 848-881. Flaherty, C. F. (1996). Incentive Relativity. New York, Cambridge University Press. Fudenberg, D. and D. K. Levine (1998). The Theory of Learning in Games. Cambridge, The MIT Press. Fudenberg, D. and J. Tirole (1991). Game Theory. Cambridge, The MIT Press. Gallistel, C. R. (1994). Foraging for brain stimulation: toward a neurobiology of computation. Cognition 50(1-3): 151-70. Glimcher, P. W. (2001). Making choices: the neurophysiology of visual-saccadic decision making. Trends Neurosci 24(11): 654-9. Glimcher, P. W. (2003a). Neural Correlates of Primate Decision-Making. Annual Reviews Neuroscience 26: in press. Glimcher, P. W. (2003b). Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics. Cambridge, The MIT Press. Gnadt, J. W. and R. A. Andersen (1988). Memory related motor planning activity in posterior parietal cortex of macaque. Exp Brain Res 70(1): 216-20. Glimcher, Paul W. / page 35 Gold, J. I. and M. N. Shadlen (2001). Neural computations that underlie decisions about sensory stimuli. Trends Cogn Sci 5(1): 10-16. Gottlieb, J. (2002). Parietal mechanisms of target representation. Curr Opin Neurobiol 12(2): 134-40. Handel, A. and P. W. Glimcher (2000). Contextual modulation of substantia nigra pars reticulata neurons. J Neurophysiol 83(5): 3042-8. Herrnstein, R. J. (1997). The Matching Law: Papers in Psychology and Economics, Harvard University Press. Hikosaka, O., Y. Takikawa, et al. (2000). Role of the basal ganglia in the control of purposive saccadic eye movements. Physiol Rev 80(3): 953-78. Kawagoe, R., Y. Takikawa, et al. (1998). Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1(5): 411-6. Killeen, P. R. (1981). Averaging theory. Recent developments in the quantification of steady-state operant behavior. C. M. Bradshaw, E. Szabadi and C. F. Lowe, Elsevier. Kreps, D. M. (1990). A Course in Microeconomic Theory. Princeton, Princeton University Press. Kusunoki, M., J. Gottlieb, et al. (2000). The lateral intraparietal area as a salience map: the representation of abrupt onset, stimulus motion, and task relevance. Vision Res 40(1012): 1459-68. Lee, D., N. L. Port, W. Kruse and A. P. Georgopoulos. (1998) Variability and correlated noise in the discharge of neurons in motor and parietal areas of the primate cortex. J Neurosci. 18(3):1161-70. Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. New York, John Wiley & Sons. Luce, R. D. and H. Raiffa (1957). Games and Decisions. New York, John Wiley & Sons. McCabe, K., D. Houser, et al. (2001). A functional imaging study of cooperation in twoperson reciprocal exchange. Proc Natl Acad Sci U S A 98(20): 11832-5. McKelvey, R. D. and T. R. Palfrey (1998) Quantal response equilibria in extensive form Glimcher, Paul W. / page 36 games. Experimental Economics. 1: 9-41. Miller, G. F. (1997). Protean primates: The evolution of adaptive unpredictability in competition and courtship. Machiavellian Intelligence II: Extensions and evaluations. Whiten and R. W. Byrne. Cambridge, Cambridge University Press: 312-340. Mainen, Z. F. and T. J. Sejnowski. (1995) Reliability of spike timing in neocortical neurons. Science. 268(5216):1503-6. Montague, P. R. and G. S. Berns (2002). Neural economics and the biological substrates of valuation. Neuron 36(2): 265-84. Montague, P. R., G. S. Berns, et al. (2002). Hyperscanning: simultaneous fMRI during linked social interactions. Neuroimage 16(4): 1159-64. Myers, J. L. (1976). Probability learning and sequence learning. Handbook of Learning and Cognitive Processes: Approaches to Human Learning and Motivation. W. K. Estes. Hillsdale, Lawrence Erlbaum. 3: 171-205. Nash, J. F. (1950). Equilibrium points in N-Person Games. PNAS 36: 48-49. Neuringer, A. (2002). Operant variability: evidence, functions, and theory. Psychon Bull Rev 9(4): 672-705. Newsome, W. T., K. H. Britten, et al. (1989). Neuronal correlates of a perceptual decision. Nature 341(6237): 52-4. Newsome, W. T., M. N. Shadlen, et al. (1995). Visual motion: linking neuronal activity to psychophysical performance. The Cognitive Neurosciences. M. S. Gazzaniga. Cambridge, The MIT Press. Parker, A.J., Krug, K. and Cumming, B.G. (2002) Neuronal activity and its links with the perception of multi-stable figures. Phil. Trans. R. Soc. Lond. B. 357: 1053-1062. Pascal, B. (1966). Pensees. New York, Penguin Books. Platt, M. L. and P. W. Glimcher (1998). Response fields of intraparietal neurons quantified with multiple saccadic targets. Exp Brain Res 121(1): 65-75. Platt, M. L. and P. W. Glimcher (1999). Neural correlates of decision variables in parietal cortex. Nature 400(6741): 233-8. Glimcher, Paul W. / page 37 Rapoport, A. and D. V. Budescu (1992). Generation of random binary series in strictly competitive games. Journal of Experimental Psychology: General 121: 352-364. Rapoport, A. and D. V. Budescu (1997). Randomization in individual choice behavior. Psychological Review 104: 603-617. Schall, J. D. (2001). Neural basis of deciding, choosing and acting. Nat Rev Neurosci 2(1): 33-42. Shadlen, M. N., K. H. Britten, et al. (1996). A computational analysis of the relationship between neuronal and behavioral responses to visual motion. J Neurosci 16(4): 1486-510. Sparks, D. L. and L. E. Mays (1990). Signal transformations required for the generation of saccadic eye movements. Annu Rev Neurosci 13: 309-36. Stevens, C. F. (1994) Neuronal communication. Cooperativity of unreliable neurons. Curr Biol. 4(3):268-9. Stephens, D. W. and J. R. Krebs (1986). Foraging theory. Princeton, N.J., Princeton University Press. Sugrue, L. P., W. T. Newsome, et al. (2001). Matching behavior in rhesus monkeys. Society for Neuroscience Abstracts. 59.3 Sugrue, L. P. and W. T. Newsome (2002). Neural correlates of experienced value in area LIP of the rhesus monkey. Society for Neuroscience Abstracts. 121.5 Tolhurst D.J., Movshon J.A., and Dean A.F. (1981)The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision. Res. 23: 775-785. Zohary, E., Shadlen, M.N. and Newsome, W.T. (1994) Correlated neuronal discharge and its implications for psychophysical performance. Nature. 370: 140-143. Glimcher, Paul W. / page 38 Figure Legends Figure 1. Behavior during the inspection game. A. (left panel) General form of the payoff matrix for the inspection game for both the experimental subject (employee) and their opponent (employer). The variables in the bottom left of each cell determine the employee’s payoffs and the variables in the top right of each cell determine the employer’s payoffs for each combination of player’s responses. V = value of hypothetical product to the employer, fixed at 4; W = wage paid by employer to employee, fixed at 2; C = cost of working to employee, fixed at 1; I = cost of inspection to the employer, varied from 0.1 to 0.9 in steps of 0.2. Middle and right panels show payoff matrices for 70% and 30% employee shirk rates. The predicted equilibrium strategy for the employer remains constant at a 50% inspect for all blocks of trials. 1 unit of payoff = 0.25mL of water for monkey = $0.05 for human. B. The behavior of an individual human subject playing the role of employee during two Nash equilibrium blocks of the inspection game. The jagged black line represents a running average of the shirk choices over the last 20 trials. The gray bars represent the predicted Nash equilibrium strategy. C. The average shirk rate (±sem) for human subjects calculated for the last half of each Nash equilibrium block. The proportion of shirking predicted at Nash equilibrium is denoted by the line of unity (black). Filled squares, human vs. human (N = 6 subjects); Filled circles, human vs. computer (N = 5 subjects). D. The same plot as (B) for an individual monkey subject. E. Same plot as (C) for monkey subjects. 29 blocks/point: 13 blocks from monkey 1, 16 blocks from monkey 2. Figure 2. Activity of LIP neurons during instructed and free choice tasks. A. Proportion of monkey’s choices devoted to the shirking and corresponding activity of a single LIP neuron. The monkey performed six successive blocks of trials; the first two were during the Glimcher, Paul W. / page 39 instructed task and the final four were during the free choice task with 4 different payoff matricies. During both blocks of the instructed task, the rate of shirking was fixed at 50% (gray bars). In the first block, the reward associated with the shirk target was twice as large as that associated with the work target (high expected utility (E.U.)) and in the second block, the rewards were switched such that the reward associated with the shirk target was half as much as that associated with the work target (low E.U.). During the 4 free choice blocks, the monkey’s shirk rate was near that predicted by the Nash equilibrium (gray bars), and the expected utility is assumed to be approximately equal (~equal E.U.) between movements for these blocks. The black line represents the running average of shirking over the last 20 trials. The black dots represent the running average of neuronal activity on shirk trials produced during the last 20 trials. This neuronal activity was sampled 50-350 ms after the visual stimuli were presented (see gray bars in B and C). B. The average post-stimulus time histograms (bin width 50 ms) for 20 neurons that were tested in the two blocks of the instructed task with different expected utilities in the response field as shown in A. The dark gray line represents the average activity during the high E.U. block and the light gray line represents the average activity during the low E.U. block. C. The average post-stimulus time histograms for 41 neurons that were tested in five blocks of the free choice task in which the Nash equilibrium strategy ranged from responding with a shirk rate of 10% (lightest line) to 90% (darkest line) in steps of 20%. A direct comparison of the figures in C and D is not possible because they describe separate populations of neurons. However, similar results were obtained for 13 neurons that were tested in both the free and forced choice task (not shown). Glimcher, Paul W. / page 40 Figure 3. Two additional experiments support the notion that LIP activity is correlated with relative expected utility. A. Switching work and shirk targets. Average neuronal activity in the standard inspection game when the shirk target was placed in the neuronal response field (black line) compared to a block of trials in which the work target was placed in the neuronal response field (gray line). In both blocks the Nash equilibrium strategy was to choose each response 50% of the time. Across blocks, the expected utility remained constant despite differences in the probability and magnitude of reward. B. Relative versus absolute expected utility. The monkeys performed two blocks with the shirk target in the neuronal response field. In one block, the magnitude of reward for the work trials was 1 unit and for the shirk trials 2 units (gray). In the other block, the absolute magnitudes of reward were doubled for both movements (black). Although the absolute expected utility in the neuronal response fields changed across blocks, the relative expected utility between the two choices was approximately equal (N = 18). Figure 4. Monkey free choice behavior on a variant of the Platt and Glimcher (1999) task. Monkey chose between two possible movements each of which provided a different magnitude and probability of fluid reward. Black line plots an 11-trial running average of the monkey’s choice behavior over 8 sequential blocks. Each block presented a different expected utility for each of the two movements. Block transitions were unsignalled. Grey line plots the trial-by-trial prediction of a reinforcement learning model that estimates the utilities of the two movements and employs a simple stochastic decision rule. See text for details. 70% Nash Equilibrium EMPLOYER Inspect No Inspect V-I-W W-C V-W W-C -I -W 0 W 1.3 1 2 1 -0.7 0 -2 2 1 2 1 -0.3 0 -2 2 90 Proportion of Shirking (%) 70 50 30 70 50 30 10 10 0 100 200 0 300 100 200 Trial Number Trial Number E Actual Proportion of Shirking (%) C Actual Proportion of Shirking (%) 1.7 Monkeys D 90 Proportion of Shirking (%) Shirk Work EMPLOYER Inspect No Inspect EMPLOYEE EMPLOYER Humans B 30% Nash Equilibrium Inspect No Inspect EMPLOYEE Shirk Work EMPLOYEE General Payoff Shirk Work A vs computer vs human 90 70 50 30 10 10 30 50 70 90 Normative Proportion of Shirking (%) 90 70 50 30 10 10 30 50 70 90 Normative Proportion of Shirking (%) Block Percent Shirk (%) 100 1 2 3 4 5 100 50 50 0 6 High Low ~Equal E.U. E.U. E.U. 200 400 Shirk Related Activity (sp/s) A 0 Trial Number C 100 50 0 0 1000 2000 Time from Target Presentation (ms) 100 Neuronal Activity (spikes/s) Neuronal Activity (spikes/s) B 50 0 0 1000 2000 Time from Target Presentation (ms) Neuronal Activity (sp/s) A 100 Target in Response Field Shirk Work 50 0 0 1000 2000 Time from Target Presentation (ms) Neuronal Activity (sp/s) B 100 Reward Magnitude in Response Field Double Regular 50 0 0 1000 2000 Time from Target Presentation (ms) 1 Proportion of Choice 1 0.8 0.6 0.4 0.2 0 100 200 300 Trial Number 400 500
© Copyright 2026 Paperzz