Saturday evening, Poster III-40 Cosyne Representation of Choice and Reward Action Values in DecisionMaking Neural Circuits Namjung Huh and Min Whan Jung Ajou University We studied the effects of past choices and rewards in decision-making. Reinforcement learning paradigm estimates action value that represents the amount of expected reward associated with a particular action and this value determines the choice of future action in a probabilistic manner. Hence, the estimation of action value is critical for understanding animal’s behavior and its relationship to neural activity. Previous studies estimated action value based on past rewards (1, 2). However, it is unclear whether past rewards are sufficient or other factors should be considered for accurate estimation of action value. A recent study has shown that behavioral choices of animal are well predicted by a dynamic model that takes into account not only past rewards but also past choices (3), suggesting that the effect of past choices must be included in the estimation of action values. In this work, we divided action value into two parts - choice and reward action values that quantify the effects of past choices and rewards, respectively. Past choices are represented as the length of the same choices (run-length). The choice action value for a given run-length can be estimated based on behavioral data during steady state, in which mean reward contingency associated with each choice stays constant, using a generalized linear model, because the reward action value term becomes constant in this condition (3). Assuming animal’s behavior depends on the latest run-length, the coefficient for the latest run-length becomes the inverse temperature (i.e., the degree of noise) in action selection. To estimate the reward action value, we introduce a state-space model (softmax action selection rule) including the estimated choice action value (4, 5). This model defines the reward action value with confidence intervals so as to valuate animal’s discrimination ability in a statistical manner. This analysis reveals independent effects of past choices and rewards, and explains which factor determines a particular behavioral choice and whether the choice is exploitation (choosing larger action value) or exploration caused by noise. The reward action value stabilizes as the animal discovers reward contingencies associated with each choice and the choice action value varies according to past choices (run-lengths). Simulation results show that a model including both effects estimates action values more accurately than a model neglecting past choices. These results suggest that we must estimate two effects separately for accurate estimation of action value regardless of the degree of learning. Choice and reward action values estimated in this work can be used in classifying neurons for their roles in a reinforcement learning task and in modeling a neural circuit related to decision making. References [1] Matching behavior and the representation of value in the parietal cortex. L. P. Sugrue, G. S. Corrado, W. T. Newsome, Science, 304, 1782–1787, 2004. [2] Representations of Action-Specific Reward Values in the Striatum. K. Samejima, Y. Ueda, K. Doya, and M. Kimura, Science 310(25):1337-1340, November 2005. [3] Dynamic Response-by-response Models of Matching Behavior in Rhesus Monkeys. B. Lau and P. W. Glimcher, Journal of The Experimental Analysis of Behavior 84(3):555-579, November 2005. [4] Dynamic Analysis of Learning in Behavioral Experiments. A. C. Smith, L. M. Frank, S. Wirth, M. Yanike, D. Hu, Y. Kubota, A. M. Graybiel, W. A. Suzuki, and E. N. Brown, The Journal of Neuroscience 24(2):447-461, January 2004. [5] Reinforcement Learning. R. S. Sutton and A. C. Barto, London: The MIT Press 2002. 242
© Copyright 2026 Paperzz