Representation of Choice and Reward Action Values in

Saturday evening, Poster III-40
Cosyne
Representation of Choice and Reward Action Values in DecisionMaking Neural Circuits
Namjung Huh and Min Whan Jung
Ajou University
We studied the effects of past choices and rewards in decision-making. Reinforcement learning paradigm
estimates action value that represents the amount of expected reward associated with a particular action
and this value determines the choice of future action in a probabilistic manner. Hence, the estimation of
action value is critical for understanding animal’s behavior and its relationship to neural activity. Previous
studies estimated action value based on past rewards (1, 2). However, it is unclear whether past rewards
are sufficient or other factors should be considered for accurate estimation of action value. A recent study
has shown that behavioral choices of animal are well predicted by a dynamic model that takes into
account not only past rewards but also past choices (3), suggesting that the effect of past choices must be
included in the estimation of action values.
In this work, we divided action value into two parts - choice and reward action values that quantify the
effects of past choices and rewards, respectively. Past choices are represented as the length of the same
choices (run-length). The choice action value for a given run-length can be estimated based on behavioral
data during steady state, in which mean reward contingency associated with each choice stays constant,
using a generalized linear model, because the reward action value term becomes constant in this condition
(3). Assuming animal’s behavior depends on the latest run-length, the coefficient for the latest run-length
becomes the inverse temperature (i.e., the degree of noise) in action selection. To estimate the reward
action value, we introduce a state-space model (softmax action selection rule) including the estimated
choice action value (4, 5). This model defines the reward action value with confidence intervals so as to
valuate animal’s discrimination ability in a statistical manner. This analysis reveals independent effects of
past choices and rewards, and explains which factor determines a particular behavioral choice and
whether the choice is exploitation (choosing larger action value) or exploration caused by noise. The
reward action value stabilizes as the animal discovers reward contingencies associated with each choice
and the choice action value varies according to past choices (run-lengths). Simulation results show that a
model including both effects estimates action values more accurately than a model neglecting past choices.
These results suggest that we must estimate two effects separately for accurate estimation of action value
regardless of the degree of learning. Choice and reward action values estimated in this work can be used
in classifying neurons for their roles in a reinforcement learning task and in modeling a neural circuit
related to decision making.
References
[1] Matching behavior and the representation of value in the parietal cortex. L. P. Sugrue, G. S. Corrado,
W. T. Newsome, Science, 304, 1782–1787, 2004.
[2] Representations of Action-Specific Reward Values in the Striatum. K. Samejima, Y. Ueda, K. Doya,
and M. Kimura, Science 310(25):1337-1340, November 2005.
[3] Dynamic Response-by-response Models of Matching Behavior in Rhesus Monkeys. B. Lau and P. W.
Glimcher, Journal of The Experimental Analysis of Behavior 84(3):555-579, November 2005.
[4] Dynamic Analysis of Learning in Behavioral Experiments. A. C. Smith, L. M. Frank, S. Wirth, M.
Yanike, D. Hu, Y. Kubota, A. M. Graybiel, W. A. Suzuki, and E. N. Brown, The Journal of Neuroscience
24(2):447-461, January 2004.
[5] Reinforcement Learning. R. S. Sutton and A. C. Barto, London: The MIT Press 2002.
242