Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL. 0.8 B CON 0.7 Reward History Weight (β) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 i-1 Kennerley, et al., Nature Neuroscience, 2006 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 0.8 B CON ACCs 0.7 Reward History Weight (β) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 i-1 Kennerley et al. Nature Neuroscience, 2006 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 Monkeys will sacrifice food opportunities to look at other monkeys ACCG Rudebeck,et al. Science 2005 Interest in other individuals is reduced after ACC gyrus lesion ACCG Rudebeck,et al. Science 2005 Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the sulcus are • mainly with motor regions: • Primary motor cortex • Premotor cortex • Parietal motor areas • Spinal Cord ACCs has information about our own actions Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the gyrus are mainly with regions that process emotional and biological stimuli: • • • • • Periacqueductal grey hypothalamus STS/STG Insula/Temporal pole connections are stronger to the gyrus ACCg has access to information about other agents. Anatomy - shared connections between ACCs and ACCg. • Some shared connections • • • • • Orbitofrontal cortex Amydala Ventral striatum ACCg and ACCs are strongly interconnected Both regions have access to and influence over reward and value processing. ACC Sulcus and learning about your actions. 0.8 B CON ACCs 0.7 Reward History Weight (β) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 i-1 Kennerley et al. Nature Neuroscience, 2006 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 What determines the integration length? 0.8 CON 0.7 Reward History Weight (β) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 i-1 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005 VOLATILE STABLE Reward probabilities change Reward probabilities change approximately every 25 trials only after hundreds of trials 0.8 CON 0.7 Reward History Weight (β) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 i-1 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005 Reinforcement learning • We need to continually re-appraise the value of an action based each new experience. outcome prediction (Vt) new prediction x (Vt+1) Updating beliefs on the basis of new information Vt+1=Vt +( x The prediction error is the information available from this event The learning rate is the weight given to the current information 14 The learning rate and the value of information. Vt+1=Vt +( x The learning rate should represent the value of the current information for guiding future beliefs. Relationship with integration length =0.01 =0.1 =0.4 37 stable 63 Behrens et al., Nature Neuroscience, 2007 Vt+1=Vt+ x Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007 changes in reward estimates occur throughout the task… …as do change in volatility estimates Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007 Monitor x Volatility Decide Monitor Behrens et al., Nature Neuroscience, 2007 ACC effect size predicts learning rate across subjects Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007 ACC Gyrus and learning about your social partners. Interest in other individuals is reduced after ACC gyrus lesion ACCG Rudebeck et al. Science 2005 Rudebeck et al., Science, 2006 Learning about other agents 37 63 25 Behrens, Hunt, Woolrich, Rushworth Nature 2008 Sources of information Probability that correct colour is blue Probability that confederate advice is good Value of action information Value of social information Behrens, Hunt, Woolrich, Rushworth Nature 2008 Social information is integrated over time - behaviour Vt+1=Vt +( x Reward Prediction Error Reward - Expectation Effect size Outcome Time Behrens, Hunt, Woolrich, Rushworth Nature 2008 Vt+1=Vt +( x Prediction error on a social partner. Lie event -Lie prediction Effect size Outcome Time Behrens, Hunt, Woolrich, Rushworth Nature 2008 Vt+1=Vt +( x The value of information and the ACC Value of reward information Value of social information 30 Vt+1=Vt +( x Combining Information to drive behaviour Conclusions • ACC codes a learning signal when information is observed. • This signal predicts the speed of learning. • Learning from our own and others’ actions are processed in parallel in ACCs and ACCg. • The outputs of these parallel learning 32 Acknowledgments • Matthew Rushworth • Mark Woolrich • Laurence Hunt • Mark Walton 33
© Copyright 2026 Paperzz