Role of dopamine in natural reward

Role of dopamine in
Pavlovian reward conditioning
Sotiris Masmanidis
May 8 2017
Outline
1. Review of anatomy & physiology
2. Associative learning models & computational
properties of dopamine neurons
3. Behavioral role of dopamine neurons
2
Overview of dopamine function
Role in healthy brain:
1. Cognition
2. Control of movement
3. Reward-guided learning and behavior
Role in disease:
1. Cognition: Schizophrenia, ADHD
2. Movement: Parkinson
3. Reward: Addiction, Depression
Note: PubMed or DOI links to the cited papers
are provided in the presenter notes.
3
History of dopamine
2003
1990s-2000s
Marsden, 2006
Dopamine: the movie
Electrophysiological properties of dopamine system
Discovery of reward prediction error coding
Genetic tools to study dopaminergic function
1960s-80s
Discovery of dopamine receptors
Discovery of principal dopaminergic pathways
Implication of dopamine in reward / addiction
Model of basal ganglia function (direct/indirect)
1957-1960
Discovery of DA in human brain
Discovery of DA concentration in striatum
Discovery of DA depletion in Parkinson’s disease
Implication of dopamine in movement
1950
Compound name coined as dopamine
Synthesis of chlorpromazine (antipsychotic, D2R)
1910
First laboratory synthesis of
3,4-dihydroxyphenylethylamine
4
Anatomy of dopaminergic systems – Cell bodies
Main dopaminergic nuclei:
1. Ventral tegmental area (VTA)
2. Substantia nigra pars compacta (SNc)
SNc
VTA
0.5 mm
Tyrosine hydroxylase (TH)
TH: enzyme used in
dopamine synthesis
5
Anatomy of dopaminergic systems - Inputs
Dopaminergic neurons receive input from both external and local excitatory
and inhibitory sources.
It is thought that the combined effect of these inputs is what gives rise to the
reward processing properties of dopaminergic neurons.
Dopamine
Glutamate
GABA
mPFC:
VP:
LHb:
LHT:
medial prefrontal cortex
ventral pallidum
lateral habenula
lateral hypothalamus
Morales & Margolis, 2017
6
Anatomy of dopaminergic systems - Projections
Main dopaminergic pathways:
1. Mesolimbic: VTA to nucleus accumbens (reward)
2. Mesocortical: VTA to prefrontal cortex (cognition)
3. Nigrostriatal: SNc to dorsal striatum (movement)
4. Other tracts: VTA to amygdala and hippocampus
Mesocorticolimbic pathway (Russo & Nestler, 2013)
7
Anatomy of dopaminergic systems - Projections
Dopaminergic neurons densely project
to the striatum
1 mm
TH immuno-staining
8
Actions of dopamine on brain function
1. Acts on dopamine receptors (2 major categories).
•
•
D1 receptor-expressing neurons increase their excitability in
presence of DA.
D2 receptor-expressing neurons decrease their excitability in
presence of DA.
2. Modulates neuronal excitability.
• Basal ganglia output regulates cortical activity.
• Thought to be important for movement control.
3. Modulates neuronal plasticity.
• Thought to be important for reward learning.
9
Actions of dopamine on brain function - Excitability
Classical model of basal ganglia function:
1. Dopamine enhances direct pathway activity to promote movement.
2. Dopamine reduces indirect pathway activity to promote movement.
Parkinson’s disease: loss of DA suppresses movement by reducing
direct pathway and increasing indirect pathway activity.
Direct pathway: D1 receptor
Indirect pathway: D2 receptor
Albin & Penney, 1989.
Gerfen & Surmeier, 2011.
10
Actions of dopamine on brain function - Excitability
Dopamine has opposing effects on D1/D2 receptors
In addition, some dopamine neurons co-release other
neurotransmitters that regulate excitability:
•Glutamate (Stuber & Bonci 2010; Tecuapetla & Koos 2010)
•GABA (Tritsch & Sabatini 2016)
Gerfen, 2006
11
Actions of dopamine on brain function - Plasticity
Striatum:
1.
2.
3.
4.
5.
Striatum is an important site of plasticity in reward-based learning.
Dopamine modulates both LTP and LTD on corticostriatal synapses.
LTP is thought to increase efficiency of corticostriatal
communication.
Higher corticostriatal communication is associated with improved
motor task performance.
Corticostriatal communication increases with reward-based motor
learning.
Glutamatergic projections
to striatum:
cortex
thalamus
Reynolds & Wickens, 2001
Kreitzer & Malenka, 2008
Gerfen & Surmeier, 2011
Yin & Costa, 2009
Koralek & Carmena, 2012
amygdala
striatum
hippocampus
Dopaminergic projections
to striatum:
VTA/SNc
12
Dual mechanisms for striatal dopamine release
Dopamine release in the striatum is independently controlled by:
1. VTA/SNc dopaminergic neuron activity (the familiar route).
2. Cholinergic interneurons acting on nicotinic acetylcholine
receptors on dopaminergic axon terminals.
Threlfell & Cragg, 2012
Cachope & Cheer, 2012
Mamaligas & Ford, 2016
13
Summary of dopaminergic system actions
 Serves to modulate both neuronal excitability and
plasticity.
 Promotes movement and reinforcement by enhancing
direct pathway and suppressing indirect pathway of
basal ganglia.
 Dopaminergic neurons are not just squeeze bottles with
dopamine: multiple neurotransmitters & release
mechanisms.
 Although much has been discovered, our
understanding of how dopamine influences brain
activity is still incomplete, and an active area of
research.
14
Latest research trends – From cells to systems
 New viral tracing approaches make it possible to identify
the inputs and outputs of brain regions with
unprecedented specificity and detail.
 These anatomical maps allow us to make informed
hypotheses about the function of specific brain circuits.
15
Map data repositories: Allen Brain Projection Atlas
16
Associative learning: Pavlovian conditioning
 Pairing of a conditioned stimulus (CS) with an unconditioned
stimulus (US).
• CS: a sensory cue (tone, light, odor)
• US: a reinforcer (drop of water, juice, money, etc).
 After repeatedly paired trials, animals acquire a conditioned
response (CR) upon exposure to just the CS.
• CR: reward-anticipatory response (salivation, licking,
approach).
• The presence of a CR indicates that animals have learned the
CS-US association, and are predicting the reward.
 In extinction, the CS is no longer followed by a US, and the
CR is abolished.
17
Pavlovian vs operant conditioning – how distinct?
Pavlovian: stimuli allow animals to prepare for reward presented
irrespective of behavior.
Operant: stimuli elicit actions needed to obtain the reward.
There is a large amount of literature pointing to both similarities and
differences in the brain circuits mediating Pavlovian and operant learning
and behavior. In general:
•The associative learning process is thought to be mediated by dopamine in
both cases.
•However, the site of plasticity for performing Pavlovian and operant responses
is thought to often lie in different circuits (e.g., ventral and dorsal striatum).
•These distinctions can also be task-dependent (e.g., licking vs lever-pressing).
•Keep an open mind: Do not over-generalize claims that particular brain areas
do or don’t mediate Pavlovian or operant learning and behavior.
Further reading on Pavlovian conditioning:
Fanselow & Wassum, 2016
18
Example: Pavlovian trace conditioning in mice
•Trace conditioning: delay between CS and US.
•Delay conditioning: timing of CS and US overlap.
•CS: olfactory cue (amyl acetate).
•Reward (US): drop of sweetened milk.
•Food restriction to increase motivation.
Bakhurin & Masmanidis, 2016
19
Advantages of trial-based learning tasks
Most of what we know about computational properties of dopamine
neurons from the last 25 years is from trial-based learning tasks.
Can model how behavior will evolve over successive trials.
Can deliver precisely timed stimuli: thus can also model how behavior
will evolve as a function of time.
Can vary the dosage and probability of stimulus delivery.
Example of single-trial Pavlovian learning task: conditioned place preference
•Effective for studying the reinforcing properties of various compounds.
•Can vary dosage, but not timing or probability.
20
Rescorla-Wagner model of associative learning
 Used to model the strength of a CS-US association.
 Association strength is related to likelihood of executing a conditioned
response.
 Trial-based learning model (association strength gets updated on each
successive trial).
 Learning is driven by errors (discrepancy between predicted and actual
reward).
Model equation:
Vi+1 = Vi + αβ(λ-Vtot)
Vi+1
Vi
Vtot
α
β
λ
is the associative strength of a US to a specific CS on trial i+1.
is the associative strength of a US to a specific CS on trial i.
is the total associative strength of a US to all associated CS types.
is the salience (constant from 0 to 1).
is the learning rate (constant from 0 to 1).
is the maximum possible association strength possible to the US
(λ is related to the reward value).
21
Matlab tutorial: Rescorla-Wagner model
1. Open MatlabNewScript
2. Copy the script below in the script editorSave fileRun file
3. Alter the initial parameters and rerun the script.
%***Initialize parameters***
alpha=0.9; %salience (parameter from 0 to 1)
beta=0.1; %learning rate (parameter from 0 to 1)
lambda=100; %maximum possible association strength to US (depends on reward value).
Vinit=0; %initial cue-reward association strength (Vi=0 for naive animals)
n=100;
%number of trials
%****************************
close all
V=zeros(1,n);
V(1)=Vinit;
for i=1:(n-1)
deltaV=alpha*beta*(lambda-V(i));
V(i+1)=V(i)+deltaV;
end
figure(1); clf;
plot(1:n, V, '.-')
xlabel(['Trial #'], 'FontSize', 12)
ylabel(['Association strength'] ,'FontSize', 12)
title(['Rescorla-Wagner Model'] ,'FontSize', 12)
set(gca,'FontSize',12,'TickDir','out')
22
Rescorla-Wagner model of acquisition
23
Rescorla-Wagner model of extinction
24
Pros and cons of Rescorla-Wagner model
Advantages:
 Simple, intuitive, few free parameters.
 Has made some successful predictions (e.g., blocking).
Disadvantages:
 Fails to explain some behavioral effects.
 Extinction: not unlearning of previous associations.
 Does not treat time as a variable.
25
When is dopamine released in the brain?
Microelectrode recordings of dopaminergic neurons:
Dopamine neurons fire to uncued (i.e., unpredicted) rewards.
So, do dopamine neurons just signal the presence of reward?
NO!
Schultz et al., 1997
26
Dopamine neurons encode reward prediction error
RPE coding:
Positive RPE
Zero RPE
Negative RPE
Schultz et al., 1997
27
How does DA RPE signal fit into Rescorla-Wagner model?
Vi+1 = Vi + αβ(λ-Vtot)
 Dopamine’s ability to modulate plasticity is is qualitatively related to λ.
 The parameter λ represents the available reward value.
 Any error between association strength and λ will lead to a change in
association strength.
λ>Vinitial: Positive RPE, increased learning
λ=Vinitial: Zero RPE, no learning
λ<Vinitial: Negative RPE, reduced learning
28
Directly measuring dopamine release
Fast scan cyclic voltammetry (FSCV) in nucleus accumbens:
Early learning
Unpredicted reward:
Late learning
Predicted reward:
Positive RPE
Diminished RPE
Notice that DA reward response is
reduced but not zero. This is a
fairly common observation.
Day & Carelli, 2007
29
What is the CS response for?
 With more learning, dopamine signaling shifts to coincide with the CS.
 This indicates that the CS has now acquired the ability to predict that a reward is
likely to occur at a particular time.
 This predictive property allows animals to initiate anticipatory behavioral
responses.
 Sometimes, this is taken to mean that the CS has the same hedonic value as the
actual reward, but that’s not known to be universally true.
Schultz et al., 1997
CS response:
Day & Carelli, 2007
30
Caveat 1: Striatal dopamine does not just signal RPE
In spatial navigation task, striatal
dopamine shows a ramping profile
that signals proximity to reward.
Howe & Graybiel, 2013. Also see Gershman, 2014.
31
Caveat 2: Striatal dopamine signals are not uniform
Ventral striatal areas contain more reward-related, dorsal areas are
more movement-related dopamine signals.
Dorsal striatum:
Central striatum:
Ventral striatum:
Howe & Dombeck, 2016.
32
Predictive properties of dopamine neurons
 The RPE coding properties of dopamine neurons can be modeled
using temporal-difference (TD) models of learning.
TD models of learning:
• Incorporate time as variable in the trial, which RW model does
not (RW is trial-based)
• Thus, TD models can be used to predict time of reward.
• Can be viewed as extension of RW model, since timing of
reward signal influences learning rate.
Further reading on TD models in neuroscience:
Schultz et al., 1997
Suri, 2002
33
TD model of reward prediction error
Model:
Data:
Suri, 2002
34
Matlab tutorial: TD model
1. Open MatlabNewScript
2. Copy the script below in the script editorSave fileRun file
3. Alter the initial parameters and rerun the script.
%****Acknowledgment: Code was adapted from David S. Touretzky (October, 1998). Original code: www.cs.cmu.edu/afs/cs/academic/class/15883-s99/handouts/td.html****
%****For explanation of model see Suri, Neural Networks 2002.
%********Set parameters******
stimtime=5;
%time of CS
rewardtime=25;
%time of US
numberofbins = 30; %number of time bins. make sure this is greater than rewardtime.
numberoftrials=50; %number of trials.
alpha=0.9;
%learning rate (0 to 1).
gamma=0.99;
%temporal discount factor (0 to 1). More distant rewards are weighed less. default=0.99.
reset_learning='y'; %if not 'y', then will use final {W, delta, and V} values from the last run.
%*****************************
stim=zeros(numberofbins,1);
reward=zeros(numberofbins,1);
stim(stimtime)=1;
%defines stimulus vector (value of 1 at stimtime). For surprise reward, set this value to zero.
reward(rewardtime)=1;
%defines reward vector (value of 1 at rewardtime). For extinction, set this value to zero.
if reset_learning=='y';
W=zeros(numberofbins,1);
%predictive synaptic weight of each time bin. Initially, all weights are zero but they get updated every time bin.
delta=zeros(numberoftrials,numberofbins);
V=zeros(numberoftrials,numberofbins);
else
%use final {W, delta, and V} values from the last run.
delta=delta(numberoftrials,:);
V=V(numberoftrials,:)
end
Vij=V(1,1);
%prediction of cue x at time t. initial value=0.
for i=1:numberoftrials
x=zeros(numberofbins,1);
%Initialize time-shifted stimulus vector.
for j=stimtime:numberofbins
%start at j=stimtime but could also set initial j=1; result is the same.
x_prev=x;
%note that the vector x is zero for j = 1 through stimtime-1, thus as expected there is no predictive value for times before stimtime.
Vij_prev=Vij;
%Generate new time-shifted stimulus vector:
x(2:end)=x(1:(end-1));
%vector shifts forward by one time bin (the value 1 moves forward in time).
x(1)=stim(j);
%assigns first element of x to correspond to the stimulus value at time bin j. x has the value 1 in one time bin.
R=reward(j);
%R will be 1 at j=rewardtime, 0 at all other times. note that when j=rewardtime, x will have the value 1 at time rewardtime-stimtime
%TD learning rules:
Vij=sum(W.*x);
%updated prediction of current time bin.
deltaij=R+gamma*Vij-Vij_prev; %prediction error at current time bin. On the first trial that R=1, deltaij=1 at j=rewardtime (strong positive +ve).
V(i,j)=Vij;
delta(i,j)=deltaij;
%Update the synaptic weight vector for the next time bin:
W=W+alpha*deltaij*x_prev;
%With successive iterations W increases with time from stimtim and reaches a peak value at t=rewardtime-stimtime.
%A key property is that W becomes nonzero for times earlier than reward (because of term x_prev).
end
end
figure(1); clf;
subplot(2,1,1)
plot(1:numberofbins,V'); xlabel(['Time bin #'], 'FontSize', 12); ylabel(['Prediction'] ,'FontSize', 12); set(gca,'FontSize',12,'TickDir','out')
subplot(2,1,2)
plot(1:numberofbins,delta'); xlabel(['Time bin #'], 'FontSize', 12); ylabel(['Prediction Error'] ,'FontSize', 12); set(gca,'FontSize',12,'TickDir','out')
35
Matlab: TD model of positive RPE
Initially:
CS has no predictive value.
After learning:
Reward prediction increases with
time and reaches maximum in time
before reward.
36
Matlab: TD model of negative RPE
37
How do dopamine neurons compute RPE?
Hypothesis:
•The combined effect of inputs allows dopaminergic neurons to
compute RPE signals.
Unknown:
•What kind of information is provided by each input.
Dopamine
Glutamate
GABA
Morales & Margolis, 2017
38
Theoretical model of how dopamine neurons compute RPE
Keiflin & Janak, 2015
39
VTA GABAergic neurons inhibit dopamine neurons
Because of their diverse receptor expression and strong inhibitory
influence over DA neurons, much attention has focused on local
GABAergic neurons.
Relevance to addiction (see paper below)
These cells are projection neurons so they don’t just couple to DA
neurons.
GABA
Dopamine
40
VTA GABAergic neurons encode reward prediction
In order to compute RPE, the brain also needs to compute RP.
Firing rate
GABAergic
Firing rate
Dopaminergic
Cohen & Uchida, 2012
41
From correlative to causal analysis approaches
Neural recordings provide a correlative link between brain
activity and computation, but ultimately, we want to establish a
causal relationship.
•Correlative: GABAergic neurons encode reward prediction
signals, which may be necessary for DA neurons to encode
RPE.
•Causal: Inhibiting GABAergic neurons alters RPE encoding in
DA neurons.
Establishing causality requires experiments involving loss-offunction or gain-of-function of specific brain circuits.
•Approaches: pharmacology, gene knockout/rescue, lesions,
optogenetics, chemogenetics.
42
Targeting genetically defined cell types with
Cre-Lox recombination
Cardin & Moore, 2010
43
Viral vectors for spatially confined cell targeting
1.
2.
3.
4.
5.
Choose the region and cell type of interest.
Identify genes that are selectively expressed in that cell, but not other
neighboring cells.
Obtain animal strains selectively expressing Cre recombinase in the gene
of interest (e.g., VGAT-Cre or TH-Cre).
Inject a Cre-dependent virus in the region of interest.
•
Example: AAV-Flex-ChR2-YFP
•
There are several viral types with different expression properties.
Histologically confirm that the YFP reporter is selectively expressed in the
cells of interest.
VGAT
TH
Allen Brain Atlas
44
Be aware of potential pitfalls
Cre expression is not perfectly confined to the cell type of interest; if the
selectivity is poor this can limit how the results are interpreted.
45
Optogenetic manipulation of GABAergic neurons
alters dopamine RPE signals
Activating GABAergic neurons reduces dopamine RPE signal.
Inhibiting GABAergic neurons increases dopamine RPE signal.
Activate with ChR2:
GABA
Inhibit with Arch:
Dopamine
Eshel & Uchida, 2015
46
External sources of dopaminergic input – Frontal cortex
Lesioning orbitofrontal cortex reduces magnitude of positive and
negative reward prediction error signals encoded by dopamine neurons.
Takahashi & Schoenbaum, 2011
47
External sources of input - Habenula
Lesioning habenula selectively attenuates the negative RPE signal.
Reward omission trials:
Tian & Uchida, 2015
48
Summary of dopamine computational properties
 Encode reward prediction error (RPE).
 Rescorla-Wagner and TD models can be used to simulate
RPE signals.
 Higher than expected reward (positive RPE) promotes
stronger cue-reward associations.
 Lower than expected reward (negative RPE) promotes
extinction.
 The computational properties of dopamine neurons are
thought to be generated by signals from functionally
diverse inputs.
49
Latest research trends – embracing diversity
 Increasingly, studies show that dopamine signals in the
brain are heterogeneous, suggesting a diverse set of
functions beyond just reward learning.
 The diversity appears to be largely related to anatomy, and
thus, new brain maps greatly help us understand the
organizational principles of dopamine functional diversity.
50
Behavioral role of dopamine in learning
We will briefly sample the dopamine literature
relying on the following approaches:
1. Pharmacology
2. Genetic models (dopamine KO mice)
3. Optogenetics
51
DA receptor blockade, or DA pathway lesions, impair
Pavlovian learning
 There is extensive literature on these effects, dating back
to the 80s. Rather than covering a specific paper, I refer
anyone interested in reading further to the following review
papers:
 Caveat of pharmacology: lacks time specificity.
52
Pavlovian learning deficits in TH knockout mice
lacking dopamine
Darvas & Palmiter, 2014
53
Selectively restoring striatal dopamine rescues
Pavlovian learning


Ventral striatal dopamine is necessary for Pavlovian reward learning, and restoring dopamine in that
region is sufficient to rescue learning.
Caveat: lack time specificity.
Darvas & Palmiter, 2014
54
Optogenetically mimicking a positive RPE
signal drives associative learning
Used a blocking task.
Provided a well-timed laser stimulus paired with reward.
Unpaired laser had no effect on behavior.
Steinberg & Janak, 2013
Keiflin & Janak, 2015
55
Optogenetically mimicking a negative RPE
signal drives extinction
Used an over-expectation task.
Provided a well-timed laser stimulus coinciding with
expected reward.
Together with the Steinberg & Janak paper, these results
show that dopamine RPE signals bidirectionally control
learning.
56
Summary – Some general principles emerging
from optogenetic manipulations
Activation of VTA dopamine neurons drives stronger
learning (or is appetitive).
Inhibition of VTA dopamine neurons drives weaker learning
(or is aversive).
Inputs that increase DA activity thus tend to drive stronger
learning (or are appetitive).
Inputs that decrease DA activity thus tend to drive weaker
learning (or are aversive).
VTA preferentially controls reward learning, while the SNc
preferentially controls movement.
See “Other Recent Literature” slide at the end of this presentation.
57
Sufficiency versus Necessity in Optogenetics
 When reading the literature (or designing your own study), it is
important to distinguish between two fundamentally different
types of experiments:
 Test for sufficiency of a brain circuit in behavior.
 Test for necessity of a brain circuit in behavior.




One does not imply the other.
Sufficiency is often tested via optogenetic activation.
Necessity is often tested via optogenetic inhibition.
Should be aware of potential caveats. For example, activation
may not be a pure gain-of-function, and inhibition may not be a
pure loss-of-function experiment.
 For a thoughtful review of caveats, see Allen & Boyden, 2015.
 Bottom line: there are pros and cons of using optogenetics, and
if there is uncertainty in the interpretation of the results, other
methods (e.g., knockout/rescue, chemogenetics) could be used
to confirm the results.
58
Another thing about sufficiency
 If a study concludes that a particular circuit is sufficient to
drive a certain behavior, that does not mean that other
circuits are not involved in the behavior.
59
Latest research trends – Multiple circuits and
functions
 There is major interest in understanding how dopamine
signaling is orchestrated with other neuromodulatory &
neurotransmitter signaling mechanisms in different brain areas
to mediate associative learning, as well as other behavioral
functions.
 More precise cell mapping and targeting approaches allow us to
dissect these functions in genetically and anatomically defined
circuits.
Russo & Nestler, 2013
60
Other Recent Literature
(this is not meant to be a complete list)
Cholinergic inputs to VTA and SNc:
Xiao & Gradinaru, 2016.
Cholinergic Mesopontine Signals Govern Locomotion and Reward through Dissociable
Midbrain Pathways
Hypothalamic inputs to VTA:
Nieh & Tye, 2015.
Decoding neural circuits that control compulsive sucrose seeking
Excitatory and inhibitory inputs to VTA:
Lammel & Malenka, 2012.
Input-specific control of reward and aversion in the ventral tegmental area
VTA dopamine neuron inhibition by local GABAergic neurons:
Tan & Lüscher, 2012.
GABA neurons of the VTA drive conditioned place aversion
61