Inferring learning rules in cortical circuits

Inferring learning rules
in cortical circuits
Nicolas Brunel
Mechanisms of learning and memory in cortical circuits
• External stimuli triggers changes of neuronal activity in specific cortical circuits
Mechanisms of learning and memory in cortical circuits
• External stimuli triggers changes of neuronal activity in specific cortical circuits
Mechanisms of learning and memory in cortical circuits
• Changes of activity ⇒ changes in synaptic connectivity (synaptic/structural plasticity)
Mechanisms of learning and memory in cortical circuits
• Changes in synaptic connectivity ⇒ change the dynamics of the neural circuit
What are the rules govering synaptic plasticity?
1. Inferring plasticity rules from in vitro data
2. Inferring plasticity rules from in vivo data
3. Statistics of connectivity in networks optimizing information storage
1. Inferring plasticity rules from in vitro data
Synaptic plasticity experiments (cortical slices)
Markram et al 1997
Spike timing dependent plasticity (STDP) protocol
PRE
POST
I
I
...
TI
I
I
I
∆t
...
I
I
I
I
I
• T : inter-spike interval (f = 1/T : frequency)
• ∆t: difference between timings of post and pre spikes
I
I
I
...
I
I
Synaptic plasticity depends on both spike timing and firing rate
Spike timing
Firing rate
(Bi and Poo 1998)
(Sjostrom et al 2001)
Biophysical mechanisms
• Induced by calcium entry through
NMDA/VDCC channels in spine;
• Leads to cascade of processes in protein interaction networks of the PSD
• Change properties of AMPA receptors
(phosphorylation);
• Add/remove AMPA receptors on the
membrane;
• Structural changes of the spine;
• Changes at pre-synaptic level (affecting
probability of release)
A minimal calcium-based model
• Minimal model for calcium concentration:
X
X
dc
c
δ(t − tpre − D) + Cpost
δ(t − tpost )
=−
+ Cpre
dt
τCa
pre
post
• Calcium drives a synaptic efficacy variable ρ(t)
dρ dU (ρ)
τ
−
+ γp (1 − ρ)Θ(c(t) − θp ) − γd ρΘ(c(t) − θd ) + Noise(t)
dt
dρ
Dependence on spike timing - diversity of STDP curves
Dependence on firing rate
Periodic spike trains
(Data from Sjostrom et al 2001)
Poisson spike trains
Predictions
• Model predicts that shape of STDP curve should depend strongly on extracellular
calcium concentration;
• Reconciles conflicting studies in hippocampus:
2+
Wittenberg and Wang (2006) - [Ca
]=2mM
2+
Campanac and Debanne (2008) - [Ca
• In vivo, [Ca2+ ]∼ 1.5mM or lower ⇒ in the model, yields no synaptic change for the
standard low frequency STDP protocol, for any ∆T .
]=3mM
2. Inferring learning rules from in vivo data
Inferring learning rules from in vivo data
• How does neuronal activity change as
an initially novel stimulus progressively
becomes familiar?
• On average, visual responses decrease with familiarity (Baylis and Rolls
1987, Li, Miller and Desimone 1993)
• Selectivity increases with familiarity
(Freedman et al 2006, Woloszyn and
Sheinberg 2012)
ITC data - responses to ‘best’ stimuli
• In each session use 125 novel stimuli, 125 familiar stimuli
• Rank visual responses separately for the two stimuli sets
• ‘Best’ familiar stimulus elicits a stronger response than ‘best’ novel stimulus in ‘putative
excitatory’ neurons (broad spikes)
Woloszyn and Sheinberg (2012)
How do distributions of firing rates evolve with learning?
• Take a rate model, with N 1 neurons described by a
firing rate ri ;
• Total inputs to neuron i
hi = IiX
1 X
wij rj
+
N
j
• Firing rate ri = Φ(hi )
• When a novel stimulus is shown, ri = vi where vi is
drawn from Pnov (v)
• Induces changes in synaptic connectivity
wij → wij + ∆w(vi , vj )
• We assume ∆w(vi , vj ) = f (vi )g(vj )
• What is the new distribution of rates for the (now familiar)
stimulus?
How do distributions of rates evolve with learning?
• Firing rate of the (now familiar) stimulus ri = vi + ∆vi
• Change in total input due to learning
1 X
1 X
∆wij vj +
wij ∆vj + O(∆2 )
∆hi =
N j
N j
= f (vi )g(v)v + w∆v
Neuron-specific
Global
• Change in rate of neuron i
∆vi = Φ0i w∆v + f (vi )g(v)v
• Mean change in rate
Φ0 f (v)
∆v =
g(v)v
0
1−Φw
Inferring learning rule from distribution of rates?
1. Infer transfer function Φ, from
• Empirical distribution of rates for novel stimuli;
• Assumption 1: Gaussian distribution of inputs for novel stimuli
2. Infer ∆hi as a function of vi , from
• Distribution of inputs for familiar stimuli, obtained from empirical distribution of rates
for familiar stimuli + Φ;
• Assumption 2: Ranks are preserved through learning
3. Infer f (v) as a function of v
∆hi = f (vi )g(v)v + w∆v
Learning rules of individual IT neurons
A BCM rule in IT cortex?
• Threshold correlated with both mean rate, and variance
• Consistent with Bienenstock-Cooper-Munro (1982) rule
A network model with the rule inferred from the data
• Network subjected to a long stream
of uncorrelated inputs, drawn randomly
from empirical distribution;
• Implements rule inferred from data, in
excitatory-to-excitatory neurons only;
• No plasticity in inhibitory neurons;
• Add normalization of weights
Conclusions
• Inferred post-synaptic dependence of learning rule from in vivo data
• Firing rate dependence is consistent with BCM rule, and also calcium-based rule, both
derived from in vitro data
• Data consistent with Hebbian plasticity in E neurons, no plasticity in I neurons;
• Learning rule inferred from data leads to stable learning of a stream of random input
patterns in a large network simulation
• Distribution of rates in simulated networks reproduces quantitatively in vivo data
3. Statistics of connectivity in
networks optimizing information storage
Attractor network model
• Fully connected network of N 1
binary neurons;
• Goal: store a large number (p ≡
αN ) of fixed point attractor states
(stable representations of external
stimuli)
• Each attractor state: random binary
pattern with coding level f
• Learning in E-E synaptic weights
(sign constraint)
• Robustness level κ (measures size
of basin of attraction of each attractor);
Gardner approach
• Subspace of solutions to learning problem in w space:
w
~i .ξ~µ > θ + κ if ξiµ = 1
w
~i .ξ~µ < θ − κ if ξiµ = 0
• The volume of this subspace is:
Z
V =
dr(w
~i )
p
Y
Θ
(2ξiµ
− 1) w
~i .ξ~µ − θ − κ
µ=1
• Compute ‘typical’ volume of the subspace using methods
from statistical physics (replica or cavity method);
• Storage capacity obtained when volume goes to zero;
• Compute the distribution of weights in the subspace.
The synaptic weight distribution at maximal capacity
At maximal capacity:
"
P (wi = W ) = Sδ(W ) + √
1
1
exp −
2
2πσW
2 #
W
+ W0 (S)
Θ(W )
σW
• The fraction of zero weight synapses S depends on robustness parameter
ρ=
where
κ
W
p
f (1 − f )N
W ∼ θ/f N is the average synap-
tic weight
• The width of the truncated Gaussian σW
depends on S and W .
• Large fraction of zero weight synapses
is consistent with data:
– Anatomy: nearby pyramidal cells
are
potentially
fully
connected
Probability density
0.2
ρ=0
ρ=1
ρ=2
ρ=3
0.15
0.1
0.05
0
0
Markram et al 1997, Sjostrom et al
2001, Holmgren et al 2003)
⇒ Large fraction of zero-weight
(‘potential’ or ‘silent’) synapses.
0.4
0.3
0.2
0.1
5
10
Synaptic weight
15
0
2
4
6
8
Robustness parameter ρ
120
Number of synapses
∼ 10% (Mason et al 1991,
f=0.5
f=0.1
f=0.01
0.5
160
– Electrophysiology: nearby pyrami-
ity of
0.6
0
(Kalisman et al 2005)
dal cells have connection probabil-
Connection probability
Distribution of weights: theory vs experiment
80
40
0
0
1
2
3
EPSP amplitude (mV)
4
5
Sjostrom et al 2001; Song et al 2005
Two-neuron connectivity in an attractor network
• Calculation of the joint distribution P (wij , wji ) using cavity method;
• Leads to truncated correlated 2-D Gaussian, with delta functions on wij = 0 and wji = 0 axis
(and product of two delta functions at wij = wji = 0).
• Over-representation of bidirectionnally connected pairs of neurons, compared to random
5
0.2
0.18
4
0.16
0.14
0.12
wji
3
0.1
2
0.08
0.06
1
0.04
0.02
0
0
0
1
2
3
wij
4
5
excess of reciprocal connections
uncorrelated network with same connection probability
8
f=0.5
f=0.1
f=0.01
6
4
2
0
0
2
4
6
Robustness parameter ρ
8
reciprocal connection probability
Two-neuron connectivity: experiment vs theory
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1 0.2 0.3 0.4 0.5 0.6
connection probability
Bidirectional vs unidirectional connections
Probability density (1/mV)
2
bidirectional, exp
unidirectional, exp
bidirectional, th
unidirectional, th
1.6
1.2
0.8
0.4
0
0
1
2
3
4
EPSP amplitude (mV)
5
Conclusions
• A network optimized to store a large number of attractors has
– Sparse connectivity matrix;
– The sparser the matrix, the more robust the network;
– Strong overrepresentation of bidirectional connections, compared to a random
network
– Optimal connectivity matrix approximately half-way between fully random and fully
symmetric network
• All these features are consistent with the available statistics of connectivity in cortex
• A network optimized to store a large number of sequences have
– Again a sparse connectivity matrix;
– No overrepresentation of bidirectional connections
Acknowledgements
Calcium-based rule
Michael Graupner (Paris), David Higgins (Chicago), Yonatan Aljadeff (Chicago), Boris
Barbour (Paris), Dominique Debanne (Marseille)
Graupner and Brunel (PNAS 2012)
Plasticity rules from in vivo data
Sukbin Lim (Chicago), Jill McKee (Chicago), Luke Woloszyn (Brown), Yali Amit (Chicago),
David Freedman (Chicago), David Sheinberg (Brown)
Lim et al (Nature Neuroscience, in press)
Connectivity in optimal networks
Brunel (submitted, 2015)