Lecture notes

Learning Rules 1
Computational Neuroscience 03
Lecture 8
Last week showed how to model neurons and networks of
neurons using firing rate models.
dv
r
 v  F ( w.r )
dt
And we then discussed how to add them together to form networks
of neurons
Feedforward and Recurrent networks
Where the weight vector is replaced by a matrix. Also often replace
feedforward input with a vector
dv
r
 v  F (W r  M r )  v  F (h  M r )
dt
Recurrent networks can also do this but have much more complex
dynamics than feedforward nets. Also more difficult to analyse
Much analysis focuses on looking at the eigenvectors of the matrix M
Can show for instance that networks can exhibit selective
amplification if there is one dominant eigenvector (cf PCA)
Or if an eigenvalue is exactly equal to 1 and others < 1can get
integration of inputs and therefore persistent activity as activity does
not stop when input stops
But how are we to generate such precise weight changes? Need
some synaptic modification rules
Hebb’s postulate
Hebb's postulate of learning (or simply Hebb's rule) (1949), is
the following:
"When an axon of cell A is near enough to excite cell B and
repeatedly or persistently takes part in firing it, some growth
processes or metabolic changes take place in one or both cells
such that A's efficiency as one of the cells firing B, is increased".
This rule forms the basis of much of the research on role of
synaptic plasticity in memory and learning
Has been generalised to include decreases of strength when
neuron A repeatedly fails to be involved in activation of B and
generally look at the correlation or covariance of activities of
pre-and postsynaptic neurons
Learning in the hippocampus
Here see examples of long-term potentiation and depression
(LTP and LTD).
High frequency stimulation induces LTP while long-lasting low
frequency leads to LTD
‘Long Term’ refers to changes > 10 mins
Learning Rules
We will consider 3 types of learning and will focus on Hebbian type
learning though others exist (modifcations on pre/postsynaptic
activity only etc)
• Unsupervised learning: Network responds during training
solely as a result of its connections and intrinsic dynamics. Net
self-organises in a manner dependent on inputs and synaptic
plasticity rule
• Supervised learning: Here the network also has a “teacher” in
the form of a set of desired ‘target’ outputs for each input. Not
especially biologically plausible, but good for existence proofs
• Reinforcement learning: Somewhere in between the 2. The net
does not know the target output but gets feedback via reward
punishment
Unsupervised Learning
dv
r
 v  w.u
dt
Start with single postsynaptic neuron and a linear activation
function. As synaptic changes will be much longer time scale than
these dynamics firing rate equation reduces to:
Nu
v  w.u   wb ub
b 1
Start with simplest Hebbian style plasticity for a single neuron:
w
dw
 vu
dt
As u, v denote firing rates vu can be interpreted as a probability that
pre- and post fire together.
Each different set of u is known as an input pattern. As weight
changes are slow, rather than summing all changes separately can
average the input patterns and thus compute the average change.
To do this use < > to denote averages over the ensemble of input
patterns. Thus get:
w
dw
v u
dt
Remembering that v = w.u this gives the correlation-based rule
dw
w
 Q.w
dt
Nu
dwb
or :  w
  Qbb' wb '
dt
b '1
Where Q is correlation matrix: Qbb’ = <ubub’>
eg for 2 patterns u1 and u2, Qij = ½(u1i u1j + u2i u2j)
So suppose
u1=
(1, 0) and
u2=
(0, 1) then
 0.5 0 

Q  
 0 0.5 
Alternatively u1= (1, 1) and u2= (1, 0) then Q   1 0.5 
 0.5 0.5 


Suppose w=10 and w=(0.1,0.1) and we have the 2nd matrix
then:
w 
1
w
 1
Q w  0.1
 0.5

0.5  0.1  0.015 









0.5  0.1  0.01 

As wn+1= wn + w, w1=(0.115, 0.11), w2=(0.13, 0.12) ….
Which leads to instability and uncontrolled growth of w
W=(0.1, 0.1)
W=(0.1, -0.3)
Outcome dependent on
eiegenvectors of Q and initial
conditions, but unstable
To avoid unbounded growth can impose a saturation constraint.
However, this means all weights go to max or min and thus we
have no competition between different synapses
This means that the neuron cannot distinguish between presynaptic
inputs
Also since u, v are firing rates and therefore positive, Hebb rule
only describes LTP.
However, earlier figure showed that synapses can depress in
strength if presynaptic activity is accompanied by a low level of
postsynaptic activity
Can also get results where opposite is true
Analysis focuses on looking at eigenvectors of Q the correlation
matrix of the input vectors
For instance, can show that after training Hebbian rule leads to v a
e1.u for arbitary vector u and that weight vector expressed as a sum
of eigenvectors is dominated by e1 ie w a e1
That is v is projection of u onto the principal eigenvector of Q
Eg for a Q with principal
eigenvector (1, -1)/sqrt(2) we
would expect w to end up as
(wmax,0) or (0, wmax).
However because of saturation
constraints can get (wmax, wmax)
Covariance rule
To get LTD can introduce a postsynaptic or presynaptic threshold:
dw
w
 (v   v )u
dt
dw
w
 v (u   u )
dt
Below thresholds get depression, above potentiation. A convenient
choice for the thresholds is the average pre/post synaptic input <u>
or <v>. If we now replace v with w.u we get the covariance rule
w
dw
 C .w
dt
Where C is the covariance matrix of the input data ie:
C  (u - u )(u - u )  uu  u  E[(u  u )(u  u ) ]
2
T
Note that as <v> keeps changing we need to keep updating the
postsynaptic threshold while the presynaptic one is independent of
the weights
Although both average to the same thing, they do have differences.
The postsynaptic threshold means that only modifies weights for
non-zero presynaptic activities. If v is below threshold then this
results in homosynaptic depression
Alternatively, presynaptic threshold reduces the strength of inactive
synapses for v>0: heterosynaptic depression
Although the covariance rule allows LTD it is still unstable due to
positive feedback
Also we do not have competition, but this can be introduced to
allowing threshold to slide as follows
BCM rule
As covariance rule allows LTD without postsynaptic/presynaptic
activity, Bienenstock, Cooper and Munro (82) proposeed an
alternative for which there is experimental evidence where the
postsynaptic threshold is dynamic
dw
w
 vu (v   v )
dt
This is again unstable if  is fixed.
However, if we allow the threshold to grow faster than v we get
stability. For instance use  as low pass filtered version of v2
d v

 v2  v
dt
Usually set  to be less than w so that changes in  faster than
changes in v
Now get competition between
synapses since strengthening
some synapses results in threshold
increasing meaning that it is
harder for others to be
strengthened
Synaptic normalisation
A more direct way of enforcing competition is through synaptic
normalisation
Idea is that postsynaptic neuron can only support a certain amount
of total synaptic weight so strengthening one leads to weakening
others
Can either hold the sum of weights constant if all are +ve or –ve or
can constrain the sum of squares of the weights (cf ANN network
pruning)
2 types: subtractive normalisation and multiplicative normalisation
Subtractive normalisation
dw
v( n.u )n
w
 vu 
dt
Nu
Where n is a vector of ones of length Nu so n.u is the sum of all the
inputs u.
Thus the second term is simply a vector Nu long with the same values in
ie (k, k, ….., k) whose sum over all the elements is equal to the sum
over all the elements of vu. Thus the total increase in the weights is 0
This rule must be augmented by a saturation constraint to prevent the
weights becoming negative, that if a weight becomes zero, it is not
moved downwards
Also, without upper saturation often leads to all weights bar one being
zero. Note also rule involves global knowledge of weights
Multiplicative normalisation
dw
w
 vu  av 2 w
dt
Where a is a +ve constant: known as Oja’s rule (1982)
This rule is more local than previous as it only involves the weight in
question and pre-and post synaptic activities.
However, its form is based on theoretical arguments rather than
experimental data
Previous rule was rigid as it had to be satisfied at all times whereas
this is more dynamic with |w|2 gradually relaxing to 1/ a
This induces competition as if one weight increases, the maintenance
of constant length of the weight vector forces others to decrease
Both Hebbian and Oja’s rule run for long enough generate vectors
parallel to principal eigenvector of correlation matrix as in A
This is basically principal component analaysis (PCA) which is
theoretically the optimal in terms of retaining info way to encode
high dimensional info onto lower dimensional subspace
However, B shows what happens if input vectors don’t have zero
mean (as in real systems), but this problem is alleviated by using
covariance-based rules
Timing based rules
Previous rules don’t take timing into account
Can be crucial since if pre-synaptic spike occurs after postsynaptic get
LTD rather than LTP
Therfore need to integrate over time as in the following:

dw
w
 v  ( H ( )v(t )u (t   )  H ( )v(t   )u (t )) d
0
dt
Where H is a function like the solid line in previous figure
Such functions still require saturation constraints but timing can
generate competition
Multiple Postsynaptic Neurons
Can extend the rules defined previously to nets with multiple
postsynaptic neurons. In these networks the output rates are:
v Wu  M v
Thus
v  KW u
where
K  (I  M )
And the Hebbian rule becomes:
w
dW
 vu  KWQ
dt
1
Can also use feature-based models where the net is indexed by input
features rather than buy individual inputs
Network models can have adaptive feedforward wieghts and fixed
recurrent ones, or vice versa or both layers can be adaptive
Can get competition through mainly inhibitory recurrent connections
Can then get eg self-organising maps and elastic nets where K can
have forms like: