Supervised Spike-Timing-Dependent Plasticity

ARTICLE
Communicated by Jochen Triesch
Supervised Spike-Timing-Dependent Plasticity:
A Spatiotemporal Neuronal Learning Rule for Function
Approximation and Decisions
Jan-Moritz P. Franosch
[email protected]
Google Switzerland GmbH, 8002 Zurich, Switzerland
Sebastian Urban
[email protected]
J. Leo van Hemmen
[email protected]
Physik Department, Technische Universität München, 857478 Garching
bei München, Germany
How can an animal learn from experience? How can it train sensors, such
as the auditory or tactile system, based on other sensory input such as
the visual system? Supervised spike-timing-dependent plasticity (supervised STDP) is a possible answer. Supervised STDP trains one modality using input from another one as “supervisor.” Quite complex timedependent relationships between the senses can be learned. Here we
prove that under very general conditions, supervised STDP converges to
a stable configuration of synaptic weights leading to a reconstruction of
primary sensory input.
1 Introduction
Humans and animals are able to use multiple senses simultaneously or
separately so as to perform a task, for example, the visual, auditory, and
proprioceptive sense to localize or grasp a target. The input to various
senses therefore has to be combined so as to form a consistent internal representation of the environment. Accordingly, the cooperation of different
senses has to be, and can be, learned (Knudsen, Knudsen, & Esterly, 1982).
Learning processes are assumed to be based on synaptic changes. An alternative mechanism for learning is modifying neuronal responsiveness rather
than synaptic weights (Swinehart & Abbott, 2005). Here, however, we restrict ourselves to learning rules that modify synaptic weights. A possible
biological mechanism is spike-timing-dependent plasticity (STDP) (Gerstner, Kempter, van Hemmen, & Wagner, 1996; Markram, Lübke, Frotscher,
& Sakmann, 1997; van Hemmen, 2001), where synapses change according
to temporal input-output correlations experienced by a neuron. STDP can
Neural Computation 25, 3113–3130 (2013)
doi:10.1162/NECO_a_00520
c 2013 Massachusetts Institute of Technology
3114
J.-M. Franosch, S. Urban, and J. van Hemmen
be used in unsupervised learning, for example, to attain high-precision auditory localization and form auditory maps (Kempter, Leibold, Wagner, &
van Hemmen, 2001; Leibold, Kempter, & van Hemmen, 2001).
In this article, we specify an algorithm showing how through supervised
STDP, a supervised version of STDP, a neuronal network can learn how to localize and discriminate prey that emits waves on a water surface. The preylocalization technique may consist, for instance (Franosch, Lingenheil, &
van Hemmen, 2005), of space- and time-dependent measurements of water
velocity on the skin of a frog such as Xenopus, which has 180 lateral-line detectors all over its body. The appropriate supervisor depends on the context.
By forcing a neuron to fire when it should according to a reference neuron
that already has the desired synaptic weights J∗ , Legenstein, Naeger, and
Maass (2005) have shown that a second neuron can “learn” J∗ by STDP
under certain conditions regarding the correlations of the input spike trains
for Poisson neurons (Kempter, Gerstner, van Hemmen, & Wagner, 1998;
van Hemmen, 2001). Supervised STDP, as we show here, converges under
far more general conditions, completely independent of the input and a
neuron’s spike generation process.
Supervised STDP uses spike times and the postsynaptic membrane potential only to determine changes to synaptic weights and is therefore
biologically plausible. Moreover, it features supervised learning of timedependent signals using time-dependent and time-delayed input signals,
which distinguishes it from most classical learning rules found in the literature, such as Oja’s rule (Oja, 1982), feedback-error learning (Kawato, 1990),
and iso-learning (Porr & Wörgötter, 2003).
While the tempotron (Gütig & Sompolinsky, 2006) uses a voltage
convolution–based learning rule to discriminate between different spatiotemporal sequences, the supervised STDP learning rule we present here is
also voltage convolution–based (Franosch et al., 2005) but far more general.
Supervised STDP is designed not only to discriminate between different
spike sequences but even to reconstruct a teacher signal. Moreover, supervised STDP includes delayed spike times, which are necessary to learn
time-dependent tasks. A perceptron converges to stable synaptic weights
only if the input vectors are linearly separable (Hertz, Krogh, & Palmer,
1991). It is therefore not quite clear under which conditions the tempotron
converges, because the tempotron is a generalization of the perceptron.
Artificial recurrent neural networks can learn online by using local information in each neuron only (Kühn, Beyn, & Cruse, 2007). A recurrent
network is able to learn linear systems of equations and simple patterns
of activation. The network can even learn dynamics described by linear
differential equations (Kühn & Cruse, 2007). How the hypothesized artificial neurons could, however, be implemented by real biological neurons
remains unclear since the neuron required to perform the task has a rather
complex internal structure. The supervised STDP that we investigate here
is an alternative learning mechanism that we believe could be implemented
Supervised Spike-Timing-Dependent Plasticity
3115
Figure 1: Circuit diagram of a reconstruction neuron (large open circle) with
membrane potential W p . The neuron is connected to sensors (light gray circles)
measuring the time-dependent variables yi and thus generating spike trains ỹi .
The axons of the sensors cause a delay i j of the spike train ỹi and terminate in
synapses (small black circles) with strength Jij . The reconstruction neuron also
gets feedback from a second system of sensors (dark gray circles) measuring the
time-dependent variables xq and generating the spike trains x̃q . The strength of
the feedback is −F(p, q), and the delay is . Although in theory we could have
multiple delays for the feedback as well, the convergence proof and simulations
show that a single delayed feedback is sufficient. For each position p, there is a
different reconstruction neuron (the figure shows only one of them). It learns to
reconstruct the stimulus xp through its membrane potential W p with the help
of feedback.
in real neurons, although, to our knowledge, there is no direct biological
evidence.
2 Neuronal Hardware
The neuronal hardware needed for supervised spike-timing-dependent
plasticity (SSTDP) is very simple and quite general, and it may be found in
nature frequently. We assume that the animal has sensors (such as those in
the retina, or cochlea, or the lateral line) measuring some time-dependent
variables xp at positions p in the environment. For instance, those positions can be the places where visual sensors measure light intensity or for
the auditory system where the two tympani measure sound pressure. For
each position p, there is a different sensor generating the spike train x̃p (see
Figure 1).
In addition, related to each perception system, an animal typically has at
least one second system, such as a tactile or auditory system, that performs
neuronal information processing based on sensory input from the outside
3116
J.-M. Franosch, S. Urban, and J. van Hemmen
world. The second system has yet to be trained to compute the same variables xp internally, in the brain. For that purpose, it has reconstruction
neurons for each position p trying to reconstruct the original signal xp . The
second system gets sensory input from n sensors i, measuring signals yi ,
and generating spike trains ỹi . Throughout what follows, the tilde, as in ỹi ,
indicates a spike train. Peripheral nerves conduct the action potentials with
p
p
time-delay ik to synapses with weights Jik . Here the variable k numbers
different branchings of the same axon and thus different synaptic weights
and delay times. The synapses are connected to the reconstruction neuron
responsible for position p.
To calculate the membrane potential W p at a reconstruction neuron, we
p
denote by ti the times when a spike is generated by sensor i. In the spike
response neuron model (Gerstner & van Hemmen, 1994; Kistler, Gerstner,
& van Hemmen, 1997), which we adapt here for convenience, each spike
p
p
generates a postsynaptic potential Jik ε with synaptic strength Jik . The timedependent function ε describes the form of an excitatory postsynaptic potential generated by a synapse of strength 1. For the numerical simulations,
ε(t) =
t −t/t
e r
tr
(2.1)
with the rise time of the response tr = 3 ms.
The part of the membrane potential generated by the sensory nerves V p
reads
V p (t) =
i,k
p p
p
Jik ε t − ti − ik .
(2.2)
p
ti
Using spike trains ỹi described by
ỹi (t) =
p
δ t − ti ,
(2.3)
p
ti
we may also write, with denoting a convolution,
V p (t) =
p
p
Jik ỹi ε t − ik .
(2.4)
i,k
The reconstruction neurons get feedback signals from the sensors directly
measuring the stimulus xp or from more precise modalities such as—most
of the time but not always—vision (Bürck, Friedel, Sichert, Vossen, & van
Hemmen, 2010); here the Tectum opticum plays a key role. Let F(p, q)
be the synaptic strength connecting the sensor measuring its stimulus at
position q with the reconstruction neuron responsible for position p. Then
Supervised Spike-Timing-Dependent Plasticity
3117
the membrane potential at the reconstruction neurons is supposed to be
given by
W p (t) = V p (t) −
F(p, q) x̃q ε (t − ) .
(2.5)
q
It is necessary to allow connections between sensors and reconstruction
neurons with p = q since the distributed representation of a signal originating at p may be spread out over sensors at several locations q.
3 Supervised Spike-Timing-Dependent Plasticity Neuronal
Learning Algorithm
Here we investigate what at first sight seems to be a slight modification
of the supervised spike-timing-dependent plasticity algorithm (Franosch
et al., 2005) for neuronal learning. In learning step ν, the synaptic weights
p
Jik are updated according to
p
p
p
Jikν+1 = Jikν + Jikν ,
(3.1)
p
where Jikν is the change of synaptic weight. This change is caused by a
p
p
p
correlation of the input spikes arriving at times tik := ti + ik and the output
signal of the neuron. Here we put
⎡
Jikν = −2ην ⎣γ Jikν
p
p
⎤
p
+
ε t − tik W p (t) dt ⎦ .
(3.2)
p
tik
The small learning constant ην ≥ 0 decreases as the learning step ν proceeds
in time. The integration is taken over one stimulus sample duration, and
we notice the minus sign in front of ην . Here a “stimulus sample” is meant
to be a presentation of a meaningful stimulus that typically requires some
action. Of course in reality, synapses may well change during stimulus presentation, but as ην is small, these changes can be neglected (van Hemmen,
2001). For γ ≈ 0 but γ > 0, the first term of equation 3.2 is a small decay
p
of the synaptic weights proportional to the actual synaptic strength Jikν .
As shown later, the slight modification γ > 0 of supervised STDP ensures
convergence of the learning algorithm. It is also biologically plausible that
synapses may lose some of their strength as time proceeds.
The second term on the right in equation 3.2 is the supervised spiketiming-dependent plasticity term. It correlates the postsynaptic potential ε
caused by incoming spikes with the output signal, that is, the membrane
potential W p . W p (t) is high and there is an incoming spike
at time
arriving
p
p
p
p
tik just before t — tik < t but tik ≈ t — the component ε(t − tik ) W p (t) dt
3118
J.-M. Franosch, S. Urban, and J. van Hemmen
W(t)
excitatory synapse
before ...
W(t)
... and after update
t
incoming spike
inhibitory synapse
before ...
... and after update
Figure 2: Illustration of supervised spike-timing-dependent plasticity, equation
3.2, for γ = 0. If a spike arrives at an excitatory synapse at time t = 0 just
before the membrane potential W is positive, synaptic strength decreases. If the
membrane potential after spike arrival is negative, synaptic strength increases.
The supervised STDP learning rule is therefore anti-Hebbian. An inhibitory
synapse increases its inhibitory strength in case of positive membrane potential
after spike arrival. Finally, no output (W = 0) induces no effect.
p
of the SSTDP term is positive, decreasing the synaptic strength Jikν . As a
positive correlation between input and output causes a decrease in synaptic
strength, the proposed learning rule is anti-Hebbian. Figure 2 illustrates the
supervised STDP learning rule.
To make the learning rule Hebbian instead of anti-Hebbian, one could
change the feedback from negative to positive, which would invert the sign
of W in equation 2.5. Then the secondary sensory system would learn to
cancel the positive feedback by generating a negative postsynaptic potential. The mechanism to read out the result of the secondary system after
learning that we propose here is to simply switch off the feedback in such
Supervised Spike-Timing-Dependent Plasticity
3119
a way that the (positive) output of the secondary system remains. With the
alternative Hebbian approach, however, it is hard to see how one could read
out the learned signal because a negative membrane potential generates no
spikes. Further on, we therefore stick to negative feedback.
4 Convergence of Supervised STDP
Convergence of the supervised STDP algorithm has already been shown
numerically for γ = 0 and Poisson neurons (Franosch et al., 2005). In the
following, we prove that for the seemingly slight modification of the supervised STDP algorithm with γ > 0 introduced here, convergence almost
surely happens under rather general conditions.
p
In the following, the vector J indicates all synaptic weights (Jik )p,i,k . By
the variable z, we denote a learning example. A learning example z consists
of a stimulus position p together with a random time-dependent stimulus
x : t → x(t), causing a spike train x̃ in the feedback system and spike trains
ỹi in the nerves of the secondary sensory system. That is, z is the (n + 2)tuple z = (p, x̃, ỹ1 , . . . , ỹn ). The membrane potential of the reconstruction
neuron responsible for position p at time t in response to a training sample
z is W p (z, J, t). According to equations 2.5 and 2.4 we then obtain
p p
Jik ε t − tik −
W p (z, J, t) =
p
i,k
tik
F(p, q) x̃q ε (t − ) .
(4.1)
q
Furthermore,
Q(z, J) :=
W p (z, J, t)2 dt + γ J2
(4.2)
p
denotes the quadratic error of the neuronal reconstruction mechanism when
a learning sample z is presented plus a penalty γ J2 for synaptic weights
that are too strong. This penalty is natural and ensures convergence of the
learning algorithm with γ > 0, as shown later. Let P(z) be the probability
distribution for the training samples z. In the following, we show that the
supervised STDP learning algorithm minimizes the cost function
C(J) := Q(z, J) dP(z)
(4.3)
under the condition that the learning rate η decreases at every learning step
ν with
ην2 < ∞ yet
ην = ∞ .
(4.4)
ν
ν
3120
J.-M. Franosch, S. Urban, and J. van Hemmen
Of course, the learning rate has to approach zero, since otherwise the system
does not converge but just fluctuates around the optimum. But the learning
rate should not decrease toofast; otherwise, the synaptic weights may never
leave an area with radius ν ην from the starting position. The following
three conditions, together with equation 4.4, are sufficient (Bottou, 1998,
2004) for almost sure convergence of the synaptic weights J to optimal
synaptic weights J∗ minimizing C(J):1
1. At learning step ν, the synaptic weights J are updated according to
the following rule,
Jν+1 = Jν − ην ∇J Q(zν , Jν ) .
(4.5)
2. Everywhere in the parameter space, the opposite gradient −∇JC(J)
of the cost function C points toward a unique, global minimum with
synaptic weights J∗ —more specifically, denoting by the superscript
T the transpose of a vector,
inf
(J−J∗ )2 >δ
(J − J∗ )T ∇JC(J) > 0
for all δ > 0.
(4.6)
3. The variance of the learning steps does not grow faster than quadratically with the parameters J; constants C1 and C2 exist such that
(4.7)
[∇J Q(z, J)]2 dP(z) < C1 + C2 (J − J∗ )2 .
We now verify the above three conditions of Bottou (1998, 2004) in the
present context that equation 4.4, in conjunction with supervised STDP as
treated here, leads to convergence. We therefore need to prove only that
supervised STDP implies equations 4.5 to 4.7. Invoking equation 4.4, we
will then be done.
Regarding condition 1, equation 4.5 is the supervised STDP learning rule
of equation 3.2 since, due to equations 4.1 and 4.2, we have
∂
q
p
Q(z
,
J)
=
2γ
J
+
2
ε(t − tik ) W p (t) dt.
p
ν
ik
∂Jik
p
til
Regarding condition 2, because of equations 4.1, 4.2, and 4.3, the cost
function C is quadratic in the parameters J and, with the help of a matrix A
and a vector a, it can be expressed as
C(J) = (AJ + a)2 + γ J2 =
= JT (AT A + γ I)J + 2aT AJ + a2 ,
1 For a slight revision of the detailed proof of equation 4.4 as a sufficient condition of
convergence, see http://leon.bottou.org/papers/bottou=98x.
Supervised Spike-Timing-Dependent Plasticity
3121
where I is the identity matrix. At an extremum, the gradient is zero,
∇JC(J∗ ) = 2(AT A + γ I)J∗ + 2AT a = 0 .
(4.8)
Since the matrix AT A + γ I with γ > 0 is positive and therefore invertible,
the cost function C has a unique minimum J∗ = −(AT A + γ I)−1 AT a. To
prove equation 4.6, we write
C(J) = (J − J∗ )T (AT A + γ I)(J − J∗ ) + C(J∗ )
and calculate
(J − J∗ )T ∇JC(J) = (J − J∗ )T 2(AT A + γ I)(J − J∗ ) > 0
for J − J∗ = 0.
Regarding condition 3, equation 4.7 holds because ∇J Q(z, J) is at most
linear in J since Q(z, J) is at most quadratic in J as a consequence of
equation 4.2, which completes the proof of the supervised STDP learning
algorithm minimizing the cost function equation 4.3 provided equation 4.4
holds. In the simulations, we have chosen a constant learning rate and show
empirically that even though in the simulations, conditions in equation 4.4
do not hold, the system nevertheless learns the given tasks.
5 Numerical Results
5.1 Learning a Linear Combination of Delayed Signals. In hand-eye
coordination tasks like hitting a tennis ball, decisions have to be made by
analyzing sensory input during a certain period of time—here, to determine the likely position where the ball will hit the racket. To determine
whether a neuronal network can in principle learn space-time-dependent
tasks with the proposed SSTDP learning rule, we investigate the simplest
possible time-dependent task of learning a linear combination of two delayed signals. In the tennis player’s brain, possibly multiple retinal signals
have to be combined for different points in time to determine the likely
future trajectory of a tennis ball. The neuronal implementation of delays
is easy, as there should be already a whole distribution of different time
delays that naturally occur in neuronal systems.
In our simple setup, a neuron was connected to only two input channels,
A and B, representing two sensory inputs. The signals nA (t) and nB (t) at
channels A and B were gaussian white noise with standard deviation 1.
Signal A was measured by 10 independent sensors. Five sensors generated spikes according to an inhomogeneous Poisson process with density
[nA (t)]+ and five with density [nA (t)]− where [n(t)]± := ±n(t) if n(t) ≷ 0
and 0 otherwise. The spikes originating from sensors measuring signal B
were generated in the same way. Each sensor provided input to 35 delay
3122
J.-M. Franosch, S. Urban, and J. van Hemmen
lines with delays 0, 1, 2, . . . , 34 ms, connected to a reconstruction neuron as
shown in Figure 1. The feedback system consisted of 10 independent inhomogeneous Poisson processes, 5 of them with density [ f (t)]+ and the other
5 with density [ f (t)]− where the function f (t) was a linear combination of
the delayed signals A and B, in particular, f (t) := 2nA (t − 3) + nB (t − 11).
The network was trained for 1 million iterations using a constant learning
rate ην = 10−4 and without weight decay. The evolution of the weights
during learning is shown in Figure 3. The weights with the correct delays (3
and 11) grow, whereas the weights with wrong delays decay to zero. After
1000 learning steps, we already see a clear structure. After 1 million learning
steps, weights have maximums at the “correct” delays 3 and 11, and weight
A is about two times stronger than weight B. Thus, the system can indeed
learn that to reconstruct the feedback, it has to calculate 2nA (t − 3) + nB (t −
11) and converges to a steady state. While 1 million training samples is a
number where humans can reach mastery in hand-eye coordination tasks,
1000 training samples could be accomplished in a few hours. We leave
systematic investigation of how complex tasks can be learned in how many
steps for subsequent analysis because the primary purpose of this article is
to prove the convergence of the SSTDP learning rule.
5.2 Learning the XOR Function. The network for learning binary logic
functions, Figure 4, consists of two logic inputs, A and B, a layer of intermediate neurons, and a single output neuron. The logic inputs are measured by sensors generating a spike if the respective input is active and
generating no spike otherwise. The primary afferent nerves coming from
the sensors connect to a layer of 10 intermediate neurons with thresholds
(TH) 0.1, 0.2, . . . , 1.0. As usual, these neurons generate a spike when their
membrane potential reaches the individual threshold. The outputs of the
intermediate layer converge onto a single output neuron. For training the
activations of the logic inputs, A and B are drawn independently from a
Bernoulli distribution with p = 0.5 at each iteration. As feedback, a single
spike is provided to the output neuron as well as the intermediate layer if
A XOR B is true. Since every neuron in our simulation takes 1 ms to update
its output after the input changes, it was necessary to provide the feedback
spike 2 ms after providing the sensory input spikes to accommodate for the
delay caused by the intermediate and output layers.
The network was trained for 100,000 iterations using a constant learning rate ην = 10−2 and without weight decay. The membrane potential of
the output neuron after the indicated number of training steps is shown in
Figure 5. In Figure 5a, the potential over time for the input A = 1 and B = 1
is shown; thus, the desired output is 1 XOR 1 = 0. Figure 5b shows the
maximum membrane potential over time. From the start of learning, the
membrane potential increases until it reaches a maximum after about 350
learning steps. When the membrane potential is maximal, the network has
learned the logic function “or” as the output membrane potential is high
0
A
B
1
2
3
4
5
6
7
8
9
10
delay [ms]
11
12
13
14
15
16
17
10 learning steps
100 learning steps
1000 learning steps
10000 learning steps
100000 learning steps
1000000 learning steps
weights
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1
b)
10
A offset=3
B offset=11
100
1000
learning step
10000
100000
1e+06
Figure 3: (a) The weight distribution at an input line coming from channels A (black solid line) and B (gray dashed line) in
dependence on the delay for the indicated numbers of learning steps. Only weights with delays between 0 ms and 17 ms are
shown here; weights for delays longer than 17 ms are negligibly small at all times. (b) Development of weights during the learning
process. After 1000 learning steps, we already see a clear structure, and after 1 million learning steps, only weights with correct
delays, 3 and 11 ms, are significantly larger than zero.
weight
a)
Supervised Spike-Timing-Dependent Plasticity
3123
3124
J.-M. Franosch, S. Urban, and J. van Hemmen
Figure 4: The structure of a network that can learn logic functions. The neurons
in the intermediate layer have the indicated thresholds (TH).
whether the input is (A, B) = (0, 1), (1, 0), or (1, 1). The output “or” is an
intermediate state in the learning procedure and still wrong. A neuron in
the intermediate layer with high threshold has learned “and.” As learning
continues, the output neuron begins to learn that “and” has to be subtracted
from “or.” Thus, the membrane potential for the input (A, B) = (1, 1) decreases again and clearly becomes lower than that for the inputs (0, 1)
and (1, 0). Thus the network learned the correct output for input (1, 1) as
1 XOR 1 = 0.
5.3 Learning a Dynamical System. A network as in Figure 1, consisting
of one reconstruction neuron, two sensors connected to the reconstruction
neuron, and using 301 axons per sensor, with delays 0, 1, . . . , 300 and one
feedback sensor, is trained to calculate the motion of a damped linear pendulum from the applied force. The motion of the pendulum was determined
by the differential equation
d2 x
dx
+ 0.06
+ 0.04 x = F(t) ,
dt 2
dt
where x(t) is the displacement and F(t) is the applied force of the network.
Two sensors measuring the force F on the pendulum are the input. One
sensor generates spikes according to an inhomogeneous Poisson process
with density [F(t)]+ and the other one with density [F(t)]− where [F(t)]± :=
±F(t) if F(t) ≷ 0 and 0 otherwise. The feedback is a sensor measuring the
displacement of the pendulum and generating random spikes through an
a)
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
5
10
15
time [ms]
20
25
max(output)
0
1
0.02
0.04
0.06
0.08
0.1
0.12
0.14
b)
10
100
1000
learn step
10000
100000
Figure 5: (a) The time-dependent membrane potential of the output neuron in Figure 4 after the indicated number of learning
steps. The logic network learns the XOR function. The input was A = 1 and B = 1; thus, the desired output is 1 XOR 1 = 0. (b) The
maximum of the membrane potential over time for the inputs (A, B) = (1, 0), (0, 1) and (1, 1) in dependence on the number of
learning steps. The desired outputs are 1 XOR 0 = 0 XOR 1 = 1 and 1 XOR 1 = 0.
potential
0.045
Supervised Spike-Timing-Dependent Plasticity
3125
3126
J.-M. Franosch, S. Urban, and J. van Hemmen
300
result
feedback
200
position
100
0
-100
-200
force
-300
1
0.5
0
-0.5
-1
0
100
200
300
400
time (ms)
500
600
700
800
Figure 6: A network as in Figure 1 has learned to calculate the motion of a
damped linear pendulum. The force F on the pendulum is shown in the lower
panel. The upper panel compares the membrane potential of the output neuron
(solid black line) that is generated by the feedforward process using the force
measurement to the membrane potential that comes from the feedback process
(dashed gray line) based on the actual position of the pendulum.
inhomogeneous Poisson process where the probability density of a spike is
proportional to the displacement of the pendulum.
The network was trained for 1000 iterations using a constant learning rate
ην = 10−6 and without weight decay. Figure 6 shows that after training, the
membrane potential is very similar to the actual position of the pendulum;
thus, the system has learned to calculate the displacement of a pendulum
given the history of forces acting on the pendulum.
6 Discussion
Supervised STDP converges, but where does convergence lead the neuronal
system to, that is, what are the system properties in the final state? The
learning algorithm minimizes C(J) defined by equation 4.3, where C(J)
is the expectation value of Q(z, J) with respect to all learning examples
Supervised Spike-Timing-Dependent Plasticity
3127
z according to equation 4.2. For γ → 0, we can neglect the term γ J2 in
equation 4.2. What remains is
W p (z, J, t)2 dt .
p
Accordingly, as γ → 0, the learning algorithm minimizes the quadratic
difference between the membrane potential V p at a reconstruction neuron
the
secondary sensory system and the feedback signal
caused by
q
F(p,
q)
x̃
ε
(t) (see equation 4.1). If we choose F(p, q) = 1 for p = q
q
and F(p, q) 1 elsewhere, then
F(p, q) x̃q ε (t) ≈ x̃q ε (t),
q
and at the minimum,
V p (t) ≈ x̃p ε (t − ) .
In the following, we discuss what happens in the fully annealed state,
when learning is finished and the learning constant is assumed to be zero.
If the feedback x vanishes (e.g., at night), when Xenopus’s visual system
cannot give feedback anymore, then the membrane potential W p (t) of the
reconstruction neurons is generated by the secondary system alone and
W p (t) = V p (t) (see equation 2.1). Therefore,
W p (t) ≈ x̃p ε (t − ) ,
that is, through their membrane potential W p (t), the reconstruction neurons
reconstruct the potential (x̃p ε)(t − ) caused by the primary sensors (e.g.,
the visual system), some time before. The reconstruction neurons can
therefore perfectly substitute the primary system in case of primary system
failure. In addition, when feedback is switched off, they can complement
the primary system, for example, under poor visual conditions.
A potential problem of the proposed learning mechanism is twofold.
First, learning might go on even when feedback is not available. Second,
to make actual use of the secondary system, feedback has to be switched
off. While we do not claim to have completely solved these problems, it
is imaginable that the system as a whole can determine when learning is
feasible and switch on or off learning and feedback by chemical or electrical
mechanisms.
Another potential caveat is that the method learns to reconstruct the
teacher’s spike train with a delay . This may be a problem for certain
sensory systems that need to create an instantaneous representation of the
3128
J.-M. Franosch, S. Urban, and J. van Hemmen
environment. However, if it is possible to compute the teacher’s signal
instantaneously by using a secondary sensory system, the proposed mechanism can learn to do so by setting to zero while still accessing sensory
input from the past using delay lines in the input. For = 0, the results may
not be as accurate as for, say, = 100 ms, because some useful information
is still missing from the potentially slow inputs. In this case, the actual sensory system may contain several learning networks, each with a different
delay , and the system may optimize the trade-off between accuracy and
latency by choosing an output from a specific as needed.
The supervised STDP algorithm we have just proposed can discern stimuli at different positions p because after convergence, if xp = 0 and xq = 0
for q = p,
W q ≈ x̃q ε (t − ) = 0,
whereas
W p ≈ x̃p ε (t − ) = 0 .
The
neuron with maximal membrane potential in the sense of maximal
[W p (t)]2 dt therefore indicates the position p of the stimulus.
6.1 Biological Implementation. Can real neurons implement the supervised STDP learning rule given by equation 3.2? One example of a learning
mechanism based on the membrane potential of the postsynaptic neuron
is long-term plasticity of glutamatergic synapses under participation of
NMDA receptors (Andersen, 1987). When the membrane potential on the
postsynaptic neuron is at or near the resting potential, the NMDA receptor is blocked. As soon as the membrane potential increases, the block is
released and the NMDA receptor may be activated by glutamate. A presynaptic action potential activates the NMDA receptor, which causes a Ca2+
influx into the neuron, which may lead to a strengthening or a weakening
of the synapse (Yang, Tang, & Zucker, 1999; Shouval, Bear, & Cooper, 2002;
Mu & Poo, 2006).
We could imagine two problems with a biological implementation of the
supervised STDP learning rule, equation 3.2, though. First, the algorithm
p
p
would require inhibitory (Ji j < 0) as well as excitatory (Ji j > 0) synapses
at the same axon (see Figure 1). This problem can be overcome easily by
additional interneurons. Second, during learning, a synapse can change its
sign: an inhibitory synapse can become excitatory, and vice versa. This is no
p
problem at all as long as for each inhibitory synapse with delay i j , there
is also an excitatory synapse with the same delay. Then, if according to
p
learning rule 3.2, the inhibitory synapse grows above Ji j = 0, the inhibitory
synapse is in fact stuck to zero and the excitatory synapse with the same
delay takes over. In a real system, however, time delays should be more or
less random, and thus the above condition is not satisfied. This does not
Supervised Spike-Timing-Dependent Plasticity
3129
matter either, because as soon as a synapse hits zero, it vanishes and then
the minimization takes place in a vector space where the vanished synapses
no longer take part in the minimization.
In numerical experiments, the system has successfully learned relevant
tasks. It determined that the feedback can be reconstructed by linearly combining two inputs with different delays and thus has learned a spatiotemporal problem (see Figure 3). Successful learning of the XOR problem in
another numerical experiment shows that SSTDP with an additional intermediate layer is suitable even for learning nonlinear problems (see Figures
4 and 5). As a real-world problem, SSTDP has successfully learned to predict the movement of a linear pendulum driven by a random force with the
force as input and the position of the pendulum as feedback (see Figure 6).
It is therefore fair to state that supervised STDP is not only a provably
converging learning mechanism, but it can also be implemented straightforwardly in biological systems. Moreover, if the neuron is able to reconstruct
the primary sensory input, that is, if synaptic weights J∗ exist that make the
error C(J) to approach zero, the neuron is also able to learn these optimal
synaptic weights by supervised STDP. Finally, supervised STDP is able to
discriminate among stimuli coming from different positions.
Acknowledgments
This work has been supported by the BMBF through the Bernstein Center
for Computational Neuroscience, Munich.
References
Andersen, P. (1987). Long-term potentiation: Outstanding problems. In J.-P.
Changeux & M. Konishi (Eds.), The Neural and Molecular Basis of Learning (pp.
239–262). New York: Wiley.
Bottou, L. (1998). Online learning and stochastic approximations. In D. Saad (Ed.),
Online learning in neural networks. Cambridge: Cambridge University Press.
Bottou, L. (2004). Stochastic learning. In O. Bousquet & U. von Luxburg (Eds.),
Advanced lectures on machine learning (pp. 146–168). Berlin: Springer.
Bürck, M., Friedel, P., Sichert, A. B., Vossen, C., & van Hemmen, J. L. (2010). Optimality in mono- and multisensory map formation. Biol. Cybern., 103, 1–20.
Franosch J.-M. P., Lingenheil, M., & van Hemmen, J. L. (2005). How a frog can learn
what is where in the dark. Phys. Rev. Lett., 95, 078106.
Gerstner, W., & van Hemmen, J. L. (1994). In E. Domany, J. L. van Hemmen, &
K. Schulten (Eds.), Models of neural networks II (pp. 39–47). New York: Springer.
Gerstner, W., Kempter, R., van Hemmen, J. L., & Wagner H., (1996). A neuronal
learning rule for sub-millisecond temporal coding. Nature, 383, 76–81.
Gütig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that learns spike
timing-based decisions. Nat. Neurosci., 9, 420–428.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley.
3130
J.-M. Franosch, S. Urban, and J. van Hemmen
Kawato, M. (1990). Feedback-error-learning neural network for supervised learning. In R. Eckmiller (Ed.), Advanced neural computers (pp. 365–372). Amsterdam:
Elsevier.
Kempter, R., Gerstner, W., van Hemmen, J. L., & Wagner, H. (1998). Extracting
oscillations: Neuronal coincidence detection with noisy periodic spike input.
Neural Comput., 10, 1987–2017.
Kempter, R., Leibold, C., Wagner, H., & van Hemmen, J. L. (2001). Formation of
temporal-feature maps by axonal propagation of synaptic learning. Proc. Natl.
Acad. Sci. USA, 98, 4166–4171.
Kistler, W. M., Gerstner, W., & van Hemmen, J. L. (1997). Reduction of the HodgkinHuxley equations to a single-variable threshold model. Neural Comput., 9, 1015–
1045.
Knudsen, E. I., Knudsen, P. F., & Esterly, S. D. (1982). Early auditory experience
modifies sound localization in barn owls. Nature, 295, 238–240.
Kühn, S., Beyn W.-J., & Cruse H., (2007). Modelling memory functions with recurrent
neural networks consisting of input compensation units: I. Static situations. Biol.
Cybern., 96, 455–470.
Kühn, S., & Cruse H., (2007). Modelling memory functions with recurrent neural
networks consisting of input compensation units: II. Dynamic situations. Biol.
Cybern., 96, 471–486.
Legenstein, R., Naeger, C., & Maass, W. (2005). What can a neuron learn with spiketiming-dependent plasticity? Neural Comput., 17, 2337–2382.
Leibold, C., Kempter, R., & van Hemmen, J. L. (2001). Temporal map formation in
the barn owl’s brain. Phys. Rev. Lett., 87, 248101.
Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regulation of synaptic
efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275, 213–215.
Mu, Y., & Poo, M.-m. (2006). Spike timing-dependent LTP/LTD mediates visual
experience-dependent plasticity in a developing retinotectal system. Neuron, 50,
115–125.
Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal
of Mathematical Biology, 15, 267–273.
Porr, B., & Wörgötter, F. (2003). Isotropic sequence order learning Neural Computation,
15, 831–864.
Shouval, H. Z., Bear, M. F., & Cooper, L. N. (2002). A unified model of NMDA
receptor-dependent bidirectional synaptic plasticity. Proc. Natl. Acad. Sci. USA,
99, 10831–10836.
Swinehart, C. D., & Abbott, L. F. (2005). Supervised learning through neuronal
response modulation. Neural Comput., 17, 609–631.
van Hemmen, J. L. (2001). Theory of synaptic plasticity. In F. Moss & S. Gielen
(Eds.), Handbook of Biological Physics, Vol. 4. Neuro-Informatics, Neural Modelling
(pp. 771–823). Amsterdam: Elsevier.
Yang, S.-N., Tang, Y.-G., & Zucker, R. S. (1999). Selective induction of LTP and LTD
by postsynaptic Ca elevation. J. Neurophysiol., 81, 781–787.
Received December 17, 2011; accepted June 10, 2013.