ARTICLE Communicated by Jochen Triesch Supervised Spike-Timing-Dependent Plasticity: A Spatiotemporal Neuronal Learning Rule for Function Approximation and Decisions Jan-Moritz P. Franosch [email protected] Google Switzerland GmbH, 8002 Zurich, Switzerland Sebastian Urban [email protected] J. Leo van Hemmen [email protected] Physik Department, Technische Universität München, 857478 Garching bei München, Germany How can an animal learn from experience? How can it train sensors, such as the auditory or tactile system, based on other sensory input such as the visual system? Supervised spike-timing-dependent plasticity (supervised STDP) is a possible answer. Supervised STDP trains one modality using input from another one as “supervisor.” Quite complex timedependent relationships between the senses can be learned. Here we prove that under very general conditions, supervised STDP converges to a stable configuration of synaptic weights leading to a reconstruction of primary sensory input. 1 Introduction Humans and animals are able to use multiple senses simultaneously or separately so as to perform a task, for example, the visual, auditory, and proprioceptive sense to localize or grasp a target. The input to various senses therefore has to be combined so as to form a consistent internal representation of the environment. Accordingly, the cooperation of different senses has to be, and can be, learned (Knudsen, Knudsen, & Esterly, 1982). Learning processes are assumed to be based on synaptic changes. An alternative mechanism for learning is modifying neuronal responsiveness rather than synaptic weights (Swinehart & Abbott, 2005). Here, however, we restrict ourselves to learning rules that modify synaptic weights. A possible biological mechanism is spike-timing-dependent plasticity (STDP) (Gerstner, Kempter, van Hemmen, & Wagner, 1996; Markram, Lübke, Frotscher, & Sakmann, 1997; van Hemmen, 2001), where synapses change according to temporal input-output correlations experienced by a neuron. STDP can Neural Computation 25, 3113–3130 (2013) doi:10.1162/NECO_a_00520 c 2013 Massachusetts Institute of Technology 3114 J.-M. Franosch, S. Urban, and J. van Hemmen be used in unsupervised learning, for example, to attain high-precision auditory localization and form auditory maps (Kempter, Leibold, Wagner, & van Hemmen, 2001; Leibold, Kempter, & van Hemmen, 2001). In this article, we specify an algorithm showing how through supervised STDP, a supervised version of STDP, a neuronal network can learn how to localize and discriminate prey that emits waves on a water surface. The preylocalization technique may consist, for instance (Franosch, Lingenheil, & van Hemmen, 2005), of space- and time-dependent measurements of water velocity on the skin of a frog such as Xenopus, which has 180 lateral-line detectors all over its body. The appropriate supervisor depends on the context. By forcing a neuron to fire when it should according to a reference neuron that already has the desired synaptic weights J∗ , Legenstein, Naeger, and Maass (2005) have shown that a second neuron can “learn” J∗ by STDP under certain conditions regarding the correlations of the input spike trains for Poisson neurons (Kempter, Gerstner, van Hemmen, & Wagner, 1998; van Hemmen, 2001). Supervised STDP, as we show here, converges under far more general conditions, completely independent of the input and a neuron’s spike generation process. Supervised STDP uses spike times and the postsynaptic membrane potential only to determine changes to synaptic weights and is therefore biologically plausible. Moreover, it features supervised learning of timedependent signals using time-dependent and time-delayed input signals, which distinguishes it from most classical learning rules found in the literature, such as Oja’s rule (Oja, 1982), feedback-error learning (Kawato, 1990), and iso-learning (Porr & Wörgötter, 2003). While the tempotron (Gütig & Sompolinsky, 2006) uses a voltage convolution–based learning rule to discriminate between different spatiotemporal sequences, the supervised STDP learning rule we present here is also voltage convolution–based (Franosch et al., 2005) but far more general. Supervised STDP is designed not only to discriminate between different spike sequences but even to reconstruct a teacher signal. Moreover, supervised STDP includes delayed spike times, which are necessary to learn time-dependent tasks. A perceptron converges to stable synaptic weights only if the input vectors are linearly separable (Hertz, Krogh, & Palmer, 1991). It is therefore not quite clear under which conditions the tempotron converges, because the tempotron is a generalization of the perceptron. Artificial recurrent neural networks can learn online by using local information in each neuron only (Kühn, Beyn, & Cruse, 2007). A recurrent network is able to learn linear systems of equations and simple patterns of activation. The network can even learn dynamics described by linear differential equations (Kühn & Cruse, 2007). How the hypothesized artificial neurons could, however, be implemented by real biological neurons remains unclear since the neuron required to perform the task has a rather complex internal structure. The supervised STDP that we investigate here is an alternative learning mechanism that we believe could be implemented Supervised Spike-Timing-Dependent Plasticity 3115 Figure 1: Circuit diagram of a reconstruction neuron (large open circle) with membrane potential W p . The neuron is connected to sensors (light gray circles) measuring the time-dependent variables yi and thus generating spike trains ỹi . The axons of the sensors cause a delay i j of the spike train ỹi and terminate in synapses (small black circles) with strength Jij . The reconstruction neuron also gets feedback from a second system of sensors (dark gray circles) measuring the time-dependent variables xq and generating the spike trains x̃q . The strength of the feedback is −F(p, q), and the delay is . Although in theory we could have multiple delays for the feedback as well, the convergence proof and simulations show that a single delayed feedback is sufficient. For each position p, there is a different reconstruction neuron (the figure shows only one of them). It learns to reconstruct the stimulus xp through its membrane potential W p with the help of feedback. in real neurons, although, to our knowledge, there is no direct biological evidence. 2 Neuronal Hardware The neuronal hardware needed for supervised spike-timing-dependent plasticity (SSTDP) is very simple and quite general, and it may be found in nature frequently. We assume that the animal has sensors (such as those in the retina, or cochlea, or the lateral line) measuring some time-dependent variables xp at positions p in the environment. For instance, those positions can be the places where visual sensors measure light intensity or for the auditory system where the two tympani measure sound pressure. For each position p, there is a different sensor generating the spike train x̃p (see Figure 1). In addition, related to each perception system, an animal typically has at least one second system, such as a tactile or auditory system, that performs neuronal information processing based on sensory input from the outside 3116 J.-M. Franosch, S. Urban, and J. van Hemmen world. The second system has yet to be trained to compute the same variables xp internally, in the brain. For that purpose, it has reconstruction neurons for each position p trying to reconstruct the original signal xp . The second system gets sensory input from n sensors i, measuring signals yi , and generating spike trains ỹi . Throughout what follows, the tilde, as in ỹi , indicates a spike train. Peripheral nerves conduct the action potentials with p p time-delay ik to synapses with weights Jik . Here the variable k numbers different branchings of the same axon and thus different synaptic weights and delay times. The synapses are connected to the reconstruction neuron responsible for position p. To calculate the membrane potential W p at a reconstruction neuron, we p denote by ti the times when a spike is generated by sensor i. In the spike response neuron model (Gerstner & van Hemmen, 1994; Kistler, Gerstner, & van Hemmen, 1997), which we adapt here for convenience, each spike p p generates a postsynaptic potential Jik ε with synaptic strength Jik . The timedependent function ε describes the form of an excitatory postsynaptic potential generated by a synapse of strength 1. For the numerical simulations, ε(t) = t −t/t e r tr (2.1) with the rise time of the response tr = 3 ms. The part of the membrane potential generated by the sensory nerves V p reads V p (t) = i,k p p p Jik ε t − ti − ik . (2.2) p ti Using spike trains ỹi described by ỹi (t) = p δ t − ti , (2.3) p ti we may also write, with denoting a convolution, V p (t) = p p Jik ỹi ε t − ik . (2.4) i,k The reconstruction neurons get feedback signals from the sensors directly measuring the stimulus xp or from more precise modalities such as—most of the time but not always—vision (Bürck, Friedel, Sichert, Vossen, & van Hemmen, 2010); here the Tectum opticum plays a key role. Let F(p, q) be the synaptic strength connecting the sensor measuring its stimulus at position q with the reconstruction neuron responsible for position p. Then Supervised Spike-Timing-Dependent Plasticity 3117 the membrane potential at the reconstruction neurons is supposed to be given by W p (t) = V p (t) − F(p, q) x̃q ε (t − ) . (2.5) q It is necessary to allow connections between sensors and reconstruction neurons with p = q since the distributed representation of a signal originating at p may be spread out over sensors at several locations q. 3 Supervised Spike-Timing-Dependent Plasticity Neuronal Learning Algorithm Here we investigate what at first sight seems to be a slight modification of the supervised spike-timing-dependent plasticity algorithm (Franosch et al., 2005) for neuronal learning. In learning step ν, the synaptic weights p Jik are updated according to p p p Jikν+1 = Jikν + Jikν , (3.1) p where Jikν is the change of synaptic weight. This change is caused by a p p p correlation of the input spikes arriving at times tik := ti + ik and the output signal of the neuron. Here we put ⎡ Jikν = −2ην ⎣γ Jikν p p ⎤ p + ε t − tik W p (t) dt ⎦ . (3.2) p tik The small learning constant ην ≥ 0 decreases as the learning step ν proceeds in time. The integration is taken over one stimulus sample duration, and we notice the minus sign in front of ην . Here a “stimulus sample” is meant to be a presentation of a meaningful stimulus that typically requires some action. Of course in reality, synapses may well change during stimulus presentation, but as ην is small, these changes can be neglected (van Hemmen, 2001). For γ ≈ 0 but γ > 0, the first term of equation 3.2 is a small decay p of the synaptic weights proportional to the actual synaptic strength Jikν . As shown later, the slight modification γ > 0 of supervised STDP ensures convergence of the learning algorithm. It is also biologically plausible that synapses may lose some of their strength as time proceeds. The second term on the right in equation 3.2 is the supervised spiketiming-dependent plasticity term. It correlates the postsynaptic potential ε caused by incoming spikes with the output signal, that is, the membrane potential W p . W p (t) is high and there is an incoming spike at time arriving p p p p tik just before t — tik < t but tik ≈ t — the component ε(t − tik ) W p (t) dt 3118 J.-M. Franosch, S. Urban, and J. van Hemmen W(t) excitatory synapse before ... W(t) ... and after update t incoming spike inhibitory synapse before ... ... and after update Figure 2: Illustration of supervised spike-timing-dependent plasticity, equation 3.2, for γ = 0. If a spike arrives at an excitatory synapse at time t = 0 just before the membrane potential W is positive, synaptic strength decreases. If the membrane potential after spike arrival is negative, synaptic strength increases. The supervised STDP learning rule is therefore anti-Hebbian. An inhibitory synapse increases its inhibitory strength in case of positive membrane potential after spike arrival. Finally, no output (W = 0) induces no effect. p of the SSTDP term is positive, decreasing the synaptic strength Jikν . As a positive correlation between input and output causes a decrease in synaptic strength, the proposed learning rule is anti-Hebbian. Figure 2 illustrates the supervised STDP learning rule. To make the learning rule Hebbian instead of anti-Hebbian, one could change the feedback from negative to positive, which would invert the sign of W in equation 2.5. Then the secondary sensory system would learn to cancel the positive feedback by generating a negative postsynaptic potential. The mechanism to read out the result of the secondary system after learning that we propose here is to simply switch off the feedback in such Supervised Spike-Timing-Dependent Plasticity 3119 a way that the (positive) output of the secondary system remains. With the alternative Hebbian approach, however, it is hard to see how one could read out the learned signal because a negative membrane potential generates no spikes. Further on, we therefore stick to negative feedback. 4 Convergence of Supervised STDP Convergence of the supervised STDP algorithm has already been shown numerically for γ = 0 and Poisson neurons (Franosch et al., 2005). In the following, we prove that for the seemingly slight modification of the supervised STDP algorithm with γ > 0 introduced here, convergence almost surely happens under rather general conditions. p In the following, the vector J indicates all synaptic weights (Jik )p,i,k . By the variable z, we denote a learning example. A learning example z consists of a stimulus position p together with a random time-dependent stimulus x : t → x(t), causing a spike train x̃ in the feedback system and spike trains ỹi in the nerves of the secondary sensory system. That is, z is the (n + 2)tuple z = (p, x̃, ỹ1 , . . . , ỹn ). The membrane potential of the reconstruction neuron responsible for position p at time t in response to a training sample z is W p (z, J, t). According to equations 2.5 and 2.4 we then obtain p p Jik ε t − tik − W p (z, J, t) = p i,k tik F(p, q) x̃q ε (t − ) . (4.1) q Furthermore, Q(z, J) := W p (z, J, t)2 dt + γ J2 (4.2) p denotes the quadratic error of the neuronal reconstruction mechanism when a learning sample z is presented plus a penalty γ J2 for synaptic weights that are too strong. This penalty is natural and ensures convergence of the learning algorithm with γ > 0, as shown later. Let P(z) be the probability distribution for the training samples z. In the following, we show that the supervised STDP learning algorithm minimizes the cost function C(J) := Q(z, J) dP(z) (4.3) under the condition that the learning rate η decreases at every learning step ν with ην2 < ∞ yet ην = ∞ . (4.4) ν ν 3120 J.-M. Franosch, S. Urban, and J. van Hemmen Of course, the learning rate has to approach zero, since otherwise the system does not converge but just fluctuates around the optimum. But the learning rate should not decrease toofast; otherwise, the synaptic weights may never leave an area with radius ν ην from the starting position. The following three conditions, together with equation 4.4, are sufficient (Bottou, 1998, 2004) for almost sure convergence of the synaptic weights J to optimal synaptic weights J∗ minimizing C(J):1 1. At learning step ν, the synaptic weights J are updated according to the following rule, Jν+1 = Jν − ην ∇J Q(zν , Jν ) . (4.5) 2. Everywhere in the parameter space, the opposite gradient −∇JC(J) of the cost function C points toward a unique, global minimum with synaptic weights J∗ —more specifically, denoting by the superscript T the transpose of a vector, inf (J−J∗ )2 >δ (J − J∗ )T ∇JC(J) > 0 for all δ > 0. (4.6) 3. The variance of the learning steps does not grow faster than quadratically with the parameters J; constants C1 and C2 exist such that (4.7) [∇J Q(z, J)]2 dP(z) < C1 + C2 (J − J∗ )2 . We now verify the above three conditions of Bottou (1998, 2004) in the present context that equation 4.4, in conjunction with supervised STDP as treated here, leads to convergence. We therefore need to prove only that supervised STDP implies equations 4.5 to 4.7. Invoking equation 4.4, we will then be done. Regarding condition 1, equation 4.5 is the supervised STDP learning rule of equation 3.2 since, due to equations 4.1 and 4.2, we have ∂ q p Q(z , J) = 2γ J + 2 ε(t − tik ) W p (t) dt. p ν ik ∂Jik p til Regarding condition 2, because of equations 4.1, 4.2, and 4.3, the cost function C is quadratic in the parameters J and, with the help of a matrix A and a vector a, it can be expressed as C(J) = (AJ + a)2 + γ J2 = = JT (AT A + γ I)J + 2aT AJ + a2 , 1 For a slight revision of the detailed proof of equation 4.4 as a sufficient condition of convergence, see http://leon.bottou.org/papers/bottou=98x. Supervised Spike-Timing-Dependent Plasticity 3121 where I is the identity matrix. At an extremum, the gradient is zero, ∇JC(J∗ ) = 2(AT A + γ I)J∗ + 2AT a = 0 . (4.8) Since the matrix AT A + γ I with γ > 0 is positive and therefore invertible, the cost function C has a unique minimum J∗ = −(AT A + γ I)−1 AT a. To prove equation 4.6, we write C(J) = (J − J∗ )T (AT A + γ I)(J − J∗ ) + C(J∗ ) and calculate (J − J∗ )T ∇JC(J) = (J − J∗ )T 2(AT A + γ I)(J − J∗ ) > 0 for J − J∗ = 0. Regarding condition 3, equation 4.7 holds because ∇J Q(z, J) is at most linear in J since Q(z, J) is at most quadratic in J as a consequence of equation 4.2, which completes the proof of the supervised STDP learning algorithm minimizing the cost function equation 4.3 provided equation 4.4 holds. In the simulations, we have chosen a constant learning rate and show empirically that even though in the simulations, conditions in equation 4.4 do not hold, the system nevertheless learns the given tasks. 5 Numerical Results 5.1 Learning a Linear Combination of Delayed Signals. In hand-eye coordination tasks like hitting a tennis ball, decisions have to be made by analyzing sensory input during a certain period of time—here, to determine the likely position where the ball will hit the racket. To determine whether a neuronal network can in principle learn space-time-dependent tasks with the proposed SSTDP learning rule, we investigate the simplest possible time-dependent task of learning a linear combination of two delayed signals. In the tennis player’s brain, possibly multiple retinal signals have to be combined for different points in time to determine the likely future trajectory of a tennis ball. The neuronal implementation of delays is easy, as there should be already a whole distribution of different time delays that naturally occur in neuronal systems. In our simple setup, a neuron was connected to only two input channels, A and B, representing two sensory inputs. The signals nA (t) and nB (t) at channels A and B were gaussian white noise with standard deviation 1. Signal A was measured by 10 independent sensors. Five sensors generated spikes according to an inhomogeneous Poisson process with density [nA (t)]+ and five with density [nA (t)]− where [n(t)]± := ±n(t) if n(t) ≷ 0 and 0 otherwise. The spikes originating from sensors measuring signal B were generated in the same way. Each sensor provided input to 35 delay 3122 J.-M. Franosch, S. Urban, and J. van Hemmen lines with delays 0, 1, 2, . . . , 34 ms, connected to a reconstruction neuron as shown in Figure 1. The feedback system consisted of 10 independent inhomogeneous Poisson processes, 5 of them with density [ f (t)]+ and the other 5 with density [ f (t)]− where the function f (t) was a linear combination of the delayed signals A and B, in particular, f (t) := 2nA (t − 3) + nB (t − 11). The network was trained for 1 million iterations using a constant learning rate ην = 10−4 and without weight decay. The evolution of the weights during learning is shown in Figure 3. The weights with the correct delays (3 and 11) grow, whereas the weights with wrong delays decay to zero. After 1000 learning steps, we already see a clear structure. After 1 million learning steps, weights have maximums at the “correct” delays 3 and 11, and weight A is about two times stronger than weight B. Thus, the system can indeed learn that to reconstruct the feedback, it has to calculate 2nA (t − 3) + nB (t − 11) and converges to a steady state. While 1 million training samples is a number where humans can reach mastery in hand-eye coordination tasks, 1000 training samples could be accomplished in a few hours. We leave systematic investigation of how complex tasks can be learned in how many steps for subsequent analysis because the primary purpose of this article is to prove the convergence of the SSTDP learning rule. 5.2 Learning the XOR Function. The network for learning binary logic functions, Figure 4, consists of two logic inputs, A and B, a layer of intermediate neurons, and a single output neuron. The logic inputs are measured by sensors generating a spike if the respective input is active and generating no spike otherwise. The primary afferent nerves coming from the sensors connect to a layer of 10 intermediate neurons with thresholds (TH) 0.1, 0.2, . . . , 1.0. As usual, these neurons generate a spike when their membrane potential reaches the individual threshold. The outputs of the intermediate layer converge onto a single output neuron. For training the activations of the logic inputs, A and B are drawn independently from a Bernoulli distribution with p = 0.5 at each iteration. As feedback, a single spike is provided to the output neuron as well as the intermediate layer if A XOR B is true. Since every neuron in our simulation takes 1 ms to update its output after the input changes, it was necessary to provide the feedback spike 2 ms after providing the sensory input spikes to accommodate for the delay caused by the intermediate and output layers. The network was trained for 100,000 iterations using a constant learning rate ην = 10−2 and without weight decay. The membrane potential of the output neuron after the indicated number of training steps is shown in Figure 5. In Figure 5a, the potential over time for the input A = 1 and B = 1 is shown; thus, the desired output is 1 XOR 1 = 0. Figure 5b shows the maximum membrane potential over time. From the start of learning, the membrane potential increases until it reaches a maximum after about 350 learning steps. When the membrane potential is maximal, the network has learned the logic function “or” as the output membrane potential is high 0 A B 1 2 3 4 5 6 7 8 9 10 delay [ms] 11 12 13 14 15 16 17 10 learning steps 100 learning steps 1000 learning steps 10000 learning steps 100000 learning steps 1000000 learning steps weights -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1 b) 10 A offset=3 B offset=11 100 1000 learning step 10000 100000 1e+06 Figure 3: (a) The weight distribution at an input line coming from channels A (black solid line) and B (gray dashed line) in dependence on the delay for the indicated numbers of learning steps. Only weights with delays between 0 ms and 17 ms are shown here; weights for delays longer than 17 ms are negligibly small at all times. (b) Development of weights during the learning process. After 1000 learning steps, we already see a clear structure, and after 1 million learning steps, only weights with correct delays, 3 and 11 ms, are significantly larger than zero. weight a) Supervised Spike-Timing-Dependent Plasticity 3123 3124 J.-M. Franosch, S. Urban, and J. van Hemmen Figure 4: The structure of a network that can learn logic functions. The neurons in the intermediate layer have the indicated thresholds (TH). whether the input is (A, B) = (0, 1), (1, 0), or (1, 1). The output “or” is an intermediate state in the learning procedure and still wrong. A neuron in the intermediate layer with high threshold has learned “and.” As learning continues, the output neuron begins to learn that “and” has to be subtracted from “or.” Thus, the membrane potential for the input (A, B) = (1, 1) decreases again and clearly becomes lower than that for the inputs (0, 1) and (1, 0). Thus the network learned the correct output for input (1, 1) as 1 XOR 1 = 0. 5.3 Learning a Dynamical System. A network as in Figure 1, consisting of one reconstruction neuron, two sensors connected to the reconstruction neuron, and using 301 axons per sensor, with delays 0, 1, . . . , 300 and one feedback sensor, is trained to calculate the motion of a damped linear pendulum from the applied force. The motion of the pendulum was determined by the differential equation d2 x dx + 0.06 + 0.04 x = F(t) , dt 2 dt where x(t) is the displacement and F(t) is the applied force of the network. Two sensors measuring the force F on the pendulum are the input. One sensor generates spikes according to an inhomogeneous Poisson process with density [F(t)]+ and the other one with density [F(t)]− where [F(t)]± := ±F(t) if F(t) ≷ 0 and 0 otherwise. The feedback is a sensor measuring the displacement of the pendulum and generating random spikes through an a) 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 5 10 15 time [ms] 20 25 max(output) 0 1 0.02 0.04 0.06 0.08 0.1 0.12 0.14 b) 10 100 1000 learn step 10000 100000 Figure 5: (a) The time-dependent membrane potential of the output neuron in Figure 4 after the indicated number of learning steps. The logic network learns the XOR function. The input was A = 1 and B = 1; thus, the desired output is 1 XOR 1 = 0. (b) The maximum of the membrane potential over time for the inputs (A, B) = (1, 0), (0, 1) and (1, 1) in dependence on the number of learning steps. The desired outputs are 1 XOR 0 = 0 XOR 1 = 1 and 1 XOR 1 = 0. potential 0.045 Supervised Spike-Timing-Dependent Plasticity 3125 3126 J.-M. Franosch, S. Urban, and J. van Hemmen 300 result feedback 200 position 100 0 -100 -200 force -300 1 0.5 0 -0.5 -1 0 100 200 300 400 time (ms) 500 600 700 800 Figure 6: A network as in Figure 1 has learned to calculate the motion of a damped linear pendulum. The force F on the pendulum is shown in the lower panel. The upper panel compares the membrane potential of the output neuron (solid black line) that is generated by the feedforward process using the force measurement to the membrane potential that comes from the feedback process (dashed gray line) based on the actual position of the pendulum. inhomogeneous Poisson process where the probability density of a spike is proportional to the displacement of the pendulum. The network was trained for 1000 iterations using a constant learning rate ην = 10−6 and without weight decay. Figure 6 shows that after training, the membrane potential is very similar to the actual position of the pendulum; thus, the system has learned to calculate the displacement of a pendulum given the history of forces acting on the pendulum. 6 Discussion Supervised STDP converges, but where does convergence lead the neuronal system to, that is, what are the system properties in the final state? The learning algorithm minimizes C(J) defined by equation 4.3, where C(J) is the expectation value of Q(z, J) with respect to all learning examples Supervised Spike-Timing-Dependent Plasticity 3127 z according to equation 4.2. For γ → 0, we can neglect the term γ J2 in equation 4.2. What remains is W p (z, J, t)2 dt . p Accordingly, as γ → 0, the learning algorithm minimizes the quadratic difference between the membrane potential V p at a reconstruction neuron the secondary sensory system and the feedback signal caused by q F(p, q) x̃ ε (t) (see equation 4.1). If we choose F(p, q) = 1 for p = q q and F(p, q) 1 elsewhere, then F(p, q) x̃q ε (t) ≈ x̃q ε (t), q and at the minimum, V p (t) ≈ x̃p ε (t − ) . In the following, we discuss what happens in the fully annealed state, when learning is finished and the learning constant is assumed to be zero. If the feedback x vanishes (e.g., at night), when Xenopus’s visual system cannot give feedback anymore, then the membrane potential W p (t) of the reconstruction neurons is generated by the secondary system alone and W p (t) = V p (t) (see equation 2.1). Therefore, W p (t) ≈ x̃p ε (t − ) , that is, through their membrane potential W p (t), the reconstruction neurons reconstruct the potential (x̃p ε)(t − ) caused by the primary sensors (e.g., the visual system), some time before. The reconstruction neurons can therefore perfectly substitute the primary system in case of primary system failure. In addition, when feedback is switched off, they can complement the primary system, for example, under poor visual conditions. A potential problem of the proposed learning mechanism is twofold. First, learning might go on even when feedback is not available. Second, to make actual use of the secondary system, feedback has to be switched off. While we do not claim to have completely solved these problems, it is imaginable that the system as a whole can determine when learning is feasible and switch on or off learning and feedback by chemical or electrical mechanisms. Another potential caveat is that the method learns to reconstruct the teacher’s spike train with a delay . This may be a problem for certain sensory systems that need to create an instantaneous representation of the 3128 J.-M. Franosch, S. Urban, and J. van Hemmen environment. However, if it is possible to compute the teacher’s signal instantaneously by using a secondary sensory system, the proposed mechanism can learn to do so by setting to zero while still accessing sensory input from the past using delay lines in the input. For = 0, the results may not be as accurate as for, say, = 100 ms, because some useful information is still missing from the potentially slow inputs. In this case, the actual sensory system may contain several learning networks, each with a different delay , and the system may optimize the trade-off between accuracy and latency by choosing an output from a specific as needed. The supervised STDP algorithm we have just proposed can discern stimuli at different positions p because after convergence, if xp = 0 and xq = 0 for q = p, W q ≈ x̃q ε (t − ) = 0, whereas W p ≈ x̃p ε (t − ) = 0 . The neuron with maximal membrane potential in the sense of maximal [W p (t)]2 dt therefore indicates the position p of the stimulus. 6.1 Biological Implementation. Can real neurons implement the supervised STDP learning rule given by equation 3.2? One example of a learning mechanism based on the membrane potential of the postsynaptic neuron is long-term plasticity of glutamatergic synapses under participation of NMDA receptors (Andersen, 1987). When the membrane potential on the postsynaptic neuron is at or near the resting potential, the NMDA receptor is blocked. As soon as the membrane potential increases, the block is released and the NMDA receptor may be activated by glutamate. A presynaptic action potential activates the NMDA receptor, which causes a Ca2+ influx into the neuron, which may lead to a strengthening or a weakening of the synapse (Yang, Tang, & Zucker, 1999; Shouval, Bear, & Cooper, 2002; Mu & Poo, 2006). We could imagine two problems with a biological implementation of the supervised STDP learning rule, equation 3.2, though. First, the algorithm p p would require inhibitory (Ji j < 0) as well as excitatory (Ji j > 0) synapses at the same axon (see Figure 1). This problem can be overcome easily by additional interneurons. Second, during learning, a synapse can change its sign: an inhibitory synapse can become excitatory, and vice versa. This is no p problem at all as long as for each inhibitory synapse with delay i j , there is also an excitatory synapse with the same delay. Then, if according to p learning rule 3.2, the inhibitory synapse grows above Ji j = 0, the inhibitory synapse is in fact stuck to zero and the excitatory synapse with the same delay takes over. In a real system, however, time delays should be more or less random, and thus the above condition is not satisfied. This does not Supervised Spike-Timing-Dependent Plasticity 3129 matter either, because as soon as a synapse hits zero, it vanishes and then the minimization takes place in a vector space where the vanished synapses no longer take part in the minimization. In numerical experiments, the system has successfully learned relevant tasks. It determined that the feedback can be reconstructed by linearly combining two inputs with different delays and thus has learned a spatiotemporal problem (see Figure 3). Successful learning of the XOR problem in another numerical experiment shows that SSTDP with an additional intermediate layer is suitable even for learning nonlinear problems (see Figures 4 and 5). As a real-world problem, SSTDP has successfully learned to predict the movement of a linear pendulum driven by a random force with the force as input and the position of the pendulum as feedback (see Figure 6). It is therefore fair to state that supervised STDP is not only a provably converging learning mechanism, but it can also be implemented straightforwardly in biological systems. Moreover, if the neuron is able to reconstruct the primary sensory input, that is, if synaptic weights J∗ exist that make the error C(J) to approach zero, the neuron is also able to learn these optimal synaptic weights by supervised STDP. Finally, supervised STDP is able to discriminate among stimuli coming from different positions. Acknowledgments This work has been supported by the BMBF through the Bernstein Center for Computational Neuroscience, Munich. References Andersen, P. (1987). Long-term potentiation: Outstanding problems. In J.-P. Changeux & M. Konishi (Eds.), The Neural and Molecular Basis of Learning (pp. 239–262). New York: Wiley. Bottou, L. (1998). Online learning and stochastic approximations. In D. Saad (Ed.), Online learning in neural networks. Cambridge: Cambridge University Press. Bottou, L. (2004). Stochastic learning. In O. Bousquet & U. von Luxburg (Eds.), Advanced lectures on machine learning (pp. 146–168). Berlin: Springer. Bürck, M., Friedel, P., Sichert, A. B., Vossen, C., & van Hemmen, J. L. (2010). Optimality in mono- and multisensory map formation. Biol. Cybern., 103, 1–20. Franosch J.-M. P., Lingenheil, M., & van Hemmen, J. L. (2005). How a frog can learn what is where in the dark. Phys. Rev. Lett., 95, 078106. Gerstner, W., & van Hemmen, J. L. (1994). In E. Domany, J. L. van Hemmen, & K. Schulten (Eds.), Models of neural networks II (pp. 39–47). New York: Springer. Gerstner, W., Kempter, R., van Hemmen, J. L., & Wagner H., (1996). A neuronal learning rule for sub-millisecond temporal coding. Nature, 383, 76–81. Gütig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that learns spike timing-based decisions. Nat. Neurosci., 9, 420–428. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley. 3130 J.-M. Franosch, S. Urban, and J. van Hemmen Kawato, M. (1990). Feedback-error-learning neural network for supervised learning. In R. Eckmiller (Ed.), Advanced neural computers (pp. 365–372). Amsterdam: Elsevier. Kempter, R., Gerstner, W., van Hemmen, J. L., & Wagner, H. (1998). Extracting oscillations: Neuronal coincidence detection with noisy periodic spike input. Neural Comput., 10, 1987–2017. Kempter, R., Leibold, C., Wagner, H., & van Hemmen, J. L. (2001). Formation of temporal-feature maps by axonal propagation of synaptic learning. Proc. Natl. Acad. Sci. USA, 98, 4166–4171. Kistler, W. M., Gerstner, W., & van Hemmen, J. L. (1997). Reduction of the HodgkinHuxley equations to a single-variable threshold model. Neural Comput., 9, 1015– 1045. Knudsen, E. I., Knudsen, P. F., & Esterly, S. D. (1982). Early auditory experience modifies sound localization in barn owls. Nature, 295, 238–240. Kühn, S., Beyn W.-J., & Cruse H., (2007). Modelling memory functions with recurrent neural networks consisting of input compensation units: I. Static situations. Biol. Cybern., 96, 455–470. Kühn, S., & Cruse H., (2007). Modelling memory functions with recurrent neural networks consisting of input compensation units: II. Dynamic situations. Biol. Cybern., 96, 471–486. Legenstein, R., Naeger, C., & Maass, W. (2005). What can a neuron learn with spiketiming-dependent plasticity? Neural Comput., 17, 2337–2382. Leibold, C., Kempter, R., & van Hemmen, J. L. (2001). Temporal map formation in the barn owl’s brain. Phys. Rev. Lett., 87, 248101. Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275, 213–215. Mu, Y., & Poo, M.-m. (2006). Spike timing-dependent LTP/LTD mediates visual experience-dependent plasticity in a developing retinotectal system. Neuron, 50, 115–125. Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology, 15, 267–273. Porr, B., & Wörgötter, F. (2003). Isotropic sequence order learning Neural Computation, 15, 831–864. Shouval, H. Z., Bear, M. F., & Cooper, L. N. (2002). A unified model of NMDA receptor-dependent bidirectional synaptic plasticity. Proc. Natl. Acad. Sci. USA, 99, 10831–10836. Swinehart, C. D., & Abbott, L. F. (2005). Supervised learning through neuronal response modulation. Neural Comput., 17, 609–631. van Hemmen, J. L. (2001). Theory of synaptic plasticity. In F. Moss & S. Gielen (Eds.), Handbook of Biological Physics, Vol. 4. Neuro-Informatics, Neural Modelling (pp. 771–823). Amsterdam: Elsevier. Yang, S.-N., Tang, Y.-G., & Zucker, R. S. (1999). Selective induction of LTP and LTD by postsynaptic Ca elevation. J. Neurophysiol., 81, 781–787. Received December 17, 2011; accepted June 10, 2013.
© Copyright 2026 Paperzz