MACHINE LEARNING - Doctoral Class - EDIC MACHINE LEARNING Information Theory and The Neuron - II Aude Billard EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Overview LECTURE I: • Neuron – Biological Inspiration • Information Theory and the Neuron • Weight Decay + Anti-Hebbian Learning PCA • Anti-Hebbian Learning ICA LECTURE II: • Capacity of the single Neuron • Capacity of Associative Memories (Willshaw Net, Extended Hopfield Network) LECTURE III: • Continuous Time-Delay NN • Limit-Cycles, Stability and Convergence EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Neural Processing - The Brain Decay-depolarization Integration E Electrical Potential Synapse Dendrites x2 x1 E x dt E 1 t Cell Body Refractory time E A neuron receives and integrate input from other neurons. Once the input exceeds a critical level, the neuron discharges a spike. This spiking event is also called depolarization, and is followed by a refractory period, during which the neuron is unable to fire. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron y f wi xi i W X 1 Output: y W 2 W 3 W 4 You can view the neuron as a memory. • What can you store in this memory? • What is the maximal capacity? • How can you find a learning rule that maximizes the capacity? EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron A fundamental principle of learning systems is their robustness to noise. One way to measure the system’s robustness to noise is to determine the joint information between its inputs and output. Input : X yy ff(X ((X X),) ) Output: y Noise : Noise : EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron W X 1 Output: y W 2 W y f wi xi i 3 W 4 Consider the neuron as a sender-receiver system, with X being the message sent and y the received message. Information theory can give you a measure of the information conveyed by y about X. If the transmission system is imperfect (noisy), you must find a way to ensure minimal disturbance in the transmission. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron W X 1 Output: y W 2 W 3 W 4 The mutual information between the neuron output y and its Inputs x is given by: 2 y 1 I x, y log 2 2 y2 where 2 is the signal-to-noise ratio. In order to maximize the ratio, one can increase the magnitude of the weights. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron 1 2 X W 1 3 Output: y W 2 4 y wi xi i W 3 W i 4 The mutual information between the neuron output y and its Inputs X is given by: 2 y 1 I x, y log 2 2 2 2v wi j This time, one cannot simply increase the magnitude of the weights, as this affects the value of y2 as well. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron y j wij xi j 1 i y1 x 2 y2 I x, y log det( R) 2 det( R) 4 2 12 22 12 2 22 1 122 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC How to define a learning rule to optimize the mutual information? EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Hebbian Learning xi wij Input yj Output y j wij xi j wij xi y j : Learning rate If x I and y I fire simultaneously, the weight of the connection between them will be strengthened in proportion to their strength of firing. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch Hebbian Learning – Limit Cycle d W t CW t dt Stability? d W t 0 dt W t 0 wi* such that E wi* 0 E w E yxi E w j x j xi Cij w j 0 j j * i C : correlation matrix This is true for all i, thus, w_j is an eigenvector of C, with associated Eigenvalue 0 C is a positive, symmetric and semi-definite matrix all eigenvalues are >=0. Under a small disturbance E w C w C 0 * * The weights tend to grow in the direction of the largest eigenvalue of C. MACHINE LEARNING - Doctoral Class - EDIC Hebbian Learning – Weight Decay The simple weight decay rule belong to a class of decay rule called Substractive Rule wij xi y j wij The only advantage of substractive rules over simply clipping the weights lies in that it allows to eliminates weights that have little importance. Another important type of decay rules is the Multiplicative Rule wij xi y j wij wij wij : function of the weight The advantage of multiplicative rules is that, in addition to giving small weights, they also give EPFL - LASA @ 2006 A.. Billard useful weights. http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Theory and The Neuron 1 2 X W 1 3 Output: y W 2 4 y wi xi i W 3 i W 4 J wi T i T i w Cwi w wi ~ Oja’s one neuron model 2 y 1 I x, y log 2 2 2 2v wi j wi xi y y 2 wi The weights converge toward the first eigenvector of the input covariance y2 matrix and are normalized. http://lasa.epfl.ch EPFL - LASA @ 2006 A.. Billard MACHINE LEARNING - Doctoral Class - EDIC Hebbian Learning – Weight Decay Oja’s subspace algorithm wij x j yi yi wkj yk k Equivalent to minimizing the generalized form of J: J wij ~ wiT Cw j wiT wi * * y n y i j n n wiT Iwi det R I x, y log 2 v EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Hebbian Learning – Weight Decay Why PCA, LDA, ICA with ANN? • Explain the way the brain could derive important properties of the sensory and motor space. • Allows to discover new mode of computation with simple iterative and local learning rules. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Recurrence in Neural Networks Sofar, we have considered only feed-forward neural networks Most biological network have recurrent connections. This change of direction in the flow of information is interesting, as it can allow: • To keep a memory of the activation of the neuron • To propagate the information across output neurons EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning y1 x Anti-Hebbian Learning y2 How to maximize information transmission in a network, I.e. maximize: I(x;y) EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning y1 x Anti-Hebbian Learning y2 Anti-Hebbian learning is also known as lateral inhibition wij yi y j Average of values taken over all training patterns EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning If the two outputs are highly correlated, then, the weights between them will grow to a large negative value and each will tend to turn the other off. wij yi y j w ij 0 yi y j 0 No need for weight decay or renormalizing on anti-Hebbian weights, as they are automatically self-limiting! EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning Foldiak’s first Model n yi xi wij y j wij yi y j for i j j 1 wij yi y j for i j In Matrix Terms y x W y y I W x 1 y T x I W x 1 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning Foldiak’s first Model One can further show that there is a stable point in the weight space. wf 1 1 2 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning Foldiak’s 2ND Model wii 1 yi yi Allows all neurons to receive their own outputs with weight 1 W I YY T This network will converge when: 1) the outputs are decorrelated 2) the expected variance of the outputs is equal to 1. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC PCA versus ICA PCA looks at the covariance matrix only. What if the data is not well described by the covariance matrix? The only distribution which is uniquely specified by its covariance (with the subtracted mean) is the Gaussian distribution. Distributions which deviate from the Gaussian are poorly described by their covariances. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC PCA versus ICA Even with non-Gaussian data, variance maximization leads to the most faithful representation in a reconstruction error sense. The mean-square error measure implicitly assumes Gaussianity, since it penalizes datapoints close to the mean less that those that are far away. But it does not in general lead to the most meaningful representation. We need to perform gradient descent in some function other than the reconstruction error. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Uncorrelated and Statistical Independent Uncorrelated E ( y1, y2 ) E ( y1 ) E ( y2 ) Independent E( f y1 f y2 ) E( f y1 )E( f y2 ) True for any non-linear transformation f Statistical Independence is a stronger constraint than decorrelation. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Objective Function of ICA We want to ensure that the outputs yi are maximally independent. This is identical to requiring that the mutual information be small. Or alternately that the joint entropy be large. H(x,y) H(x) H(x|y) H(y) I(x,y) H(y|x) EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Anti-Hebbian Learning can also lead to a decomposition in Statistically Independent Component, and, as such allow to do a decomposition of the type of ICA. To ensure independence, the network must converge to a solution that satisfies the condition: E( f y1 f y2 ) E( f y1 )E( f y2 ) For any given function f. EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC ICA for TIME-DEPENDENT SIGNALS s1 t s2 t Original Signal X t AS t x1 t x2 t EPFL - LASA @ 2006 A.. Billard Mixed Signal http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC ICA for TIME-DEPENDENT SIGNALS S t A 1 X t S t ? A 1 ? x1 t x2 t Mixed Signal EPFL - LASA @ 2006 A.. Billard Adapted from Hyvarinen @ 2000 http://lasa.epfl.ch Anti-Hebbian Learning and ICA n yi x i wij y j Jutten and Herault Model j 1 y I W x 1 y x Wy Non-linear Learning Rule wij f yi g y j for i j If f and g are the identity, we find again the Hebbian Rule, which ensures convergence to uncorrelated outputs: E y1 , y2 0 To ensure independence, the network must converge to a solution that satisfies the condition: E( f y1 f y2 ) E( f y1 )E( f y2 ) For any given function f. Anti-Hebbian Learning and ICA HINT: Use two odd functions for f and g (f(-x)=-f(x)), then their taylor series expansion consists solely of the odd terms f x a2 j 1 x g x b2 j 1 x 2 j 1 2 j 1 j 0 j 0 wij f y1 g y2 a j bk y12 j 1 y22 k 1 j 0 k 0 wij 0 E y12 j 1 y22 k 1 0 Since most (audio) signals have an even distribution, at convergence, one has: E y 2 j 1 y 2 k 1 E y 2 j 1 E y 2 k 1 1 2 1 2 MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Application for Blind Source Separation MIXED SIGNALS Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998, ICASSP 1999 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Application for Blind Source Separation UNMIXED SIGNALS THROUGH GENERALIZED ANTI-HEBBIAN LEARNING Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998, ICASSP 1999 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Application for Blind Source Separation MIXED SIGNALS Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998, ICASSP 1999 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Application for Blind Source Separation UNMIXED SIGNALS THROUGH GENERALIZED ANTI HEBBIAN LEARNING Hsiao-Chun Wu et al, ICNN 1996, MWSCAS 1998, ICASSP 1999 EPFL - LASA @ 2006 A.. Billard http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Maximization Bell & Sejnowsky proposed a network to maximize the mutual information between the output and the input when those are not subjected to noise (or rather when the input and the noise can no longer be distinguished, then H(Y|X) tend to negative infinity). 1 2 X W 1 3 Output: y W 2 4 W 3 W W 4 0 y 1 1 e W X w0 EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Maximization Bell & Sejnowsky proposed a network to maximize the mutual information between the output and the input when those are not subjected to noise (or rather when the input and the noise can no longer be distinguished, then H(Y|X) tend to negative infinity). I x, y H y H y | x H(Y|X) is independent of the weights W and so EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Information Maximization The entropy of a distribution is maximized when all outcomes are equally likely. We must choose an activation function at the output neurons which equalizes each neuron’s chances of firing and so maximizes their collective entropy. EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA The sigmoid is the optimal solution to even out a gaussian distribution so that all outputs are equally probable EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA The sigmoid is the optimal solution to even out a gaussian distribution so that all outputs are equally probable EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA The sigmoid is the optimal solution to even out a gaussian distribution so that all outputs are equally probable 1 2 X W 1 3 Output: y W 2 4 W 3 W W 4 0 y 1 1 e W X w0 EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA The pdf of the output can be written as: The entropy of the output is then given by: The learning rules that optimize this entropy are given by: EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA Anti-Hebbian Anti-weight decay (avoids solution y=1) (moves away from simple solution w=0) EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch MACHINE LEARNING - Doctoral Class - EDIC Anti-Hebbian Learning and ICA This can be generalized to a many inputs - many outputs network with sigmoid function for the output. The learning rules that optimizes the mutual information between input and output are then given by: Such a network can linearly decompose up to 10 sources. EPFL - LASA @ 2006 A.. Billard Bell A.J. and Sejnowski T.J. 1995. An information maximisation approach to blind separation and blind deconvolution, Neural Computation, 7, 6, 1129-1159 http://lasa.epfl.ch
© Copyright 2025 Paperzz