Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks Introduction So far, we have studied neural networks that learn from their environment in a supervised manner Neural networks can also learn in an unsupervised manner as well. This is also known as self organized learning Self organized learning discovers significant features or patterns in the input data through general rules that operate locally Self organizing networks typically consist of two layers with feedforward connections and elements to facilitate ‘local’ learning CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 2 Self-Organization 1. 2. 3. “Global order can arise from local interactions” – Turing (1952) Input signal produces certain activity patterns in network <-> weights are modified (feedback loop) Principles of self organization Modification in weights tend to self-amplify Limitation of resources leads to competition and selection of the most active synapse and disregard of less active synapse Modifications in weights tends to cooperate CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 3 Hebbian Learning A self-organizing principle was proposed by Hebb in 1949 in the context of biological neurons Hebb’s principle When a neuron repeatedly excites another neuron, then the threshold of the latter neuron is decreased, or the synaptic weight between the neurons is increased, in effect increasing the likelihood of the second neuron to excite Hebbian learning rule Δwji = ηyjxi There is no desired or target signal required in the Hebbian rule, hence it is unsupervised learning The update rule is local to the weight CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 4 Hebbian Update Consider the update of a single weight w (x and y are the pre- and post-synaptic activities) w(n + 1) = w(n) + ηx(n)y(n) For a linear activation function w(n + 1) = w(n)[1 + ηx2(n)] Weights increase without bounds. If initial weight is negative, then it will increase in the negative. If it is positive, then it will increase in the positive range Hebbian learning is intrinsically unstable, unlike errorcorrection learning with BP algorithm CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 5 Geometric Interpretation of Hebbian Learning Consider a single linear neuron with p inputs y = wTx = xTw and Δw = η[x1y x2y … xpy]T The dot product can be written as y = |w||x| cos(α) α = angle between vectors x and w If α is zero (x and w are ‘close’) y is large. If α is 90 (x and w are ‘far’) y is zero. CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 6 Similarity Measure A network trained with Hebbian learning creates a similarity measure (the inner product) in its input space according to the information contained in the weights The weights capture (memorizes) the information in the data during training During operation, when the weights are fixed, a large output y signifies that the present input is "similar" to the inputs x that created the weights during training Similarity measures Hamming distance Correlation CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 7 Hebbian Learning as Correlation Learning Hebbian learning (pattern-by-pattern mode) Δw(n) = ηx(n)y(n) = ηxT(n)x(n)w(n) Using batch mode Δw(n) = η[Σn=1 Nx(n)xT(n)]w(0) The term Σn=1 Nx(n)xT(n) is sample approximation of the auto-correlation of the input data Thus Hebbian learning can be thought of learning the autocorrelation of the input space Correlation is a well-known operation in signal processing and statistics. In particular, it completely describes signals defined by Gaussian distributions Applications in signal processing CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 8 Oja’s Rule The simple Hebbian rule causes the weights to increase (or decrease) without bounds The weights need to be normalized to one as wji(n + 1) = [wji(n) + ηxi(n)yj(n)] / √Σi[wji(n) + ηxi(n)yj(n)]2 This equation effectively imposes a constraint on the weights that the sum at a neuron be equal to 1 Oja approximated the normalization (for small η) as: wji(n + 1) = wjin) + ηyj(n)[xi(n) – yj(n)wji(n)] This is Oja’s rule, or the generalized Hebbian rule It involves a ‘forgetting term’ that prevents the weights from growing without bounds CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 9 Oja’s Rule – Geometric Interpretation The simple Hebbian rule finds the weight vector with the largest variance with the input data. However, the magnitude of the weight vector increases without bounds Oja’s rule has a similar interpretation; normalization only changes the magnitude while the direction of the weight vector is same Magnitude is equal to one Oja’s rule converges asymptotically, unlike Hebbian rule which is unstable CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 10 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 11 The Maximum Eigenfilter A linear neuron trained with Oja’s rule produces a weight vector that is the eigenvector of the input auto correlation matrix, and produces at its output the largest eigenvalue A linear neuron trained with Oja’s rule solves the following eigen problem Re1 = λ1e1 R = auto-correlation matrix of input data e1 = largest eigenvector which corresponds to the weight vector w obtained by Oja’s rule λ1 = largest eigenvalue, which corresponds to the network’s output CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 12 Principal Component Analysis (1) Oja’s rule when applied to a single neuron creates a principal component in the input space in the form of the weight vector How can we find other components in the input space with significant variance ? In statistics, PCA is used to obtain the significant components of data in the form of orthogonal principal axes PCA is also known as K-L filtering in signal processing First proposed in 1901. Later developments occurred in the 1930s, 1940s and 1960s. Hebbian network with Oja’s rule can perform PCA CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 13 Principal Component Analysis (2) PCA Consider a set of vectors x with zero mean and unit variance. There exist an orthogonal transformation y = QTx such that the covariance matrix of y is Λ = E[yyT] Λij = λi if i = j and Λij = 0 otherwise (diagonal matrix) λ1 > λ2 > … > λp = eigenvalues of covariance matrix of x (C = E[xxT] Columns of Q are the corresponding eigenvectors Vector y is the principal component that has the maximum variance with all other components CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 14 PCA – Example CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 15 Hebbian Network for PCA CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 16 Hebbian Network for PCA Procedure Use Oja’s rule to find the principal component Project the data orthogonal to the principal component Use Oja’s rule on the projected data to find the next major component Repeat the above for m <= p (m = desired components; p = input space dimensionality) How to find the projection onto orthogonal direction? Deflation method: subtract the principal component from the input Oja’s rule can be modified to perform this operation; Sanger’s rule CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 17 Sanger’s Rule Sanger’s rule is a modification of Oja’s rule that implements the deflation method for PCA Classical PCA involves matrix operations Sanger’s rule implements PCA in an iterative fashion for neural networks Consider p inputs and m outputs, where m < p yj(n) = Σi=1 p wji(n)xi(n) j = 1, m and, the update (Sanger’s rule) Δwji(n) = η[yj(n)xi(n) – yj(n) Σk=1 j wki(n)yk(n)] CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 18 PCA for Feature Extraction PCA is the optimal linear feature extractor. This means that there is no other linear system that is able to provide better features for reconstruction. PCA may or may not be the best preprocessing for pattern classification or recognition. Classification requires good discrimination which PCA might not be able to provide. Feature extraction: transform p-dimensional input space to an m-dimensional space (m < p), such that the m-dimensions capture the information with minimal loss The error e in the reconstruction is given by: e2 = Σi=M+1 p λi CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 19 PCA for Data Compression PCA identifies an orthogonal coordinate system for the input data such that the variance of the projection on the principal axis is largest, followed by the next major axis, and so on By discarding some of the minor components, PCA can be used for data compression, where a p-dimension (bit) input is encoded in a m < p dimensional space Weights are computed by Sanger’s rule on typical inputs The de-compressor (receiver) must know the weights of the network to reconstruct the original signal x’ = WTy CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 20 PCA for Classification (1) Can PCA enhance classification ? In general, no. PCA is good for reconstruction and not feature discrimination or classification CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 21 PCA for Classification (2) CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 22 PCA – Some Remarks Practical uses of PCA Data Compression Cluster analysis Feature extraction Preprocessing for classification/recognition (e.g. preprocessing for MLP training) Biological basis It is unlikely that the processing performed by biological neurons in, say perception, involves PCA only. More complex feature extraction processes are involved. CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 23 Anti-Hebbian Learning Modifying the Hebbian rule as Δwji(n) = - ηxi(n)yj(n) The anti-Hebbian rule find the direction in space that has the minimum variance. In other words, it is the complement of the Hebbian rule Anti-Hebbian does de-correlation. It de-correlates the output from the input Hebbian rule is unstable, since it tries to maximize the variance. Anti-Hebbian rule, on the other hand, is stable and converges CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS 24
© Copyright 2026 Paperzz