The simple neuron model Consider a simplified neuron model that has inputs The neuron first computes the weighted sum of the inputs, , each with a weight . (1) and then passes this sum to the next neurons in the network. This neuron is shown in the adjoining Figure. The problem of learning is how to change the weights when a stream of input vectors , one at a time, are given to this neuron as inputs. From Hebbian learning to the Oja learning rule Using the mathematical notation above, the Hebbian learning principle could be stated as (2) where denotes the change in the value of the weight , is the input coming through the weight , and is the output of the neuron as given in equation (1) (see the Figure). The coefficient is called the learning rate and it is typically small. Due to this, one input vector (whose component is the term ) only causes a small instantaneous change in the weights, but when the small changes accumulate over time, the weights will settle to some values. Equation (2) represents the Hebbian principle, because the term is the product of the input and the output. However, this learning rule has a severe problem: there is nothing there to stop the connections from growing all the time, finally leading to very large values. There should be another term to balance this growth. In many neuron models, another term representing "forgetting" has been used: the value of the weight itself should be subtracted from the right hand side. The central idea in the Oja learning rule is to make this forgetting term proportional, not only to the value of the weight, but also to the square of the output of the neuron. The Oja rule reads: (3) Now, the forgetting term balances the growth of the weight. The squared output guarantees that the larger the output of the neuron becomes, the stronger is this balancing effect. Oja learning rule and Principal Component Analysis A mathematical analysis of the Oja learning rule in (3) goes as follows (a much more thorough and rigorous analysis appears in the book (Oja, 1983)). First, change into vector notation, in which is the column vector with elements and is the column vector with elements . They are called the input vector and the weight vector, respectively. In vector-matrix notation, equation (1) then reads (4) where T means the transpose, changing a column vector into a row vector. This is the well-known inner product between two vectors, defined as the sum of products of their elements (see equation (1)). Next, write equation (3) in vector notation : (5) Then, substitute from equation (4) into equation (5): This is the incremental change for just one input vector . When the algorithm is run for a long time, changing the input vector at every step, one can look at the average behaviour. An especially interesting question is what is the value of the weights when the average change in the weight is zero. This is the point of convergence of the algorithm. Averaging the right hand side over the and setting this to zero gives the following equation for the weight vector at the point of convergence: (6) where the matrix is the average of . Assuming the input vectors have zero means, this is in fact the well-known covariance matrix of the inputs. Considering that the quadratic form is a scalar, this equation clearly is the eigenvalue-eigenvector equation for the covariance matrix . This analysis shows that if the weights converge in the Oja learning rule, then the weight vector becomes one of theeigenvectors of the input covariance matrix, and the output of the neuron becomes the corresponding principal component. Principal components are defined as the inner products between the eigenvectors and the input vectors. For this reason, the simple neuron learning by the Oja rule becomes a principal component analyzer (PCA). Although not shown here, it has been proven that it is the first principal component that the neuron will find, and the norm of the weight vector tends to one. For details, see (Oja, 1983; 1992). Extensions of the Oja learning rule This learning rule has been extended to several directions. Two extensions are briefly reviewed here: Oja rule for several parallel neurons, and nonlinearities in the rule. Oja rule for several neurons It is possible to define this learning rule for a layer of parallel neurons, each receiving the same input vector . Then, in order to prevent all the neurons from learning the same thing, parallel connections between them are needed. The result is that a subset or all of the principal components are learned. Such neural layers have been considered by (Oja, 1983, 1992; Sanger, 1989). Nonlinear Hebbian learning and Independent component analysis Independent component analysis (ICA) is a technique that is related to PCA, but is potentially much more powerful: instead of finding uncorrelated components like in PCA, statistically independent components are found. It turns out that quite small changes in the Ojarule can produce independent, instead of principal, components. What is needed is to change the linear output factor in the Hebbian term to a suitable nonlinearity, such as . Also the forgetting term must be changed accordingly. The ensuing learning rule (7) can be shown to give one of the independent hidden factors under suitable assumptions (Hyvärinen and Oja, 1998). The main requirement is that prior to entering this algorithm, the input vectors have to be zero mean and whitened so that their covariance matrix is equal to the identity matrix. This can be achieved with a simple linear transformation (see also Hyvärinen et al, 2001). References Oja E. (1982) A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology, 15:267-2735 Oja E. (1983) Subspace methods of pattern recognition. Research Studies Press Oja E. (1992) Principal components, minor components, and linear neural networks. Neural Networks, 5:927-935 Sanger T. D. (1989) Optimal unsupervised learning in a single-layered linear feedforward network. Neural Netwworks, 2:459-473 Hyvärinen A. and Oja E. (1998) Independent component analysis by general nonlinear Hebbian-like learning rules. Signal Processing, 64:301-313 Hyvärinen A., Karhunen J., and Oja E. (2001) Independent component analysis. Wiley
© Copyright 2026 Paperzz