The simple neuron model

The simple neuron model
Consider a simplified neuron model that has inputs
The neuron first computes the weighted sum of the inputs,
, each with a weight
.
(1)
and then passes this sum to the next neurons in the network. This neuron is shown in
the adjoining Figure. The problem of learning is how to change the weights
when
a stream of input vectors
, one at a time, are given to this neuron as
inputs.
From Hebbian learning to the Oja learning rule
Using the mathematical notation above, the Hebbian learning principle could be stated
as
(2)
where
denotes the change in the value of the weight , is the input coming
through the weight , and is the output of the neuron as given in equation (1) (see
the Figure). The coefficient is called the learning rate and it is typically small. Due
to this, one input vector (whose
component is the term ) only causes a small
instantaneous change in the weights, but when the small changes accumulate over
time, the weights will settle to some values.
Equation (2) represents the Hebbian principle, because the term is the product of the
input and the output. However, this learning rule has a severe problem: there is
nothing there to stop the connections from growing all the time, finally leading to very
large values. There should be another term to balance this growth. In many neuron
models, another term representing "forgetting" has been used: the value of the weight
itself should be subtracted from the right hand side. The central idea in
the Oja learning rule is to make this forgetting term proportional, not only to the value
of the weight, but also to the square of the output of the neuron. The Oja rule reads:
(3)
Now, the forgetting term balances the growth of the weight. The squared
output guarantees that the larger the output of the neuron becomes, the stronger is
this balancing effect.
Oja learning rule and Principal Component Analysis
A mathematical analysis of the Oja learning rule in (3) goes as follows (a much more
thorough and rigorous analysis appears in the book (Oja, 1983)). First, change into
vector notation, in which is the column vector with elements and is the
column vector with elements . They are called the input vector and the weight
vector, respectively. In vector-matrix notation, equation (1) then reads
(4)
where T means the transpose, changing a column vector into a row vector. This is the
well-known inner product between two vectors, defined as the sum of products of
their elements (see equation (1)).
Next, write equation (3) in vector notation : (5)
Then, substitute
from equation (4) into equation (5):
This is the incremental change for just one input vector . When the algorithm is run
for a long time, changing the input vector at every step, one can look at the average
behaviour. An especially interesting question is what is the value of the weights when
the average change in the weight is zero. This is the point of convergence of the
algorithm.
Averaging the right hand side over the and setting this to zero gives the following
equation for the weight vector at the point of convergence:
(6)
where the matrix is the average of
. Assuming the input vectors have zero
means, this is in fact the well-known covariance matrix of the inputs.
Considering that the quadratic form
is a scalar, this equation clearly is
the eigenvalue-eigenvector equation for the covariance matrix . This analysis shows
that if the weights converge in the Oja learning rule, then the weight vector becomes
one of theeigenvectors of the input covariance matrix, and the output of the neuron
becomes the corresponding principal component. Principal components are defined as
the inner products between the eigenvectors and the input vectors. For this reason, the
simple neuron learning by the Oja rule becomes a principal component analyzer
(PCA).
Although not shown here, it has been proven that it is the first principal
component that the neuron will find, and the norm of the weight vector tends to one.
For details, see (Oja, 1983; 1992).
Extensions of the Oja learning rule
This learning rule has been extended to several directions. Two extensions are briefly
reviewed here: Oja rule for several parallel neurons, and nonlinearities in the rule.
Oja rule for several neurons
It is possible to define this learning rule for a layer of parallel neurons, each receiving
the same input vector . Then, in order to prevent all the neurons from learning the
same thing, parallel connections between them are needed. The result is that a subset
or all of the principal components are learned. Such neural layers have been
considered by (Oja, 1983, 1992; Sanger, 1989).
Nonlinear Hebbian learning and Independent component analysis
Independent component analysis (ICA) is a technique that is related to PCA, but is
potentially much more powerful: instead of finding uncorrelated components like in
PCA, statistically independent components are found. It turns out that quite small
changes in the Ojarule can produce independent, instead of principal, components.
What is needed is to change the linear output factor in the Hebbian term to a
suitable nonlinearity, such as . Also the forgetting term must be changed
accordingly. The ensuing learning rule
(7)
can be shown to give one of the independent hidden factors under suitable
assumptions (Hyvärinen and Oja, 1998). The main requirement is that prior to
entering this algorithm, the input vectors have to be zero mean and whitened so that
their covariance matrix is equal to the identity matrix. This can be achieved with a
simple linear transformation (see also Hyvärinen et al, 2001).
References
Oja E. (1982) A simplified neuron model as a principal component analyzer. Journal
of Mathematical Biology, 15:267-2735
Oja E. (1983) Subspace methods of pattern recognition. Research Studies Press
Oja E. (1992) Principal components, minor components, and
linear neural networks. Neural Networks, 5:927-935
Sanger T. D. (1989) Optimal unsupervised learning in a single-layered linear
feedforward network. Neural Netwworks, 2:459-473
Hyvärinen A. and Oja E. (1998) Independent component analysis by general
nonlinear Hebbian-like learning rules. Signal Processing, 64:301-313
Hyvärinen A., Karhunen J., and Oja E. (2001) Independent component analysis.
Wiley