appendix - Royal Society of Chemistry

This journal is © The Royal Society of Chemistry 2002
APPENDIX
The NIPALS algorithm
For factors k = 1,2,, K compute pk and lk from Rk-1
Step 0. Initialise: Selection of start values, e.g. pk = the column in Rk-1 with highest sum
of squares.
Repeat until convergence {
Step 1. Improve estimate of loading vector by projecting Rk-1 on pk:

l kt  p kt p k

1
p kt Rk 1
a.1
Step 2. Scale the length of lk to unity:
 
lk  l k l kt l k
0.5
a.2
Step 3. Improve estimate of score vector by projecting Rk-1 on lk:
 
p k  Rk 1l k l kt l k
1
a.3
Step 4. Improve estimate of the eigenvalue k:
k = p kt p k
a.4
Step 5. Check convergence: If the difference between the value of the eigenvalue in this
iteration and the value in the previous iteration is below a specified small value, the
method has converged for factor k.
Then compute the residual matrix:
This journal is © The Royal Society of Chemistry 2002
Rk = Rk-1 - p k l kt
a.5
and go to Step 0 and repeat process for factor k+1.
If the convergence criterion is not fulfilled then return to Step 1. }
The PLS algorithm for one C variable
The input variables are first mean-centred: R , c
For each factor k = 1,, K do
Begin
Step 1. Compute the loading weights lk using least squares and the local model:
R k-1  c k 1l kt  E
a.6
and scale the vector to length 1:
t
lˆk  s R k 1 c k 1
a.7
where s is a scaling factor that makes the length of vector lˆk equal to 1.
Step 2. Estimate the scores pk using the local model:
R k-1  p k lˆkt  E
a.8
since lˆkt lˆk  1 , the least squares solution is
This journal is © The Royal Society of Chemistry 2002
p k  R k-1lˆk
a.9
Step 3. Estimate the spectral loading w k using the local model:
R k-1  p k wkt  E
a.10
which gives the least squares solution
wkt  R k-1 p k / p kt p k
a.11
Step 4. Estimate the chemical loading uk using the local model:
c k 1  pk u k  f
which leads to
t
uk  c k 1 pk / pkt pk
Step 5. Create new residuals by subtracting the estimated effect of this factor and replace
R k-1 and c k 1 by the new residuals, increase the value of k:
E  R k-1  p k wkt  R k
a.12
f  c k 1  pk u k  c k
a.13
k = k+1
a.14
end
This journal is © The Royal Society of Chemistry 2002
Final step. Once the K factors have been estimated, compute b0 and b, to be used for
prediction:

b  LW tL

1
Q
a.15
t
b0  c  c b
a.16
Therefore, during the prediction phase the concentration estimate ĉi associated to the
mean-centred response vector R i can be obtained as follows:
t
cˆi  b0  R b
i
a.17
Backpropagation learning algorithm
Consider a three-layer network (e.g. with input, hidden and output layers). Output
neurones have linear activation functions while hidden neurones have logistic sigmoid
activation functions given by:
f (a) 
1
1  exp( a)
a.18
Its first derivative, f’(a) can be expressed in the simple form:
f ' (a)  f (a)(1  f (a))
a.19
This journal is © The Royal Society of Chemistry 2002
For a standard sum of squares error function, the error for training vector k is given by
Ek 

1 c
 yj tj
2 j 1
2
a.20
where yj is the response of output neurone j, tk is the corresponding target for learning
vector (i.e. input vector) k, and c is the number of neurones in the output layer.
The error for an output neurone j is  j  y j  t j while for a neurone in the hidden layer
is
c
 j  z j (1  z j ) w ji  i
i 1
a.21
where zj is the output of hidden neurone j, and wji is the weight from hidden neurone j to
output neurone i.
The error derivatives with respect of the weights of the hidden and output layers are
given by
E k
  j ri ,
w ji
E k
 k z j
whj
a.22
If the weights of the hidden layer neurones are updated after the presentation of each
learning vector (fixed-step gradient descent technique) then:
w ji   j ri
a.23
where  (0<<1) is a scalar called learning rate.
This journal is © The Royal Society of Chemistry 2002
RBF network training algorithm
In a first phase of training, the available learning vectors should be used to determine the
number of radial basis neurones and their governing parameters in an unsupervised way
[26]. In the second step, supervised training is used to adjust the parameters of the output
layer.
The radial basis function network mapping is:
d
z j (r )   w ji f j (r )  w j 0
i 1
a.24
The biases wj0 can be absorbed into the summation by including an extra basis function f0
whose activation is set to 1 and (a.24) can be rewritten in matrix notation:
z(r) = Wf
a.25
where W is the weights’ matrix. Since the basis function are considered fixed and,
considering a sum-of-squares error function given by
Ek 
2
1
 z (r n )  t n 
 j
j 
2 n j 
a.26
where t n is the target value for output j when the network is presented with learning
j
vector rn.
This journal is © The Royal Society of Chemistry 2002
Since the error function is a quadratic function of the weights, the formal solution for the
weights is
Wt = f*T
a.27
where the element in position (n, j) in T is t n , the element in position (n, j) in f is fj(rn),
j
and super-indexes t and * denote the transpose and pseudo-inverse, respectively.
ART1 and Fuzzy ART mathematical models
ART1
Let I = (I1, ... , IM) be the input vector where Ii = 1 or 0 (ART1 requires binary inputs) and
X = (X1, ... , XM) be the vector of F1 nodes where Xi = 1 or 0.
Let wJi be the top-down weight from the winning node J in the F2 layer (F2 is a
competitive layer) to a node i in the F1 layer, and let ziJ be the corresponding bottom-up
weight. Assuming fast learning (e.g. weight update equations reach their asymptotic
values before the next training vector is presented), the weights take the values:
1
wJi  
0
if i  X
otherwise
a.28
z iJ
L


  L 1 X
0
if i  X
otherwise
a.29
This journal is © The Royal Society of Chemistry 2002
where X is the cardinality of X, and L is a network parameter (L>1).
If WJ is the vector of top-down weights from the winning neurone in F2 and ZJ is the
vector of bottom-up weights, (a.28) and (a.29) can be rewritten as follows:
WJ = X
a.30
LW J
LX

L  1  X L  1  WJ
ZJ 
a.31
A given node j in the F2 layer gets the following input from the F1 layer:
t j   z ij  I i
i
a.32
Using (a.31), equation (a.32) can be rewritten in terms of the top-down weights:

L
tj 
 L 1 W
j


 w I
ji
i

 i
a.33
The summation term in (a.33) is the number of common 1s the vectors Wj and I have in
corresponding positions. Then equation (a.33) can be rewritten as:
 L  I W j
tj 
 L 1 W
j





a.34
where I  W j is the cardinality of the intersection set of Wj and I.
Category choice is made by selecting the neurone in F2 with the maximum value for tj.
Thus the Choice Function can be defined as:
This journal is © The Royal Society of Chemistry 2002
T j (I ) 
I W j
L 1 W j
a.35
A given node i in the F1 layer is active only if both top-down weight WJi from the
winning F2 node and the input to node i are non-zero:
X = I  WJ
a.36
The winning node J in the F2 layer is reset by the orienting subsystem if:
I  WJ
I

a.37
where  is the vigilance parameter. Once a node is reset, it remains inactive for the
duration of the trial.
Fuzzy ART
Fuzzy ART is a generalisation of the equations (a.35) to (a.37). The generalisation is
achieved by using fuzzy set theory operations rather than binary set theory operations.
Input nodes can take values between 0 and 1 (analog patterns). The wining neurone in the
F2 layer (e.g. wining category) is the one for which the choice function attains its
maximum value. The new choice function, derived from (a.35) is as follows:
T j (I ) 
I W j
L 1 W j
a.38
This journal is © The Royal Society of Chemistry 2002
where I  W j is the equivalent operation in fuzzy set theory of the intersection of Wj and
I in standard set theory. Thus,
I  W j = (min(I1, wj1), ... , min(Ii, wji), ... , min (IM, wjM))
a.39
and the expression for the cardinality is as follows:
M
X   Xi
i 1
a.40
The winning category is reset by the vigilance subsystem if:
I  WJ
I

a.41
When a node in F2 is first committed (e.g. because novelty has been detected), a fast
commit equation is used:
WJ( new)  I
a.42
Once a node has been committed, as slow recode equation is used to change the weights
towards the spatial position of the actual input vector:
WJ new   I  WJ old   1  WJ old
0   1
a.43
The slow recode of committed nodes prevents noisy data from erroneously recoding
them. Small values of  cause the system to base its results on a long-term average of its
This journal is © The Royal Society of Chemistry 2002
experience, while values of  near one allow adaptation to a rapidly changing
environment.