27784.pdf

System Identification using Augmented Principal Component Analysis
P. Vijaysai', R.D. Gudi" and S. Lakshminarayanan'
2
'
Dept.of Chemical Engineering, IIT Bombay, Powai, Mumbai 400 076, India
Dept. of Chemical and Environmental Engg., National University of Singapore, Singapore
*Corresponding author: [email protected]
Though the TLS and its derivatives are very powerful
techniques, they don't account for the collinearity in the
causal data (Huang (2001). Nounou et al. (2002)). The
model parameters using TLS are obtained by the singular
value decomposition of the augmented data block that
includes both the causal variables and the output variable.
The right singular vector corresponding to the smallest and
hence the last eigenvalue describes the possible linear
relation between the columns under the error-in-variable
formulation. However, any collinearity in the causal block
may yield two or more eigenvalues of smaller magnitude
thus creating an ambiguous situation. This could pose
difficulties in determining the last singular vector (last
principal component). In this paper, we propose a novel
scheme to effectively circumvent the problems of the
collinearity in the causal data block.
Abstract
The total least squares (TLS) technique has been
extensively used for the identification of
dynamic systems when' both the inputs and
outputs are corrupted with noise. But the major
limitation of this technique has been the
difficulty in identifying the actual parameters
when the collinearity in the input data leads to
several "small" eigenvalues. This paper proposes
a novel technique namely augmented principal
component analysis (APCA) to deal with
collinearity problems in the error-in-variable
formulation. The APCA formulation can also be
used to determine the least squares prediction
error when an appropriate operator is chosen.
This property has been used for the nonlinear
structure selection through forward selection
methodology. The efficacy of the new technique
has been illustrated through representative case
studies taken from the literature.
Keywords: Augmented PCA, Collinearity problems, Errorin-variables
1. INTRODUCTION
The generalized least squares (GLS) and prediction error
methods (PEM) have heen proven to be statistically
efficient tools in system identification (Ljung, 1987,
Soderstrom and Stoica, 1989). These techniques use nonlinear numerical optimization and under the assumption of
inputs being noisy, the minimization could be time
consuming and may even converge to local optima.
An alternative and efficient approach that inherently
assumes noise in all the variables is the total least squares
(Roorda and Heij, 1995). Recently, the last principal
component analysis (LPCA) was shown to yield a class of
estimators when additional constraints were imposed on
TLS (Huang, 2001). Soderstrom and Mahata (2002)
showed the comparison between the asymptotic covariance
matrix of TLS estimates and the instrumental variable
method (IVM).
0-7803-7896-2/03/$17.00
02003 IEEE
This paper is organized as follows. A detailed mathematical
formulation of TLS that leads to the simplified analysis of
APCA is presented in the Section 2. The Section 3
discusses some of the interesting mathematical properties of
APCA. The Section 4 essentially dwells upon determining
an operator for the APCA formulation that yields least
squares prediction error. This property has been further
used for the nonlinear structure selection in the APCA
framework. In section 5, the superiority of APCA over
other conventional techniques has been shown using
illustrative examples taken from literature.
2. FORMULATION
The mathematical formulation of the TLS has already been
well documented in the literature. For the effective
interpretation of the proposed APCA, the formulation of the
TLS is briefly revisited here.
CASE 1: TLS
Let X E %%"he a well conditioned, mean centered input
=m
variable (causal) block such that E > E . The subscript
m denotes the measured values. Let r be the rank of the
measured block such that E = r . Similarly, let
, ' l E 9Y be
a mean centered vector of measured response (output
variable). The augmented block is hence given by
4179
Proceedings 01 the American Control Conference
Denver, Colorado June 4-6.2w3
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
z
=m
yml
= [=m
X
CASE 2: APCA
(1)
In this case, we seek to overcome the drawbacks IO the TLS
technique, when the causal block is ill conditioned. This is
done through reformulating the problem using components,
which we term as augmented principal components (APC).
The formulation is as below.
Let us assume that X is a matrix of rank r such that
Equation ( I ) can he expanded as
z
=m
= [=oX
Y,l+[ExYE,1
-
(2)
The subscript a denotes the actual values of all the
variables. E and E are the errors associated with the
=X
=Y
=m
causal (input) and response (output) blocks respectively.
The objective is to optimize the covariance in Z subjected
r < i i . The above condition can mislead the criteria shown
in Equation (1 1 ) as the problem can have i i + l - r n u m b e r
=m
of possible solutions. W e therefore reformulate this
problem in two steps.
1. Maximize the variance in X
such that the
to the constraint
F=88.
--
(3)
Here ' A * denotes the predicted value of the respective
variable. The above constraint can be redefined as
^ "
~
[Xg-l']=Q,
-
:[
-;
I
=eE
or [X U]
=m
information
in
X
=m
is
compressed
r orthogonal vectors z E Pxr
.
=m
(4)
The last principal component of the augmented
matrix Z =[z U ] is hence the ( r + l )*vector.
2.
.[-I]
The first step can be posed as maximization of
vector and after it is normalized
J - ' T
in Equation (4) is called as the extended parameter
1gfly,
=
gx
into
;5m
-, x x
=m-m
(12)
j.-cjJ(jfji-l) ; ~ < i < r
=",=*-I
where j . are the orthonormal vectors.
(5)
-I
or
Hence
21 = 1 3,~Where
(6)
T
JaJ= Z X =m=m--L
x j . - 2 ~ ~- I , ~ j . = o
(13)
aji
or, =m=m-,
X ' X j , =c3,'ji
'
Equation (6) yields an additional constraint such that
l i
T
in Equation (14) is hence the principal component
. In other words, r'
= [-1-2
j j ... j
=m
(loading) of =m
X
1 !=1.
(7)
Hence, objective function in the unconstrained form is
(14)
-7
]E %""
gives the orthogonal directions along which the variance of
X is maximum. Thus, by rotating X by f m we obtain
-m
the scores
Differentiating Equation (8) with respect to
r
=m=m
z =X
=m
=m
that are the
rdimensional
orthogonal components.
Therefore, Z can now he redefined as
=m
-c,~'~)!=q!
Substituting Equation (6) in (IO),
=m=m
or.(Z
rZ
(10)
(15)
Thus the last principal component of the Z captures the
=m
zrz
I=c$
where,
!is a eigenvector of the covariance of =m
Z .
=m=m-
z =[ImYml.
=m
(11)
As Z is a full rank matrix, the covariance will have
=m
n + 1nonzero eigenvalues. The bracketed expression in
Equation (10) is the covariance of the prediction error and
1 is also the eigenvector of the error covariance. Thus, if the
variance due to error has to he minimum, c2 should he the
last (smallest) eigenvalue of the covariance of &and
accordingly 1 has to be the last eigenvector (last principal
component).
linear dependency between the transformed and well
conditioned causal block with the response.
The last principal component of the transformed block is
therefore given by
zTz
=m,m-t
1
=R,L,
(16)
The coefficients of the Equation (15) are
...l,.JT I&,+,
sr
(17)
However, the above coefficients do not exemplify the linear
dependency between the original causal and response
blocks. But then, correcting these values through the earlier
rotation yields the coeffcicnts corresponding to the original
variables. This is shown in Equation (18)
0,= -[!&, !&2
4180
E
Proceedings of the American COIIBOI
Conference
Denver. Colorado June 4-6. 2003
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
ij=rmij,
Subsequently,
(18)
=m
is a full rank matrix ( r = i i ) , the
coeficients derived using APCA are same the TLS
estimates.
See Appendix A for the proof
X
=m
4. LEAST SQUARES BASED METHODS
AND APCA
=e)
The Ez is a symmetric matrix, which holds a diagonal
-
is a rank deficient matrix(r<%). the APCA
matrix of size r with the r + l z hrow and column being
nonzero vectors.
Using the law of matrices, the sum of the square prediction
error (SSPE) of least squares can be proved to be equal to
Remark 2. If
if
Em lies in the range of
X
=m
, ( =x
E ==
O;E,
estimates are same as the solution obtained by the pseudo
inverse.
See Appendix B for the proof
3. SOME ADDITIONAL FEATURES OF
We know that
z=x = =XmT=Xm
THE FORMULATION
In this section, we propose an alternate method to estimate
8,
the parameter vector
. The formulation presented below
is further used in proving the other properties of the
method.
As shown in Case 2, X can be decomposed as
x
=m
can be estimated by substituting the
2 ,+' in Equation (23).
Remark 1. If X
and
8,
-lrm.fm1
det(Z ) =
=x
(26)
However, if det(Z ) needs to be evaluated, it is necessary
=z
to diagonalize Z . This is done by selecting an appropriate
=z
=m
operator
(19)
Such that
A = =r mT=zm =diag(Al,A2...A,)
hAi .
i=,
9~ s'+'~'+'
such that
(20)
Where ,$are the eigenvalues of the covariance
X,
Where I is an identity matrix of rank rand 13,is a null
=I
Therefore, the covariance of the augmented PCs as shown
in Equation (15) reduces to
vector of size r . Treating the Z with the operators yield
=Z
A = =z=
OZ O r = diag(Al,A,,
(..,d, ,oE)
Since $is
- obtained by diagonalizing
(28)
4,
The eigenvector decomposition of Equation (2 I ) yields
gzpi = k i i p i ; l < i 6 r + l
It is to be noticed that
er+,= !r,r+l .
If 2 is the last eigenvalue, the
solving
8, can
be obtained by
-zr+1B,=r;.,.
(22)
8, = ( A - 2 ,+I )-'z;Y,
(23)
(
Remark 3:
Or
=O
If =X m
is a matrix full rank matrix, then rhe
SSPE obtained through Equation (30) is equal to the SSPE
of ordinary least squares.
Note: It can be shown that X,+lcan also be determined by
explicitly solving Equation (24).
-Y T=zm ( A - k r + ~ L ) - ' E L Y+k,+l
m
-l'LE,
Substituting Equation (26) and Equation (29) in Equation
(25).
SSPE = bE
(30)
Which is nothing but the last element of the matrix $
-
Remark 4: However, if X
(24)
=m
is a rank deficient matrix, the
SSPE is equal to the predictions obtained through PCR,
where the number of PCs chosen are equal f o the rank of
matrix.
Remark 5: I f is to be noted that APCA assumes error in all
the variables. On the contrary, the SSPE shown above is for
4181
Proceedings of the American Control Conference
Denver, Colorado June 4-6, 2003
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
rhe case when all the variables of the causal block are
assumed fo be error free and only rhe response is
corrupted.
The above shown formulation was successively used for the
identification of nonlinear systems using NARX and
NARMAX structures. The proposed algorithm is identical
to the QR based algorithms proposed by Chen et al (1989)
but uses SVD based methods, which are numerically
superior to QR decomposition. The algorithm is presented
below.
This algorithm requires the re-estimation of PCs, each time
a new variable is added. See step 4 in table 1. The standard
algorithm used to determine the PC is either SVD or
NIPALS. But in order to improve the computational
efficiency, a plane-rotation algorithm coupled with
modified NIPALS has been proposed. The details of the
algorithm are reported in (Vijaysai and Gudi, 2002).
Using the white input signal U , , the output y , is generated
using Equation (31). Even if the order of the non-linearity is
precisely known and no additional noise is added, the
identified structure using the QR based methods and the
proposed algorithm could he entirely different and non
parsimonious. It is mainly due to the reason that when the
output lag in Equation (31) is further expanded, the lags of
higher order receive greater weightage. In other words, the
coefficients associated with the higher order lags will be
significantly greater than the coefficients of the lower order
lags. Therefore, the algorithm starts with the selecting
variables that are essentially higher order lags and could
end-up with identifying a structure that has completely
missed the actualhue variables.
Though there are algorithms for estimating parsimonious
structures (Billings and Voon, 1986). they are
computationally expensive. A more reliable and
computationally superior approach for parsimonious
structure selection is under investigation.
Table 1. Structure selection algorithm for NARX using
APCA
I. Let
2.
X
=m
=[_X,,,"_X~,~
..._x~,,] be the measured
5. CASE STUDY
variables.
Formulate the augmented matrix
2 =[_xi,, E m ] s.t. IS i S F
In this section, predictive ability of APCA has been
compared with the other standard techniques reported in the
literature for the inferential estimation of distillate
composition.
=m
Determine the variable ( k ) which is maximally
correlated with
Let X
=k,m
3. Choose X
&", =
4.
5.
,?'
(1 5 k 5 7i 1.
Inferential estimation
=_~.r,~
In this study, we consider the high purity distillation
column of Skogestad (1991). The column has 41 theoretical
trays including the condenser and re-boiler. The
temperatures of 5 , 7,9,11 and 13Ih trays from the top, feed
rate, and reflux rate were used for the static inferential
estimation of distillate composition. The reflux and the feed
rate were perturbed with PRBS signals and, the change in
temperature on each tray was recorded. The outputs were
assumed to he corrupted with gaussian measurement noise
to the extent of 5 % of the true values.
such that
,
=i,m
XI,, I; I S i S i i s i i f k
Decomposing ,Yi%m
in to its principal components
Augment the PCs such that, Z .
=,,m
Calculate Z
=zi
= [-,,m
I,
= Z T. Z
=1,m=,,m
Determine SSPE as shown in Equations (26) to
(30).
I. Determine the variable i that yields the least
SSPE.
8. Include that variable in the set of selected variables
and repeat the step 4.
=*.",.,,
X
=
-c.m
x. I
6.
A linear structure with sufficiently large number of lags
was used for the prediction. The Figure 1 shows the crossvalidation results for 50 samples obtained by TLS and
APCA. It can be seen in the figure that the predictions using
TLS are not only poor but also incoherent (the predicted
mole fraction of the distillate is more than one). It is mainly
due to the limitation of the TLS to handle correlation in tray
temperatures.
[L,"!
9. Use AIC to terminate the algorithm
It is important to note that the dependency in the variables
of time series models can completely mislead the proposed
algorithm. For example, let the structure under investigation
be,
Y r =3Yr-1 -2.5Ld-l +0.35Yl-l2 -o.4yl-lUl-l
(31)
Improvement in prediction using APCA is mainly due to
the effective utilization of information content in the causal
block. The performance of APCA is also compared with
other popular techniques used fur inferential estimations
namely PCR and PLS. The performance is reported in
4182
Proceedings of Ihe American Control Conference
Oenvet, Colorado June 4-6, 2003
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
CONCLUSION
Table 2 in terms of explained prediction of variance (EPV)
for a sample size of 300.
APCA is a promising technique for EIV modeling and can
successfully circumvent the problems of ill-conditionality
that has been the major limitations for the TLS based
methods. This paper shows the superiority of the proposed
technique over conventional multivariate statistical (MVS)
methods namely PCR and PLS under EIV formulation.
However, like the TLS, the technique would suffer when
the noise level in each of the measured variable is different.
Appendix A
Proof of Remark 1
Let
Simpler
Rigure 1. Cross-validation results from TLS and APCA
Table 2. Cross-validation results: Average EPV for 50
different runs
I
I APCA
I PCR
I PLS
EPV
1 98.6
I 96.5
I 91.2
Let =
R b e a new matrix ofrank r+I such that
I
I
I
I
I
I
Like =rm '
I
Initially, the data used for PCR and PLS was randomly
sorted out and using 'leaving out one' method, the required
numbers of latent variables were determined for the crossvalidation. 50 runs using inputs generated with different
seeds were conducted. The average EPV estimated from all
the runs show that APCA has a better predictive ability than
the other two techniques. The numbers of PCs to be used
for APCA can be determined from the scree-plot as shown
in Figure 2. For the above problem, the first 30 PCs were
selected, as the contributions of the remaining PCs are
almost insignificant towards explaining variance in the
causal block.
.-
g
-
is also an orthonormal and unitary matrix of
of size n + 1 (because, rank of X
=m
is r = Z ) .
RTX
R is a therefore a similaritly.transformation which
=<=
doesn't affect the eigenvalues of (AI), which means that
={
= RTT.
= =(=
eitenvoluex
) { X I , X,
% . . . X i i + lI
(A31
Simplifying (A3) further, the formulation reduces to
= =Z
E
Therefore, if the causal block is a full rank matrix, the
solution obtained through APCA is identical to T U
solution.
Appendix B
Proof of Remark 2.
The
can be decomposed
x
-m
into
rindependent
components. Therefore, the augmented covariance matrix is
given by
Numbsr of Componsnlr
Figure 2. Scree-plot to determine the number of
PCs to be used for APCA.
Since the r +ltheigenvalue is zero
4183
Proceedings of the American Control Conference
Denver, Colorado June 46,2003
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
Or
REFERENCES
S.A and W.S.F Voon (1986). Stepwiseregressiodprediction-error estimation for non-linear
systems. In/. J. Contro1,44(3),803-822.
Billings,
Chen, S., S.A.Billings and W.Luo (1989). Orthogonal Least
Squares Methods end Their Application to Nonlinear System Identification, Int.J.Control, 50(5),
1873-1896.
Huang, B. (2001). Process Identification Basdd on Last
Principal Component Analysis, Journal of Process
Conrrol. 11, 19-33.
Ljung L., 1987, “System Identification: Theory for the
User” Prentice-Hall, Englewood Cliffs, New Jersey.
Mejdell, T and S . Skogestad (1991). Estimation of
Distillation Composition from Multiple Temperature
Measurements
using
Partial-Least
-Square
Regession. Ind.Eng.Chem.Res, 30,2543-2555 .
Niu, S., D.G.Fisher and D.Xiao (1992). An Augmented UD
Identification Algorithm. Int.J.Control., 56(1).193211.
Nounou, M.N., B.R.B&shi, P.K.Goel and XShen (2002).
Process Modeling by Bayesian Latent Variable
Regression. AIChE Journal. 48(8), 1175-1793.
Roorda, S. and C. Heij (1995). Global total least squares
modeling for multivariable time series. IEEE
Transactions on Automatic Control, 40(1), 50-63.
Soderstrom, T and K.Mahata (2002). On Instrumental
Variable and Total Least Squares Approaches for
System Identification ‘of Noisy Systems. In/. J.
Conrro1.75(6),381-389.
Vijaysai, P and R.D.Gudi (2002). A new Subset Selection
Methodology for the Identification of Linearh’onlinear Systems, In Proceedings of International
Symposium on Advanced Control of Industrial
Processes ( AdCONIP’02), Kumamoto, Japan, 401407.
4184
Proceedings 07 the American Control Conference
Denver, Colorado June 4-6.2003
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.