System Identification using Augmented Principal Component Analysis P. Vijaysai', R.D. Gudi" and S. Lakshminarayanan' 2 ' Dept.of Chemical Engineering, IIT Bombay, Powai, Mumbai 400 076, India Dept. of Chemical and Environmental Engg., National University of Singapore, Singapore *Corresponding author: [email protected] Though the TLS and its derivatives are very powerful techniques, they don't account for the collinearity in the causal data (Huang (2001). Nounou et al. (2002)). The model parameters using TLS are obtained by the singular value decomposition of the augmented data block that includes both the causal variables and the output variable. The right singular vector corresponding to the smallest and hence the last eigenvalue describes the possible linear relation between the columns under the error-in-variable formulation. However, any collinearity in the causal block may yield two or more eigenvalues of smaller magnitude thus creating an ambiguous situation. This could pose difficulties in determining the last singular vector (last principal component). In this paper, we propose a novel scheme to effectively circumvent the problems of the collinearity in the causal data block. Abstract The total least squares (TLS) technique has been extensively used for the identification of dynamic systems when' both the inputs and outputs are corrupted with noise. But the major limitation of this technique has been the difficulty in identifying the actual parameters when the collinearity in the input data leads to several "small" eigenvalues. This paper proposes a novel technique namely augmented principal component analysis (APCA) to deal with collinearity problems in the error-in-variable formulation. The APCA formulation can also be used to determine the least squares prediction error when an appropriate operator is chosen. This property has been used for the nonlinear structure selection through forward selection methodology. The efficacy of the new technique has been illustrated through representative case studies taken from the literature. Keywords: Augmented PCA, Collinearity problems, Errorin-variables 1. INTRODUCTION The generalized least squares (GLS) and prediction error methods (PEM) have heen proven to be statistically efficient tools in system identification (Ljung, 1987, Soderstrom and Stoica, 1989). These techniques use nonlinear numerical optimization and under the assumption of inputs being noisy, the minimization could be time consuming and may even converge to local optima. An alternative and efficient approach that inherently assumes noise in all the variables is the total least squares (Roorda and Heij, 1995). Recently, the last principal component analysis (LPCA) was shown to yield a class of estimators when additional constraints were imposed on TLS (Huang, 2001). Soderstrom and Mahata (2002) showed the comparison between the asymptotic covariance matrix of TLS estimates and the instrumental variable method (IVM). 0-7803-7896-2/03/$17.00 02003 IEEE This paper is organized as follows. A detailed mathematical formulation of TLS that leads to the simplified analysis of APCA is presented in the Section 2. The Section 3 discusses some of the interesting mathematical properties of APCA. The Section 4 essentially dwells upon determining an operator for the APCA formulation that yields least squares prediction error. This property has been further used for the nonlinear structure selection in the APCA framework. In section 5, the superiority of APCA over other conventional techniques has been shown using illustrative examples taken from literature. 2. FORMULATION The mathematical formulation of the TLS has already been well documented in the literature. For the effective interpretation of the proposed APCA, the formulation of the TLS is briefly revisited here. CASE 1: TLS Let X E %%"he a well conditioned, mean centered input =m variable (causal) block such that E > E . The subscript m denotes the measured values. Let r be the rank of the measured block such that E = r . Similarly, let , ' l E 9Y be a mean centered vector of measured response (output variable). The augmented block is hence given by 4179 Proceedings 01 the American Control Conference Denver, Colorado June 4-6.2w3 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply. z =m yml = [=m X CASE 2: APCA (1) In this case, we seek to overcome the drawbacks IO the TLS technique, when the causal block is ill conditioned. This is done through reformulating the problem using components, which we term as augmented principal components (APC). The formulation is as below. Let us assume that X is a matrix of rank r such that Equation ( I ) can he expanded as z =m = [=oX Y,l+[ExYE,1 - (2) The subscript a denotes the actual values of all the variables. E and E are the errors associated with the =X =Y =m causal (input) and response (output) blocks respectively. The objective is to optimize the covariance in Z subjected r < i i . The above condition can mislead the criteria shown in Equation (1 1 ) as the problem can have i i + l - r n u m b e r =m of possible solutions. W e therefore reformulate this problem in two steps. 1. Maximize the variance in X such that the to the constraint F=88. -- (3) Here ' A * denotes the predicted value of the respective variable. The above constraint can be redefined as ^ " ~ [Xg-l']=Q, - :[ -; I =eE or [X U] =m information in X =m is compressed r orthogonal vectors z E Pxr . =m (4) The last principal component of the augmented matrix Z =[z U ] is hence the ( r + l )*vector. 2. .[-I] The first step can be posed as maximization of vector and after it is normalized J - ' T in Equation (4) is called as the extended parameter 1gfly, = gx into ;5m -, x x =m-m (12) j.-cjJ(jfji-l) ; ~ < i < r =",=*-I where j . are the orthonormal vectors. (5) -I or Hence 21 = 1 3,~Where (6) T JaJ= Z X =m=m--L x j . - 2 ~ ~- I , ~ j . = o (13) aji or, =m=m-, X ' X j , =c3,'ji ' Equation (6) yields an additional constraint such that l i T in Equation (14) is hence the principal component . In other words, r' = [-1-2 j j ... j =m (loading) of =m X 1 !=1. (7) Hence, objective function in the unconstrained form is (14) -7 ]E %"" gives the orthogonal directions along which the variance of X is maximum. Thus, by rotating X by f m we obtain -m the scores Differentiating Equation (8) with respect to r =m=m z =X =m =m that are the rdimensional orthogonal components. Therefore, Z can now he redefined as =m -c,~'~)!=q! Substituting Equation (6) in (IO), =m=m or.(Z rZ (10) (15) Thus the last principal component of the Z captures the =m zrz I=c$ where, !is a eigenvector of the covariance of =m Z . =m=m- z =[ImYml. =m (11) As Z is a full rank matrix, the covariance will have =m n + 1nonzero eigenvalues. The bracketed expression in Equation (10) is the covariance of the prediction error and 1 is also the eigenvector of the error covariance. Thus, if the variance due to error has to he minimum, c2 should he the last (smallest) eigenvalue of the covariance of &and accordingly 1 has to be the last eigenvector (last principal component). linear dependency between the transformed and well conditioned causal block with the response. The last principal component of the transformed block is therefore given by zTz =m,m-t 1 =R,L, (16) The coefficients of the Equation (15) are ...l,.JT I&,+, sr (17) However, the above coefficients do not exemplify the linear dependency between the original causal and response blocks. But then, correcting these values through the earlier rotation yields the coeffcicnts corresponding to the original variables. This is shown in Equation (18) 0,= -[!&, !&2 4180 E Proceedings of the American COIIBOI Conference Denver. Colorado June 4-6. 2003 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply. ij=rmij, Subsequently, (18) =m is a full rank matrix ( r = i i ) , the coeficients derived using APCA are same the TLS estimates. See Appendix A for the proof X =m 4. LEAST SQUARES BASED METHODS AND APCA =e) The Ez is a symmetric matrix, which holds a diagonal - is a rank deficient matrix(r<%). the APCA matrix of size r with the r + l z hrow and column being nonzero vectors. Using the law of matrices, the sum of the square prediction error (SSPE) of least squares can be proved to be equal to Remark 2. If if Em lies in the range of X =m , ( =x E == O;E, estimates are same as the solution obtained by the pseudo inverse. See Appendix B for the proof 3. SOME ADDITIONAL FEATURES OF We know that z=x = =XmT=Xm THE FORMULATION In this section, we propose an alternate method to estimate 8, the parameter vector . The formulation presented below is further used in proving the other properties of the method. As shown in Case 2, X can be decomposed as x =m can be estimated by substituting the 2 ,+' in Equation (23). Remark 1. If X and 8, -lrm.fm1 det(Z ) = =x (26) However, if det(Z ) needs to be evaluated, it is necessary =z to diagonalize Z . This is done by selecting an appropriate =z =m operator (19) Such that A = =r mT=zm =diag(Al,A2...A,) hAi . i=, 9~ s'+'~'+' such that (20) Where ,$are the eigenvalues of the covariance X, Where I is an identity matrix of rank rand 13,is a null =I Therefore, the covariance of the augmented PCs as shown in Equation (15) reduces to vector of size r . Treating the Z with the operators yield =Z A = =z= OZ O r = diag(Al,A,, (..,d, ,oE) Since $is - obtained by diagonalizing (28) 4, The eigenvector decomposition of Equation (2 I ) yields gzpi = k i i p i ; l < i 6 r + l It is to be noticed that er+,= !r,r+l . If 2 is the last eigenvalue, the solving 8, can be obtained by -zr+1B,=r;.,. (22) 8, = ( A - 2 ,+I )-'z;Y, (23) ( Remark 3: Or =O If =X m is a matrix full rank matrix, then rhe SSPE obtained through Equation (30) is equal to the SSPE of ordinary least squares. Note: It can be shown that X,+lcan also be determined by explicitly solving Equation (24). -Y T=zm ( A - k r + ~ L ) - ' E L Y+k,+l m -l'LE, Substituting Equation (26) and Equation (29) in Equation (25). SSPE = bE (30) Which is nothing but the last element of the matrix $ - Remark 4: However, if X (24) =m is a rank deficient matrix, the SSPE is equal to the predictions obtained through PCR, where the number of PCs chosen are equal f o the rank of matrix. Remark 5: I f is to be noted that APCA assumes error in all the variables. On the contrary, the SSPE shown above is for 4181 Proceedings of the American Control Conference Denver, Colorado June 4-6, 2003 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply. rhe case when all the variables of the causal block are assumed fo be error free and only rhe response is corrupted. The above shown formulation was successively used for the identification of nonlinear systems using NARX and NARMAX structures. The proposed algorithm is identical to the QR based algorithms proposed by Chen et al (1989) but uses SVD based methods, which are numerically superior to QR decomposition. The algorithm is presented below. This algorithm requires the re-estimation of PCs, each time a new variable is added. See step 4 in table 1. The standard algorithm used to determine the PC is either SVD or NIPALS. But in order to improve the computational efficiency, a plane-rotation algorithm coupled with modified NIPALS has been proposed. The details of the algorithm are reported in (Vijaysai and Gudi, 2002). Using the white input signal U , , the output y , is generated using Equation (31). Even if the order of the non-linearity is precisely known and no additional noise is added, the identified structure using the QR based methods and the proposed algorithm could he entirely different and non parsimonious. It is mainly due to the reason that when the output lag in Equation (31) is further expanded, the lags of higher order receive greater weightage. In other words, the coefficients associated with the higher order lags will be significantly greater than the coefficients of the lower order lags. Therefore, the algorithm starts with the selecting variables that are essentially higher order lags and could end-up with identifying a structure that has completely missed the actualhue variables. Though there are algorithms for estimating parsimonious structures (Billings and Voon, 1986). they are computationally expensive. A more reliable and computationally superior approach for parsimonious structure selection is under investigation. Table 1. Structure selection algorithm for NARX using APCA I. Let 2. X =m =[_X,,,"_X~,~ ..._x~,,] be the measured 5. CASE STUDY variables. Formulate the augmented matrix 2 =[_xi,, E m ] s.t. IS i S F In this section, predictive ability of APCA has been compared with the other standard techniques reported in the literature for the inferential estimation of distillate composition. =m Determine the variable ( k ) which is maximally correlated with Let X =k,m 3. Choose X &", = 4. 5. ,?' (1 5 k 5 7i 1. Inferential estimation =_~.r,~ In this study, we consider the high purity distillation column of Skogestad (1991). The column has 41 theoretical trays including the condenser and re-boiler. The temperatures of 5 , 7,9,11 and 13Ih trays from the top, feed rate, and reflux rate were used for the static inferential estimation of distillate composition. The reflux and the feed rate were perturbed with PRBS signals and, the change in temperature on each tray was recorded. The outputs were assumed to he corrupted with gaussian measurement noise to the extent of 5 % of the true values. such that , =i,m XI,, I; I S i S i i s i i f k Decomposing ,Yi%m in to its principal components Augment the PCs such that, Z . =,,m Calculate Z =zi = [-,,m I, = Z T. Z =1,m=,,m Determine SSPE as shown in Equations (26) to (30). I. Determine the variable i that yields the least SSPE. 8. Include that variable in the set of selected variables and repeat the step 4. =*.",.,, X = -c.m x. I 6. A linear structure with sufficiently large number of lags was used for the prediction. The Figure 1 shows the crossvalidation results for 50 samples obtained by TLS and APCA. It can be seen in the figure that the predictions using TLS are not only poor but also incoherent (the predicted mole fraction of the distillate is more than one). It is mainly due to the limitation of the TLS to handle correlation in tray temperatures. [L,"! 9. Use AIC to terminate the algorithm It is important to note that the dependency in the variables of time series models can completely mislead the proposed algorithm. For example, let the structure under investigation be, Y r =3Yr-1 -2.5Ld-l +0.35Yl-l2 -o.4yl-lUl-l (31) Improvement in prediction using APCA is mainly due to the effective utilization of information content in the causal block. The performance of APCA is also compared with other popular techniques used fur inferential estimations namely PCR and PLS. The performance is reported in 4182 Proceedings of Ihe American Control Conference Oenvet, Colorado June 4-6, 2003 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply. CONCLUSION Table 2 in terms of explained prediction of variance (EPV) for a sample size of 300. APCA is a promising technique for EIV modeling and can successfully circumvent the problems of ill-conditionality that has been the major limitations for the TLS based methods. This paper shows the superiority of the proposed technique over conventional multivariate statistical (MVS) methods namely PCR and PLS under EIV formulation. However, like the TLS, the technique would suffer when the noise level in each of the measured variable is different. Appendix A Proof of Remark 1 Let Simpler Rigure 1. Cross-validation results from TLS and APCA Table 2. Cross-validation results: Average EPV for 50 different runs I I APCA I PCR I PLS EPV 1 98.6 I 96.5 I 91.2 Let = R b e a new matrix ofrank r+I such that I I I I I I Like =rm ' I Initially, the data used for PCR and PLS was randomly sorted out and using 'leaving out one' method, the required numbers of latent variables were determined for the crossvalidation. 50 runs using inputs generated with different seeds were conducted. The average EPV estimated from all the runs show that APCA has a better predictive ability than the other two techniques. The numbers of PCs to be used for APCA can be determined from the scree-plot as shown in Figure 2. For the above problem, the first 30 PCs were selected, as the contributions of the remaining PCs are almost insignificant towards explaining variance in the causal block. .- g - is also an orthonormal and unitary matrix of of size n + 1 (because, rank of X =m is r = Z ) . RTX R is a therefore a similaritly.transformation which =<= doesn't affect the eigenvalues of (AI), which means that ={ = RTT. = =(= eitenvoluex ) { X I , X, % . . . X i i + lI (A31 Simplifying (A3) further, the formulation reduces to = =Z E Therefore, if the causal block is a full rank matrix, the solution obtained through APCA is identical to T U solution. Appendix B Proof of Remark 2. The can be decomposed x -m into rindependent components. Therefore, the augmented covariance matrix is given by Numbsr of Componsnlr Figure 2. Scree-plot to determine the number of PCs to be used for APCA. Since the r +ltheigenvalue is zero 4183 Proceedings of the American Control Conference Denver, Colorado June 46,2003 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply. Or REFERENCES S.A and W.S.F Voon (1986). Stepwiseregressiodprediction-error estimation for non-linear systems. In/. J. Contro1,44(3),803-822. Billings, Chen, S., S.A.Billings and W.Luo (1989). Orthogonal Least Squares Methods end Their Application to Nonlinear System Identification, Int.J.Control, 50(5), 1873-1896. Huang, B. (2001). Process Identification Basdd on Last Principal Component Analysis, Journal of Process Conrrol. 11, 19-33. Ljung L., 1987, “System Identification: Theory for the User” Prentice-Hall, Englewood Cliffs, New Jersey. Mejdell, T and S . Skogestad (1991). Estimation of Distillation Composition from Multiple Temperature Measurements using Partial-Least -Square Regession. Ind.Eng.Chem.Res, 30,2543-2555 . Niu, S., D.G.Fisher and D.Xiao (1992). An Augmented UD Identification Algorithm. Int.J.Control., 56(1).193211. Nounou, M.N., B.R.B&shi, P.K.Goel and XShen (2002). Process Modeling by Bayesian Latent Variable Regression. AIChE Journal. 48(8), 1175-1793. Roorda, S. and C. Heij (1995). Global total least squares modeling for multivariable time series. IEEE Transactions on Automatic Control, 40(1), 50-63. Soderstrom, T and K.Mahata (2002). On Instrumental Variable and Total Least Squares Approaches for System Identification ‘of Noisy Systems. In/. J. Conrro1.75(6),381-389. Vijaysai, P and R.D.Gudi (2002). A new Subset Selection Methodology for the Identification of Linearh’onlinear Systems, In Proceedings of International Symposium on Advanced Control of Industrial Processes ( AdCONIP’02), Kumamoto, Japan, 401407. 4184 Proceedings 07 the American Control Conference Denver, Colorado June 4-6.2003 Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 12, 2009 at 01:56 from IEEE Xplore. Restrictions apply.
© Copyright 2025 Paperzz