VI. Principal Components Analysis A. The Basic Principle We wish to explain/summarize the underlying variancecovariance structure of a large set of variables through a few linear combinations of these variables. The objectives of principal components analysis are - data reduction - interpretation The results of principal components analysis are often used as inputs to - regression analysis - cluster analysis B. Population Principal Components Suppose we have a population measured on p random variables X1,…,Xp. Note that these random variables represent the p-axes of the Cartesian coordinate system in which the population resides. Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability: X2 X1 This is accomplished by rotating the axes. Consider our random vector X1 X 2 X = X p with covariance matrix S ~ and eigenvalues l1 l2 lp. We can construct p linear combinations Y1 = a'1 X = a11 X 1 + a12 X 2 + + a1p X p Y2 = a'2 X = a21 X 1 + a22 X 2 + + a2p X p Yp = a'p X = a p1 X 1 + a p2 X 2 + + a pp X p It is easy to show that Var Yi = a'iΣai, i = 1, ,p Cov Yi, Yk = a'iΣak, i, k = 1, ,p The principal components are those uncorrelated linear combinations Y1,…,Yp whose variances are as large as possible. Thus the first principal component is the linear combination of maximum variance, i.e., we wish to solve the nonlinear optimization problem source of nonlinearity max a'1Σa1 a1 st a'1a1 = 1 restrict to coefficient vectors of unit length The second principal component is the linear combination of maximum variance that is uncorrelated with the first principal component, i.e., we wish to solve the nonlinear optimization problem max a'2Σa2 a2 st ' 2 a a2 = 1 a'1Σa2 = 0 restricts covariance to zero The third principal component is the solution to the nonlinear optimization problem max a'3Σa3 a3 st a'3a3 = 1 ' 1 ' 2 a Σa3 = 0 a Σa3 = 0 restricts covariances to zero Generally, the ith principal component is the linear combination of maximum variance that is uncorrelated with all previous principal components, i.e., we wish to solve the nonlinear optimization problem max a'iΣai ai st ' i ' k a ai = 1 a Σai = 0 k < i We can show that, for random vector X with covariance ~ matrix S~ and eigenvalues l1 l2 lp 0, the ith principal component is given by Yi = e'iX = e'i1X1 + e'i2X 2 + + e'ip X p, i = 1, Note that the principal components are not unique if some eigenvalues are equal. ,p We can also show for random vector X with covariance ~ matrix S and eigenvalue-eigenvector pairs (l1 , e1), …, (lp , ~ ~ e~p) where l1 l2 lp, p σ11 + + σ pp = Var X i p = λ1 + i =1 + λp = Var Y i i =1 so we can assess how well a subset of the principal components Yi summarizes the original random variables Xi – one common method of doing so is λk p λ i=1 i proportion of total population variance due to the kth principal component If a large proportion of the total population variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! We can also easily find the correlations between the original random variables Xk and the principal components Yi: ρYi,Xk = eik λi σkk These values are often used in interpreting the principal components Yi. Example: Suppose we have the following population of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 X3 9.0 10.0 15.0 12.0 Find the three population principal components Y1, Y2, and Y3: First we need the covariance matrix S: ~ 1.50 2.50 1.00 Σ = 2.50 6.00 3.50 1.00 3.50 5.25 and the corresponding eigenvalue-eigenvector pairs: 0.2910381 λ1 = 9.9145474, e1 = 0.7342493 0.6133309 0.4150386 λ2 = 2.5344988, e2 = 0.4807165 -0.7724340 0.8619976 λ3 = 0.3009542, e3 = -0.4793640 0.1648350 so the principal components are: Y1 = e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3 Y2 = e'2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3 Y3 = e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3 Note that σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0 = 9.9145474 + 2.5344988 + 0.3009542 = λ1 + λ2 + λ3 and the proportion of total population variance due to the each principal component is λ1 9.9145474 = = 0.777611529 17.0 p λ i i=1 λ2 2.5344988 = = 0.198784220 17.0 p λ i i=1 λ3 0.3009542 = = 0.023604251 17.0 p λ i i=1 Note that the third principal component is relatively irrelevant! Next we obtain the correlations between the original random variables Xi and the principal components Yi: ρY1,X1 = ρY1,X2 = ρY1,X3 = ρY2,X1 = ρY2,X2 = e11 λ1 σ11 e21 λ1 σ22 e31 λ1 σ33 e12 λ2 σ11 e22 λ2 σ21 0.2910381 9.9145474 = = 0.610935027 1.50 0.7342493 9.9145474 = = 0.385326368 6.00 0.6133309 9.9145474 = = 0.367851033 5.25 0.4150386 2.5344988 = = 0.440497325 1.50 0.4807165 2.5344988 = = 0.127550987 6.00 ρY2,X3 = ρY3,X1 = ρY3,X2 = ρY3,X3 = e32 λ2 σ33 e13 λ3 σ11 e23 λ3 σ22 e33 λ3 σ33 -0.7724340 2.5344988 = = -0.234233023 5.25 0.8619976 0.3009542 = = 0.315257191 1.50 -0.4793640 0.3009542 = = -0.043829283 6.00 0.1648350 0.3009542 = = 0.017224251 5.25 We can display these results in a correlation matrix: Y1 Y2 Y3 X1 X2 X3 0.6109350 0.3853264 0.3678510 0.4404973 0.1275510 -0.2342330 0.3152572 -0.0438293 0.0172243 Here we can easily see that - the first principal component (Y1) is a mixture of all three random variables (X1, X2, and X3) - the second principal component (Y2) is a trade-off between X1 and X3 - the third principal component (Y3) is a residual of X1 When the principal components are derived from an X ~ ~ Np(m,S) distributed population, the density of X is ~~ ~ constant on the m-centered ellipsoids ~ ' x - μ Σ x - μ = c2 which have axes c λi, i = 1, , p where (li,ei) are the eigenvalue-eigenvector pairs of S. ~ ~ We can set m = 0 w.l.g. – we can then write ~ ~ 1 ' c = x Σx = e1x λ1 2 ' 2 + 1 ' + ep x λp 2 where the e'i x are the principal components of x. ~ Setting y i = e'i x and substituting into the previous expression yields 1 2 c = y1 + λ1 2 1 2 + yp λp which defines an ellipsoid (note that li > 0 i) in a coordinate system with axes y1,…,yp lying in the ~ directions of e~~1,…,e ~ p, respectively. The major axis lies in the direction determined by the eigenvector ei associated with the largest eigenvalue li ~ the remaining minor axes lie in the directions determined by the other eigenvectors. Example: For the principal components derived from the following population of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 plot the major and minor axes. X3 9.0 10.0 15.0 12.0 We will need the centroid m: 3.0 μ = 10.0 11.5 The direction of the major axis is given by e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3 while the directions of the two minor axis are given by e'2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3 e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3 We first graph the centroid: X2 3.0,10.0,15.0 X3 X1 …then use the first eigenvector to find a second point on the first principal axis: X2 Y1 X1 X3 The line connecting these two points is the Y1 axis. …then do the same thing with the second eigenvector: Y2 X2 Y1 X1 X3 The line connecting these two points is the Y2 axis. …and do the same thing with the third eigenvector: Y2 X2 Y1 X1 Y3 X3 The line connecting these two points is the Y3 axis. What we have done is a rotation… Y2 X2 Y1 X1 Y3 X3 and a translation in p = 3 dimensions. Y2 Y2 X2 Note that the rotated axes remain orthogonal! Y1 X1 Y3 X3 Note that we can also construct principal components for the standardized variables Zi: X i - μi Zi = , i = 1, , p σii which in matrix notation is X - μ Z = V 12 -1 where V1/2 is the diagonal standard deviation matrix. ~ Obviously E Z = 0 Cov Z = V 12 -1 Σ V 12 -1 = ρ This suggests that the principal components for the standardized variables Zi may be obtained from the eigenvectors of the correlation matrix r! The operations ~ are analogous to those used in conjunction with the covariance matrix. We can show that, for random vector Z of standardized ~ variables with covariance matrix r and eigenvalues l1 l2 ~ th lp 0, the i principal component is given by ' i ' i Yi = e Z = e V X - μ , i 12 -1 = 1, ,p Note again that the principal components are not unique if some eigenvalues are equal. We can also show for random vector Z with covariance ~ matrix r and eigenvalue-eigenvector pairs (l1 , e1), …, (lp , ~ ~ ep) where l1 l2 lp, ~ p Var Z i p = λ1 + + λp = i =1 Var Y i = p i =1 and we can again assess how well a subset of the principal components Yi summarizes the original random variables Xi by using λk p proportion of total population variance due to the kth principal component If a large proportion of the total population variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! Example: Suppose we have the following population of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 X3 9.0 10.0 15.0 12.0 Find the three population principal components variables Y1, Y2, and Y3 for the standardized random variables Z1, Z2, and Z3: We could standardize the variables X1, X2, and X3, then work with the resulting covariance matrix S, ~ but it is much easier to proceed directly with correlation matrix r: 1.000 0.833 0.356 ρ = 0.833 1.000 0.624 0.356 0.624 1.000 and the corresponding eigenvalue-eigenvector pairs: ~ 0.58437383 λ1 = 2.2149347, e1 = 0.63457754 0.50578527 -0.5449250 λ2 = 0.6226418, e2 = -0.1549791 0.8240377 0.6013018 λ3 = 0.1624235, e3 = -0.7571610 0.2552315 These results differ from the covariancebased principal components! so the principal components are: Y1 = e'1Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3 Y2 = e'2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3 Y3 = e'3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3 Note that σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0 = 2.2149347 + 0.6226418 + 0.1624235 = λ1 + λ2 + λ3 and the proportion of total population variance due to the each principal component is λ1 2.2149347 = = 0.738311567 3.0 p λ i i=1 λ2 0.6226418 = = 0.207547267 3 p λ i i=1 λ3 0.1624235 = = 0.054141167 3 p λ i i=1 Note that the third principal component is again relatively irrelevant! Next we obtain the correlations between the original random variables Xi and the principal components Yi: ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464 ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907 ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749 ρY2,Z1 = e12 λ2 = -0.5449250 0.6226418 = -0.429987538 ρY2,Z2 = e22 λ2 = -0.1549791 0.6226418 = -0.122290294 ρY2,X3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824 ρY3,X1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443 ρY3,X2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504 ρY3,X3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886 We can display these results in a correlation matrix: Y1 Y2 Y3 Z1 0.8697035 -0.4299875 0.2423354 Z2 Z3 0.944420 0.7527427 -0.122290 0.6502288 -0.305150 0.1028629 Here we can easily see that - the first principal component (Y1) is a mixture of all three random variables (X1, X2, and X3) - the second principal component (Y2) is a trade-off between X1 and X3 - the third principal component (Y3) is a trade-off between X1 and X2 SAS code for Principal Components Analysis: OPTIONS LINESIZE=72 NODATE PAGENO=1; DATA stuff; INPUT x1 x2 x3; LABEL x1='Random Variable 1' x2='Random Variable 2' x3='Random Variable 3'; CARDS; 1.0 6.0 9.0 4.0 12.0 10.0 3.0 12.0 15.0 4.0 10.0 12.0 ; PROC PRINCOMP DATA=stuff OUT=pcstuff N=3; VAR x1 x2 x3; RUN; PROC CORR DATA=pcstuff; VAR x1 x2 x3; WITH prin1 prin2 prin3; RUN; PROC FACTOR DATA=stuff SCREE; VAR x1 x2 x3; RUN; Note that the SAS default is to use the correlation matrix to perform this analysis! SAS output for Principal Components Analysis: The PRINCOMP Procedure Observations 4 Variables 3 Mean StD Correlation Matrix x1 Random Variable 1 1.0000 Random Variable 2 0.8333 Random Variable 3 0.3563 x1 x2 x3 1 2 3 x1 x2 x3 Simple Statistics x1 x2 3.000000000 10.00000000 1.414213562 2.82842712 x3 11.50000000 2.64575131 x2 0.8333 1.0000 0.6236 x3 0.3563 0.6236 1.0000 Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative 2.22945702 1.56733894 0.7432 0.7432 0.66211808 0.55369318 0.2207 0.9639 0.10842490 0.0361 1.0000 Random Variable 1 Random Variable 2 Random Variable 3 Eigenvectors Prin1 0.581128 0.645363 0.495779 Prin2 -0.562643 -0.121542 0.817717 Prin3 0.587982 -0.754145 0.292477 SAS output for Correlation Matrix – Original Random Variables vs. Principal Components: The CORR Procedure 3 With Variables: 3 Variables: Variable Prin1 Prin2 Prin3 x1 x2 x3 N 4 4 4 4 4 4 Prin1 x1 Prin2 x2 Simple Statistics Mean Std Dev 0 1.49314 0 0.81371 0 0.32928 3.00000 1.41421 10.00000 2.82843 11.50000 2.64575 Prin3 x3 Sum 0 0 0 12.00000 40.00000 46.00000 Minimum -2.20299 -0.94739 -0.28331 1.00000 6.00000 9.00000 Pearson Correlation Coefficients, N = 4 Prob > |r| under H0: Rho=0 x1 x2 x3 Prin1 0.86770 0.1323 0.96362 0.0364 0.74027 0.2597 Prin2 -0.45783 0.5422 -0.09890 0.9011 0.66538 0.3346 Prin3 0.19361 0.8064 -0.24832 0.7517 0.09631 0.9037 Maximum 1.11219 0.99579 0.47104 4.00000 12.00000 15.00000 SAS output for Factor Analysis PRINCIPAL COMPONENTS ANALYSIS FOR QA 610 SPRING QUARTER 2001 Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis The FACTOR Procedure Initial Factor Method: Principal Components Prior Communality Estimates: ONE Eigenvalues of the Correlation Matrix: Total = 3 1 2 3 Average = 1 Eigenvalue Difference Proportion Cumulative 2.22945702 0.66211808 0.10842490 1.56733894 0.55369318 0.7432 0.2207 0.0361 0.7432 0.9639 1.0000 1 factor will be retained by the MINEIGEN criterion. Note that this is consistent with the results from PCA SAS output for Factor Analysis The FACTOR Procedure Initial Factor Method: Principal Components Scree Plot of Eigenvalues ‚ ‚ ‚ ‚ ‚ ‚ 2.5 ˆ ‚ ‚ ‚ 1 ‚ ‚ 2.0 ˆ ‚ ‚ E ‚ i ‚ g ‚ e 1.5 ˆ n ‚ v ‚ a ‚ l ‚ u ‚ e 1.0 ˆ s ‚ ‚ ‚ ‚ 2 ‚ 0.5 ˆ ‚ ‚ ‚ ‚ ‚ 3 0.0 ˆ ‚ ‚ ‚ ‚ ‚ Šƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ 0 1 2 3 Number SAS output for Factor Analysis The FACTOR Procedure Initial Factor Method: Principal Components Factor Pattern Factor1 x1 x2 x3 Random Variable 1 Random Variable 2 Random Variable 3 0.86770 0.96362 0.74027 Variance Explained by Each Factor Factor1 Pearson Correlation Coefficients for the first principal component with the three original variables X1, X2, and X3 First eigenvalue l1 2.2294570 Final Communality Estimates: Total = 2.229457 x1 x2 x3 0.75291032 0.92855392 0.54799278 SAS code for Principal Components Analysis: OPTIONS LINESIZE=72 NODATE PAGENO=1; DATA stuff; INPUT x1 x2 x3; LABEL x1='Random Variable 1' x2='Random Variable 2' x3='Random Variable 3'; CARDS; 1.0 6.0 9.0 4.0 12.0 10.0 3.0 12.0 15.0 4.0 10.0 12.0 ; PROC PRINCOMP DATA=stuff OUT=pcstuff N=3 COV; VAR x1 x2 x3; RUN; PROC CORR DATA=pcstuff; VAR x1 x2 x3; WITH prin1 prin2 prin3; RUN; PROC FACTOR DATA=stuff SCREE COV; VAR x1 x2 x3; RUN; Note that here we use SAS to derive the covariance matrix based principal components! SAS output for Principal Components Analysis: The PRINCOMP Procedure Observations 4 Variables 3 Mean StD x1 x2 x3 Simple Statistics x1 x2 3.000000000 10.00000000 1.414213562 2.82842712 Random Variable 1 Random Variable 2 Random Variable 3 Covariance Matrix x1 x2 2.000000000 3.333333333 3.333333333 8.000000000 1.333333333 4.666666667 Total Variance 1 2 3 x1 x2 x3 x3 11.50000000 2.64575131 17 Eigenvalues of the Covariance Matrix Eigenvalue Difference Proportion 13.2193960 9.8400643 0.7776 3.3793317 2.9780594 0.1988 0.4012723 0.0236 Random Variable 1 Random Variable 2 Random Variable 3 Eigenvectors Prin1 0.291038 0.734249 0.613331 x3 1.333333333 4.666666667 7.000000000 Prin2 0.415039 0.480716 -.772434 Cumulative 0.7776 0.9764 1.0000 Prin3 0.861998 -.479364 0.164835 SAS output for Correlation Matrix – Original Random Variables vs. Principal Components: The CORR Procedure 3 With Variables: 3 Variables: Variable Prin1 Prin2 Prin3 x1 x2 x3 N 4 4 4 4 4 4 Prin1 x1 Prin2 x2 Simple Statistics Mean Std Dev 0 3.63585 0 1.83830 0 0.63346 3.00000 1.41421 10.00000 2.82843 11.50000 2.64575 Prin3 x3 Sum 0 0 0 12.00000 40.00000 46.00000 Minimum -5.05240 -1.74209 -0.38181 1.00000 6.00000 9.00000 Pearson Correlation Coefficients, N = 4 Prob > |r| under H0: Rho=0 x1 x2 x3 Prin1 0.74824 0.2518 0.94385 0.0561 0.84285 0.1571 Prin2 0.53950 0.4605 0.31243 0.6876 -0.53670 0.4633 Prin3 0.38611 0.6139 -0.10736 0.8926 0.03947 0.9605 Maximum 3.61516 2.53512 0.94442 4.00000 12.00000 15.00000 SAS output for Factor Analysis PRINCIPAL COMPONENTS ANALYSIS FOR QA 610 SPRING QUARTER 2001 Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis The FACTOR Procedure Initial Factor Method: Principal Components Prior Communality Estimates: ONE Eigenvalues of the Covariance Matrix: Total = 17 1 2 3 Average = 5.66666667 Eigenvalue Difference Proportion Cumulative 13.2193960 3.3793317 0.4012723 9.8400643 2.9780594 0.7776 0.1988 0.0236 0.7776 0.9764 1.0000 1 factor will be retained by the MINEIGEN criterion. Note that this is consistent with the results from PCA SAS output for Factor Analysis The FACTOR Procedure Initial Factor Method: Principal Components Scree Plot of Eigenvalues ‚ ‚ ‚ 14 ˆ ‚ ‚ 1 ‚ ‚ 12 ˆ ‚ ‚ ‚ ‚ 10 ˆ ‚ E ‚ i ‚ g ‚ e 8 ˆ n ‚ v ‚ a ‚ l ‚ u 6 ˆ e ‚ s ‚ ‚ ‚ 4 ˆ ‚ ‚ 2 ‚ ‚ 2 ˆ ‚ ‚ ‚ ‚ 3 0 ˆ ‚ ‚ ‚ Šƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒ 0 1 2 3 Number SAS output for Factor Analysis The FACTOR Procedure Initial Factor Method: Principal Components Factor Pattern Factor1 x1 x2 x3 Random Variable 1 Random Variable 2 Random Variable 3 0.74824 0.94385 0.84285 Variance Explained by Each Factor Factor Factor1 Weighted Unweighted 13.2193960 2.16112149 Pearson Correlation Coefficients for the first principal component with the three original variables X1, X2, and X3 First eigenvalue l1 Final Communality Estimates and Variable Weights Total Communality: Weighted = 13.219396 Unweighted = 2.161121 Variable x1 x2 x3 Communality Weight 0.55986257 0.89085847 0.71040045 2.00000000 8.00000000 7.00000000 Covariance matrices with special structures yield particularly interesting principal components: - Diagonal covariance matrices – suppose S is the diagonal ~ matrix 0 0 σ 11 0 Σ = 0 0 σ pp σ22 0 since the eigenvector e~ i has a value of 1 in the ith position and 0 in all other positions, we have σ11 0 0 σ 22 Σei = 0 0 0 0 σ pp 0 th so ( s ,e ) is the i ii i eigenvalue-eigenvecotr 0 pair σii = σiiei 0 0 …so the linear combination Yk = Σe'i X = X i demonstrates that the set of principal components and the original set of (uncorrelated) random variables are the same! Note that this result is also true if we work with the correlation matrix. - constant variances and covariance matrices – suppose S is ~ the patterned matrix σ2 ρσ2 ρσ2 2 2 ρσ σ Σ = 2 2 ρσ ρσ ρσ 2 σ 2 Here the resulting correlation matrix 1 ρ ρ 1 ρ = ρ ρ ρ ρ 1 is also the covariance matrix of the standardized variables Z Here the resulting correlation matrix ~ C. Using Principal Components to Summarize Sample Variation Suppose the data x~1,…,x represent n independent ~n observations from a p-dimensional population with some mean vector m and covariance matrix S – these data ~ _ ~ yield a sample mean vector ~x, sample covariance matrix S, ~ and sample correlation matrix R. ~ As in the population case, our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability: y1 = a'1x = a11x1 + a12x 2 + + a1p x p y 2 = a'2x = a21x1 + a22x 2 + + a2p x p y p = a'p x = a p1x1 + a p2x 2 + + a pp x p Again it is easy to show that the linear combinations a'ix = ai1xj1 + ai2xj2 + + aipxjp have sample means a'i x and = a Sa , i = 1, , p Cov a x, a x = a Sa , i, k = 1, Var a'i x ' i ' i i ' k ' i k ,p The principal components are those uncorrelated linear ^ combinations y^ 1,…,y p whose variances are as large as possible. Thus the first principal component is the linear combination of maximum sample variance, i.e., we wish to solve the nonlinear optimization problem source of nonlinearity max a'1Sa1 a1 st a'1a1 = 1 restrict to coefficient vectors of unit length The second principal component is the linear combination of maximum sample variance that is uncorrelated with the first principal component, i.e., we wish to solve the nonlinear optimization problem max a'2Sa2 a2 st ' 2 a a2 = 1 a'1Sa2 = 0 restricts covariance to zero The third principal component is the solution to the nonlinear optimization problem max a'3Sa3 a3 st a'3a3 = 1 ' 1 ' 2 a Sa3 = 0 a Sa3 = 0 restricts covariances to zero Generally, the ith principal component is the linear combination of maximum sample variance that is uncorrelated with all previous principal components, i.e., we wish to solve the nonlinear optimization problem max a'iSai ai st ' i ' k a ai = 1 a Sai = 0 k < i We can show that, for random sample X with sample ~ ^ ^ ^ covariance matrix S ~ and eigenvalues l1 l2 lp 0, the ith sample principal component is given by ˆ'ix = e ˆ'i1x1 + e ˆ'i2x2 + ˆi = e y ˆ'ipxp, i = 1, +e Note that the principal components are not unique if some eigenvalues are equal. ,p We can also show for random sample X with sample ~ ^ covariance matrix S and eigenvalue-eigenvector pairs (l1 , ^ ^ ^ ^ ^ ~ ^ e~1), …, (lp , e~p) where l~1 l~2 l~p, p s11 + + spp = s ii = λˆ1 + i =1 + λˆ p = p Var y i i =1 so we can assess how well a subset of the principal components yi summarizes the original random sample X ~ – one common method of doing so is ˆ λ k p i=1 ˆ λ i proportion of total sample variance due to the kth principal component If a large proportion of the total sample variance can be attributed to relatively few principal components, we can replace the original p variables with these principal components without loss of much information! We can also easily find the correlations between the original random variables xk and the principal components yi rYi,X k = eˆik λˆi skk These values are often used in interpreting the principal components yi. Note that - the approach for standardized data (i.e., principal components derived from the sample correlation matrix R) is analogous to the population approach ~ - when principal components are derived from sample data, the sample data are frequently centered, x-x which has no effect on the sample covariance matrix S ~ and yields the derived principal components ˆ'i x - x ˆi = e y Under these circumstances, the mean value of the ith principal component associated with all n observations in the data set is 1 ˆ yi = n n n 1 1 ' ' ' ˆ ˆ ei xj - x = ei xj - x = eˆi0 = 0 n j=1 n j=1 Example: Suppose we have the following sample of four observations made on three random variables X1, X2, and X3: X1 1.0 4.0 3.0 4.0 X2 6.0 12.0 12.0 10.0 X3 9.0 10.0 15.0 12.0 Find the three sample principal components y1, y2, and y3 based on the sample covariance matrix S: ~ First we need the sample covariance matrix S: ~ 2.00 3.33 1.33 S = 3.33 8.00 4.67 1.33 4.67 7.00 and the corresponding eigenvalue-eigenvector pairs: ˆ = 13.21944, e ˆ1 λ 1 ˆ = λ 2 ˆ2 3.37916, e ˆ = λ 3 ˆ3 0.40140, e 0.291000 = 0.734253 0.613345 0.415126 = 0.480690 -0.772403 0.861968 = -0.479385 0.164927 so the principal components are: ˆy1 = e'1x = 0.291000x1 + 0.734253x2 + 0.613345x3 ˆy2 = e'2x = 0.415126x1 + 0.480690x2 - 0.772403x3 ˆy3 = e'3x = 0.861968x1 - 0.479385x2 + 0.164927x3 Note that s11 + s22 + s33 = 2.0 + 8.0 + 7.0 = 17.0 = 13.21944 + 3.37916 + 0.40140 = λˆ1 + λˆ2 + λˆ3 and the proportion of total population variance due to the each principal component is ˆ λ 1 p ˆ λ i 13.21944 = = 0.777613814 17.0 i=1 ˆ λ 2 p ˆ λ i 3.37916 = = 0.198774404 17.0 i =1 ˆ λ 3 p ˆ λ i 0.40140 = = 0.023611782 17.0 i =1 Note that the third principal component is relatively irrelevant! Next we obtain the correlations between the observed values xi of the original random variables and the sample principal components yik ry1,x1 = ry1,x2 = ry1,x3 = ry2,x1 = ry2,x2 = ˆe11 λˆ1 s11 eˆ λˆ 21 1 s22 ˆ e λˆ 31 1 s33 ˆe12 λˆ2 s11 ˆ e22 λˆ2 s21 0.291000 13.21944 = = 0.529016407 2.0 0.734253 13.21944 = = 0.333704415 8.0 0.613345 13.21944 = = 0.318576185 7.0 0.415126 3.37916 = = 0.381552972 2.0 0.480690 3.37916 = = 0.110453671 8.0 ry2,x3 = ry3,x1 = ry3,x2 = ry3,x3 = ˆ32 λˆ2 e s33 ˆ λˆ e 13 3 s11 ˆ e λˆ 23 3 s22 ˆ e λˆ 33 s33 3 -0.772403 3.37916 = = -0.202838600 7.0 0.861968 0.40140 = = 0.273055007 2.0 -0.479385 0.40140 = = -0.037964991 8.0 0.164927 0.40140 = = 0.014927318 7.0 We can display these results in a correlation matrix: Y1 Y2 Y3 X1 X2 X3 0.529016 0.333704 0.318576 0.381553 0.110454 -0.202839 0.273055 -0.037965 0.014927 How would we interpret these results? Note that results based on the sample correlation matrix R will not differ from results based on the population ~ correlation matrix r (why?). ~ SAS code for Principal Components Analysis: OPTIONS LINESIZE=72 NODATE PAGENO=1; DATA stuff; INPUT x1 x2 x3; LABEL x1='Random Variable 1' x2='Random Variable 2' x3='Random Variable 3'; CARDS; 1.0 6.0 9.0 4.0 12.0 10.0 3.0 12.0 15.0 4.0 10.0 12.0 ; PROC PRINCOMP DATA=stuff COV OUT=pcstuff; VAR x1 x2 x3; TITLE4 'Using PROC PRINCOMP for Principal Components Analysis'; RUN; PROC CORR DATA=pcstuff; VAR x1 x2 x3; used to instruct SAS to perform WITH prin1 prin2 prin3; the principal components run; analysis on the sample covariance rather than the default (correlation matrix)! SAS output for Principal Components Analysis: The PRINCOMP Procedure Observations 4 Variables 3 Mean StD x1 x2 x3 Simple Statistics x1 x2 3.000000000 10.00000000 1.414213562 2.82842712 Random Variable 1 Random Variable 2 Random Variable 3 Covariance Matrix x1 x2 2.000000000 3.333333333 3.333333333 8.000000000 1.333333333 4.666666667 Total Variance 1 2 3 x1 x2 x3 x3 11.50000000 2.64575131 17 Eigenvalues of the Covariance Matrix Eigenvalue Difference Proportion 13.2193960 9.8400643 0.7776 3.3793317 2.9780594 0.1988 0.4012723 0.0236 Random Variable 1 Random Variable 2 Random Variable 3 Eigenvectors Prin1 0.291038 0.734249 0.613331 x3 1.333333333 4.666666667 7.000000000 Prin2 0.415039 0.480716 -0.772434 Cumulative 0.7776 0.9764 1.0000 Prin3 0.861998 -0.479364 0.164835 SAS output for Correlation Matrix – Original Random Variables vs. Principal Components: The CORR Procedure 3 With Variables: 3 Variables: Variable Prin1 Prin2 Prin3 x1 x2 x3 N 4 4 4 4 4 4 Prin1 x1 Prin2 x2 Simple Statistics Mean Std Dev 0 1.49314 0 0.81371 0 0.32928 3.00000 1.41421 10.00000 2.82843 11.50000 2.64575 Prin3 x3 Sum 0 0 0 12.00000 40.00000 46.00000 Minimum -2.20299 -0.94739 -0.28331 1.00000 6.00000 9.00000 Pearson Correlation Coefficients, N = 4 Prob > |r| under H0: Rho=0 x1 x2 x3 Prin1 0.86770 0.1323 0.96362 0.0364 0.74027 0.2597 Prin2 -0.45783 0.5422 -0.09890 0.9011 0.66538 0.3346 Prin3 0.19361 0.8064 -0.24832 0.7517 0.09631 0.9037 Maximum 1.11219 0.99579 0.47104 4.00000 12.00000 15.00000
© Copyright 2026 Paperzz