The central idea of principal component analysis (PCA) is

“The central idea of principal component analysis (PCA) is to
reduce the dimensionality of a data set consisting of a large number
of interrelated variables, while retaining as much as possible of the
variation present in the data set.” – I. Jolliffe
Agenda
I
Motivations for PCA
I
Analysis of Covariance Structure
I
PCA as Variance-Maximization
I
Linear Dimension Reduction
I
MSE optimality of PCA as a linear dimension reduction
technique
I
Example
Principal Components Analysis
I
Two high–level motivations for PCA
I
I
Explanation of covariance structure – Are there important
features in the variation of the data?
Dimension reduction – Can we reduce the p variables into a
smaller number d without losing much information?
Covariance Structure
I
The covariance matrix can be difficult to interpret when there
are a large number of variables (large p).
I
PCA provides a smaller number (d << p) of derived variables
that preserves most of the information given by the covariance
matrix.
Variance Maximization Idea
I
Let X be a p−variate random vector with mean µ and
covariance matrix Σ.
I
Look for linear combinations of X
T
uT
1 X , u2 X , . . .
that
I
I
T
are uncorrelated: Cov(u T
i X , u j X ) = 0, i 6= j
have maximal variance: maximize Var(u T
i X)
Example 1
300
Generate X from MVN2 (0, Σ) where
1 0.9
.
0.9 1
●
250
200
150
Frequency
100
50
0
−2
0
X2
2
4
●
●
●●
●
●●
●
● ●●
● ●
●●
●
●●●
●
●●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●● ●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●●
●●
●●
●●●● ●
●●
●
●● ●●
●
●
−4
I
−4
−2
0
2
X1
(a) X1 vs X2
4
−6
−4
−2
0
2
linear comb of (X1, X2)
(c) u T X
4
6
Definition
I
Definition: The principal components are the uncorrelated
linear combinations
T
T
uT
1 X , u2 X , . . . , up X
with maximal variance subject to the restriction ||u i ||22 = 1 for
i = 1, . . . , p.
Variance Maximization (step 1 of 3)
T
Var(u T
i X ) = u i Σu i
T
T
Cov(u T
i X , u j X ) = u i Σu j .
I
Let’s find the 1st principal component direction:
max u T Σu
||u ||22 =1
I
The maximum is equal to the largest eigenvalue of Σ, and
attained by the corresponding eigenvector v1 .
max u T Σu = λ1 = v T
1 Σv 1 .
||u ||22 =1
Variance Maximization (step 2 of 3)
I
To find the remaining directions, we use the spectral
decomposition of Σ:
Σ=
p
X
T
λi v i v T
i = V ΛV
i=1
λ1 ≥ . . . ≥ λp ,
I
vT
i v j = 1(i = j).
Recall from linear algebra that
max
||u ||22 =1,u ⊥v 1 ,v 2 ,...,v k−1
u T Σu = λk = v T
k Σv k
Variance Maximization (step 3 of 3)
I
Now let’s find the 2nd principal component direction
||u
max
u T Σu =
T
u Σv 1 =0
||22 =1,
=
||u
max
u T Σu
T
u v 1 =0
||22 =1,
max
||u ||22 =1,u T ⊥v 1
u T Σu
= λ2
= vT
2 Σv 2 .
I
This generalizes to the k−th direction provided λk > 0
max
||u ||22 =1,u T Σv 1 =0,...,u T Σv k−1 =0
=
||u
||22 =1,
u T Σu
max
u T Σu
T
u v 1 ,...,u ⊥v k−1
= λk
T⊥
= vT
k Σv k .
Principal Components
I
Let the spectral decomposition of Σ be given by
Σ = V ΛV T =
p
X
λi v i v T
i
i=1
I
The principal component directions are the eigenvectors
v 1, . . . , v p .
I
The principal components are the coordinates of
T
T
Y = (v T
1 X, . . . , vp X)
Properties of PCs
Var(Y ) = Var(V X ) = Λ
Cov(X , Y ) = Cov(X , V X ) = V Λ.
I
Principal components are uncorrelated
I
Their variances are the eigenvalues
Var(v T
i X ) = λi
Proportion of Variance Explained
I
By the i−th principle component
Var(v T
λi
i X)
=
Tr(Var(X ))
λ1 + λ2 + . . . + λp
I
By the first i principle components,
λ1 + λ2 + . . . + λi
λ1 + λ2 + . . . + λp
I
If the eigenvalues decay quickly, then the first i PCs explain
most of the variance.
Example 2
2
1
●
●
0
eigenvalues
3
●
1
2
3
number
●
●
●
4
5
6
0.0
0.2
0.4
0.6
proportion of variance explained
0.8
Example 2 (Contd)
●
●
●
1
2
3
PC
●
●
●
4
5
6
0.88
0.92
0.94
0.96
0.98
proportion of variance explained
0.90
1.00
Example 2 (Contd)
●
●
●
1
2
3
PC
●
●
●
4
5
6
I
MSE Optimality of Principal Components Analysis
Ellipsoids
I
I
The set {x : x T Σ−1 x = 1} is an ellipsoid
This ellipsoid gives a geometric picture of the variation of X ;
I
I
Principal axes are the principal modes of variation of X
(Radius)2 of axis proportional to Var(PC )
Example 2
−2
0
2
−4
0
2.0
2.0
−2
2
2
0
X2
4
−4
X1
●
●
−4
4
X1
−2
0
2
4
X1
2.0
−4
●
●
●
●
●
● ●●
●●●● ●
●●
●
●
● ●● ●
●
●●
●●
●
●●
●●
● ●●
●
●
●●
●
●● ●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●●●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●●●
●
●
●●●●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●● ●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●● ●●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●●●●
●
● ●●
●●
●
●
●●
●●● ●●
●
● ●
●● ●
●
●
●
●●
●●● ●
●● ●●
● ●
●
● ●●●●
● ●
● ● ●●
●
●● ● ● ●
●
●
●
● ●
●
●
−2
2
0
X2
−4
−4
●
4
4
●
●
●●
● ●
● ●●
●
● ● ●
●● ●●
● ●
●
●
●
● ●●
●
●
●
●
●● ●●
●
●
●●
●
●
●
● ●●
●
●
●●
●
●
● ●
●
●
●
●●
●● ●
●●
●
●
●●
●
●
●
●●●
●●●●
●●
●●
● ●●
●
●
●
●
●●
●
●
●●
●●
●● ●
●
●
●
●●
●
●
●
●●●●
●●●●●●
●
●●
●
●●
●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
● ●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
● ●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●●
●
●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
● ●●● ●
●
●●●
●●
●
●
●●●
●
●
●
●
● ●
●● ●
●●● ● ●
●
●
●
●
●
●
●●
●
−2
0
−2
X2
2
4
●
●
●●
●
●●
●
● ●●
● ●
●●
●
●●●
●
●●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●●
●
●●
●
●●● ●
●●
●●
● ●●
●●●
●
●
●●
●
●● ●●
●
●
●
●
2
pc
1.5
0.5
0.0
0.0
1
●
1.0
accumulative prop
1.5
1.0
accumulative prop
●
0.5
1.0
0.5
0.0
accumulative prop
1.5
●
1
2
pc
1
2
pc
Linear Dimension Reduction
I
I
Project data onto lower d -dimensional subspace without losing
important information
Enables
I
I
simplification of further analyses (use d −dim as input for
subsequent analysis)
visualization (when d = 2 or 3)
Orthogonal Projection
I
Consider orthogonal projections of the data onto a
d −dimensional subspace.
I
All orthogonal projectors are of the form
UU T
where U is a p × d orthonormal matrix
How to choose the linear transformation?
I
Want c and U (p × d orthonormal) so that
X ≈ X̂ = c + UY = c + UU T (X − c)
I
Least squares–Choose c and U to minimize:
E||X − X̂ ||22 = E||X − {c + UU T (X − c)}||22
= E||(X − c) − UU T (X − c)||22
= E||(Ip − UU T )(X − c)}||22
Minimization of the MSE (step 1 of 4)
First minimize with respect to c: Reparameterize c = µ − t, where
µ = E(X )
E||X − X̂ ||22 = E||(Ip − UU T )(X − c)||22
= E||(Ip − UU T )(X − µ) + (Ip − UU T )t||22
= E||(Ip − UU T )(X − µ)||22 + ||(Ip − UU T )t||22
+2E < (Ip − UU T )(X − µ), (Ip − UU T )t >
= E||(Ip − UU T )(X − µ)||22 + ||(Ip − UU T )t||22
Minimization of the MSE (step 2 of 4)
E||X − X̂ ||22 = E||(Ip − UU T )(X − µ)||22 + ||(Ip − UU T )t||22
Both terms on the right-hand side are ≥ 0. The second is
minimized when t = 0. So c = µ is optimal:
µ = arg min E||X − X̂ ||22
c
Minimization of the MSE (step 3 of 4)
Some algebra will help us minimize over U:
min E||X − X̂ ||22 = E||(Ip − UU T )(X − µ)||22
c
= ETr{(X − µ)T (Ip − UU T )T (Ip − UU T )(X − µ)}
= ETr{(Ip − UU T )T (Ip − UU T )(X − µ)(X − µ)T }
= Tr{(Ip − UU T )T (Ip − UU T )Σ}
= Tr(Σ) − Tr(UU T Σ)
= Tr(Σ) − Tr(U T ΣU)
Minimization of the MSE (step 3 of 4)
I
So far we have shown that
min E||X − X̂ ||22 = Tr(Σ) − Tr(U T ΣU)
c ∈Rp
I
with c = µ optimal.
It remains for us to minimize over p × d orthonormal matrices
U:
min
minp ||X − X̂ ||22
T
c
U :U U =Id ∈R
=
min
Tr(Σ) − Tr(U T ΣU)
T
U :U U =Id
= Tr(Σ) −
max
Tr(U T ΣU)
U :U T U =Id
I
Theorem (Ky Fan (1949)): Let Σ be a p × p real symmetric
matrix with eigenvalues
λ1 ≥ . . . ≥ λp
and a corresponding set of orthonormal eigenvectors
v 1 , . . . , v p . Then
T
max Tr(U ΣU) =
U T U =Id
d
X
λi ,
i=1
with a maximizer given by V 1:d = (v 1 , . . . , v d ).
Minimization of the MSE (step 4 of 4)
Use the spectral decomposition of Σ
Σ = V ΛV T =
p
X
i=1
λi v i v T
i
Minimization of the MSE (step 4 of 4)
Use the spectral decomposition of Σ
Σ = V ΛV
T
=
p
X
λi v i v T
i
i=1
and then apply the Ky Fan theorem to get
min min ||X − X̂ ||22 = Tr(Σ) − max Tr(U T ΣU)
U U =Id c
U T U =Id
T
=
=
p
X
λi −
i=1
p
X
i=d+1
d
X
i=1
λi
λi
MSE Optimal Linear Dimension Reduction
I
The eigenvectors associated with the d largest eigenvalues of
Σ are called the leading eigenvectors of Σ.
I
MSE optimal linear dimension reduction is projection onto the
subspace spanned by the leading eigenvectors of Σ.
MSE Optimal Linear Dimension Reduction (contd)
I
Let Y 1:d = V 1:d (X − µ) be the vector whose coordinates are
the first d principal components of X .
I
The MSE optimal rank-d approximation of X is given by
T
µ + V1:d V1:d
(X − µ) = µ + V 1:d Y 1:d
P
with optimal MSE pi=d+1 λi .
I
If the eigenvalues of the population covariance matrix decay
rapidly, then we can project X onto a lower-dimensional
subspace without losing much information (as measured by
MSE).
Example 2 (Contd)
Data: 6 variables, 961 observations
10
12
●
8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
3
4
6
x_ij
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
0
I
●
●
●
●
●
Var
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
4
5
6
Example 2 (Contd)
4
6
x_ij
8
10
12
0-dimensional projection of X
2
●
0
I
●
●
1
2
●
3
4
Var
●
5
●
6
Example 2 (Contd)
10
12
1-dimensional projection of X
●
●
●
●
●
1
2
4
6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
hat{x}_ij
8
●
●
●
●
0
I
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
3
4
5
Var
●
●
●
●
6
Example 2 (Contd)
2-dimensional projection of X
10
12
●
1
2
4
6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
hat{x}_ij
8
●
●
●
●
0
I
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
4
5
6
●
3
Var
Example 2 (Contd)
3-dimensional projection of X
10
12
●
6
4
2
hat{x}_ij
8
●
●
●
●
0
I
●
●
●
●
●
●
●
●
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
3
●
●
●
●
●
Var
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
4
5
6
Population PCA: High–Level Summary
I
If eigenvalues of population covariance matrix decay rapidly,
then
I
I
Covariance structure of X is explained by a smaller number of
variables (the principal components)
X can be approximated by a lower-dimensional projection
(subspace spanned by the PC directions)