ppt

Principal Components Analysis
(PCA)
273A Intro Machine Learning
Principal Components Analysis
• We search for those directions in space that have the highest variance.
• We then project the data onto the subspace of highest variance.
• This structure is encoded in the sample co-variance of the data:
1

N
1
C 
N
N
x
i 1
i
N
T
(
x


)(
x


)
 i
i
i 1
• Note that PCA is a unsupervised learning method (why?)
PCA
• We want to find the eigenvectors and eigenvalues of this covariance:
C  U U T
1
2
u1 u2
0
0
eigenvalue = variance
in direction eigenvector
( in matlab [U,L]=eig(C) )
ud
u2
d
Orthogonal, unit-length eigenvectors.
u1
PCA properties
d
C   i (ui uiT )
i 1
d
d
Cu j   i (ui ui )u j   i ui (uiT u j )  j u j
T
i 1
i 1
U T U  UU T  I
C  U1:k 1:kU1:kT
yi  U1:kT xi
1
Cy 
N
N
U
i 1
(U eigevectors)
T
1:k
(u orthonormal  U rotation)
(rank-k approximation)
(projection)
xi x U1:k  U
T
i
1:3 
T
1:k
1 N
xi xiT


N  i 1
U1:3 

T
T
U1:k  U1:k U U U1:k  1:k

1
0
2
0
3
u1 u2 u3
PCA properties
C 1:k
is the optimal rank-k approximation of C in Frobenius norm.
I.e. it minimizes the cost-function:
d
d
 (C
i 1 j 1
k
ij
  Ail AljT )2
with
A U 
1
2
l 1
Note that there are infinite solutions that minimize this norm.
T
If A is a solution, then AR with RR  I is also a solution.
The solution provided by PCA is unique because U is orthogonal and ordered
by largest eigenvalue.
Solution is also nested: if I solve for a rank-k+1 approximation, I will find that
the first k eigenvectors are those found by an rank-k approximation (etc.)