Principal Components Analysis (PCA) 273A Intro Machine Learning Principal Components Analysis • We search for those directions in space that have the highest variance. • We then project the data onto the subspace of highest variance. • This structure is encoded in the sample co-variance of the data: 1 N 1 C N N x i 1 i N T ( x )( x ) i i i 1 • Note that PCA is a unsupervised learning method (why?) PCA • We want to find the eigenvectors and eigenvalues of this covariance: C U U T 1 2 u1 u2 0 0 eigenvalue = variance in direction eigenvector ( in matlab [U,L]=eig(C) ) ud u2 d Orthogonal, unit-length eigenvectors. u1 PCA properties d C i (ui uiT ) i 1 d d Cu j i (ui ui )u j i ui (uiT u j ) j u j T i 1 i 1 U T U UU T I C U1:k 1:kU1:kT yi U1:kT xi 1 Cy N N U i 1 (U eigevectors) T 1:k (u orthonormal U rotation) (rank-k approximation) (projection) xi x U1:k U T i 1:3 T 1:k 1 N xi xiT N i 1 U1:3 T T U1:k U1:k U U U1:k 1:k 1 0 2 0 3 u1 u2 u3 PCA properties C 1:k is the optimal rank-k approximation of C in Frobenius norm. I.e. it minimizes the cost-function: d d (C i 1 j 1 k ij Ail AljT )2 with A U 1 2 l 1 Note that there are infinite solutions that minimize this norm. T If A is a solution, then AR with RR I is also a solution. The solution provided by PCA is unique because U is orthogonal and ordered by largest eigenvalue. Solution is also nested: if I solve for a rank-k+1 approximation, I will find that the first k eigenvectors are those found by an rank-k approximation (etc.)
© Copyright 2026 Paperzz