PCA with Outliers and Missing Data Sujay Sanghavi Electrical and Computer Engg. University of Texas, Austin Joint w/ C. Caramanis, Y. Chen, H. Xu Outline PCA and Outliers - Why SVD fails - Corrupted features vs. corrupted points Our idea + Algorithms Results - Full observation - Missing Data Framework for Robustness in High Dimensions Principal Components Analysis Given points that lie on/near a Lower dimensional subspace, find this subspace. Classical technique: 1. Organize points as matrix 2. Take SVD 3. Top singular vectors span space Data Points Low Rank Matrix Fragility Gross errors of even one/few points can completely throw off PCA Reason: Classical PCA minimizes `2 error, which is susceptible to gross outliers Two types of gross errors Corrupted Features - Individual entries corrupted .. and missing data versions of both Outliers - Entire columns corrupted PCA with Outliers Points Objective: find identities of outliers (and hence col. space of true matrix) Outlier Pursuit - Idea Points Standard PCA min ⇥M L L⇥F s.t. rank(L) = r min ⇥M L,C L C⇥F s.t. rank(L) = r col(C) = c Outlier Pursuit - Method Points We propose: min ⇥M L,C L C⇥F + Convex surrogate for Rank constraint 1 ⇥L⇥ + 2 ⇥C⇥1,2 Convex surrogate for Column-sparsity When does it (not) work ? When certain directions of column space of L poorly represented L =U V⇥ This vector has large inner product with some coordinate axes max V ei i is large Results Assumption: Columns of true L are incoherent: max ⇥V ei ⇥ 2 i Note: r First consider: Noiseless case min kLk⇤ + kCk1,2 L,C s.t. L + C = M µr µr n n Results Assumption: Columns of true L are incoherent: max ⇥V ei ⇥ 2 i Note: r µr n µr n Theorem: (noiseless case) Our convex program can identify upto a fraction 1 Outer bound: of outliers as long as c ⇥ µr 1 makes the problem un-identifiable > r+1 ⇥= 3 7 n Proof Technique A point x is the optimum of a convex function f Zero lies in the (sub) gradient of f at x Steps: 1. guess a “nice” point, -- oracle problem 2. show it is the optimum by showing zero is in subgradient f (x) Proof Technique Guessing a “nice” optimum (Note: in “single structure” problems like matrix completion, compressed sensing etc., this is not an issue) Oracle Problem: min ⇥M L,C L C⇥F + s.t. ColSupp(C) ColSpace(L) (L, C) is, by definition, a nice point. 1 ⇥L⇥ + 2 ⇥C⇥1,2 ColSupp(C ) ColSpace(L ) Rest of proof: showing it is the optimum of original program, under our assumption. 21 21 19 19 rank r rank r Performance 17 17 15 15 13 13 0.05 0.1 0.15 0.2 0.25 0.3 fraction of outliers ! L + C formulation 0.35 0.4 0.05 0.1 0.15 0.2 0.25 0.3 fraction of ourliers ! 0.35 0.4 L + S formulation ( from [Chandrasekaran et. al.], [Candes,et. al.] ) Another view… Mean is solution of min x (xi Standard PCA of M is solution of 2 x) X j i Fragile: Can be easily skewed by one / few points x i |xi L j k2 rank(L) r Our method is (convex rel. of) Median is solution of min kMj x| Robust: skewing requires Error in constant fraction of pts X j kMj Lj k rank(L) r Collaborative Filtering w/ Adversaries Collaborative Filtering w/ Adversaries Users Low-rank matrix that - Is partiallly observed - Has some corrupted columns == outliers with missing data ! Our setting: - Good users == random sampling of incoherent matrix (as in matrix completion) - Manipulators == completely arbitrary sampling, values Outlier Pursuit with Missing Data min ||L|| + ||C||1,2 s.t. lij + cij = mij for observed (i, j) Now: need row space to be incoherent as well - since we are doing matrix completion and manipulator identification Our Result Theorem: Convex program optimum (L, C) is such that and the support of C L has the correct column space is exactly the set of manipulators, whp, provided Sampling density Fraction of users that are manipulators µ2 r2 log3 (4n) c1 p 1 Note: no assumptions on manipulators ⇥2 ⇥ c2 (1 + µrp )µ2 r2 log6 (4n) n p 1 fraction of observed entries " fraction of observed entries " Robust Collaborative Filtering 0.8 0.6 0.45 0.3 1 0.8 0.6 0.45 0.3 0.02 0.05 0.1 0.2 fraction of manipulators ! 0.4 0.02 0.05 0.1 0.2 fraction of manipulators ! Algo: Partially observed Algo: Partially observed Low-rank + Column-sparse Sparse + Low-rank 0.4 More generally … Several methods in High-dim. Statistics min X L(y, A; X) + Loss function r(X) regularizer Our approach: min X1 ,X2 L(y, A; X1 + X2 ) + (same) Loss function 1 r1 (X1 ) 2 r2 (X2 ) Weighted sum of regularizers Yields robustness + flexibility in several settings. Today: PCA wit Outliers + missing data + Latent factors in time series (ICML’12) Matrix completion from Errors and Erasures (ISIT 2011, SIAM J. Optim. 2011) Ali Jalali Yudong Chen Graph clustering (ICML 2011) Robust Recommender Systems (ICML 2011) Multiple Sparse Regression (NIPS 2010) PCA that is robust to Outliers (NIPS 2010, Trans IT) All papers on my website, Arxiv. Huan Xu Constantine Caramanis Pradeep Ravikumar
© Copyright 2026 Paperzz