Theoretical Analysis of PCA for Heteroscedastic Data David Hong Laura Balzano Jeffrey A. Fessler Department of EECS University of Michigan Ann Arbor, Michigan 48105 Department of EECS University of Michigan Ann Arbor, Michigan 48105 Department of EECS University of Michigan Ann Arbor, Michigan 48105 I. I NTRODUCTION IV. D EPENDENCE ON PARAMETERS Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of applications [1], [2], [3] and problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. Effective use of PCA requires a rigorous understanding of its performance. This work provides an analysis of PCA for samples with heteroscedastic noise, that is, samples that have nonuniform noise variances. In particular, we provide a simple asymptotic prediction of the recovery of a low-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, b) reasoning about the asymptotic performance (with heteroscedasticity such as outliers), and (c) a deeper understanding that PCA has best performance when the noise is homoscedastic (all points share the same noise level). The main result (Theorem 1) enables us to reason about how PCA depends on the various parameters. Dependence on sample-to-dimension ratio and average noise variance. Figure 3 shows the asymptotic recovery as we sweep over c > 1 and σ ˉ 2 > 0 where the subspace is one-dimensional with ˉ 2 /(2p1 ) amplitude θ1 = 1 and there are two noise levels σ12 = σ and σ22 = σ ˉ 2 /(2p2 ) with p = (0.8, 0.2). As expected, the recovery generally improves with increasing c (more samples per dimension) and decreasing σ ˉ 2 (less noise on average). It also has the same general shape as the homoscedastic case (analyzed as an example in [5]) but with more samples generally needed to achieve the same recovery. Dependence on sample proportions. The red curves in Figure 1 show the asymptotic recovery as p2 ∈ (0, 1) varies with p1 = 1 −p2 , c = 10, θ1 = 1, and σ = (1.8, 0.2). As expected, recovery generally improves with increasing p2 ; having more low noise samples is better. Dependence on noise variances. Figure 4 shows the asymptotic recovery as σ12 and σ22 vary with c = 10, p = (0.7, 0.3) and θ = 1. As expected, the recovery generally improves with decreasing noise variances. Interestingly, for a fixed average noise variance (i.e., along the dotted line in the plot), the best recovery occurs when σ12 = σ22 . It turns out this is true in general, a fact which we next state. II. M ODEL FOR H ETEROSCEDASTIC DATA We model n heteroscedastic samples y1 , . . . , yn ∈ Rd from a k dimensional subspace Ũ ∈ Rd×k as yi = ŨΘzi + ηi εi (1) where k×k is a diagonal matrix of amplitudes θ1 , . . . , θk , • Θ ∈ R+ • ηi ∈ R+ are the noise standard deviations, d×k is a matrix representation of the subspace basis, • Ũ ∈ R iid • zi ∼ N (0, Ik ) are independent coefficient vectors, iid • εi ∼ N (0, Id ) are independent noise vectors, and the L noise levels σ1 , . . . , σL occur in proportions p1 , . . . , pL (i.e., p1 of the samples have ηi = σ1 , p2 have ηi = σ2 and so on). Remark 1: The following results also hold without modification for distributions on zi and εi satisfying a certain set of technical conditions (described in [4]). We present Gaussian here for simplicity. III. M AIN R ESULT a.s. −→ n, d → ∞ n/d = c if A (β1 ) , . . . , A (βk ) > 0 where A (x) := 1 − c L X `=1 p` σ`4 , (x − σ`2 )2 k 1 X A (βi ) k βi Bi0 (βi ) L X `=1 PL i=1 p σ 2 is the average noise variance. The bound is `=1 ` ` 2 (i.e., the noise is homoscedastic). when σ12 = ∙ ∙ ∙ = σL where σ ˉ := attained when Remark 2: It follows that the asymptotic recovery for a fixed average noise variance is maximized when the noise is homoscedastic. Thus, using average noise variance to predict performance gives an optimistic proxy; actual recovery will be worse than such predictions for heteroscedastic data. Our expression (2) is more accurate. VI. C ONCLUSION i=1 Bi (x) := 1 − cθi2 and βi is the largest real root of Bi . (2) k k c−σ ˉ 4 /θi4 1 X A (βi ) 1X ≤ max 0, βi Bi0 (βi ) k k c+σ ˉ 2 /θi2 i=1 2 The following theorem provides our main 2 result: a simple expression for the asymptotic recovery k1 ÛT ŨF of the subspace Ũ. Theorem 1: For fixed samples-to-dimension ratio c > 1, the PCA estimate Û is such that 1 ÛT Ũ2 F k V. A N UPPER BOUND ON PERFORMANCE The following theorem provides an upper bound on the asymptotic recovery of the subspace Ũ. The bound is a function of the average noise variance and is attained when the noise is homoscedastic. Theorem 2: If A (βi ) ≥ 0, the asymptotic recovery is bounded as: p` x − σ`2 Figure 1 shows simulations displaying these results for L = 2. Figure 2 illustrates the behavior of the largest real root of Bi in relation to the noise levels. This work provides a simple expression for the asymptotic recovery of a subspace from heteroscedastic samples by PCA. We use the result to gain insights about the performance of PCA as a function of the parameters and find an upper bound that shows the performance for a fixed average noise variance is optimal when the noise is homoscedastic. There are many avenues for potential extensions and future work, including further study of the algebraic structure of the expressions in the prediction, considering a weighted version of PCA, and analyzing the non-asymptotic recovery. 0.8 0.6 0.6 0.4 0.2 0 σ22 0.4 0.2 0 0 1 1/2 p2 (a) Results for d = 102 , n = 103 from 10000 trials. 0 1 1/2 p2 (b) Results for d = 103 , n = 104 from 1000 trials. Fig. 1: Numerical simulation results for c = 10, θ1 = 1, σ = (1.8, 0.2) where p2 is swept from 0 to 1 with p1 = 1 − p2 . Simulation mean (blue curve) and interquartile interval (light blue ribbon) shown with asymptotic prediction (red curve). Going from Figure 1a to Figure 1b, we increase the problem size while keeping the same model parameters. The simulation mean moves closer to the asymptotic prediction and the interquartile range shrinks, indicating concentration to the asymptotic prediction. PL p` `=1 x−σ 2 σ22 1 cθi2 x σ12 βi Fig. 2: Location of the largest real root βi of Bi (x) for σ12 = 2.04, σ22 = 0.874, c = 10, p = (0.7, 0.3) and θi = 1. 1 c 0.6 10 0.4 5 1 1 kÛ T Ũ k2F k 0.8 15 0.2 0 1 2 3 4 5 1 4 0.8 3 0.6 2 0.4 1 0.2 1 2 σ12 3 4 0 Fig. 4: Asymptotic recovery as a function of noise variances σ12 and σ22 with c = 10, p = (0.7, 0.3), θ1 = 1. Namely, 70% of samples have noise variance σ12 and 30% have noise variance σ22 . Contours are overlaid in black and the region where A(βi ) ≤ 0 is shown as zero. The dotted line indicates points with the same average noise variance. As observed in Section IV, the best performance along the line occurs when σ12 = σ22 , and this is true in general as discussed in Section V. R EFERENCES ` 20 5 1 kÛ T Ũ k2F k 0.8 1 kÛ T Ũ k2F k 1.0 1 kÛ T Ũ k2F k 1.0 0 σ̄ 2 ˉ2 Fig. 3: Asymptotic recovery as a function of average noise variances σ ˉ 2 /(2p1 ), σ22 = and sample-to-dimension ratio c with θ1 = 1, σ12 = σ σ ˉ 2 /(2p2 ) and p = (0.8, 0.2). Contours are overlaid in black and the region where A(βi ) ≤ 0 is shown as zero. [1] B. A. Ardekani, J. Kershaw, K. Kashikura, and I. Kanno, “Activation detection in functional MRI using subspace modeling and maximum likelihood estimation,” IEEE Transactions on Medical Imaging, vol. 18, no. 2, pp. 101–114, 1999. [2] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-wide traffic anomalies,” in Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’04, 2004, pp. 219–230. [3] N. Sharma and K. Saroha, “A novel dimensionality reduction method for cancer dataset using PCA and feature ranking,” in Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on, 2015, pp. 2261–2264. [4] D. Hong, L. Balzano, and J. A. Fessler, “Towards a theoretical analysis of pca for heteroscedastic data,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2016. [5] F. Benaych-Georges and R. R. Nadakuditi, “The singular values and vectors of low rank perturbations of large rectangular random matrices,” Journal of Multivariate Analysis, vol. 111, pp. 120 – 135, 2012.
© Copyright 2026 Paperzz