Sparse Approximation Phase Transitions Matrix completion Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop on Sparsity, Compressed Sensing and Applications ———– University of Oxford Joint with Blanchard, Donoho, and Wei Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Three sparse approximation questions to test Sparse approximation: min kxk0 s.t. kAx − bk2 ≤ τ x with A ∈ Rm×n 1. Are there algorithms that have same behaviour for different A? 2. Which algorithm is fastest and with a high recovery probability? Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Three sparse approximation questions to test Sparse approximation: min kxk0 s.t. kAx − bk2 ≤ τ x with A ∈ Rm×n 1. Are there algorithms that have same behaviour for different A? 2. Which algorithm is fastest and with a high recovery probability? Matrix completion: min rank(X ) s.t. kA(X ) − bk2 ≤ τ X with A maps Rm×n to Rp 3. What is largest rank that is recovered with efficient algorithm? Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Three sparse approximation questions to test Sparse approximation: min kxk0 s.t. kAx − bk2 ≤ τ x with A ∈ Rm×n 1. Are there algorithms that have same behaviour for different A? 2. Which algorithm is fastest and with a high recovery probability? Matrix completion: min rank(X ) s.t. kA(X ) − bk2 ≤ τ X with A maps Rm×n to Rp 3. What is largest rank that is recovered with efficient algorithm? Information about each question can be gleaned from large scale empirical testing. Lets use some HPC resources. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Sparse approximation phase transition I Problem characterized by three numbers: k ≤ m ≤ n • n, Signal Length, “Nyquist” sampling rate • m, number of inner product measurements, • k, signal complexity, sparsity, k := minx kxk0 I Mixed under/over-sampling rates compared to naive/optimal Undersampling: δm := m , n Oversampling: ρm := k m I Testing model: For matrix ensemble and algorithm draw A and k-sparse x0 , let Π(k, m, n) be the probability of recovery I For fixed (δm , ρm ), Π(k, m, n) converges to 1 or 0 with increasing m: separated by phase transition curve ρ(δ) I Algorithm with ρ(δ) large, Π(k, m, n) insensitive to matrix? Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Phase Transition: `1 ball, C n I With overwhelming probability on measurements Am,n : for any > 0, as (k, m, n) → ∞ • All k-sparse signals if k/m ≤ ρS (m/n, C )(1 − ) • Most k-sparse signals if k/m ≤ ρW (m/n, C )(1 − ) • Failure typical if k/m ≥ ρW (m/n, C )(1 + ) 1 0.9 0.8 0.7 k m 0.6 0.5 0.4 0.3 !W Recovery: most signals 0.2 !S 0.1 Recovery: all signals 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ = m/n I Asymptotic behaviour δ → 0: ρ(m/n) ∼ [2(e) log(n/m)]−1 Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Phase Transition: Simplex, T n−1 , x ≥ 0 I With overwhelming probability on measurements Am,n : for any > 0, x ≥ 0, as (k, m, n) → ∞ • All k-sparse signals if k/m ≤ ρS (m/n, T )(1 − ) • Most k-sparse signals if k/m ≤ ρW (m/n, T )(1 − ) • Failure typical if k/m ≥ ρW (m/n, T )(1 + ) 1 0.9 0.8 0.7 0.6 k m 0.5 0.4 !W 0.3 Recovery: most signals 0.2 !S 0.1 Recovery: all signals 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ = m/n I Asymptotic behaviour δ → 0: ρ(m/n) ∼ [2(e) log(n/m)]−1 Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs `1 -Weak Phase Transitions: Visual agreement I Testing beyond the proven theory, 6.4 CPU years later... Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs `1 -Weak Phase Transitions: Visual agreement I I I Testing beyond the proven theory, 6.4 CPU years later... Black: Weak phase transition: x ≥ 0 (top), x signed (bot.) Overlaid empirical evidence of 50% success rate: 1 Gaussian Bernoulli Fourier Ternary p=2/3 Ternary p=2/5 Ternary p=1/10 Hadamard Expander p=1/5 Rademacher ρ(δ,Q) 0.9 0.8 0.7 0.6 ρ=k/n 0.5 0.4 0.3 0.2 0.1 0 I I I I 0 0.1 0.2 0.3 0.4 0.5 δ=n/N 0.6 0.7 0.8 0.9 1 Gaussian, Bernoulli, Fourier, Hadamard, Rademacher Ternary (p): P(0) = 1 − p and P(±1) = p/2 Expander (p): dp · ne ones per column, otherwise zeros Rigorous statistical comparison shows n−1/2 convergence Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Bulk Z -scores 5 6 4 4 3 2 2 1 Z−score Z−score 0 0 −1 −2 −2 −4 −3 −4 0.1 0.2 0.3 0.4 0.5 δ=n/N 0.6 0.7 0.8 0.9 1 −6 0.1 0.2 0.3 (a) Bernoulli 0.4 0.5 δ=n/N 0.6 0.7 0.8 0.9 0.7 0.8 0.9 (b) Fourier 4 3 3 2 2 1 1 0 Z−score 0 Z−score −1 −1 −2 −2 −3 −3 −4 0.1 0.2 0.3 0.4 0.5 δ=n/N 0.6 0.7 0.8 (c) Ternary (1/3) I I I 0.9 1 −4 0.1 0.2 0.3 0.4 0.5 δ=n/N 0.6 (d) Rademacher n = 200, n = 400 and n = 1600 Linear trend with δ = m/n, decays at rate n−1/2 Proven for matrices with subgaussian tail, Montanari 2012 Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Which algorithm is fastest and with high phase transition? State of the art algorithms for sparse approximation I Hard Thresholding, Hk (AT b), followed by subspace restricted linear solver: Conjugate Gradient I Normalized IHT: Hk (x t + κAT (b − Ax t )), (Steepest Descent) I Hard Thresholding Pursuit: NIHT with pseudo-inverse I CSAMPSP (hybrid of CoSaMP and Subspace Pursuit) v t+1 = x t+1 = Hαk (x t + κAT (b − Ax t )) It = supp(v t ) ∪ supp(x t ) Join supp. sets −1 T wIt = (AT Least squares fit It AIt ) AIt b t+1 t x = Hβk (w ) Second threshold I SpaRSA [Lee and Wright ’08] Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Which algorithm is fastest and with high phase transition? State of the art algorithms for sparse approximation I Hard Thresholding, Hk (AT b), followed by subspace restricted linear solver: Conjugate Gradient I Normalized IHT: Hk (x t + κAT (b − Ax t )), (Steepest Descent) I Hard Thresholding Pursuit: NIHT with pseudo-inverse I CSAMPSP (hybrid of CoSaMP and Subspace Pursuit) v t+1 = x t+1 = Hαk (x t + κAT (b − Ax t )) It = supp(v t ) ∪ supp(x t ) Join supp. sets −1 T wIt = (AT Least squares fit It AIt ) AIt b t+1 t x = Hβk (w ) Second threshold I I SpaRSA [Lee and Wright ’08] Testing environment with random problem generation, or passing matrix and measurements. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Which algorithm is fastest and with high phase transition? State of the art algorithms for sparse approximation I Hard Thresholding, Hk (AT b), followed by subspace restricted linear solver: Conjugate Gradient I Normalized IHT: Hk (x t + κAT (b − Ax t )), (Steepest Descent) I Hard Thresholding Pursuit: NIHT with pseudo-inverse I CSAMPSP (hybrid of CoSaMP and Subspace Pursuit) v t+1 = x t+1 = Hαk (x t + κAT (b − Ax t )) It = supp(v t ) ∪ supp(x t ) Join supp. sets −1 T wIt = (AT Least squares fit It AIt ) AIt b t+1 t x = Hβk (w ) Second threshold I I I SpaRSA [Lee and Wright ’08] Testing environment with random problem generation, or passing matrix and measurements. Matrix Ensembles I Discrete Cosine Transform, Sparse Matrices, Gaussian Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Ingredients of Greedy CS Algorithms: I Descent: ν t := x t + κAT (b − Ax t ) with κ = kAT (b−Ax t )k22 Λt kAΛt AT (b−Ax t )k22 Λt requires two matvec and one transpose matvec, and vec adds. I Support: identification of the support set for x t+1 = Hk (ν t ), hard thresholding, and calculating κ. Use linear binning for fast parallel order statistic calculation, and only do so when support set could change. Reduced support set time to a small fraction of one DCT matvec time. I Generation: when testing millions of problems the problem generation can become slow, especially using matlab randn. Total time (for large problems) reduced to essentially the matvecs. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Computing environment CPU: I Intel Xeon 5650 (released March 2010) I 6 core, 2.66 GHz I 12 GB of DDR2 PC3-1066, 6.4 GT/s I Matlab 2010a, 64 bit (inherent multi-core threading) GPU: I NVIDIA Tesla c2050 (release April 2010) I 448 Cores, peak performance 1.03 Tflop/s I 3GB GDDR5 (on device memory) I Error-correction Is it faster? Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Multiplicative acceleration factor for NIHT: CPU/GPU matrixEnsemble dct smv gen Jared Tanner n 214 216 218 220 212 214 216 218 212 214 216 218 210 212 214 nonZeros 4 4 4 4 7 7 7 7 Descent 63.21 64.46 54.11 57.94 0.52 1.41 4.29 10.43 0.63 1.86 5.42 10.80 1.06 10.36 16.75 Support 42.16 41.59 38.45 38.82 4.10 14.64 43.04 71.50 3.48 12.86 37.11 55.60 2.07 4.09 6.17 Generation 1.04 1.77 3.20 5.80 32.32 135.08 521.60 1630.08 33.92 142.53 526.82 1556.44 0.34 2.53 5.85 Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Algorithm Selection for DCT, map, n = 216 Algorithm selection map, n=65536 1 NIHT: circle 0.9 HTP: plus CSMPSP: square 0.8 ThresholdCG: times 0.7 0.6 k/m 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n NIHT dominant near phase transition. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Algorithm Selection for DCT, map, n = 218 Algorithm selection map, n=262144 1 NIHT: circle 0.9 HTP: plus CSMPSP: square 0.8 ThresholdCG: times 0.7 0.6 k/m 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n NIHT dominant near phase transition. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Algorithm Selection for DCT, map, n = 220 Algorithm selection map, n=1048576 1 NIHT: circle 0.9 HTP: plus CSMPSP: square 0.8 ThresholdCG: times 0.7 0.6 k/m 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n NIHT dominant near phase transition, though HTP nearly as fast. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs HTP / best time for DCT, n = 220 Time: HTP / fastest algorithm, n=1048576 1 3 0.9 2.8 0.8 2.6 0.7 2.4 0.6 2.2 k/m 0.5 2 0.4 1.8 0.3 1.6 0.2 1.4 0.1 0 1.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 m/n NIHT and HTP have essentially identical average case behaviour. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Best time for DCT, n = 214 Time (ms) of fastest algorithm, n=16384 1 45 0.9 0.8 40 0.7 0.6 35 k/m 0.5 0.4 30 0.3 0.2 25 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Best time for DCT, n = 216 Time (ms) of fastest algorithm, n=65536 1 60 0.9 55 0.8 0.7 50 0.6 45 k/m 0.5 40 0.4 35 0.3 0.2 30 0.1 25 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Best time for DCT, n = 218 Time (ms) of fastest algorithm, n=262144 1 0.9 110 0.8 100 0.7 90 0.6 80 k/m 0.5 70 0.4 60 0.3 0.2 50 0.1 40 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Best time for DCT, n = 220 Time (ms) of fastest algorithm, n=1048576 1 0.9 300 0.8 0.7 250 0.6 k/m 0.5 200 0.4 0.3 150 0.2 0.1 100 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m/n Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion Universality using cluster: embarrassingly Empirical testing of iterative algorithms using GPUs Concentration phenomenon NIHT for DCT, δ = 0.25 exp(β0 +β1 k) 1+exp(β0 +β1 k) , of data collected of about 105 tests I Logit fit, I ρniht W (1/4) ≈ 0.25967 (Note, ρW (1/4, C ) = 0.2674) I Transition width proportional to n−1/2 Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Optimal order recovery - matrix completion I Four defining numbers: r ≤ m ≤ n and p • m × n, Matrix size, mn is “Nyquist” sampling rate • p, number of inner product measurements • r , matrix complexity, rank Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Optimal order recovery - matrix completion I Four defining numbers: r ≤ m ≤ n and p • m × n, Matrix size, mn is “Nyquist” sampling rate • p, number of inner product measurements • r , matrix complexity, rank I For what (r , m, n, p) does an encoder/decoder pair recover a suitable approximation of X from (b, A)? • p = r (m + n − r ) is the optimal oracle rate • p ∼ r (m + n − r ) possible using efficient algorithms Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Optimal order recovery - matrix completion I Four defining numbers: r ≤ m ≤ n and p • m × n, Matrix size, mn is “Nyquist” sampling rate • p, number of inner product measurements • r , matrix complexity, rank I For what (r , m, n, p) does an encoder/decoder pair recover a suitable approximation of X from (b, A)? • p = r (m + n − r ) is the optimal oracle rate • p ∼ r (m + n − r ) possible using efficient algorithms I Mixed under/over-sampling rates compared to naive/optimal Undersampling: δ := Jared Tanner p , mn Oversampling: ρ := r (m + n − r ) p Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Largest rank recoverable with efficient algorithm I Compresses sensing algorithms all behave about the same I How about matrix completion, do simple methods work well? I NIHT: alternating projection with column subspace stepsize X j+1 = Hr (X j + µj A∗ (b − A(X j ))) with µj := kPUj A∗ (b − A(X j ))k2F ||A(PUj A∗ (b − A(X j )))||22 where PUj := Uj Uj∗ . (column & row projection doesn’t work.) I Contrast NIHT with nuclear norm minimization via semi-definite programming and simple Power Factorization. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Three matrix completion algorithms to compare I Nuclear norm minimization (extension of `1 in CS) min kX k∗ := X X I σi (X ) subject to A(X ) = b. NIHT for matrix completion (how to select µj ) X j+1 = Hr (X j + µj A∗ (b − A(X j ))) I Power Factorization min kRV k2 R,V subject to RV(X ) := A(X ) = b. Benchmark algorithms ability to recover low rank matrices, and contrast speed and memory requirements. 4.3 CPU years later... Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 NIHT vs “state of the art”, Gaussian sensing (m = n = 80) Recovery phase transition with gamma = 1.000 1 0.9 0.8 rho 0.7 0.6 0.5 0.4 NIHT: Column Projection (0.999) with Gaussian Measurements Power Factorization with Gaussian Measurements Nuclear Norm Minimization with Gaussian Measurements 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p/mn I Simple NIHT has nearly optimal recovery ability I Convex relaxation consistent with theory of Hassibi et al. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 NIHT vs “state of the art”, entry sensing (m = n = 800) Recovery phase transition with gamma = 1.000 1 0.9 0.8 0.7 rho 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1 NIHT: Column Projection (0.999) with Entry Measurements Power Factorization with Entry Measurements Nuclear Norm Minimization with Entry Measurements 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p/mn I Simple NIHT has nearly optimal recovery ability I Convex relaxation is slow and with a small recovery region. Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 Conclusions I There are many algorithms for sparse approximation, and matrix completion, all proven to have the optimal order recovery m ≥ Const.k log(n/m), and p ≥ Const.r (m + n − r ). I Empirical testing can suggest conjectures and point us to “best” methods. I Use of high performance computing tools allows testing large numbers of problems, and problems quickly: GPU software solves problems of size n = 106 in under one second. Two new findings: I Near universality of CS algorithms phase transitions, `1 I Convexification less effective for matrix completion; simple methods for min rank have higher phase transition Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Sparse Approximation Phase Transitions Matrix completion What is largest rank recoverable with efficient algorithm Matrix completion with ρ near 1 References I Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing (2009) Phil. Trans. Roy. Soc. A, Donoho and Tanner. I GPU Accelerated Greedy Algorithms for compressed sensing (2012), Blanchard and Tanner. I Normalized iterative hard thresholding for matrix completion (2012), Tanner and Wei. Thanks for your time Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms
© Copyright 2026 Paperzz