Sparse recovery using sparse random matrices

Sparse Recovery Using
Sparse (Random) Matrices
Piotr Indyk
MIT
Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic
Goal of this talk
“Stay awake until the end”
Linear Compression
(learning Fourier coeffs, linear sketching, finite rate of innovation,
compressed sensing...)
•
Setup:
A
– Data/signal in n-dimensional space : x
E.g., x is an 1000x1000 image  n=1000,000
– Goal: compress x into a “sketch” Ax ,
where A is a m x n “sketch matrix”, m << n
•
Requirements:
8
– Plan A: want to recover x from Ax
• Impossible: undetermined system of equations
– Plan B: want to recover an “approximation” x* of x
7
6
x
Want:
– Good compression (small m)
– Efficient algorithms for encoding and recovery
•
Why linear compression ?
5
4
3
2
• Sparsity parameter k
• Want x* such that ||x*-x||p C(k) ||x’-x||q ( lp/lq guarantee )
over all x’ that are k-sparse (at most k non-zero entries)
• The best x* contains k coordinates of x with the largest abs
value
•
x
1
0
8
7
6
5
x*
4
3
2
1
0
k=2
= Ax
Linear compression: applications
– Efficient increments:
A(x+) = Ax + A
• Single pixel camera
[Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly,
Baraniuk’06]
• Pooling, Microarray
Experiments [Kainkaryam, Woolf], [Hassibi et
al], [Dai-Sheikh, Milenkovic, Baraniuk]
destination
source
• Data stream algorithms
(e.g. for network monitoring)
Constructing matrix A
• Choose encoding matrix A at random
(the algorithms for recovering x* are more complex)
– Sparse matrices:
• Data stream algorithms
• Coding theory (LDPCs)
– Dense matrices:
• Compressed sensing
• Complexity/learning theory
(Fourier matrices)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• “Traditional” tradeoffs:
– Sparse: computationally more efficient, explicit
– Dense: shorter sketches
• Goal: the “best of both worlds”
Prior and New Results
Paper
Rand.
/ Det.
Sketch
length
Encode
time
Column
sparsity
Recovery time
Approx
Scale: Excellent Very Good
Good
Fair
Prior and New Results
“state
of art”
Paper
R/
D
Sketch length
Encode
time
Column
sparsity
Recovery time
Approx
[CCF’02],
[CM’06]
R
k log n
n log n
log n
n log n
l2 / l2
R
k logc n
n logc n
logc n
k logc n
l2 / l2
[CM’04]
R
k log n
n log n
log n
n log n
l1 / l1
R
k logc n
n logc n
logc n
k logc n
l1 / l1
[CRT’04]
[RV’05]
D
k log(n/k)
nk log(n/k)
k log(n/k)
nc
l2 / l1
D
k logc n
n log n
k logc n
nc
l2 / l1
[GSTV’06]
[GSTV’07]
D
k logc n
n logc n
logc n
k logc n
l1 / l1
D
k logc n
n logc n
k logc n
k2 logc n
l2 / l1
[BGIKS’08]
D
k log(n/k)
n log(n/k)
log(n/k)
nc
l1 / l1
[GLR’08]
D
k lognlogloglogn
kn1-a
n1-a
nc
l2 / l1
[NV’07], [DM’08], [NT’08],
[BD’08], [GK’09], …
D
k log(n/k)
nk log(n/k)
k log(n/k)
nk log(n/k) * T
l2 / l1
D
k logc n
n log n
k logc n
n log n * T
l2 / l1
[IR’08], [BIR’08]
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k)
l1 / l1
D
k log(n/k)
n log(n/k)
log(n/k)
n log(n/k) * T
l1 / l1
[BI’09]
D
k log(n/k)
n log(n/k)
log(n/k)
nlog(n/k)*T*stuff
l1 / l1
[GLSP’09]
R
k log(n/k)
n logc n
logc n
k logc n
l2 / l1
Recovery “in principle”
(when is a matrix “good”)
dense
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
•
•
vs.
sparse
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Restricted Isometry Property (RIP) * - sufficient property of a dense matrix A:
 is k-sparse  ||||2 ||A||2  C ||||2
Holds w.h.p. for:
– Random Gaussian/Bernoulli: m=O(k log (n/k))
– Random Fourier: m=O(k logO(1) n)
•
•
•
Consider random m x n 0-1 matrices with d ones per column
Do they satisfy RIP ?
– No, unless m=(k2) [Chandar’07]
However, they can satisfy the following RIP-1 property [Berinde-Gilbert-IndykKarloff-Strauss’08]:
 is k-sparse  d (1-) ||||1 ||A||1  d||||1
• Sufficient (and necessary) condition: the underlying graph is a
( k, d(1-/2) )-expander
* Other (weaker) conditions exist, e.g., “kernel of A is a near-Euclidean subspace of l1”. More later.
Expanders
• A bipartite graph is a (k,d(1-))-expander if for
any left set S, |S|≤k, we have |N(S)|≥(1-)d |S|
• Constructions:
N(S)
– Randomized: m=O(k log (n/k))
– Explicit: m=k quasipolylog n
• Plenty of applications in computer science,
coding theory etc.
• In particular, LDPC-like techniques yield good
algorithms for exactly k-sparse vectors x
d
S
m
[Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08]
n
Proof: d(1-/2)-expansion  RIP-1
• Want to show that for any k-sparse  we have
d (1-) ||||1 ||A ||1  d||||1
• RHS inequality holds for any 
• LHS inequality:
d
– W.l.o.g. assume
|1|≥… ≥|k| ≥ |k+1|=…= |n|=0
– Consider the edges e=(i,j) in a lexicographic
order
– For each edge e=(i,j) define r(e) s.t.
m
• r(e)=-1 if there exists an edge (i’,j)<(i,j)
• r(e)=1 if there is no such edge
• Claim 1: ||A||1 ≥∑e=(i,j) |i|re
• Claim 2: ∑e=(i,j) |i|re ≥ (1-) d||||1
n
Recovery: algorithms
Matching Pursuit(s)
A
x*-x
i
Ax-Ax*
=
i
• Iterative algorithm: given current approximation x* :
– Find (possibly several) i s. t. Ai “correlates” with Ax-Ax* . This
yields i and z s. t.
||x*+zei-x||p << ||x* - x||p
– Update x*
– Sparsify x* (keep only k largest entries)
– Repeat
• Norms:
– p=2 : CoSaMP, SP, IHT etc (dense matrices)
– p=1 : SMP, this paper (sparse matrices)
– p=0 : LDPC bit flipping (sparse matrices)
Sequential Sparse Matching
Pursuit
• Algorithm:
– x*=0
– Repeat T times
A
• Repeat S=O(k) times
– Find i and z that minimize* ||A(x*+zei)-b||1
– x* = x*+zei
i
N(i)
• Sparsify x*
(set all but k largest entries of x* to 0)
• Similar to SMP, but updates done
sequentially
Ax-Ax*
x-x*
* Set z=median[ (Ax*-Ax)N(i) ]. Instead, one could first optimize (gradient) i and then z [ Fuchs’09]
SSMP: Running time
• Algorithm:
– x*=0
– Repeat T times
• For each i=1…n compute* zi that
achieves
A
Di=minz ||A(x*+zei)-b||1
and store Di in a heap
• Repeat S=O(k) times
i
– Pick i,z that yield the best gain
– Update x* = x*+zei
– Recompute and store Di’ for all i’ such that
N(i) and N(i’) intersect
• Sparsify x*
(set all but k largest entries of x* to 0)
• Running time:
T [ n(d+log n) + k nd/m*d (d+log n)]
= T [ n(d+log n) + nd (d+log n)] = T [ nd (d+log n)]
Ax-Ax*
x-x*
SSMP: Approximation
guarantee
• Want to find k-sparse x*
that minimizes ||x-x*||1
• By RIP1, this is
approximately the same as
minimizing ||Ax-Ax*||1
• Need to show we can do it
greedily
x
a1
a2
x
a1
a2
Supports of a1 and a2 have small
overlap (typically)
Experiments
256x256
SSMP is ran with S=10000,T=20. SMP is ran for 100 iterations. Matrix sparsity is d=8.
Conclusions
• Even better algorithms for sparse
approximation (using sparse matrices)
• State of the art: can do 2 out of 3:
– Near-linear encoding/decoding
This talk
– O(k log (n/k)) measurements
– Approximation guarantee with respect to L2/L1
norm
• Questions:
– 3 out of 3 ?
• “Kernel of A is a near-Euclidean subspace of l1” [Kashin et al]
• Constructions of sparse A exist [Guruswami-Lee-Razborov,
Guruswami-Lee-Wigderson]
• Challenge: match the optimal measurement bound
– Explicit constructions