Signal Recovery from Random Measurements

Signal Recovery
from Random Measurements
❦
Joel A. Tropp
Anna C. Gilbert
{jtropp|annacg}@umich.edu
Department of Mathematics
The University of Michigan
1
The Signal Recovery Problem
❦
Let s be an m-sparse signal in Rd, for example
T
s = 0 −7.3 0 0 0 2.7 0 1.5 0 . . .
Use measurement vectors x1, . . . , xN to collect N nonadaptive linear
measurements of the signal
hs, x1i ,
hs, x2i ,
...,
hs, xN i
Q1. How many measurements are necessary to determine the signal?
Q2. How should the measurement vectors be chosen?
Q3. What algorithms can perform the reconstruction task?
Signal Recovery from Partial Information (Madison, 29 August 2006)
2
Motivations I
❦
Medical Imaging
❧ Tomography provides incomplete, nonadaptive frequency information
❧ The images typically have a sparse gradient
❧ Reference: [Candès–Romberg–Tao 2004]
Sensor Networks
❧ Limited communication favors nonadaptive measurements
❧ Some types of natural data are approximately sparse
❧ References: [Haupt–Nowak 2005, Baraniuk et al. 2005]
Signal Recovery from Partial Information (Madison, 29 August 2006)
3
Motivations II
❦
Sparse, High-Bandwidth A/D Conversion
❧
❧
❧
❧
Signals of interest have few important frequencies
Locations of frequencies are unknown a priori
Frequencies are spread across gigahertz of bandwidth
Current analog-to-digital converters cannot provide resolution and
bandwidth simultaneously
❧ Must develop new sampling techniques
❧ References: [Healy 2005]
Signal Recovery from Partial Information (Madison, 29 August 2006)
4
Q1: How many measurements?
❦
Adaptive measurements
Consider the class of m-sparse signals in Rd that have 0–1 entries
d
It is clear that log2 m bits suffice to distinguish members of this class.
By Stirling’s approximation,
Storage per signal: O(m log(d/m)) bits
A simple adaptive coding scheme can achieve this rate
Nonadaptive measurements
The naı̈ve approach uses d orthogonal measurement vectors
Storage per signal: O(d) bits
But we can do exponentially better. . .
Signal Recovery from Partial Information (Madison, 29 August 2006)
5
Q2: What type of measurements?
❦
Idea: Use randomness
Random measurement vectors yield summary statistics that are
nonadaptive yet highly informative. Examples:
Bernoulli measurement vectors
Independently draw each xn uniformly from {−1, +1}d
Gaussian measurement vectors
Independently draw each xn from the distribution
1
−kxk22 /2
e
d/2
(2π)
Signal Recovery from Partial Information (Madison, 29 August 2006)
6
Connection with Sparse Approximation
❦
Define the fat N × d measurement matrix

xT1
..
Φ = 
xTN


The columns of Φ are denoted ϕ1, . . . , ϕd
Given an m-sparse signal s, form the data vector v = Φ s
 
 

 s1
s2 
v1
 
 ..  = ϕ1 ϕ2 ϕ3 . . . ϕd s3 
.
.
vN
sd
Note that v is a linear combination of m columns from Φ
Signal Recovery from Partial Information (Madison, 29 August 2006)
7
Orthogonal Matching Pursuit (OMP)
❦
Input: A measurement matrix Φ, data vector v, and sparsity level m
Initialize the residual r0 = v
For t = 1, . . . , m do
A. Find the column index ωt that solves
ωt = arg maxj=1,...,d |hrt−1, ϕj i|
B. Calculate the next residual
rt = v − Pt v
where Pt is the orthogonal projector onto span {ϕω1 , . . . , ϕωt }
Output: An m-sparse estimate sb with nonzero entries in components
ω1, . . . , ωm. These entries appear in the expansion
XT
Pm v =
sbωt ϕωt
t=1
Signal Recovery from Partial Information (Madison, 29 August 2006)
8
Advantages of OMP
❦
We propose OMP as an effective method for signal recovery because
❧
❧
❧
❧
OMP
OMP
OMP
OMP
is
is
is
is
fast
easy to implement
surprisingly powerful
provably correct
The goal of this lecture is to justify these assertions
Signal Recovery from Partial Information (Madison, 29 August 2006)
9
Theoretical Performance of OMP
❦
Theorem 1. [T–G 2005] Choose an error exponent p.
❧ Let s be an arbitrary m-sparse signal in Rd
❧ Draw N = O(p m log d) Gaussian or Bernoulli(?) measurements of s
❧ Execute OMP with the data vector to obtain an estimate sb
The estimate sb equals the signal s with probability exceeding (1 − 2 d−p).
To achieve 99% success probability in practice, take
N ≈ 2 m ln d
Signal Recovery from Partial Information (Madison, 29 August 2006)
10
Flowchart for Algorithm
Specify a coin-tossing
algorithm, including the
distribution of coin flips
Flip coins and
determine
measurement
vectors
knowledge of algorithm and
distribution of coin flips
no knowledge of
measurement vectors
no knowledge of
signal choice
Adversary
chooses arbitrary
m-sparse signal
Measure signal,
Run greedy
pursuit
algorithm
Output signal
Signal Recovery from Partial Information (Madison, 29 August 2006)
11
Empirical Results on OMP
❦
For each trial. . .
❧ Generate an m-sparse signal s in Rd by choosing m components and
setting each to one
❧ Draw N Gaussian measurements of s
❧ Execute OMP to obtain an estimate sb
❧ Check whether sb = s
Perform 1000 independent trials for each triple (m, N, d)
Signal Recovery from Partial Information (Madison, 29 August 2006)
12
Percentage Recovered vs. Number of Gaussian Measurements
Percentage of input signals recovered correctly (d = 256) (Gaussian)
100
90
80
Percentage recovered
70
60
50
40
30
20
m=4
m=12
m=20
m=28
m=36
10
0
50
100
150
Number of measurements (N)
Signal Recovery from Partial Information (Madison, 29 August 2006)
200
250
13
Percentage Recovered vs. Number of Bernoulli Measurements
Percentage of input signals recovered correctly (d = 256) (Bernoulli)
100
90
80
Percentage recovered
70
60
50
40
30
20
m=4
m=12
m=20
m=28
m=36
10
0
50
100
150
Number of measurements (N)
Signal Recovery from Partial Information (Madison, 29 August 2006)
200
250
14
Percentage Recovered vs. Level of Sparsity
Percentage of input signals recovered correctly (d = 256) (Gaussian)
100
N=52
N=100
N=148
N=196
N=244
90
80
Percentage recovered
70
60
50
40
30
20
10
0
10
20
30
40
Sparsity level (m)
Signal Recovery from Partial Information (Madison, 29 August 2006)
50
60
70
80
15
Number of Measurements for 95% Recovery
Regression Line: N = 1.5 m ln d + 15.4
Number of measurements to achieve 95% reconstruction probability (Gaussian)
240
220
Number of measurements (N)
200
180
160
140
120
100
80
60
40
Linear regression d = 256
Empirical value d = 256
5
10
15
Sparsity Level (m)
Signal Recovery from Partial Information (Madison, 29 August 2006)
20
25
30
16
Number of Measurements for 99% Recovery
m
4
8
12
16
20
d = 256
N N/(m ln d)
56
2.52
96
2.16
136
2.04
184
2.07
228
2.05
m
5
10
15
d = 1024
N N/(m ln d)
80
2.31
140
2.02
210
2.02
These data justify the rule of thumb
N ≈ 2 m ln d
Signal Recovery from Partial Information (Madison, 29 August 2006)
17
Percentage Recovered: Empirical vs. Theoretical
Percentage of input signals recovered correctly (d = 1024) (Gaussian)
100
90
80
Percentage recovered
70
60
50
40
30
m=5 empirical
m=10 empirical
m=15 empirical
m=5 theoretical
m=10 theoretical
m=15 theoretical
20
10
0
0
100
200
300
400
500
Number of measurements (N)
Signal Recovery from Partial Information (Madison, 29 August 2006)
600
700
800
18
Execution Time for 1000 Complete Trials
Execution time for 1000 instances (Bernoulli)
300
time d = 1024, N = 400
quadratic fit d = 1024
time d = 256, N = 250
quadratic fit d = 256
Execution time (seconds)
250
200
150
100
50
0
8
16
24
32
40
Sparsity level (m)
Signal Recovery from Partial Information (Madison, 29 August 2006)
48
56
64
19
Elements of the Proof I
❦
A Thought Experiment
❧ Fix an m-sparse signal s and draw a measurement matrix Φ
❧ Let Φopt consist of the m correct columns of Φ
❧ Imagine we could run OMP with the data vector and the matrix Φopt
❧ It would choose all m columns of Φopt in some order
❧ If we run OMP with the full matrix Φ and it succeeds, then it must
select columns in exactly the same order
Signal Recovery from Partial Information (Madison, 29 August 2006)
20
Elements of the Proof II
❦
The Sequence of Residuals
❧ If OMP succeeds, we know the sequence of residuals r1, . . . , rm
❧ Each residual lies in the span of the correct columns of Φ
❧ Each residual is stochastically independent of the incorrect columns
Signal Recovery from Partial Information (Madison, 29 August 2006)
21
Elements of the Proof III
❦
The Greedy Selection Ratio
❧ Suppose that r is the residual in Step A of OMP
❧ The algorithm picks a correct column of Φ whenever
ρ(r) =
max{j
max{j
: sj =0} |hr,
:
ϕj i|
sj 6=0} |hr, ϕj i|
< 1
❧ The proof shows that ρ(rt) < 1 for all t with high probability
Signal Recovery from Partial Information (Madison, 29 August 2006)
22
Elements of the Proof IV
❦
Measure Concentration
❧ The incorrect columns of Φ are probably almost orthogonal to rt
❧ One of the correct columns is probably somewhat correlated with rt
❧ So the numerator of the greedy selection ratio is probably small
Prob
max
{j : sj =0}
|hrt, ϕj i| > ε krtk2
2
. d e−ε
/2
❧ But the denominator is probably not too small
(
Prob
r
max
{j : sj 6=0}
|hrt, ϕj i| <
Signal Recovery from Partial Information (Madison, 29 August 2006)
!
N
− 1 − ε krtk2
m
)
2
. e−ε
m/2
23
Another Method: `1 Minimization
❦
❧ Suppose s is an m-sparse signal in Rd
❧ The vector v = Φ s is a linear combination of m columns of Φ
❧ For Gaussian measurements, this m-term representation is unique
Signal Recovery as a Combinatorial Problem
minsb kb
s k0
subject to Φ sb = v
(`0)
subject to Φ sb = v
(`1)
Relax to a Convex Program
minsb kb
s k1
References: [Donoho et al. 1999, 2004] and [Candès et al. 2004]
Signal Recovery from Partial Information (Madison, 29 August 2006)
24
A Result for `1 Minimization
❦
Theorem 2. [Rudelson–Vershynin 2005] Draw N = O(m log(d/m))
Gaussian measurement vectors. With probability at least (1 − e−d), the
following statement holds. For every m-sparse signal in Rd, the solution to
(`1) is identical with the solution to (`0).
Notes:
❧ One set of measurement vectors works for all m-sparse signals
❧ Related results have been established in [Candès et al. 2004–2005] and
in [Donoho et al. 2004–2005]
Signal Recovery from Partial Information (Madison, 29 August 2006)
25
So, why use OMP?
❦
Ease of implementation and speed
❧ Writing software to solve (`1) is difficult
❧ Even specialized software for solving (`1) is slow
Sample Execution Times
m
14
28
56
84
112
N
175
500
1024
1700
2400
d
512
2048
8192
16384
32768
OMP Time
0.02 s
0.17
2.50
11.94
43.15
Signal Recovery from Partial Information (Madison, 29 August 2006)
(`1) Time
1.5 s
14.9
212.6
481.0
1315.6
26
Randomness
❦
In contrast with `1, OMP may require randomness during the algorithm
Randomness can be reduced by
❧ Amortizing over many input signals
❧ Using a smaller probability space
❧ Accepting a small failure probability
Signal Recovery from Partial Information (Madison, 29 August 2006)
27
Research Directions
❦
❧ (Dis)prove existence of deterministic measurement ensembles
❧ Extend OMP results to approximately sparse signals
❧ Applications of signal recovery
❧ Develop new algorithms
Signal Recovery from Partial Information (Madison, 29 August 2006)
28
Related Papers and Contact Information
❦
❧ “Signal recovery from partial information via Orthogonal Matching
Pursuit,” submitted April 2005
❧ “Algorithms for simultaneous sparse approximation. Parts I and II,”
accepted to EURASIP J. Applied Signal Processing, April 2005
❧ “Greed is good: Algorithmic results for sparse approximation,” IEEE
Trans. Info. Theory, October 2004
❧ “Just Relax: Convex programming methods for identifying sparse
signals,” IEEE Trans. Info. Theory, March 2006
❧ ...
All papers available from http://www.umich.edu/~jtropp
E-mail: {jtropp|annacg}@umich.edu
Signal Recovery from Partial Information (Madison, 29 August 2006)
29