Reconstruction from Anisotropic Random
Measurements
Mark Rudelson and Shuheng Zhou
The University of Michigan, Ann Arbor
Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan
August 7, 2013
Want to estimate a parameter β ∈ Rp
Example: How is a response y ∈ R related to the Parkinson’s
disease affected by a set of genes among the Chinese
population?
Construct a linear model: y = β T ~x + , where E (y |~x ) = β T ~x .
I
Parameter: Non-zero entries in β (sparsity of β) identify a subset of
genes and indicate how much they influence y .
Take a random sample of (X , Y ), and use the sample to estimate
β; that is, we have Y = X β + .
Model selection and parameter estimation
When can we approximately recover β from n noisy observations Y ?
Questions: How many measurements n do we need in order to
recover the non-zero positions in β?
How does n scale with p or s, where s is the number of non-zero
entries of β?
What assumptions about the data matrix X are reasonable?
Sparse recovery
When β is known to be s-sparse for some 1 ≤ s ≤ n, which means that
at most s of the coefficients of β can be non-zero:
Assume every 2s columns of X are linearly independent:
Identifiability condition (reasonable once n ≥ 2s)
4
Λmin (2s) =
min
υ6=0,2s-sparse
kX υk2
n kυk2
> 0.
Proposition: (Candès-Tao 05). Suppose that any 2s columns of
the n × p matrix X are linearly independent. Then, any s-sparse
signal β ∈ Rp can be reconstructed uniquely from X β.
`0 -minimization
How to reconstruct an s-sparse signal β ∈ Rp from the measurements
Y = X β given Λmin (2s) > 0?
Let β be the unique sparsest solution to X β = Y :
β = arg min β:X β=Y kβk0
where kβk0 := #{1 ≤ i ≤ p : βi 6= 0} is the sparsity of β.
Unfortunately, `0 -minimization is computationally intractable; (in
fact, it is an NP-complete problem).
Basis pursuit
Consider the following convex optimization problem
β ∗ := arg min β:X β=Y kβk1 .
Basis pursuit works whenever the n × p measurement matrix X is
sufficiently incoherent:
RIP (Candès-Tao 05) requires that for all T ⊂ {1, . . . , p} with
|T | ≤ s and for all coefficients sequences (cj )j∈T ,
(1 − δs ) kck2 ≤ kXT c/nk2 ≤ (1 + δs ) kck2
holds for some 0 < δs < 1 (s-restricted isometry constant).
The “good” matrices for compressed sensing should satisfy the
inequalities for the largest possible s.
Restricted Isometry Property (RIP): examples
For Gaussian random matrix, or any sub-Gaussian ensemble, RIP
holds with s n/ log(p/n).
For random Fourier ensemble, or randomly sampled rows of
orthonormal matrices, RIP holds for s = O(n/ log4 p).
For a random matrix composed of columns that are independent
isotropic vectors with log-concave densities, RIP holds for
s = O(n/log 2 (p/n)).
References: Candès-Tao 05, 06, Rudelson-Vershynin 05, Donoho 06,
Baraniuk et al. 08, Mendelson et al. 08, Adamczak et al. 09.
Basis pursuit for high dimensional data
These algorithms are also robust with regards to noise, and RIP will be
replaced by more relaxed conditions.
In particular, the isotropicity condition which has been assumed in
all literature cited above needs to be dropped. Let Xi ∈ Rp ,
i = 1, . . . , n be i.i.d. random row vectors of the design matrix X .
Covariance matrix:
Σ(Xi ) = EXi ⊗ Xi = EXi XiT
n
n
X
1X
bn = 1
Σ
Xi ⊗ Xi =
Xi XiT
n
n
i=1
Xi is isotropic if Σ(Xi ) = I and E kXi k22 = n.
i=1
Sparse recovery for Y = X β + Lasso (Tibshirani 96), a.k.a. Basis Pursuit (Chen, Donoho and
Saunders 98, and others):
βe = arg min kY − X βk2 /2n + λn kβk1 ,
β
where the scaling factor 1/(2n) is chosen by convenience.
Dantzig selector (Candès-Tao 07):
e 1 subject to kX T (Y − X β)/nk
e
(DS) arg min kβk
∞ ≤ λn .
e p
β∈R
References: Greenshtein-Ritov 04, Meinshausen-Bühlmann 06, Zhao-Yu 06,
Bunea et al. 07, Candès-Tao 07, van de Geer 08, Zhang-Huang 08,
Wainwright 09, Koltchinskii 09, Meinshausen-Yu 09, Bickel et. al. 09, and
others.
The Cone Constraint
For an appropriately chosen λn , the solution of the Lasso or the
Dantzig selector satisfies (under i.i.d. Gaussian noise),
with high probability,
υ := βb − β ∈ C(s, k0 )
k0 = 1 for the Dantzig selector, and k0 = 3 for the Lasso.
Object of interest: for 1 ≤ s0 ≤ p, and a positive number k0 ,
C(s0 , k0 ) = x ∈ Rp | ∃J ∈ {1, . . . , p}, |J| = s0 s.t. kxJ c k1 ≤ k0 kxJ k1
This object has appeared in earlier work in the noiseless setting
References: Donoho-Huo 01, Elad-Bruckstein 02, Feuer-Nemirovski 03,
Candès-Tao 07, Bickel-Ritov-Tsybakov 09, Cohen-Dahmen-DeVore 09.
The Lasso solution
Restricted Eigenvalue (RE) condition
Object of interest:
C(s0 , k0 ) = x ∈ Rp | ∃J ∈ {1, . . . , p}, |J| = s0 s.t. kxJ c k1 ≤ k0 kxJ k1 .
Definition
Matrix Aq×p satisfies RE(s0 , k0 , A) condition with parameter
K (s0 , k0 , A) if for any υ 6= 0,
kAυk2
1
:= min
min
> 0.
J⊆{1,...,p}, kυJ c k ≤k0 kυJ k
K (s0 , k0 , A)
kυJ k2
1
1
|J|≤s0
References: van de Geer 07, Bickel-Ritov-Tsybakov 09, van de
Geer-Bühlmann 09.
An elementary estimate
Lemma
For each vector υ ∈ C(s0 , k0 ), let T0 denote the locations
of the s0
largest coefficients of υ in absolute values. Then υT0c ≤ υT0 1 , and
1
υT ≥ √kυk2 .
0 2
1+k0
Implication: Let A be a q × p matrix such that RE(s0 , 3k0 , A) condition
holds for 0 < K (s0 , 3k0 , A) < ∞. Then ∀υ ∈ C(s0 , k0 ) ∩ S p−1
υT 1
0 2
√
kAυk2 ≥
≥
>0
K (s0 , k0 , A)
K (s0 , k0 , A) · 1 + k0
Sparse eigenvalues
Definition
For m ≤ p, we define the largest and smallest m-sparse eigenvalue of
a q × p matrix A to be
ρmax (m, A) :=
ρmin (m, A) :=
max
kAtk22 / ktk22 ,
min
kAtk22 / ktk22 .
t∈Rp ,t6=0;m−sparse
t∈Rp ,t6=0;m−sparse
If RE(s0 , k0 , A) is satisfied with k0 ≥ 1, then the square submatrices of
size 2s0 of AT A are necessarily positive definite, that is,
ρmin (2s0 , A) > 0.
Examples: of A which satisfies the Restricted Eigenvalue condition,
but not RIP (Ruskutti, Wainwright, and Yu 10)
Spiked Identity matrix: for a ∈ [0, 1),
Σp×p = (1 − a)Ip×p + a~1~1T
where ~1 ∈ Rp is the vector of all ones.
ρmin (Σ) > 0
Then for all s0 × s0 submatrix ΣSS , we have
ρmax (ΣSS )
1 + a(s0 − 1)
=
ρmin (ΣSS )
1−a
Largest sparse eigenvalue → ∞ as s0 → ∞, but Σ1/2 ej 2 = 1 is
bounded.
Motivation: to construct classes of design matrices such that the
Restricted Eigenvalue condition will be satisfied.
Design matrix X has just independent rows, rather than
independent entries: e.g., consider for some matrix Aq×p
X = ΨA,
where rows of the matrix Ψn×q are independent isotropic vectors
with subgaussian marginals, and RE(s0 , (1 + ε)k0 , A) holds for
some ε > 0, p > s0 ≥ 0, and k0 > 0.
Design matrix X consists of independent identically distributed
rows with bounded entries, whose covariance matrix
Σ(Xi ) = EXi XiT satisfies RE(s0 , (1 + ε)k0 , Σ1/2 ).
The rows of X will be sampled from some distributions in Rp ; The
distribution may be highly non-Gaussian and perhaps discrete.
Outline
Introduction
The main results
I
I
The reduction principle
Applications of the reduction principle
Ingredients of the proof
Conclusion
Notation
Let e1 , . . . , ep be the canonical basis of Rp . For a set
J ⊂ {1, . . . , p}, denote EJ = span {ej : j ∈ J}.
For a matrix A, we use kAk2 to denote its operator norm.
For a set V ⊂ Rp , we let conv V denote the convex hull of V .
For a finite set Y , the cardinality is denoted by |Y |.
Let B2p and S p−1 be the unit Euclidean ball and the unit sphere
respectively
The reduction principle:
Theorem
Let E = ∪|J|=d EJ for d(3k0 ) < p, where
2 16K 2 (s0 , 3k0 , A)(3k0 )2 (3k0 + 1)
d(3k0 ) = s0 + s0 max Aej 2
δ2
j
e be a matrix such that
and E denotes Rp otherwise. Let Ψ
e ∀x ∈ AE (1 − δ) kxk2 ≤ Ψx
≤ (1 + δ) kxk2 .
2
e holds with 0 < K (s0 , k0 , ΨA)
e ≤
Then RE(s0 , k0 , ΨA)
K (s0 ,k0 ,A)
1−5δ .
e acts as almost isometry on the images of the d-sparse
If the matrix Ψ
e satisfies the RE condition with a
vectors under A, then the product ΨA
smaller parameter k0 .
Reformulation of the reduction principle:
Theorem
e be a matrix such that
Restrictive Isometry. Let Ψ
e ∀x ∈ AE (1 − δ) kxk2 ≤ Ψx
≤ (1 + δ) kxk2 .
2
Then for any x ∈ A C(s0 , k0 ) ∩ S q−1 ,
e (1 − 5δ) ≤ Ψx
≤ (1 + 3δ)
2
e acts as almost isometry on the images of the
If the matrix Ψ
d-sparse vectors under A, then it acts the same way on the
images of C(s0 , k0 ).
It is reduced to checking that the almost isometry property holds
for all vectors from some low-dimensional subspaces which is
easier than checking the RE property directly.
Definition: subgaussian random vectors
Let Y be a random vector in Rp
1
2
Y is called isotropic if for every y ∈ Rp , E | h Y , y i |2 = kyk22 .
Y is ψ2 with a constant α if for every y ∈ Rp ,
k h Y , y i kψ2 := inf{t : E exp( h Y , y i 2 /t 2 ) ≤ 2} ≤ α kyk2 .
The ψ2 condition on a scalar random variable V is equivalent to the
subgaussian tail decay of V , which means for some constant c,
P (|V | > t) ≤ 2 exp(−t 2 /c 2 ), for all t > 0.
A random vector Y in Rp is subgaussian if the one-dimensional
marginals h Y , y i are sub-gaussian random variables for all y ∈ Rp .
The first application of the reduction principle
Let A be a q × p matrix satisfying RE(s0 , 3k0 , A) condition. Let
m = min(d, p) where
d
2 16K 2 (s0 , 3k0 , A)(3k0 )2 (3k0 + 1)
= s0 + s0 max Aej 2
.
δ2
j
Theorem
Let Ψ be an n × q matrix whose rows are independent isotropic ψ2
random vectors in Rq with constant α. Suppose
60ep
2000mα4
n≥
log
.
mδ
δ2
Then with probability at least 1 − 2 exp(−δ 2 n/2000α4 ),
√
RE(s0 , k0 , (1/ n)ΨA) condition holds with
√
K (s0 , k0 , A)
0 < K (s0 , k0 , (1/ n)ΨA) ≤
.
1−δ
Examples of subgaussian vectors
The random vector Y with i.i.d N(0, 1) random coordinates.
Discrete Gaussian vector, which is a random vector taking values
e p with distribution
on the integer lattice X
e p.
P(X = m) = C exp(− kmk22 /2) for m ∈ X
A vector with independent centered bounded random coordinates.
In particular, vectors with random symmetric Bernoulli
coordinates, in other words, random vertices of the discrete cube.
Previous results on (sub)Gaussian random vectors
Raskutti, Wainwright, and Yu 10: RE(s0 , k0 , X ) holds for random
Gaussian measurements / design matrix X which consists of
n = O(s0 log p) independent copies of a Gaussian random vector
Y ∼ Np (0, Σ), assuming that the RE condition holds for Σ1/2 .
Their proof relies on a deep result from the theory of Gaussian
random processes – Gordon’s Minimax Lemma
To establish the RE condition for more general classes of random
matrices we had to introduce a new approach based on geometric
functional analysis, namely, the reduction principle.
The bound n = O(s0 log p) can be improved to the optimal one
n = O(s0 log(p/s0 )) when RE(s0 , k0 , Σ1/2 ) is replaced with
RE(s0 , (1 + ε)k0 , Σ1/2 ) for any ε > 0.
In Zhou 09, subgaussian random matrices of the form X = ΨΣ1/2
was considered, where Σ is a p × p positive semidefinite matrix: X
satisfies RE(s0 , k0 ) condition with overwhelming probability if for
K := K (s0 , k0 , Σ1/2 ),
9c 0 α4
5ep
2 2
1/2
(2
+
k
)
K
n>
4ρ
(s
,
Σ
)s
log
∧
s
log
p
.
max 0
0
0
0
s0
δ2
Analysis there used a result in Mendelson et al. 07, 08.
The current result does not involve ρmax (s0 , A), nor the global
e such as the norm or the
parameters of the matrices A and Ψ,
smallest singular value.
Recall the Spiked Identity matrix: for a ∈ [0, 1),
Σp×p = (1 − a)Ip×p + a~1~1T
which satisfies the RE condition, such that ρmax (s0 , A) grows
linearly with s0 while the maximum of Σ1/2 ej 2 = 1.
Design matrices with uniformly bounded entries
Let Y ∈ Rp be a random vector such that kY k∞ ≤ M a.s. and denote
Σ = EYY T . Let X be an n × p matrix, whose rows X1 , . . . , Xn ∼ Y . Set
!
1/2 2 16K 2 (s0 , 3k0 , Σ1/2 )(3k0 )2 (3k0 + 1)
d = s0 + s0 max Σ ej .
δ2
2
j
Theorem
Assume that d ≤ p and ρ = ρmin (d, Σ1/2 ) > 0. Let Σ satisfy the
RE(s0 , 3k0 , Σ1/2 ) condition. Suppose
CM 2 d · log p
n≥
· log3
ρδ 2
CM 2 d · log p
ρδ 2
.
Then with probability at least 1 − exp −δρn/(6M 2 d) , RE(s0 , k0 , X )
√
√
1/2 )
0 ,Σ
holds for matrix X / n with 0 < K (s0 , k0 , X / n)) ≤ K (s0 ,k
.
1−δ
Remarks on applying the reduction principle
To analyze different classes of random design matrices:
Unlike the case of a random matrix with subgaussian marginals,
the estimate of the second example contains the minimal sparse
singular value ρmin (d, Σ1/2 ).
The reconstruction of sparse signals by subgaussian design
matrices or by random Fourier ensemble was analyzed in the
literature before, however only under the RIP assumptions.
The reduction principle can be applied to other types of random
variables: e.g., random vectors with heavy-tailed marginals, or
random vectors with log-concave densities.
References: Rudelson-Vershynin 05, Baraniuk et al. 08, Mendelson et al. 08,
Vershynin 11a, b, Adamczak et al. 09.
Maurey’s empirical approximation argument (Pisier 81)
Let u1 , . . . , uM ∈ Rq . Let y ∈ conv (u1 , . . . , uM ):
y=
X
αj uj where αj ≥ 0, and
X
αj = 1.
j
j∈{1,...,M}
There exists a set L ⊂ {1, 2, . . . , M} such that
2
4 maxj∈{1,...,M} uj 2
|L| ≤ m =
ε2
and a vector y 0 ∈ conv (uj , j ∈ L) such that
0
y − y ≤ ε.
2
proof An application of the probabilistic methods: if we only want to
approximate y rather than exactly represent it as a convex combination
of u1 , . . . , uM , this is possible with much fewer points, namely,
u1 , . . . , u|L| .
Let y =
X
αj uj where αj ≥ 0, and
j∈{1,...,M}
. .
.
. . .. . . . . .
.
.
. . . .. .
. . .. . . .
. ..
Goal: to find a vector y 0 ∈ conv (uj , j ∈ L) such that
0
y − y ≤ ε.
2
X
j
αj = 1.
Let Y be a random vector in Rq such that
P (Y = u` ) = α` , ` ∈ {1, . . . , M}
X
Then E (Y ) =
α` u` = y.
`∈{1,...,M}
.
.. . .
. . .. . . .
. .. . . .
. . .. .. .. .
.
Let Y1 , . . . , Ym be independent copies of Y and let ε1 , . . . , εm be ±1
i.i.d. mean zero Bernoulli random variables, chosen independently of
Y1 , . . . , Ym .
By the standard symmetrization argument we have
2
2
X
m
m
m
X
2 1
4 X 1
= 2
Yj ≤ 4E εj Yj E y −
E Yj 2
m
m
m j=1
j=1
j=1
2
2
≤
4 max`∈{1,...,M} ku` k22
≤ ε2
m
where
2
2
E Yj 2 ≤ sup Yj 2 ≤
max
`∈{1,...,M}
ku` k22
and the last inequality in (1) follows from the definition of m.
(1)
Fix a realization Yj = ukj , j = 1, . . . , m for which
m
X
y − 1
Yj ≤ ε.
m
j=1
2
.
ukm
.
.
u k m-1
...
1
m
Pm
.
.
.
u k1
.
u k2
u k3
The vector
j=1 Yj belongs to the convex hull of {u` : ` ∈ L}, where
L is the set of different elements from the sequence k1 , . . . , km .
Obviously |L| ≤ m and the lemma is proved. Q.E.D.
The Inclusion Lemma
To prove the restricted isometry of Ψ over the set of vectors in
A C(s0 , k0 ) ∩ S q−1 , we show that this set is contained in the convex
hull of the images of the sparse vectors with norms not exceeding
(1 − δ)−1 .
Lemma
Let 1 > δ > 0. Suppose RE(s0 , k0 , A) condition holds for matrix Aq×p .
For a set J ⊂ {1, . . . , p}, EJ = span {ej : j ∈ J}. Set
d
Then
!
16K 2 (s0 , k0 , A)k02 (k0 + 1)
.
δ2
j
[
A C(s0 , k0 ) ∩ S q−1 ⊂ (1 − δ)−1 conv
AEJ ∩ S q−1
2
= d(k0 , A) = s0 + s0 max Aej 2
|J|≤d
where for d ≥ p, EJ is understood to be Rp .
Conclusion
We prove a general reduction principle showing that if the matrix
e acts as almost isometry on the images of the sparse vectors
Ψ
e satisfies the RE condition with a
under A, then the product ΨA
smaller parameter k0 .
We apply the reduction principle to analyze different classes of
random design matrices.
This analysis is reduced to checking that the almost isometry
property holds for all vectors from some low-dimensional
subspaces, which is easier than checking the RE property directly.
Thank you!
© Copyright 2026 Paperzz