LECTURE 5 1. Compressive sensing This part follows [1] carefully

LECTURE 5
1. Compressive sensing
This part follows [1] carefully.
The basic setting of compressed sensing is the following. Assume we have
a high-dimensional signal - which we model by a vector x ∈ RN (think of
these being for example the Fourier coefficients of a signal). Assume further
that we have some way of making ”linear measurements” which is modelled
by a matrix A ∈ Rm×N acting on x. We assume that m N and ask when
can we recover x from y = Ax. Of course in general, the system of equations
is vastly underdetermined so there are infinitely many solutions. But if we
impose the assumption that x is inherently sparse - i.e. only say s N
entries of it are significantly different from zero, then things are different.
We will prove some very simple results about what one needs to know about
A to be able to carry this out. In practise, the signals x won’t be exactly
sparse, but one may hope that if the signal is nearly sparse, then pretending
it’s sparse could work to some degree.
We’ll be making use of different ways to measure the size of a vector in
so it will be convenient to introduce some notation for this.
RN
Definition 1. For x = (x1 , ..., xN ) ∈ RN , write
(1)
||x||0 =
N
X
1{xk 6= 0} = ]{k ∈ {1, ..., N } : xk 6= 0},
k=1
(2)
||x||1 =
N
X
|xk |,
k=1
and
(3)
v
uN
uX
|xk |2 .
||x||2 = t
k=1
If ||x||0 ≤ s for some s ∈ Z+ (or if x has at most s non-zero entries),
then we say that x is s-sparse. Moreover, we define the support of a vector
x ∈ RN as the set of indices k for which xk 6= 0:
(4)
supp x = {k ∈ {1, ..., N } : xk 6= 0}.
Remark 2. Here ||x||0 is just the number of non-zero entries x has (or in
other words, the number of elements in the support of x). Even though the
notation might suggest it, this is not a norm on RN . ||x||2 is the Euclidean
1
2
LECTURE 5
length of the vector x. All of these quantities satisfy a triangle inequality:
for x, y ∈ RN
(5)
||x + y||i ≤ ||x||i + ||y||i .
If we assume that x is s-sparse, and A ∈ Rm×N , then it can be shown
that under some conditions on s, m, N , and A (and possibly some further
conditions on x), x can be recovered from Ax. In particular, under suitable
conditions, it turns out that x is given by the unique minimizer to the
following problem with y = Ax:
(6)
min ||z||0 subject to Az = y.
Or in words: find the sparsest vector z ∈ RN such that Az = y. For more
information, see [1] and in particular, references within it.
This optimization problem is still rather problematic in that it is NPhard, so it might be completely unreasonable to expect that one could solve
it numerically. Due to this reason, it is natural to look for modifications of
the optimization problem and hope that under suitable conditions on A, we
can find an optimization problem that is efficient to solve, and recovers x.
It turns out that one such optimization problem is given by
(7)
argmin ||z||1 subject to Az = y,
for which there exist efficient algorithms (argmin means that we are looking for the minimizing vector, not the minimal value of its `1 -norm). The
remainder of our brief review of compressed sensing will be describing some
conditions on A under which the unique solution to (7) with y = Ax is given
by x, when x is sparse. We will then show that if A is random (and with
high enough dimension), with suitable distribution, and x is sparse enough,
these conditions will be satisfied with high probability.
The first condition under which we’ll be able to recover x from this type of
problem is called the null space property (null space referring to the vectors
v for which Av = 0).
Definition 3. A matrix A ∈ Rm×N satisfies the null space property of order
s if for all subsets S ⊂ {1, ..., N } of size s (i.e. S has s elements)
(8)
||vS ||1 < ||vS C ||1 ,
for all v ∈ KerA \ {0} = {u ∈ RN \ {0} : Au = 0}. Here vS denotes the RN
valued vector whose entries are vk with k ∈ S and zero if k ∈
/ S. Similarly
N
vS C denotes the R -valued vector whose entries are vk for k ∈
/ S and vk = 0
for k ∈ S.
Let us now prove that indeed if y = Ax and A satisfies the null space
property, x can be recovered from the optimization problem (7).
Proposition 4. Let A ∈ Rm×N . Then every s-sparse vector x ∈ RN (i.e.
||x||0 ≤ s) is the unique solution to the minimization problem:
LECTURE 5
(9)
3
argmin ||z||1 subject to Az = Ax,
if and only if A satisfies the null space property of order s.
Proof. Assume first that for each s-sparse vector x ∈ RN , the optimization
problem
(10)
min ||z||1 subject to Az = Ax,
has a unique minimizer: z = x. Now take S ⊂ {1, ..., N } such that S has
s elements. Also let v ∈ KerA \ {0}. By our assumption, vS (which is an
s-sparse vector) is the unique minimizer to the minimization problem
(11)
min ||z||1 subject to Az = AvS .
Since v = vS + vS C and Av = 0, we see that AvS = A(−vS C ). Also since
v 6= 0, vS 6= −vS C . Thus we must have ||vS C ||1 > ||vS ||1 . Otherwise −vS C
would be another minimizer for the optimization problem (since AvS =
A(−vS C )) which is contrary to our assumption that the problem has a unique
solution. Thus the null space property of order s holds for A.
Assume now that the null space property of order s holds for the matrix
A. Let x be an s-sparse vector. Consider now some z ∈ RN such that z 6= x
and Az = Ax. Thus v = x − z ∈ KerA \ {0}. Set S = {k : xk 6= 0} (and
enlarge it in an arbitrary way if ||x||0 ≤ s so that S has s elements). Due to
the null space property of order s (and the fact that x = xS ), we have
(12)
||x||1 ≤ ||x − zS ||1 + ||zS ||1
= ||xS − zS ||1 + ||zS ||1
= ||vS ||1 + ||zS ||1
< ||vS C ||1 + ||zS ||1
= || − zS C ||1 + ||zS ||1
= ||z||1 ,
or in other words, x is the unique minimizer to the minimization problem.
The null space property is not very easy to check in practice, so we consider a further condition on A which will be more accessible and will imply
the null space property in some cases. To state the condition, we introduce
the concept of a restricted isometry constant.
Definition 5. The restricted isometry constant δs of a matrix A ∈ Rm×N
is defined to be the smallest δs ≥ 0 such that
(13)
(1 − δs )||x||22 ≤ ||Ax||22 ≤ (1 + δs )||x||22
for all s-sparse x ∈ RN .
4
LECTURE 5
The statement that we’ll prove is that when δ2s is small enough, A will
satisfy the null space property of order s. To prove this, we’ll need some
way to estimate δs .
Lemma 6. Let A ∈ Rm×N with restricted isometry constants δs . Then
a) The restricted isometry constants are ordered: δ1 ≤ δ2 ≤ ....
b) The restricted isometry constant δs can be expressed in the following
way
δs = sup |h(AT A − IN ×N )x, xi|,
(14)
x∈Ts
where Ts = {x ∈ RN : ||x||2 = 1, ||x||0 ≤ s}, for x, y ∈ RN , hx, yi =
PN
i=1 xi yi , and IN ×N is the N × N identity matrix. Note that as
A ∈ Rm×N , AT A ∈ RN ×N .
c) Let u, v ∈ RN be such that supp u and supp v are disjoint (i.e.
uk vk = 0 for all k). Let s = ||u||0 + ||v||0 . Then
|hAu, Avi| ≤ δs ||u||2 ||v||2 .
(15)
Proof. Recall that due to our definition of a vector x being s-sparse, an
s-sparse vector is also s + 1-sparse. Thus we see from (13) that δs+1 has
strictly more constraints to satisfy than δs , so it must be larger which is the
statement in a) (this result is also immediate from the representation in b)).
Note that the inequality (13) is equivalent to
||Ax||22 − ||x||22 ≤ δs ||x||22 .
(16)
Note that we can write the left side of this as
(17)
|hAx, Axi − hx, xi| = |h(AT A − IN ×N )x, xi|.
Thus we can reformulate the definition of δs as
(18)
|h(AT A − IN ×N )x, xi|
||x||22
x∈RN \{0},||x||0 ≤s
x
x
T
(A A − IN ×N )
,
=
sup
||x||2 ||x||2 x∈RN \{0},||x||0 ≤s
= sup (AT A − IN ×N )y, y .
δs =
sup
y∈Ts
This was precisely the claim in b). Finally for c) note that due to the
assumption that the supports of u and v being disjoint, hu, vi = 0. Thus
(19)
|hAu, Avi| = |hu, (AT A − IN ×N )vi|
u
v
T
.
= ||u||2 ||v||2 , (A A − IN ×N )
||u||2
||v||2 LECTURE 5
5
Let us now define S = supp u ∪ supp v and VS to be the space of
vectors w ∈ RN with suppw = S. Or in words, VS is the space of vectors w ∈ RN whose entries wk are zero for k ∈
/ S. Write PVS for the
orthogonal projection from RN to VS . Then look at the linear mapping
HS = PVS (AT A − IN ×N )PVTS : VS → VS (actually PVTS = PVS which is the
identity on VS , but we write HS in this way to underline that it’s symmetric). This is a symmetric linear map - HST = HS . Thus its (normalized)
eigenvectors (note that we are now talking about eigenvectors in VS !) form
an orthonormal basis for VS . Let us write (φk )sk=1 for these eigenvectors
and let (λk )sk=1 be the corresponding
we write x, y ∈ VS
Pseigenvalues. Now if P
s
with
||x||
=
||y||
=
1
as
x
=
α
φ
and
y
=
2
2
k
k
k=1
k=1 βk φk , where
P 2 P 2
α
=
β
=
1,
we
see
that
k k
k k
(20)
s
X
|hx, HS yi| = αk βk λk ≤ max |λk |
k
k=1
and one has an equality here for x = y = φkmax where kmax is the index
satisfying λkmax = maxk |λk |. Moreover, we see that |hx, HS xi| ≤ maxk |λk |
and equality is achieved with x = φkmax . We conclude that
(21)
sup
|hx, HS yi| =
x,y∈VS ,||x||2 =||y||2 =1
|hx, HS xi|.
sup
x∈VS ,||x||2 =1
This in turn implies that
v
u
T
≤
(22) , (A A − IN ×N )
||u||2
||v||2 sup
x, (AT A − IN ×N )x x∈Vs ,||x||2 =1
≤ sup x, (AT A − IN ×N )x x∈Ts
= δs .
Combining with (19) yields the claim in c).
We are now in a position to prove that some constraint on the restricted
isometry constants implies a suitable null space property.
Theorem 7. Suppose that for A ∈ Rm×N , the restricted isometry constant
δ2s satisfies δ2s < 1/3. Then the null space property of order s is satisfied.
Proof. Fix v ∈ KerA \ {0}. Let S0 ⊂ {1, ..., N } be the index set of the s
largest modulus entries of v. More precisely S0 is a set of s elements and
|vk | ≥ |vl | for all k ∈ S0 and l ∈
/ S0 . Define S1 in a similar manner: it is
the index set of the s largest modulus entries of vS C (that is it contains the
0
next s largest entries). We keep going on in this way defining Si for as long
as there are entries left. We then have
(23)
v=
∞
X
i=0
vSi ,
6
LECTURE 5
where Si is an empty set for large enough i and vSi zero in that case. Now
as v ∈ KerA, or Av = 0, we have
(24)
AvS0 =
∞
X
A(−vSi ).
i=1
We thus find from the definition of δ2s (we could use δs in the first step
here, but it won’t be useful later on)
(25)
1
||AvS0 ||22
1 − δ2s
∞
X
1
=
hAvS0 , A(−vSi )i.
1 − δ2s
||vS0 ||22 ≤
i=1
Now by construction, the supports of vSi are disjoint so by part c of
Lemma 6 (this is where the 2s comes into play)
(26)
hAvS0 , A(−vSi )i ≤ δ2s ||vS0 ||2 ||vSi ||2 .
Plugging this into (25) and dividing by ||vS0 ||2 we find
(27)
∞
δ2s X
||vSi ||2 .
||vS0 ||2 ≤
1 − δ2s
i=1
Now recall that we constructed the Si so that the corresponding entries
are decreasing in magnitude, or then we can write for k ∈ Sj+1
|vk | ≤
(28)
1X
|vl |.
s
l∈Sj
Thus
(29)
||vSj+1 ||2 =
s X
k∈Sj+1
1
|vk |2 ≤ √ ||vSj ||1 .
s
Now by Cauchy-Schwarz, (27), and (29)
(30)
||vS0 ||1 =
X
|1 · vk |
k∈S0
≤
=
≤
sX
√
k∈S0
X
1·
|vk |2
k∈S0
s||vS0 ||2
∞
δ2s X
||vSk−1 ||1
1 − δ2s
k=1
δ2s
=
(||vS0 ||1 + ||vS C ||1 ).
0
1 − δ2s
LECTURE 5
7
Solving for ||vS0 ||1 from this, we see (using δ2s < 1/3 so 1−δ2s /(1−δ2s ) >
0) that
(31)
||vS0 ||1 ≤
1
1−
δ2s
1−δ2s
δ2s
||v C ||1
1 − δ2s S0
< ||vS C ||1
0
for δ2s < 1/3.
So far we considered only ||vS ||1 for S = S0 , but this is obviously the worst
case scenario so in fact we have already proven the null space property of
order s.
References
[1] H. Rauhut: Compressive sensing and structured random matrices. Theoretical foundations and numerical methods for sparse recovery, 1–92,
Radon Ser. Comput. Appl. Math., 9, Walter de Gruyter, Berlin, 2010