Sparse Optimization
Lecture: Dual Certificate in `1 Minimization
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Instructor: Wotao Yin
Department of Mathematics, UCLA
2013-07-12
Sparse Optimization
Lecture: Dual Certificate in `1 Minimization
Instructor: Wotao Yin
Department of Mathematics, UCLA
July 2013
Note scriber: Zheng Sun
Those who complete this lecture will know
•
•
•
what is a dual certificate for `1 minimization
a strictly complementary dual certificate guarantees exact recovery
it also guarantees stable recovery
1 / 24
July 2013
Note scriber: Zheng Sun
Those who complete this lecture will know
•
•
•
what is a dual certificate for `1 minimization
a strictly complementary dual certificate guarantees exact recovery
it also guarantees stable recovery
What is covered
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
What is covered
A review of dual certificate in the context of conic programming
2013-07-12
A condition that guarantees recovering a set of sparse vectors (whose
entries have the same signs), not for all k-sparse vectors /
A review of dual certificate in the context of conic programming
A condition that guarantees recovering a set of sparse vectors (whose
entries have the same signs), not for all k-sparse vectors /
The condition depends on sign(xo ), but not xo itself or b
The condition is sufficient and necessary ,
It also guarantees robust recovery against measurement errors ,
The condition can be numerically verified (in polynomial time) ,
The underlying techniques are Lagrange duality, strict complementarity, and LP
strong duality.
Results in this lecture are drawn from various papers. For references, see:
H. Zhang, M. Yan, and W. Yin, One condition for all: solution uniqueness and robustness of
`1 -synthesis and `1 -analysis minimizations
2 / 24
The condition depends on sign(xo ), but not xo itself or b
The condition is sufficient and necessary ,
What is covered
It also guarantees robust recovery against measurement errors ,
The condition can be numerically verified (in polynomial time) ,
The underlying techniques are Lagrange duality, strict complementarity, and LP
strong duality.
Results in this lecture are drawn from various papers. For references, see:
H. Zhang, M. Yan, and W. Yin, One condition for all: solution uniqueness and robustness of
`1 -synthesis and `1 -analysis minimizations
Lagrange dual for conic programs
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Lagrange dual for conic programs
2013-07-12
T
(Suppose a, b ∈ Ki . Then, a b ≥ 0. If a b = 0, either a = 0 or b = 0.)
Primal:
min cT x
s.t. Ax = b, xi ∈ Ki ∀i.
L(x; s) = cT x + sT (Ax − b)
Dual function:
x
Dual problem:
min −d(s) ⇐⇒ min bT s
s
s
min cT x
s.t. Ax = b, xi ∈ Ki ∀i.
Lagrangian relaxation:
L(x; s) = cT x + sT (Ax − b)
Lagrange dual for conic programs
Dual function:
d(s) = min{L(x; s) : xi ∈ Ki ∀i} = −bT s − ι{(AT s+c)i ∈Ki
x
Dual problem:
min −d(s) ⇐⇒ min bT s
s
s
• min −d(s) ⇔ max d(s)
∀i}
s.t. (AT s + c)i ∈ Ki ∀i
One problem might be simpler to solve than the other; solving one might help
solve the other.
3 / 24
∀i}
s.t. (AT s + c)i ∈ Ki ∀i
One problem might be simpler to solve than the other; solving one might help
solve the other.
• d(s) = inf x {(c + AT s)T x − sT b : xi ∈ Ki }
If (cT + AT s)i ∈ Ki the inner product is no less than zero; otherwise, it will
crash down to −∞. Thus d(s) = −bT y − ι{(AT s+c)i ∈Ki ∀i} Where
(
0, {(AT s + c)i ∈ Ki ∀i}.
ι{(AT s+c)i ∈Ki ∀i} =
∞, otherwise
Lagrangian relaxation:
d(s) = min{L(x; s) : xi ∈ Ki ∀i} = −bT s − ι{(AT s+c)i ∈Ki
(Suppose a, b ∈ Ki . Then, aT b ≥ 0. If aT b = 0, either a = 0 or b = 0.)
Primal:
Let Ki be a first-orthant, second-order, or semi-definite cone. It is self-dual.
T
Let Ki be a first-orthant, second-order, or semi-definite cone. It is self-dual.
Dual certificate
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
2013-07-12
Dual certificate
Given that x∗ is primal feasible, i.e., obeying Ax∗ = b, x∗i ∈ Ki ∀i.
Question: is x∗ optimal?
Answer: One does not need to compare x∗ to all other feasible x.
A dual vector y∗ will certify the optimality of x∗ .
4 / 24
Given that x∗ is primal feasible, i.e., obeying Ax∗ = b, x∗i ∈ Ki ∀i.
Question: is x∗ optimal?
Dual certificate
Answer: One does not need to compare x∗ to all other feasible x.
A dual vector y∗ will certify the optimality of x∗ .
Dual certificate
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Dual certificate
Theorem
Suppose x∗ is feasible (i.e., Ax∗ = b, x∗i ∈ Ki ∀i). If s∗ obeys
1. vanished duality gap: −bT s∗ = cT x∗ , and
T ∗
2013-07-12
2. dual feasibility: (A s + c)i ∈ Ki ,
Theorem
Suppose x∗ is feasible (i.e., Ax∗ = b, x∗i ∈ Ki ∀i). If s∗ obeys
1. vanished duality gap: −bT s∗ = cT x∗ , and
then x∗ is primal optimal.
Pick any primal feasible x (i.e., Ax = b, xi ∈ Ki ∀i), we have
X
(c + AT s∗ )T x =
(c + AT s∗ )Ti xi ≥ 0
|
{z
} |{z}
i
Dual certificate
∈Ki
cT x = (c + AT s∗ )T x − (AT s∗ )T x ≥ −(AT s∗ )T x = −bT s∗ = cT x∗ .
Therefore, x∗ is optimal.
Corollary: (c + AT s∗ )T x∗ = 0 and (c + AT s∗ )Ti x∗i = 0, ∀i.
Bottom line: dual vector y∗ = AT s∗ certifies the optimality of x∗ .
2. dual feasibility: (AT s∗ + c)i ∈ Ki ,
• The illustration of ”vanished gap”
then x∗ is primal optimal.
Pick any primal feasible x (i.e., Ax = b, xi ∈ Ki ∀i), we have
X
(c + AT s∗ )T x =
(c + AT s∗ )Ti xi ≥ 0
|
{z
} |{z}
i
∈Ki
∈Ki
Figure: vanished gap
and thus due to Ax = b,
cT x = (c + AT s∗ )T x − (AT s∗ )T x ≥ −(AT s∗ )T x = −bT s∗ = cT x∗ .
P
• To verify i (c + AT s∗ )T
i xi ≥ 0, we use the property of Ki :
Suppose a, b ∈ Ki . Then, aT b ≥ 0. If aT b = 0, either a = 0 or b = 0
Therefore, x∗ is optimal.
T ∗ T
∗
Corollary: (c + A s ) x = 0 and (c +
AT s∗ )Ti x∗i
∈Ki
and thus due to Ax = b,
• We could prove the corollary by substituting x = x∗ on the left part of the
former inequality, forcing −(AT s∗ )T x = cT x∗
= 0, ∀i.
Bottom line: dual vector y∗ = AT s∗ certifies the optimality of x∗ .
5 / 24
Dual certificate
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Dual certificate
2013-07-12
A related claim:
A related claim:
Theorem
If any primal feasible x∗ and dual feasible s∗ have no duality gap, then x is
primal optimal and s is dual optimal.
Reason: the primal objective value of any primal feasible x ≥ the dual
objective value of any dual feasible s. Therefore, assuming both primal and
dual feasibilities, a pair of primal/dual objectives must be optimal.
6 / 24
Theorem
If any primal feasible x∗ and dual feasible s∗ have no duality gap, then x is
primal optimal and s is dual optimal.
Dual certificate
Reason: the primal objective value of any primal feasible x ≥ the dual
objective value of any dual feasible s. Therefore, assuming both primal and
dual feasibilities, a pair of primal/dual objectives must be optimal.
Complementarity and strict complementarity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Complementarity and strict complementarity
From
X
2013-07-12
From
X
(c + AT s∗ )Ti x∗i = (c + AT s∗ )T x∗ = cT x∗ + bT s∗ = 0
i
and
(c + AT s∗ )Ti x∗i ≥ 0, ∀i.
{z
} |{z}
|
∈Ki
(c + AT s∗ )Ti x∗i = (c + AT s∗ )T x∗ = cT x∗ + bT s∗ = 0
i
∈Ki
we get
(c + AT s∗ )Ti x∗i = 0, ∀i.
Hence, at least one of (c + AT s∗ )Ti and x∗i is 0 (but they can be both zero.)
If exactly one of (c + AT s∗ )Ti and x∗i is zero (the other is nonzero), then
they are strictly complementary.
Certifying the uniqueness of x∗ requires a strictly complementary s∗ .
7 / 24
and
(c + AT s∗ )Ti x∗i ≥ 0, ∀i.
|
{z
} |{z}
∈Ki
Complementarity and strict complementarity
∈Ki
we get
(c + AT s∗ )Ti x∗i = 0, ∀i.
Hence, at least one of (c + AT s∗ )Ti and x∗i is 0 (but they can be both zero.)
If exactly one of (c + AT s∗ )Ti and x∗i is zero (the other is nonzero), then
they are strictly complementary.
Certifying the uniqueness of x∗ requires a strictly complementary s∗ .
`1 duality and dual certificate
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
`1 duality and dual certificate
Primal:
min kxk1
2013-07-12
Primal:
min kxk1
s.t. Ax = b
max b s
max bT s
`1 duality and dual certificate
s.t. kAT sk∞ ≤ 1
Given a feasible x∗ , if s∗ obeys
1. kAT s∗ k∞ ≤ 1, and
2. kx∗ k1 − bT s∗ = 0,
then y∗ = AT s∗ certifies the optimality of x∗ .
Any primal optimal x∗ must satisfy kx∗ k1 − bT s∗ = 0.
(1)
• The dual problem for basis pursuit. (a special case in the 3rd slide)
The LP formulation of minx {kxk1 : Ax = b} is:
Dual:
T
s.t. Ax = b
Dual:
T
s.t. kA sk∞ ≤ 1
min{1T x1 + 1T x2 Ax1 − Ax2 = b, x1 , x2 0}
x
Since L(x, s) = kxk1 + sT (Ax − b),
Given a feasible x∗ , if s∗ obeys
min L(x, s) ⇔ min{kxk1 + sT Ax} − sT b
x
T ∗
1. kA s k∞ ≤ 1, and
x
P
kxk1 + sT Ax = i |xi | + (AT s)i xi . It has a lower bound if and only if (AT s)i
is no more than 1 for each i.
Thus the dual function is :
2. kx∗ k1 − bT s∗ = 0,
then y∗ = AT s∗ certifies the optimality of x∗ .
d(s) = −sT b + ι{kAT sk∞ ≤1}
Any primal optimal x∗ must satisfy kx∗ k1 − bT s∗ = 0.
And the dual problem is:
max bT s
s
8 / 24
s.t. kAT sk∞ ≤ 1
(1)
`1 duality and complementarity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
`1 duality and complementarity
|a| ≤ 1 =⇒ ab ≤ |b|. If ab = |b|, then
1. |a| < 1 ⇒ b = 0
2. a = 1 ⇒ b ≥ 0
2013-07-12
3. a = −1 ⇒ b ≤ 0
|a| ≤ 1 =⇒ ab ≤ |b|. If ab = |b|, then
1. |a| < 1 ⇒ b = 0
From kAT s∗ k∞ ≤ 1, we get kx∗ k1 = bT s∗ = (AT s∗ )T x∗ ≤ kx∗ k1 and
`1 duality and complementarity
(AT s∗ )i · xi = |xi |,
1. if |(AT s∗ )i | < 1, then x∗i = 0
T ∗
2. if (A s )i = 1, then
(AT s∗ )i · xi = |xi |,
∀i.
Therefore,
1. if |(AT s∗ )i | < 1, then x∗i = 0
2. if (AT s∗ )i = 1, then x∗i ≥ 0
3. if (AT s∗ )i = −1, then x∗i ≤ 0
Strict complementarity holds if for each i, 1 − |(AT s∗ )i | or xi is zero but not
both.
9 / 24
≥0
3. if (AT s∗ )i = −1, then x∗i ≤ 0
• (AT s∗ )i · xi = |xi | implies (AT s∗ )i and xi has the same sign. Thus
|xi | + (AT s∗ )i · xi = (1 − |(AT s∗ )i |)|xi |. The definition of strictly
complementarity requires 1 − |(AT s∗ )i | or xi is zero but not both.
From kAT s∗ k∞ ≤ 1, we get kx∗ k1 = bT s∗ = (AT s∗ )T x∗ ≤ kx∗ k1 and
x∗i
Strict complementarity holds if for each i, 1 − |(AT s∗ )i | or xi is zero but not
both.
2. a = 1 ⇒ b ≥ 0
3. a = −1 ⇒ b ≤ 0
∀i.
Therefore,
Uniqueness of x∗
Suppose x∗ is a solution to the basis pursuit model.
Question: Is it the unique solution?
Define I := supp(x∗ ) = {i : x∗i 6= 0} and J = I c .
2013-07-12
Uniqueness of
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
x∗
Suppose x∗ is a solution to the basis pursuit model.
Uniqueness of x
∗
AI xI = b has a unique solution provided that AI has independent
columns, or equivalently, ker(AI ) = {0}.
• Another illustration of the problem
If s∗ is a dual certificate and k(AT s∗ )J k∞ < 1, xJ = 0 for all optimal x.
?
For i ∈ I, (AT s∗ )i = ±1 cannot determine xi = 0 for optimal x. It is
possible that (AT s∗ )i = ±1 yet xi = 0 (this is called degenerate.)
On the other hand, if AI xI = b has a unique solution, denoted by x∗I , then
since x∗J = 0 is unique, x∗ = [x∗I ; x∗J ] = [x∗I ; 0] is the unique solution to the
basis pursuit model.
AI xI = b has a unique solution provided that AI has independent
columns, or equivalently, ker(AI ) = {0}.
10 / 24
?
For i ∈ I, (AT s∗ )i = ±1 cannot determine xi = 0 for optimal x. It is
possible that (AT s∗ )i = ±1 yet xi = 0 (this is called degenerate.)
On the other hand, if AI xI = b has a unique solution, denoted by x∗I , then
since x∗J = 0 is unique, x∗ = [x∗I ; x∗J ] = [x∗I ; 0] is the unique solution to the
basis pursuit model.
Question: Is it the unique solution?
Define I := supp(x∗ ) = {i : x∗i 6= 0} and J = I c .
If s∗ is a dual certificate and k(AT s∗ )J k∞ < 1, xJ = 0 for all optimal x.
Optimality and uniqueness
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Optimality and uniqueness
Condition
For a given x̄, the index sets I = supp(x̄) and J = I c satisfy
2013-07-12
1. ker(AI ) = {0}
Condition
c
2. there exists y such that y ∈ R(AT ), yI = sign(x̄I ), and kyJ k∞ < 1.
Comments:
Optimality and uniqueness
• part 1 guarantees unique x∗I as the solution to AI xI = b
• part 2 guarantees x∗J = 0
• y ∈ R(AT ) means y = AT s for some s
• the condition involves I and sign(x̄I ), not the values of x̄I or b; but
different I and sign(x̄I ) require a different condition
• RIP guarantees the condition hold for all small I and arbitrary signs
• the condition is easy to verify
For a given x̄, the index sets I = supp(x̄) and J = I satisfy
1. ker(AI ) = {0}
• Now we care about whether the optimal solution x∗ is unique. Suppose
k(AT s∗ )J k∞ < 1, all the elements on J are forced to be zero. Thus, the
uniqueness is determined by the property of AI . If ker(AI ) = {0}, the
nonhomogeneous equation AI xI = bI has a unique solution. Thus, all the
elements on I could be uniquely determined.
2. there exists y such that y ∈ R(AT ), yI = sign(x̄I ), and kyJ k∞ < 1.
Comments:
• part 1 guarantees unique x∗I as the solution to AI xI = b
• An illustration:
• part 2 guarantees x∗J = 0
• y ∈ R(AT ) means y = AT s for some s
• the condition involves I and sign(x̄I ), not the values of x̄I or b; but
different I and sign(x̄I ) require a different condition
• RIP guarantees the condition hold for all small I and arbitrary signs
• the condition is easy to verify
11 / 24
Optimality and uniqueness
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
2013-07-12
Optimality and uniqueness
Theorem
Suppose x̄ obeys Ax̄ = b and the above Condition, then x̄ is the unique
solution to min{kxk1 : Ax = b}.
Optimality and uniqueness
In fact, the converse is also true, namely, the Condition is also necessary.
• To ensure the uniqueness, AI should be thin enough and at most a square. In
fact, for a good recovery, we usually require the columns of AI (corresponding to
the sparsity of the signal) to be fewer than (m + 1)/2which could be interpreted
from the concept of “spark”.
Theorem
Suppose x̄ obeys Ax̄ = b and the above Condition, then x̄ is the unique
solution to min{kxk1 : Ax = b}.
In fact, the converse is also true, namely, the Condition is also necessary.
12 / 24
Uniqueness of x∗
Part 1 ker(AI ) = {0} is necessary.
Lemma
2013-07-12
Uniqueness of
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
x∗
Part 1 ker(AI ) = {0} is necessary.
Lemma
If 0 6= h ∈ ker(AI ), then all xα = x∗ + α[h; 0] for small α is optimal.
Proof.
• xα is feasible since Axα = Ax∗ = b.
• We know kxα k1 ≥ kx∗ k1 , but for small α around 0, we also have
kxα k1 = kx∗I + αhk1 = (AT s∗ )TI (x∗I + αh) = kx∗ k1 + α(AT s∗ )TI h.
∗
• Hence, (AT s∗ )T
I h = 0 and thus kxα k1 = kx k1 . So, xα is also optimal.
13 / 24
If 0 6= h ∈ ker(AI ), then all xα = x∗ + α[h; 0] for small α is optimal.
Uniqueness of x
∗
Proof.
• xα is feasible since Axα = Ax∗ = b.
• We know kxα k1 ≥ kx∗ k1 , but for small α around 0, we also have
kxα k1 = kx∗I + αhk1 = (AT s∗ )TI (x∗I + αh) = kx∗ k1 + α(AT s∗ )TI h.
∗
• Hence, (AT s∗ )T
I h = 0 and thus kxα k1 = kx k1 . So, xα is also optimal.
Necessity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
2013-07-12
Necessity
Introduce
y
s.t. y ∈ R(AT ), yI = sign(x̄I ).
Introduce
min kyJ k∞
y
Necessity
s.t. y ∈ R(AT ), yI = sign(x̄I ).
We shall translate (2) and rewrite y ∈ R(AT ).
(2)
If the optimal objective value < 1, then there exists y obeying part 2, so part 2
is also necessary.
We shall translate (2) and rewrite y ∈ R(AT ).
Figure: The intersection of the hyperplane and the diamond
14 / 24
(2)
If the optimal objective value < 1, then there exists y obeying part 2, so part 2
is also necessary.
• kyJ k∞ determines the stability of the signal recovery. As shown below, the
kyJ k∞ is related with the angle θ between the hyperplane and the diamond. The
smaller kyJ k∞ is , the larger angle θ is, and a more stable recovery we may have.
Is part 2 necessary?
min kyJ k∞
Is part 2 necessary?
Necessity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Necessity
Define a = [sign(x̄I ); 0] and basis Q of Null(A).
T
2013-07-12
If a ∈ R(A ), set y = a. done.
Define a = [sign(x̄I ); 0] and basis Q of Null(A).
If a ∈ R(AT ), set y = a. done.
Otherwise, let y = a + z. Then
• y ∈ R(AT ) ⇔ QT y = 0 ⇔ QT z = −QT a
• yI = sign(x̄I ) = aI ⇔ zI = 0
• aJ = 0 ⇒ kyJ k∞ = kzJ k∞
Equivalent problem:
min kzJ k∞
z
s.t. QT z = −QT a, zI = 0.
(3)
If the optimal objective value < 1, then part 2 is necessary.
15 / 24
Otherwise, let y = a + z. Then
• y ∈ R(AT ) ⇔ QT y = 0 ⇔ QT z = −QT a
Necessity
• yI = sign(x̄I ) = aI ⇔ zI = 0
• aJ = 0 ⇒ kyJ k∞ = kzJ k∞
Equivalent problem:
min kzJ k∞
z
s.t. QT z = −QT a, zI = 0.
If the optimal objective value < 1, then part 2 is necessary.
(3)
Necessity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Necessity
2013-07-12
Theorem (LP strong duality)
Theorem (LP strong duality)
If a linear program has a finite solution, its Lagrange dual has a finite solution.
The two solutions achieve the same primal and dual optimal objective.
Problem (3) is feasible and has a finite objective value. The dual of (3) is
max (QT a)T p
p
s.t. k(Qp)J k1 ≤ 1.
If its optimal objective value < 1, then part 2 is necessary.
16 / 24
If a linear program has a finite solution, its Lagrange dual has a finite solution.
The two solutions achieve the same primal and dual optimal objective.
Necessity
Problem (3) is feasible and has a finite objective value. The dual of (3) is
max (QT a)T p
p
s.t. k(Qp)J k1 ≤ 1.
If its optimal objective value < 1, then part 2 is necessary.
Necessity
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Necessity
Lemma
If x∗ is unique, then the optimal objective of the following primal-dual
problems is strictly less than 1.
min kzJ k∞
2013-07-12
z
Lemma
If x∗ is unique, then the optimal objective of the following primal-dual
problems is strictly less than 1.
min kzJ k∞
z
p
p
Necessity
s.t. k(Qp)J k1 ≤ 1.
Uniqueness of x∗ =⇒ ∀ h ∈ ker(A) \ {0}, kx∗ k1 < kx∗ + hk1
=⇒ aTI hI < khJ k1
Therefore,
• if p∗ = 0, then kz∗J k∞ = (QT a)T p∗ = 0.
• if p∗ 6= 0, then h := Qp∗ ∈ ker(A) \ {0} obeys
kz∗J k∞ = (QT a)T p∗ = aTI hI < khJ k1 ≤ k(Qp)J k1 ≤ 1.
In both cases, the optimal objective value < 1.
s.t. QT z = −QT a, zI = 0.
max (QT a)T p
s.t. QT z = −QT a, zI = 0.
max (QT a)T p
Proof.
s.t. k(Qp)J k1 ≤ 1.
Proof.
Uniqueness of x∗ =⇒ ∀ h ∈ ker(A) \ {0}, kx∗ k1 < kx∗ + hk1
=⇒ aTI hI < khJ k1
Therefore,
• if p∗ = 0, then kz∗J k∞ = (QT a)T p∗ = 0.
• if p∗ 6= 0, then h := Qp∗ ∈ ker(A) \ {0} obeys
kz∗J k∞ = (QT a)T p∗ = aTI hI < khJ k1 ≤ k(Qp)J k1 ≤ 1.
In both cases, the optimal objective value < 1.
17 / 24
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Theorem
Suppose x̄ obeys Ax̄ = b. Then, x̄ is the unique solution to
min{kxk1 : Ax = b} if and only if the Condition holds.
2013-07-12
Comments:
Theorem
Suppose x̄ obeys Ax̄ = b. Then, x̄ is the unique solution to
min{kxk1 : Ax = b} if and only if the Condition holds.
Comments:
• the uniqueness requires strong duality result for problems involving kzJ k∞
• strong duality does not hold for all convex programs
• strong duality does hold for convex polyhedral functions f (zJ ), as well as
those with constraint qualifications (e.g., the Slater condition)
• indeed, the theorem generalizes to analysis `1 minimization: kΨT xk1
• does it generalize to
P
kxGi k2 or kXk∗ ? the key is strong duality for
k · k2 and k · k∗
• also, the theorem generalizes to the noisy `1 models (next part...)
18 / 24
• the uniqueness requires strong duality result for problems involving kzJ k∞
• strong duality does not hold for all convex programs
• strong duality does hold for convex polyhedral functions f (zJ ), as well as
those with constraint qualifications (e.g., the Slater condition)
• indeed, the theorem generalizes to analysis `1 minimization: kΨT xk1
• does it generalize to
P
kxGi k2 or kXk∗ ? the key is strong duality for
k · k2 and k · k∗
• also, the theorem generalizes to the noisy `1 models (next part...)
Noisy measurements
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Noisy measurements
2013-07-12
Suppose b is contaminated by noise: b = Ax + w
Appropriate models to recover a sparse x include
1
kAx − bk22
2
s.t. kAx − bk2 ≤ δ
min λkxk1 +
(4)
min kxk1
(5)
Theorem
Suppose x̄ is a solution to either (4) or (5). Then, x̄ is the unique solution if
and only if the Condition holds for x̄.
Key intuition: reduce (4) to (1) with a specific b. Let x̂ be any solution to (4)
and b∗ := Ax̂. All solutions to (4) are solutions to
min kxk1
Suppose b is contaminated by noise: b = Ax + w
Appropriate models to recover a sparse x include
s.t. Ax = b∗ .
The same applies to (5). Recall that the Condition does not involve b.
19 / 24
Noisy measurements
1
kAx − bk22
2
s.t. kAx − bk2 ≤ δ
min λkxk1 +
(4)
min kxk1
(5)
Theorem
Suppose x̄ is a solution to either (4) or (5). Then, x̄ is the unique solution if
and only if the Condition holds for x̄.
Key intuition: reduce (4) to (1) with a specific b. Let x̂ be any solution to (4)
and b∗ := Ax̂. All solutions to (4) are solutions to
min kxk1
s.t. Ax = b∗ .
The same applies to (5). Recall that the Condition does not involve b.
Stable recovery
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Stable recovery
2013-07-12
Assumptions:
Assumptions:
• x̄ and y satisfy the Condition. x̄ is the original signal.
• b = Ax̄ + w, where kwk2 ≤ δ
• x∗ is the solution to
min kxk1
s.t. kAx − bk2 ≤ δ.
Goal: obtain a bound kx∗ − x̄k2 ≤ Cδ.
Constant C shall be independent of δ.
20 / 24
• x̄ and y satisfy the Condition. x̄ is the original signal.
• b = Ax̄ + w, where kwk2 ≤ δ
• x∗ is the solution to
Stable recovery
min kxk1
s.t. kAx − bk2 ≤ δ.
Goal: obtain a bound kx∗ − x̄k2 ≤ Cδ.
Constant C shall be independent of δ.
Stable recovery
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Stable recovery
2013-07-12
Define I = supp(x̄) and J = I c .
Proof.
kx∗ − x̄k1 = kx∗I − x̄I k1 + kx∗J k1
p
p
kx∗I − x̄I k1 ≤ |I| · kx∗I − x̄I k2 ≤ |I| · r(I) · kAI (x∗I − x̄I )k2 , where
supp(u)=I,u6=0
Proof.
Stable recovery
kx∗ − x̄k1 = kx∗I − x̄I k1 + kx∗J k1
p
p
kx∗I − x̄I k1 ≤ |I| · kx∗I − x̄I k2 ≤ |I| · r(I) · kAI (x∗I − x̄I )k2 , where
r(I) :=
sup
supp(u)=I,u6=0
kuk
kAuk
(r(I) is related to one side of the RIP bound)
introduce x̂ = [x∗I ; 0].
kAI (x∗I − x̄I )k2 = kA(x̂ − x̄)k2 ≤ kA(x̂ − x∗ )k2 + kA(x∗ − x̄)k2
{z
}
|
≤2δ
kA(x̂ − x∗ )k2 ≤ kAkkx̂ − x∗ k2 ≤ kAkkx̂ − x∗ k1 = kAkkx∗J k1
kx∗ − x̄k1 ≤ C3 δ + C4 kx∗J k1 ,
p
p
where C3 = 2 |I| · r(I) and C4 = kAk |I| · r(I) + 1.
sup
Define I = supp(x̄) and J = I c .
kx∗ − x̄k1 ≤ C3 δ + C4 kx∗J k1 ,
p
p
where C3 = 2 |I| · r(I) and C4 = kAk |I| · r(I) + 1.
Lemma
r(I) :=
Lemma
kuk
kAuk
(r(I) is related to one side of the RIP bound)
introduce x̂ = [x∗I ; 0].
kAI (x∗I − x̄I )k2 = kA(x̂ − x̄)k2 ≤ kA(x̂ − x∗ )k2 + kA(x∗ − x̄)k2
{z
}
|
≤2δ
kA(x̂ − x∗ )k2 ≤ kAkkx̂ − x∗ k2 ≤ kAkkx̂ − x∗ k1 = kAkkx∗J k1
21 / 24
Stable recovery
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Stable recovery
kx∗J k1 ≤ (1 − kyJ k∞ )−1 (kx∗J k1 − hyJ , x∗ i)
2013-07-12
Therefore,
Recall in the Condition, yI = sign(x̄) and kyJ k∞ < 1
kx∗I k1 ≥ hyI , x∗I i
kx∗J k1
≤ (1 − kyJ k∞ )
−1
(kx∗J k1
Recall in the Condition, yI = sign(x̄) and kyJ k∞ < 1
kx∗I k1 ≥ hyI , x∗I i
∗
− hyJ , x i)
Therefore,
kx∗J k1 ≤ (1 − kyJ k∞ )−1 (kx∗ k1 − hy, x∗ i) = (1 − kyJ k∞ )−1 dy (x∗ , x̄),
where
dy (x∗ , x̄) = kx∗ k1 − kx̄k1 − hy, x∗ − x̄i
is the Bregman distance induced by k · k1 .
Recall in the Condition, y ∈ R(AT ) so y = AT β for some vector β.
dy (x∗ , x̄) ≤ 2kβk2 δ.
Lemma
Under the above assumptions,
kx∗J k1 ≤ 2(1 − kyJ k∞ )−1 kβk2 δ.
22 / 24
kx∗J k1 ≤ (1 − kyJ k∞ )−1 (kx∗ k1 − hy, x∗ i) = (1 − kyJ k∞ )−1 dy (x∗ , x̄),
where
dy (x∗ , x̄) = kx∗ k1 − kx̄k1 − hy, x∗ − x̄i
Stable recovery
is the Bregman distance induced by k · k1 .
Recall in the Condition, y ∈ R(AT ) so y = AT β for some vector β.
dy (x∗ , x̄) ≤ 2kβk2 δ.
Lemma
Under the above assumptions,
kx∗J k1 ≤ 2(1 − kyJ k∞ )−1 kβk2 δ.
Stable recovery
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Stable recovery
Assumptions:
• x̄ and y satisfy the Condition. x̄ is the original signal. y = AT β.
2013-07-12
• b = Ax̄ + w, where kwk2 ≤ δ
Theorem
Assumptions:
• x∗ is the solution to
min kxk1
Stable recovery
kx∗ − x̄k1 ≤ Cδ,
C=2
• x∗ is the solution to
s.t. kAx − bk2 ≤ δ.
Conclusion:
kx∗ − x̄k1 ≤ Cδ,
where
p
p
2kβk2 (kAk |I| · r(I) + 1)
C = 2 |I| · r(I) +
1 − kyJ k∞
Comment: a similar bound can be obtained for min λkxk1 + 21 kAx − bk22 with
a condition on λ.
23 / 24
p
p
2kβk2 (kAk |I| · r(I) + 1)
|I| · r(I) +
1 − kyJ k∞
Comment: a similar bound can be obtained for min λkxk1 + 12 kAx − bk22 with
a condition on λ.
• If kyJ k∞ approaches 1, the constant C will blow up.
p
√
• I should be corrected by |I|.
• b = Ax̄ + w, where kwk2 ≤ δ
s.t. kAx − bk2 ≤ δ.
Conclusion:
where
• x̄ and y satisfy the Condition. x̄ is the original signal. y = AT β.
min kxk1
Theorem
Generalization
Sparse Optimization Lecture: Dual Certificate in `1 Minimization
Generalization
All the previous results (exact and stable recovery) generalize to the following
models:
2013-07-12
min kΨT xk1
All the previous results (exact and stable recovery) generalize to the following
models:
min kΨT xk1
s.t. Ax = b
1
kAx − bk22
2
s.t. kAx − bk2 ≤ δ
min λkΨT xk1 +
min kΨT xk1
Assume that A and Ψ each has independent rows, the update conditions are
Condition
For a given x̄, the index sets I = supp(ΨT x̄) and J = I c satisfy
1. ker(ΨTJ ) ∩ ker(AI ) = {0}
2. there exists y such that Ψy ∈ R(AT ), yI = sign(ΨTI x̄), and kyJ k∞ < 1.
24 / 24
s.t. Ax = b
1
kAx − bk22
2
s.t. kAx − bk2 ≤ δ
min λkΨT xk1 +
min kΨT xk1
Generalization
Assume that A and Ψ each has independent rows, the update conditions are
Condition
For a given x̄, the index sets I = supp(ΨT x̄) and J = I c satisfy
1. ker(ΨTJ ) ∩ ker(AI ) = {0}
2. there exists y such that Ψy ∈ R(AT ), yI = sign(ΨTI x̄), and kyJ k∞ < 1.
© Copyright 2026 Paperzz