Complexity Analysis beyond Convex Optimization
Yinyu Ye
K. T. Li Professor of Engineering
Department of Management Science and Engineering
Stanford University
http://www.stanford.edu/˜yyye
August 1, 2013
Yinyu Ye
ICCOPT 2013
Outline
Application arisen from Non-Convex Regularization
Theory of the Lp -norm Regularization
Selected Complexity Results for Non-Convex Optimization
High-Level Complexity Analyses for Few cases
Open Questions
Yinyu Ye
ICCOPT 2013
Unconstrained L2 +Lp Minimization
Consider the problem:
Minimizex∈R n
f2p (x) := Ax − b22 + λxpp
where data A ∈ R m×n , b ∈ R m , parameter 0 ≤ p ≤ 1, and
xpp =
n
|xj |p .
j=1
Yinyu Ye
ICCOPT 2013
(1)
Unconstrained L2 +Lp Minimization
Consider the problem:
Minimizex∈R n
f2p (x) := Ax − b22 + λxpp
where data A ∈ R m×n , b ∈ R m , parameter 0 ≤ p ≤ 1, and
xpp =
n
|xj |p .
j=1
x0 := |{j : xj = 0}|
that is, the number of nonzero entries in x.
Yinyu Ye
ICCOPT 2013
(1)
Unconstrained L2 +Lp Minimization
Consider the problem:
Minimizex∈R n
f2p (x) := Ax − b22 + λxpp
where data A ∈ R m×n , b ∈ R m , parameter 0 ≤ p ≤ 1, and
xpp =
n
|xj |p .
j=1
x0 := |{j : xj = 0}|
that is, the number of nonzero entries in x.
A more general model: for q ≥ 1
Minimizex∈R n
fqp (x) := Ax − bqq + λxpp
Yinyu Ye
ICCOPT 2013
(1)
Constrained Lp Minimization
Consider another problem:
Minimize
xjp
1≤j≤n
Subject to Ax = b,
x ≥ 0,
Yinyu Ye
ICCOPT 2013
(2)
Constrained Lp Minimization
Consider another problem:
Minimize
xjp
1≤j≤n
Subject to Ax = b,
x ≥ 0,
or
Minimize
|xj |p
1≤j≤n
Subject to
Yinyu Ye
Ax = b.
ICCOPT 2013
(2)
(3)
Application and Motivation
The original goal is to minimize x0 = |{j : xj = 0}|, the size of
the support set of x, such that Ax = b, for
Sparse image reconstruction
Sparse signal recovering
Sensor network localization
which is known to be an NP-Hard problem.
Yinyu Ye
ICCOPT 2013
Approximation of x0
x1 has been used to approximate x0 , and the
regularization can be exact under certain strong conditions
(Donoho 2004, Candès and Tao 2005, etc). This
regularization model is actually a linear program.
Yinyu Ye
ICCOPT 2013
Approximation of x0
x1 has been used to approximate x0 , and the
regularization can be exact under certain strong conditions
(Donoho 2004, Candès and Tao 2005, etc). This
regularization model is actually a linear program.
Theoretical and empirical computational results indicate that
xp regularization, say p = .5, have better performances
under weaker conditions, and it is solvable equally efficiently
in practice (Chartrand 2009, Xu et al. 2009, etc).
Yinyu Ye
ICCOPT 2013
The Hardness of Lp (Ge et al. 2011, Chen et al. 2012)
Question: is L2 + Lp minimization easier than L2 + L0
minimization?
Yinyu Ye
ICCOPT 2013
The Hardness of Lp (Ge et al. 2011, Chen et al. 2012)
Question: is L2 + Lp minimization easier than L2 + L0
minimization?
Theorem
Decide the global minimum of optimization problem Lq + Lp is
strongly NP-hard for any given 0 ≤ p < 1, q ≥ 1 and λ > 0.
Yinyu Ye
ICCOPT 2013
The Hardness of Lp (Ge et al. 2011, Chen et al. 2012)
Question: is L2 + Lp minimization easier than L2 + L0
minimization?
Theorem
Decide the global minimum of optimization problem Lq + Lp is
strongly NP-hard for any given 0 ≤ p < 1, q ≥ 1 and λ > 0.
Nevertheless, practitioners solve them using non-linear solvers to
compute an KKT solution...
Yinyu Ye
ICCOPT 2013
Recover Result: L0.5-Norm vs. L1 Norm
Gaussian Random Matrices
1
L1
0.9
Lp
Frequency of Success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
Yinyu Ye
10
Sparsity
12
14
ICCOPT 2013
16
18
20
Sensor Network Localization
Given a graph G = (V , E ) and sets of non–negative weights, say
{dij : (i , j) ∈ E }, the goal is to compute a realization of G in the
Euclidean space Rd for a given low dimension d , i.e.
to place the vertexes of G in Rd such that
the Euclidean distance between a pair of adjacent vertexes
(i , j) equals to (or bounded by) the prescribed weight dij ∈ E .
Yinyu Ye
ICCOPT 2013
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
Figure: 50-node 2-D Sensor Localization
Yinyu Ye
ICCOPT 2013
0.5
Application to SNL
SNL-SDP:
2
minimize
(i ,j)∈E αij
subject to (ei − ej )(ei − ej )T • Y
Y
Yinyu Ye
= dij2 + αij , ∀ (i , j) ∈ E ,
0.
ICCOPT 2013
Application to SNL
SNL-SDP:
2
minimize
(i ,j)∈E αij
subject to (ei − ej )(ei − ej )T • Y
Y
= dij2 + αij , ∀ (i , j) ∈ E ,
0.
Regularized SNL-SDP:
2
minimize
(i ,j)∈E αij + λY p
subject to (ei − ej )(ei − ej )T • Y
Y
= dij2 + αij , ∀ (i , j) ∈ E ,
0.
Yinyu Ye
ICCOPT 2013
Schatten p-norm (Ji et al. 2013)
For any given symmetric matrix Y ∈ S n ,
⎛
⎞1/p
|λ(Y )j |p ⎠ , 0 < p ≤ 1
Y p = ⎝
j
is known as the Schatten p-quasi-norm of Y .
Yinyu Ye
ICCOPT 2013
Schatten p-norm (Ji et al. 2013)
For any given symmetric matrix Y ∈ S n ,
⎛
⎞1/p
|λ(Y )j |p ⎠ , 0 < p ≤ 1
Y p = ⎝
j
is known as the Schatten p-quasi-norm of Y .
When p = 1, it is called Nuclear norm.
Yinyu Ye
ICCOPT 2013
Schatten p-norm (Ji et al. 2013)
For any given symmetric matrix Y ∈ S n ,
⎛
⎞1/p
|λ(Y )j |p ⎠ , 0 < p ≤ 1
Y p = ⎝
j
is known as the Schatten p-quasi-norm of Y .
When p = 1, it is called Nuclear norm.
The Schatten pquasinorm has several nice analytical properties
that make it a natural candidate for a regularizer.
Yinyu Ye
ICCOPT 2013
Recover Result: Schatten 0.5-Norm vs. Nuclear Norm
recovered sensor
exact sensor
Anchor
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
−0.1
−0.1
−0.2
−0.2
−0.3
−0.3
−0.4
−0.4
−0.5
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Yinyu Ye
recovered sensor
exact sensor
Anchor
0.5
−0.5
−0.6
−0.4
ICCOPT 2013
−0.2
0
0.2
0.4
0.6
Exactly Recover Cases (50 sensors)
100
Trace(Y)
p
Trace(Y )
Recover Cases
80
60
40
20
0
0
25
50
75
100
Extra Edges Added
Yinyu Ye
125
ICCOPT 2013
150
Theory of L2 +Lp Minimization I
Theorem
(The first order bound) Let x∗ be any first-order KKT point and
let
1
1−p
λp
.
Li =
2ai f2p (x∗ )
Then
for any i ∈ N ,
xi∗ ∈ (−Li , Li )
Yinyu Ye
ICCOPT 2013
⇒
xi∗ = 0.
Theory of L2 +Lp Minimization I
Theorem
(The first order bound) Let x∗ be any first-order KKT point and
let
1
1−p
λp
.
Li =
2ai f2p (x∗ )
Then
for any i ∈ N ,
xi∗ ∈ (−Li , Li )
⇒
xi∗ = 0.
“Lower Bound Theory of Nonzero Entries in Solutions of L2-Lp
Minimization” (Chen, Xu and Y), SIAM J. Scientific Computing
32:5 (2010) 2832-2852.
Yinyu Ye
ICCOPT 2013
Theory of L2 +Lp Minimization II
Theorem
(The second order bound) Let x∗ be any second-order KKT point
1
λp(1 − p) 2−p
, i ∈ N . Then
and let Li =
2ai 2
(1)
for any i ∈ N ,
xi∗ ∈ (−Li , Li ) ⇒ xi∗ = 0.
(2) The support columns of x∗ are linearly independent.
Yinyu Ye
ICCOPT 2013
More Theoretical Developments ...
Markowitz Portfolio Model:
1 T
T
minimize
2 x Qx + c x
subject to eT x = 1, x ≥ 0
Yinyu Ye
ICCOPT 2013
More Theoretical Developments ...
Markowitz Portfolio Model:
1 T
T
minimize
2 x Qx + c x
subject to eT x = 1, x ≥ 0
Regularized Markowitz Portfolio Model:
minimize
subject to
p
1 T
T
2 x Qx + c x + λxp
eT x = 1, x ≥ 0
Yinyu Ye
ICCOPT 2013
Linearly Constrained Optimization Problem
(LCOP)
minimize
f (x)
subject to Ax = b, x ≥ 0.
Yinyu Ye
ICCOPT 2013
Linearly Constrained Optimization Problem
(LCOP)
minimize
f (x)
subject to Ax = b, x ≥ 0.
The first-order KKT conditions:
xj (∇f (x) − AT y)j = 0, ∀j
Ax
= b
T
x ≥ 0.
∇f (x) − A y ≥ 0,
Yinyu Ye
ICCOPT 2013
Linearly Constrained Optimization Problem
(LCOP)
minimize
f (x)
subject to Ax = b, x ≥ 0.
The first-order KKT conditions:
xj (∇f (x) − AT y)j = 0, ∀j
Ax
= b
T
x ≥ 0.
∇f (x) − A y ≥ 0,
First-order -KKT solution: |xj (∇f (x) − AT y)j | ≤ for all j.
Yinyu Ye
ICCOPT 2013
Linearly Constrained Optimization Problem
(LCOP)
minimize
f (x)
subject to Ax = b, x ≥ 0.
The first-order KKT conditions:
xj (∇f (x) − AT y)j = 0, ∀j
Ax
= b
T
x ≥ 0.
∇f (x) − A y ≥ 0,
First-order -KKT solution: |xj (∇f (x) − AT y)j | ≤ for all j.
Second-order -KKT solution if additionally the Hessian in the null
space of active constraints is -positive-semidefinite.
Yinyu Ye
ICCOPT 2013
Iteration Bound for an -KKT Solution
log log(−1 )
−1 log(−1 )
−3/2
−2
Smooth
[Y 1992] Ball-IQP
[Y 1998] IQP
[Nesterov et al
2006];
[Cartis et al 2011]
[Vavasis
1991],
[Nesterov 2004];
[Gratton et al
2008]
−3 log(−1 )
Lipschitz
Non-Lipschitz
[Ge et al 2011]
Constrained Lp
[Bian et al
2012]
[Vavasis 1991]
[Cartis et al
2011]
[Garmanjani et
al 2012]
[Bian
2012];
[Bian
2012]
et
al
et
al
Table: Selected worst-case complexity results for nonconvex optimization
Yinyu Ye
ICCOPT 2013
Ball or Sphere-Constrained Indefinite QP
(BQP)
minimize
subject to
Yinyu Ye
1 T
T
2 x Qx + c x
x2 = (≤)1.
ICCOPT 2013
Ball or Sphere-Constrained Indefinite QP
(BQP)
minimize
subject to
1 T
T
2 x Qx + c x
x2 = (≤)1.
The solution x of problem (BQP) satisfies the following necessary
and sufficient conditions (S-Lemma):
(Q + μI )x = −c,
(Q + μI ) 0,
x = 1.
and
Yinyu Ye
ICCOPT 2013
Ball or Sphere-Constrained Indefinite QP
(BQP)
minimize
subject to
1 T
T
2 x Qx + c x
x2 = (≤)1.
The solution x of problem (BQP) satisfies the following necessary
and sufficient conditions (S-Lemma):
(Q + μI )x = −c,
(Q + μI ) 0,
x = 1.
and
This is an SDP problem, and the simplest trust-region sub-problem
(Moré, Sorensen, Dennis and Schnabel, etc. 1980).
Yinyu Ye
ICCOPT 2013
The Bisection Method
For any μ > −λ(Q), where λ(Q) is the minimal eigenvalue of Q,
denote by x(μ) the solution to
(Q + μI )x = −c.
Yinyu Ye
ICCOPT 2013
The Bisection Method
For any μ > −λ(Q), where λ(Q) is the minimal eigenvalue of Q,
denote by x(μ) the solution to
(Q + μI )x = −c.
Assume x(−λ(Q)) > 1 (the othe case can be handled easily)
and note
μ ≤ −λ(Q) + c.
Yinyu Ye
ICCOPT 2013
The Bisection Method
For any μ > −λ(Q), where λ(Q) is the minimal eigenvalue of Q,
denote by x(μ) the solution to
(Q + μI )x = −c.
Assume x(−λ(Q)) > 1 (the othe case can be handled easily)
and note
μ ≤ −λ(Q) + c.
Thus, one can apply the bisection method to find the right μ and
solve the problem in polynomial-time log(−1 ) steps.
Yinyu Ye
ICCOPT 2013
The Bisection Method
For any μ > −λ(Q), where λ(Q) is the minimal eigenvalue of Q,
denote by x(μ) the solution to
(Q + μI )x = −c.
Assume x(−λ(Q)) > 1 (the othe case can be handled easily)
and note
μ ≤ −λ(Q) + c.
Thus, one can apply the bisection method to find the right μ and
solve the problem in polynomial-time log(−1 ) steps.
We can do it in log-polynomial time using the Steve Smale 1986
work on Newton’s method ...
Yinyu Ye
ICCOPT 2013
Combined Bisection and Newton’s Method
Yinyu Ye
ICCOPT 2013
Potential Reduction Algorithm for LCOP
Consider the (concave+convex) Karmarkar potential function
φ(x) = ρ log(f (x)) − nj=1 log xj ,
where we assume that f (x) is nonnegative in the feasible region.
Yinyu Ye
ICCOPT 2013
Potential Reduction Algorithm for LCOP
Consider the (concave+convex) Karmarkar potential function
φ(x) = ρ log(f (x)) − nj=1 log xj ,
where we assume that f (x) is nonnegative in the feasible region.
We start from the analytic center x0 of the feasible region–Fp , so
that if
(4)
φ(xk ) − φ(x0 ) ≤ ρ log ,
f (xk )
≤ ;
f (x0 )
which implies that xk is an -global minimizer.
Yinyu Ye
ICCOPT 2013
Quadratic Over-Estimate of Potential Function I
Consider
1
f (x) = q(x) = xT Qx + cT x.
2
Yinyu Ye
ICCOPT 2013
Quadratic Over-Estimate of Potential Function I
Consider
1
f (x) = q(x) = xT Qx + cT x.
2
Given 0 < x ∈ Fp , let Δ = q(x) and let dx , Adx = 0, be a vector
such that x+ := x + dx > 0. Then the non-convex part
ρ log(q(x+ )) − ρ log(q(x))
1
Qdx + (Qx + c)T dx ) − ρ log Δ
= ρ log(Δ + dT
2 x
1
Qdx + (Qx + c)T dx )/Δ)
= ρ log(1 + ( dT
2 x
ρ 1 T
( d Qdx + (Qx + c)T dx ).
≤
Δ 2 x
Yinyu Ye
ICCOPT 2013
Quadratic Over-Estimate of Potential Function II
On the other hand, if X −1 dx ≤ β < 1, the convex part
−
n
j=1
log(xj+ )
+
n
log(xj ) ≤ −eT X −1 dx +
j=1
Yinyu Ye
ICCOPT 2013
β2
.
2(1 − β)
Quadratic Over-Estimate of Potential Function II
On the other hand, if X −1 dx ≤ β < 1, the convex part
−
n
log(xj+ )
j=1
+
n
log(xj ) ≤ −eT X −1 dx +
j=1
β2
.
2(1 − β)
Thus, if X −1 dx ≤ β < 1, x+ = x + dx > 0 and
φ(x+ )− φ(x) ≤
ρ 1 T
Δ
β2
( dx Qdx + (Qx+ c− X −1 e)T dx )+
.
Δ 2
ρ
2(1 − β)
(5)
Yinyu Ye
ICCOPT 2013
A Ball-Constrained Quadratic Subproblem I
We solve the following problem at the kth iteration:
minimize
1 T
2 dx Qdx
+ (Qxk + c −
subject to Adx = 0,
(X k )−1 dx 2 ≤ β 2 .
Yinyu Ye
ICCOPT 2013
Δk
k −1 T
ρ (X ) e) dx
A Ball-Constrained Quadratic Subproblem I
We solve the following problem at the kth iteration:
minimize
1 T
2 dx Qdx
+ (Qxk + c −
Δk
k −1 T
ρ (X ) e) dx
subject to Adx = 0,
(X k )−1 dx 2 ≤ β 2 .
Using the affine-scaling, this problem can be reduced to the
ball-constrained quadratic program, where the radius of the ball is
β.
Yinyu Ye
ICCOPT 2013
Complexity Analysis
Each iteration can either make a constant reduction of the
potential, or not.
In the latter case, the new iterate x+ becomes a second-order
-KKT solution with a suitable choice of ρ.
Yinyu Ye
ICCOPT 2013
Complexity Analysis
Each iteration can either make a constant reduction of the
potential, or not.
In the latter case, the new iterate x+ becomes a second-order
-KKT solution with a suitable choice of ρ.
Theorem
0
)
Let β = 13 and ρ = 3q(x
. Then the potential reduction
algorithm returns a second-order -KKT solution or global
0
0
minimizer in no more than O( q(x ) log q(x ) ) iterations.
Yinyu Ye
ICCOPT 2013
Complexity Analysis
Each iteration can either make a constant reduction of the
potential, or not.
In the latter case, the new iterate x+ becomes a second-order
-KKT solution with a suitable choice of ρ.
Theorem
0
)
Let β = 13 and ρ = 3q(x
. Then the potential reduction
algorithm returns a second-order -KKT solution or global
0
0
minimizer in no more than O( q(x ) log q(x ) ) iterations.
This type of algorithm is called fully polynomial time
approximation scheme.
Yinyu Ye
ICCOPT 2013
The PRA for Concave Minimization
The case when f (x) being a concave function is even easier.
Yinyu Ye
ICCOPT 2013
The PRA for Concave Minimization
The case when f (x) being a concave function is even easier.
Let x+ = x + dx . Then, we have a linear over-estimate of the
potential function:
ρ
β2
+
T
T −1
∇f (x) − e X
,
dx +
φ(x ) − φ(x) ≤
f (x)
2(1 − β)
as long as X −1 dx ≤ β.
Yinyu Ye
ICCOPT 2013
The PRA for Concave Minimization
The case when f (x) being a concave function is even easier.
Let x+ = x + dx . Then, we have a linear over-estimate of the
potential function:
ρ
β2
+
T
T −1
∇f (x) − e X
,
dx +
φ(x ) − φ(x) ≤
f (x)
2(1 − β)
as long as X −1 dx ≤ β.
Let affine-scaling d = X −1 dx . Then, one can solve:
ρ
T X − eT d
∇f
(x)
z(d ) := Minimize
f (x)
Subject to
Yinyu Ye
AX d = 0
d 2 ≤ β 2 .
ICCOPT 2013
Affine Scaling Direction
The optimal direction of the affine scaling sub-problem is given by
d =
β
· p(x),
p(x)
where
ρ
X ∇f (x) − e
p(x) = − I − XAT (AX 2 AT )−1 AX f (x)
ρ
X ∇f (x) − AT y .
= e − f (x)
Yinyu Ye
ICCOPT 2013
Affine Scaling Direction
The optimal direction of the affine scaling sub-problem is given by
d =
β
· p(x),
p(x)
where
ρ
X ∇f (x) − e
p(x) = − I − XAT (AX 2 AT )−1 AX f (x)
ρ
X ∇f (x) − AT y .
= e − f (x)
And the minimal value of the sub-problem
z(d ) = −β · p(x).
Yinyu Ye
ICCOPT 2013
Complexity Analysis I
If p(x) ≥ 1, then the minimal objective value of the affine
scaling sub-problem is less than β so that
φ(x+ ) − φ(x) < −β +
β2
2(1 − β).
Thus, the potential value is reduced by a constant for choosing a
suitable β.
Yinyu Ye
ICCOPT 2013
Complexity Analysis I
If p(x) ≥ 1, then the minimal objective value of the affine
scaling sub-problem is less than β so that
φ(x+ ) − φ(x) < −β +
β2
2(1 − β).
Thus, the potential value is reduced by a constant for choosing a
suitable β.
0
If this case would hold for O(ρ log f (x ) ) iterations, we would have
produced an -global minimizer of LCOP.
Yinyu Ye
ICCOPT 2013
Complexity Analysis II
On the other hand, if p(x) < 1, from
ρ
X ∇f (x) − AT y ,
p(x) = e −
f (x)
we must have
and
ρ
X ∇f (x) − AT y ≥ 0
f (x)
ρ
X ∇f (x) − AT y ≤ 2e.
f (x)
Yinyu Ye
ICCOPT 2013
Complexity Analysis III
In other words
and
∇f (x) − AT y ≥ 0
2f (x)
, ∀j.
xj ∇f (x) − AT y <
ρ
j
Yinyu Ye
ICCOPT 2013
Complexity Analysis III
In other words
and
∇f (x) − AT y ≥ 0
2f (x)
, ∀j.
xj ∇f (x) − AT y <
ρ
j
The first condition indicate that the Lagrange multiplier y is valid,
and the second inequality implies that the complementarity
condition is approximately satisfied when ρ chosen sufficiently
large.
Yinyu Ye
ICCOPT 2013
Complexity Analyses IV
In particular, if we choose ρ ≥
2f (x0 )
,
then
X (∇f (x) − AT y)∞ ≤ ,
which implies that x is a first-order -KKT solution.
Yinyu Ye
ICCOPT 2013
Complexity Analyses IV
In particular, if we choose ρ ≥
2f (x0 )
,
then
X (∇f (x) − AT y)∞ ≤ ,
which implies that x is a first-order -KKT solution.
Theorem
The algorithm then will provably return a first-order -KKT
0
0
solution of LCOP in no more than O( f (x ) log f (x ) ) iterations for
any given < 1, if the objective function is concave.
Yinyu Ye
ICCOPT 2013
More Applications and Questions
Could the time bound be further improved for QP or concave
minimization?
Yinyu Ye
ICCOPT 2013
More Applications and Questions
Could the time bound be further improved for QP or concave
minimization?
Would the O(−1 log −1 ) bound applicable to more general
non-convex optimization problems?
Yinyu Ye
ICCOPT 2013
More Applications and Questions
Could the time bound be further improved for QP or concave
minimization?
Would the O(−1 log −1 ) bound applicable to more general
non-convex optimization problems?
We are not even able to prove the O(−1 log −1 ) bound when
f (x) = q(x) + xpp
.
Yinyu Ye
ICCOPT 2013
More Applications and Questions
Could the time bound be further improved for QP or concave
minimization?
Would the O(−1 log −1 ) bound applicable to more general
non-convex optimization problems?
We are not even able to prove the O(−1 log −1 ) bound when
f (x) = q(x) + xpp
.
More structural properties on the final KKT solution.
Yinyu Ye
ICCOPT 2013
More Applications and Questions
Could the time bound be further improved for QP or concave
minimization?
Would the O(−1 log −1 ) bound applicable to more general
non-convex optimization problems?
We are not even able to prove the O(−1 log −1 ) bound when
f (x) = q(x) + xpp
.
More structural properties on the final KKT solution.
Applications to general sparse solution optimization, such as
the cardinality constrained portfolio selection, sparse pricing
reduction for revenue management, etc..
Yinyu Ye
ICCOPT 2013
© Copyright 2026 Paperzz