Sparse Approximation via Penalty
Decomposition Methods
Zhaosong Lu
Simon Fraser University
SIAM Conference on Imaging Science / May 20, 2012
Joint work with Yong Zhang (SFU)
Outline
Some sparse decision problems
Penalty decomposition methods for l0 minimization
Computational results
Sparse least squares regression
Consider a regression model:
y = f (ξ, x) + η.
Given a data set consisting of n points (ξ i , yi ), i = 1, . . . , n, the
least squares model is:
min
x
n
X
(yi − f (ξ i , x))2 .
i=1
The sparse least squares models are:
P
min ni=1 (yi − f (ξ i , x))2 + νkxk0 ,
x
P
min{ ni=1 (yi − f (ξ i , x))2 : kxk0 ≤ r }.
x
Sparse least squares regression
Consider a regression model:
y = f (ξ, x) + η.
Given a data set consisting of n points (ξ i , yi ), i = 1, . . . , n, the
least squares model is:
min
x
n
X
(yi − f (ξ i , x))2 .
i=1
The sparse least squares models are:
P
min ni=1 (yi − f (ξ i , x))2 + νkxk0 ,
x
P
min{ ni=1 (yi − f (ξ i , x))2 : kxk0 ≤ r }.
x
Sparse least squares regression
Consider a regression model:
y = f (ξ, x) + η.
Given a data set consisting of n points (ξ i , yi ), i = 1, . . . , n, the
least squares model is:
min
x
n
X
(yi − f (ξ i , x))2 .
i=1
The sparse least squares models are:
P
min ni=1 (yi − f (ξ i , x))2 + νkxk0 ,
x
P
min{ ni=1 (yi − f (ξ i , x))2 : kxk0 ≤ r }.
x
Sparse logistic regression
Given scaled samples {a1 , . . . , an } and binary outcomes
b1 , . . . , bn , the average logistic loss function is defined as
lavg (v , w) :=
n
X
θ(w T ai + vbi )/n,
i=1
where θ is the logistic loss function
θ(t) := log(1 + exp(−t)).
The sparse logistic regression models are:
min lavg (v , w) + νkwk0 ,
v ,w
min{lavg (v , w) : kwk0 ≤ r }.
v ,w
Sparse logistic regression
Given scaled samples {a1 , . . . , an } and binary outcomes
b1 , . . . , bn , the average logistic loss function is defined as
lavg (v , w) :=
n
X
θ(w T ai + vbi )/n,
i=1
where θ is the logistic loss function
θ(t) := log(1 + exp(−t)).
The sparse logistic regression models are:
min lavg (v , w) + νkwk0 ,
v ,w
min{lavg (v , w) : kwk0 ≤ r }.
v ,w
Sparse inverse covariance selection
Given a sample covariance matrix Σ̂ and a set Ω consisting of
pairs of known conditionally independent nodes, the sparse
inverse covariance selection models are:
)
(
E
D
P
kXij k0 : Xij = 0 ∀(i, j) ∈ Ω ,
max log det X − Σ̂, X − ν
X 0
(i,j)∈Ω̄
(
D
max log det X − Σ̂, X
X 0
E
: Xij = 0 ∀(i, j) ∈ Ω,
where Ω̄ = {(i, j) : (i, j) ∈
/ Ω, i 6= j}.
P
(i,j)∈Ω̄
kXij k0 ≤ r
)
,
l0 minimization
Consider
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r },
(1)
min{f (x) + νkxJ k0 : g(x) ≤ 0, h(x) = 0}.
(2)
x∈X
x∈X
Assumption:
◮
◮
◮
X is a closed convex set.
f : ℜn → ℜ, g : ℜn → ℜm , h : ℜn → ℜp are continuously
differentiable.
A feasible point x feas is known.
l0 minimization
Consider
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r },
(1)
min{f (x) + νkxJ k0 : g(x) ≤ 0, h(x) = 0}.
(2)
x∈X
x∈X
Assumption:
◮
◮
◮
X is a closed convex set.
f : ℜn → ℜ, g : ℜn → ℜm , h : ℜn → ℜp are continuously
differentiable.
A feasible point x feas is known.
l0 minimization
Consider
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r },
(1)
min{f (x) + νkxJ k0 : g(x) ≤ 0, h(x) = 0}.
(2)
x∈X
x∈X
Assumption:
◮
◮
◮
X is a closed convex set.
f : ℜn → ℜ, g : ℜn → ℜm , h : ℜn → ℜp are continuously
differentiable.
A feasible point x feas is known.
l0 minimization
Consider
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r },
(1)
min{f (x) + νkxJ k0 : g(x) ≤ 0, h(x) = 0}.
(2)
x∈X
x∈X
Assumption:
◮
◮
◮
X is a closed convex set.
f : ℜn → ℜ, g : ℜn → ℜm , h : ℜn → ℜp are continuously
differentiable.
A feasible point x feas is known.
Special l0 minimization
Proposition
Consider
(
min φ(x) =
n
X
φi (xi ) : kxk0 ≤ r , x ∈ X1 × · · · × Xn
i=1
)
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
I ∗ is the index set corresp. to the r largest values of
{φi (0) − φi (x̃i∗ )}ni=1 .
Then, x ∗ is an optimal solution of (3), where
∗
x̃i if i ∈ I ∗ ;
∗
i = 1, . . . , n.
xi =
0 o.w.
.
(3)
Special l0 minimization
Proposition
Consider
(
min φ(x) =
n
X
φi (xi ) : kxk0 ≤ r , x ∈ X1 × · · · × Xn
i=1
)
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
I ∗ is the index set corresp. to the r largest values of
{φi (0) − φi (x̃i∗ )}ni=1 .
Then, x ∗ is an optimal solution of (3), where
∗
x̃i if i ∈ I ∗ ;
∗
i = 1, . . . , n.
xi =
0 o.w.
.
(3)
Special l0 minimization
Proposition
Consider
(
min φ(x) =
n
X
φi (xi ) : kxk0 ≤ r , x ∈ X1 × · · · × Xn
i=1
)
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
I ∗ is the index set corresp. to the r largest values of
{φi (0) − φi (x̃i∗ )}ni=1 .
Then, x ∗ is an optimal solution of (3), where
∗
x̃i if i ∈ I ∗ ;
∗
i = 1, . . . , n.
xi =
0 o.w.
.
(3)
Special l0 minimization
Proposition
Consider
(
min φ(x) =
n
X
φi (xi ) : kxk0 ≤ r , x ∈ X1 × · · · × Xn
i=1
)
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
I ∗ is the index set corresp. to the r largest values of
{φi (0) − φi (x̃i∗ )}ni=1 .
Then, x ∗ is an optimal solution of (3), where
∗
x̃i if i ∈ I ∗ ;
∗
i = 1, . . . , n.
xi =
0 o.w.
.
(3)
Special l0 minimization
Proposition
Consider
(
min φ(x) =
n
X
φi (xi ) : kxk0 ≤ r , x ∈ X1 × · · · × Xn
i=1
)
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
I ∗ is the index set corresp. to the r largest values of
{φi (0) − φi (x̃i∗ )}ni=1 .
Then, x ∗ is an optimal solution of (3), where
∗
x̃i if i ∈ I ∗ ;
∗
i = 1, . . . , n.
xi =
0 o.w.
.
(3)
Special l0 minimization
Proposition
Consider
(
min νkxk0 +
n
X
φi (xi ) : x ∈ X1 × · · · × Xn
i=1
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
vi∗ = φi (0) − ν − φi (x̃i∗ ) for i = 1, . . . , n.
Then, x ∗ is an optimal solution of (4), where
∗
x̃i if vi∗ ≥ 0;
∗
xi =
i = 1, . . . , n.
0 o.w.
)
.
(4)
Special l0 minimization
Proposition
Consider
(
min νkxk0 +
n
X
φi (xi ) : x ∈ X1 × · · · × Xn
i=1
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
vi∗ = φi (0) − ν − φi (x̃i∗ ) for i = 1, . . . , n.
Then, x ∗ is an optimal solution of (4), where
∗
x̃i if vi∗ ≥ 0;
∗
xi =
i = 1, . . . , n.
0 o.w.
)
.
(4)
Special l0 minimization
Proposition
Consider
(
min νkxk0 +
n
X
φi (xi ) : x ∈ X1 × · · · × Xn
i=1
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
vi∗ = φi (0) − ν − φi (x̃i∗ ) for i = 1, . . . , n.
Then, x ∗ is an optimal solution of (4), where
∗
x̃i if vi∗ ≥ 0;
∗
xi =
i = 1, . . . , n.
0 o.w.
)
.
(4)
Special l0 minimization
Proposition
Consider
(
min νkxk0 +
n
X
φi (xi ) : x ∈ X1 × · · · × Xn
i=1
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
vi∗ = φi (0) − ν − φi (x̃i∗ ) for i = 1, . . . , n.
Then, x ∗ is an optimal solution of (4), where
∗
x̃i if vi∗ ≥ 0;
∗
xi =
i = 1, . . . , n.
0 o.w.
)
.
(4)
Special l0 minimization
Proposition
Consider
(
min νkxk0 +
n
X
φi (xi ) : x ∈ X1 × · · · × Xn
i=1
Suppose that
◮
0 ∈ Xi for all i;
◮
x̃i∗ ∈ Arg min{φi (xi ) : xi ∈ Xi };
◮
vi∗ = φi (0) − ν − φi (x̃i∗ ) for i = 1, . . . , n.
Then, x ∗ is an optimal solution of (4), where
∗
x̃i if vi∗ ≥ 0;
∗
xi =
i = 1, . . . , n.
0 o.w.
)
.
(4)
Necessary conditions for general l0 min (1)
Theorem
Assume that x ∗ is a local minimizer of (1). Let J ∗ ⊆ J be an
index set with |J ∗ | = r such that xj∗ = 0 for all j ∈ J̄ ∗ := J \ J ∗ .
Suppose that the Robinson condition
′ ∗
g (x )d − v
∗
m
ℜ ,
h′ (x ∗ )d : d ∈ TX (x ), v ∈
= ℜm ×ℜp ×ℜ|J|−r
∗)
v
≤
0,
i
∈
A(x
i
(IJ̄ ∗ )T d
(5)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions
−∇f (x ∗ ) − ∇g(x ∗ )λ∗ − ∇h(x ∗ )µ∗ − z ∗ ∈ NX (x ∗ ),
λ∗i ≥ 0, λ∗i gi (x ∗ ) = 0, i = 1, . . . , m;
where J̄ := {1, . . . , n} \ J.
zj∗ = 0, j ∈ J̄ ∪ J ∗ .
(6)
Necessary conditions for general l0 min (1)
Theorem
Assume that x ∗ is a local minimizer of (1). Let J ∗ ⊆ J be an
index set with |J ∗ | = r such that xj∗ = 0 for all j ∈ J̄ ∗ := J \ J ∗ .
Suppose that the Robinson condition
′ ∗
g (x )d − v
∗
m
ℜ ,
h′ (x ∗ )d : d ∈ TX (x ), v ∈
= ℜm ×ℜp ×ℜ|J|−r
∗)
v
≤
0,
i
∈
A(x
i
(IJ̄ ∗ )T d
(5)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions
−∇f (x ∗ ) − ∇g(x ∗ )λ∗ − ∇h(x ∗ )µ∗ − z ∗ ∈ NX (x ∗ ),
λ∗i ≥ 0, λ∗i gi (x ∗ ) = 0, i = 1, . . . , m;
where J̄ := {1, . . . , n} \ J.
zj∗ = 0, j ∈ J̄ ∪ J ∗ .
(6)
Necessary conditions for general l0 min (1)
Theorem
Assume that x ∗ is a local minimizer of (1). Let J ∗ ⊆ J be an
index set with |J ∗ | = r such that xj∗ = 0 for all j ∈ J̄ ∗ := J \ J ∗ .
Suppose that the Robinson condition
′ ∗
g (x )d − v
∗
m
ℜ ,
h′ (x ∗ )d : d ∈ TX (x ), v ∈
= ℜm ×ℜp ×ℜ|J|−r
∗)
v
≤
0,
i
∈
A(x
i
(IJ̄ ∗ )T d
(5)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions
−∇f (x ∗ ) − ∇g(x ∗ )λ∗ − ∇h(x ∗ )µ∗ − z ∗ ∈ NX (x ∗ ),
λ∗i ≥ 0, λ∗i gi (x ∗ ) = 0, i = 1, . . . , m;
where J̄ := {1, . . . , n} \ J.
zj∗ = 0, j ∈ J̄ ∪ J ∗ .
(6)
Necessary conditions for general l0 min (2)
Theorem
Assume that x ∗ is a local minimizer of (2). Let
J ∗ = {j ∈ J : xj∗ 6= 0} and J̄ ∗ = J \ J ∗ . suppose that the
following Robinson condition
′ ∗
g (x )d − v
∗ ), v ∈ ℜm ,
∗
d
∈
T
(x
X
h′ (x ∗ )d :
= ℜm × ℜp × ℜ|J̄ |
∗
vi ≤ 0, i ∈ A(x )
(IJ̄ ∗ )T d
(7)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions (6).
Necessary conditions for general l0 min (2)
Theorem
Assume that x ∗ is a local minimizer of (2). Let
J ∗ = {j ∈ J : xj∗ 6= 0} and J̄ ∗ = J \ J ∗ . suppose that the
following Robinson condition
′ ∗
g (x )d − v
∗ ), v ∈ ℜm ,
∗
d
∈
T
(x
X
h′ (x ∗ )d :
= ℜm × ℜp × ℜ|J̄ |
∗
vi ≤ 0, i ∈ A(x )
(IJ̄ ∗ )T d
(7)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions (6).
Necessary conditions for general l0 min (2)
Theorem
Assume that x ∗ is a local minimizer of (2). Let
J ∗ = {j ∈ J : xj∗ 6= 0} and J̄ ∗ = J \ J ∗ . suppose that the
following Robinson condition
′ ∗
g (x )d − v
∗ ), v ∈ ℜm ,
∗
d
∈
T
(x
X
h′ (x ∗ )d :
= ℜm × ℜp × ℜ|J̄ |
∗
vi ≤ 0, i ∈ A(x )
(IJ̄ ∗ )T d
(7)
holds. Then, there exists (λ∗ , µ∗ , z ∗ ) together with x ∗ satisfying
KKT conditions (6).
Sufficient conditions for general l0 min
Theorem
Assume that h’s are affine functions, and f and g’s are convex
functions. Let x ∗ be a feasible point of (1), and let
J ∗ = {J ∗ ⊆ J : |J ∗ | = r , xj∗ = 0, ∀j ∈ J \ J ∗ }. Suppose that for
any J ∗ ∈ J ∗ , there exists some (λ∗ , µ∗ , z ∗ ) such that KKT
conditions (6) hold. Then, x ∗ is a local minimizer of (1).
Theorem
Assume that h’s are affine functions, and f and g’s are convex
functions. Let x ∗ be a feasible point of (2), and let
J ∗ = {j ∈ J : xj∗ 6= 0}. Suppose that for such J ∗ , there exists
some (λ∗ , µ∗ , z ∗ ) such that KKT conditions (6) hold. Then, x ∗ is
a local minimizer of (2).
Sufficient conditions for general l0 min
Theorem
Assume that h’s are affine functions, and f and g’s are convex
functions. Let x ∗ be a feasible point of (1), and let
J ∗ = {J ∗ ⊆ J : |J ∗ | = r , xj∗ = 0, ∀j ∈ J \ J ∗ }. Suppose that for
any J ∗ ∈ J ∗ , there exists some (λ∗ , µ∗ , z ∗ ) such that KKT
conditions (6) hold. Then, x ∗ is a local minimizer of (1).
Theorem
Assume that h’s are affine functions, and f and g’s are convex
functions. Let x ∗ be a feasible point of (2), and let
J ∗ = {j ∈ J : xj∗ 6= 0}. Suppose that for such J ∗ , there exists
some (λ∗ , µ∗ , z ∗ ) such that KKT conditions (6) hold. Then, x ∗ is
a local minimizer of (2).
PD method for general l0 min (1)
Observe
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r }
x∈X
m
min {f (x) : g(x) ≤ 0, h(x) = 0, xJ − y = 0},
x∈X ,y ∈Y
(8)
where
Y = {y ∈ ℜ|J| : ky k0 ≤ r }.
Define
̺
q̺ (x, y ) = f (x) + (k[g(x)]+ k2 + kh(x)k2 + kxJ − y k2 )
2
∀x, y .
PD method for general l0 min (1)
Observe
min{f (x) : g(x) ≤ 0, h(x) = 0, kxJ k0 ≤ r }
x∈X
m
min {f (x) : g(x) ≤ 0, h(x) = 0, xJ − y = 0},
x∈X ,y ∈Y
(8)
where
Y = {y ∈ ℜ|J| : ky k0 ≤ r }.
Define
̺
q̺ (x, y ) = f (x) + (k[g(x)]+ k2 + kh(x)k2 + kxJ − y k2 )
2
∀x, y .
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
Choose ̺0 > 0, σ > 1, y00 ∈ Y,
Υ ≥ max{f (x feas ), minx∈X q̺0 (x, y00 )}. Set k = 0.
1) Set l = 0. Find an approx. solution (x k , y k ) to
min{q̺k (x, y ) : x ∈ X , y ∈ Y}
by performing:
k
1a) Solve xl+1
∈ Arg min q̺k (x, ylk ).
x∈X
k
k
1b) Solve yl+1
∈ Arg min q̺k (xl+1
, y).
y ∈Y
k
k
1c) Set (x k , y k ) := (xl+1
, yl+1
). If (x k , y k ) satisfies
kPX (x k − ∇x q̺k (x k , y k )) − x k k ≤ ǫk ,
then go to step 2).
1d) Set l ← l + 1 and go to step 1a).
PD method for (1)
2) Set ̺k +1 := σ̺k .
3) If min q̺k +1 (x, y k ) > Υ, set y0k +1 := x feas . Otherwise, set
x∈X
y0k +1 := y k .
4) Set k ← k + 1 and go to step 1).
end
PD method for (1)
2) Set ̺k +1 := σ̺k .
3) If min q̺k +1 (x, y k ) > Υ, set y0k +1 := x feas . Otherwise, set
x∈X
y0k +1 := y k .
4) Set k ← k + 1 and go to step 1).
end
PD method for (1)
2) Set ̺k +1 := σ̺k .
3) If min q̺k +1 (x, y k ) > Υ, set y0k +1 := x feas . Otherwise, set
x∈X
y0k +1 := y k .
4) Set k ← k + 1 and go to step 1).
end
Convergence theorem for (1)
Assume that ǫk → 0. Let Ik = {i1k , . . . , irk } be a set of r distinct
indices in {1, . . . , |J|} such that (y k )i = 0 for any i ∈
/ Ik , and let
Jk = {J(i) : i ∈ Ik }. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded.
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of (1). Moreover, there exists a subsequence
K such that {(x k , y k )}k ∈K → (x ∗ , y ∗ ), Ik = I ∗ and
Jk = J ∗ := {J(i) : i ∈ I ∗ } for some index set
I ∗ ⊆ {1, . . . , |J|} when k ∈ K is sufficiently large.
(c) Let x ∗ , K and J ∗ be defined above, and let J̄ ∗ = J \ J ∗ .
Suppose that the Robinson condition (5) holds at x ∗ for
such J̄ ∗ . Then, {(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Convergence theorem for (1)
Assume that ǫk → 0. Let Ik = {i1k , . . . , irk } be a set of r distinct
indices in {1, . . . , |J|} such that (y k )i = 0 for any i ∈
/ Ik , and let
Jk = {J(i) : i ∈ Ik }. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded.
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of (1). Moreover, there exists a subsequence
K such that {(x k , y k )}k ∈K → (x ∗ , y ∗ ), Ik = I ∗ and
Jk = J ∗ := {J(i) : i ∈ I ∗ } for some index set
I ∗ ⊆ {1, . . . , |J|} when k ∈ K is sufficiently large.
(c) Let x ∗ , K and J ∗ be defined above, and let J̄ ∗ = J \ J ∗ .
Suppose that the Robinson condition (5) holds at x ∗ for
such J̄ ∗ . Then, {(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Convergence theorem for (1)
Assume that ǫk → 0. Let Ik = {i1k , . . . , irk } be a set of r distinct
indices in {1, . . . , |J|} such that (y k )i = 0 for any i ∈
/ Ik , and let
Jk = {J(i) : i ∈ Ik }. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded.
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of (1). Moreover, there exists a subsequence
K such that {(x k , y k )}k ∈K → (x ∗ , y ∗ ), Ik = I ∗ and
Jk = J ∗ := {J(i) : i ∈ I ∗ } for some index set
I ∗ ⊆ {1, . . . , |J|} when k ∈ K is sufficiently large.
(c) Let x ∗ , K and J ∗ be defined above, and let J̄ ∗ = J \ J ∗ .
Suppose that the Robinson condition (5) holds at x ∗ for
such J̄ ∗ . Then, {(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Convergence theorem for (1)
Assume that ǫk → 0. Let Ik = {i1k , . . . , irk } be a set of r distinct
indices in {1, . . . , |J|} such that (y k )i = 0 for any i ∈
/ Ik , and let
Jk = {J(i) : i ∈ Ik }. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded.
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of (1). Moreover, there exists a subsequence
K such that {(x k , y k )}k ∈K → (x ∗ , y ∗ ), Ik = I ∗ and
Jk = J ∗ := {J(i) : i ∈ I ∗ } for some index set
I ∗ ⊆ {1, . . . , |J|} when k ∈ K is sufficiently large.
(c) Let x ∗ , K and J ∗ be defined above, and let J̄ ∗ = J \ J ∗ .
Suppose that the Robinson condition (5) holds at x ∗ for
such J̄ ∗ . Then, {(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Convergence theorem for (1) (cont’d)
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if kxJ∗ k0 = r , h’s are affine functions, and f and g’s
are convex functions, then x ∗ is a local minimizer of (1).
Convergence theorem for (1) (cont’d)
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if kxJ∗ k0 = r , h’s are affine functions, and f and g’s
are convex functions, then x ∗ is a local minimizer of (1).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Convergence theorem for (2)
Assume that ǫk → 0. Suppose that XΥ := {x ∈ X : f (x) ≤ Υ} is
compact. Then, the following statements hold:
(a) {(x k , y k )} is bounded;
(b) Suppose (x ∗ , y ∗ ) is a limit point of {(x k , y k )}. Then, x ∗ is a
feasible point of problem (2).
(c) Let (x ∗ , y ∗ ) be defined above. Suppose that
{(x k , y k )}k ∈K → (x ∗ , y ∗ ) for some subsequence K . Let
J ∗ = {j ∈ J : xj∗ 6= 0}, J̄ ∗ = J \ J ∗ . Assume that the
Robinson condition (7) holds at x ∗ for such J̄ ∗ . Then,
{(λk , µk , ̟k )}k ∈K is bounded, where
λk = ̺k [g(x k )]+ ,
µk = ̺k h(x k ),
̟k = ̺k (xJk − y k ).
Moreover, each limit point (λ∗ , µ∗ , ̟∗ ) of {(λk , µk , ̟k )}k ∈K
together with x ∗ satisfies the KKT conditions (6) with
zj∗ = ̟i∗ for all j = J(i) ∈ J̄ ∗ .
(d) Further, if h’s are affine functions, and f and g’s are convex
functions, then x ∗ is a local minimizer of (2).
Sparse logistic regression (real data)
min lavg (v , w) : kwk0 ≤ r ,
v ,w
min lavg (v , w) + λkwk1
v ,w
◮
compare the solution quality of l0 and l1 models
Table:
Data
p
n
Colon
2000
62
Iono
34
351
Ad
1430
2359
Computational results on three real data sets
λ/λmax
0.5
0.1
0.05
0.01
0.5
0.1
0.05
0.01
0.5
0.1
0.05
0.01
r
7
22
25
28
3
11
14
24
3
36
67
197
lavg
0.4398
0.1326
0.0664
0.0134
0.4804
0.3062
0.2505
0.1846
0.2915
0.1399
0.1042
0.0475
SLEP
Error (%)
17.74
1.61
0
0
17.38
11.40
9.12
6.55
12.04
4.11
2.92
1.10
Time
0.2
0.5
0.6
1.3
0.1
0.1
0.1
0.4
2.3
14.2
21.6
153.0
lavg
0.4126
0.0150
0.0108
0.0057
0.3466
0.2490
0.2002
0.1710
0.2578
0.1110
0.0681
0.0249
PD
Error (%)
12.9
0
0
0
13.39
9.12
8.26
5.98
7.21
4.11
2.92
1.10
Time
9.1
6.0
5.0
5.4
0.7
1.0
1.1
1.7
31.9
56.0
74.1
77.4
Sparse logistic regression (random data)
Table: Computational results on random data sets
Size
n×p
1000 × 2000
2000 × 1000
1000 × 1000
λ/λmax
0.9
0.7
0.5
0.3
0.1
0.9
0.7
0.5
0.3
0.1
0.9
0.7
0.5
0.3
0.1
r
17.0
52.9
96.6
138.7
192.0
11.0
42.8
78.0
115.5
160.8
11.7
37.2
67.6
100.1
137.9
lavg
0.6411
0.5090
0.3838
0.2611
0.1228
0.6441
0.5083
0.3776
0.2490
0.1056
0.6417
0.5086
0.3805
0.2544
0.1124
SLEP
Error (%)
9.76
3.96
2.23
1.22
0.31
11.46
3.63
1.65
0.6
0.03
11.00
3.95
2.15
0.81
0.12
Time
0.4
1.0
1.7
2.1
2.0
0.4
1.1
2.0
2.6
3.1
0.1
0.2
0.3
0.4
0.5
lavg
0.2145
0.0588
0.0060
0.0022
0.0013
0.2763
0.0376
0.0032
0.0015
0.0010
0.2444
0.0572
0.0060
0.0016
0.0011
PD
Error (%)
8.49
2.66
0.02
0
0
10.67
1.49
0
0
0
9.67
2.46
0.01
0
0
Time
9.9
20.0
34.9
25.5
16.0
15.2
38.9
34.4
25.3
15.8
2.3
5.8
6.2
4.6
3.3
Sparse inverse covariance selection (real data)
(
D
E
D
E
max log det X − Σ̂, X
X 0
(
: Xij = 0 ∀(i, j) ∈ Ω,
kXij k0 ≤ r
(i,j)∈Ω̄
max log det X − Σ̂, X − ρ
X 0
P
P
)
|Xij | : Xij = 0 ∀(i, j) ∈ Ω .
(i,j)∈Ω̄
Table: Computational results on two real data sets
Genes
p
587
Samples
n
148
1255
72
ρ
0.01
0.05
0.10
0.50
0.70
0.90
0.01
0.05
0.10
0.50
0.70
0.90
r
144294
67474
38504
4440
940
146
249216
169144
107180
37914
4764
24
Likelihood
790.12
174.86
−47.03
−561.38
−642.05
−684.59
3229.75
1308.38
505.02
−931.59
−1367.22
−1465.70
)
PPA
Loss
23.24
24.35
24.73
25.52
25.63
25.70
28.25
29.85
30.53
31.65
31.84
31.90
Time
101.5
85.2
66.7
33.2
26.9
22.0
705.7
491.1
501.4
345.9
125.7
110.6
Likelihood
1035.24
716.97
389.65
−260.32
−511.70
−598.05
3555.38
2996.95
2531.62
797.23
−1012.48
−1301.99
PD
Loss
22.79
23.27
23.85
24.91
25.30
25.51
28.12
28.45
28.82
30.16
31.48
31.68
Time
38.0
31.5
26.1
24.8
22.0
14.9
177.1
189.2
202.8
256.6
271.6
187.8
,
Sparse inverse covariance selection (synthetic data)
(a) True inverse Σ−1
(b) Noisy inverse Σ̂−1
(c) Approximate solution of PD (d) Approximate solution of PPA
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
Summary:
◮
Study optimality conditions for l0 minimization
◮
Propose PD methods for l0 minimization
◮
Converge to a KKT point for general l0 minimization
◮
Converge to a local minimizer for “convex” l0 minimization
Reference:
Sparse Approximation via Penalty Decomposition Methods
(with Yong Zhang).
Thank you!
© Copyright 2026 Paperzz