Lower bounds of nonzero entries in solutions

LOWER BOUNDS OF NONZERO ENTRIES IN SOLUTIONS
XIAOJUN CHEN∗ , LEI GUO† , ZHAOSONG LU‡ , AND JANE J. YE§
Abstract. This is a supplementary material of paper [2].
1. Lower bound for nonzero entries of local minimizers. In [2], we consider the following non-Lipschitz nonlinear programming problem:
min f (x) + Φ(x)
s.t. c(x) = 0,
(1.1)
d(x) ≤ 0,
where f : IRn → IR, c : IRn → IRm and d : IRn → IRp are continuously differentiable functions, and Φ : IRn → IR is a continuous function (possibly nonconvex
non-Lipschitz).
As observed in the literature and our numerical experiments in [2], the local
minimizers of problem (1.1) are often sparse when Φ is nonconvex and non-Lipschitz.
This has made problem (1.1) a popular model in seeking a sparse solution (see, e.g., [1,
3]). In this report, we derive a lower bound on nonzero entries of any local minimizer
of problem (1.1) under some suitable assumptions on its objective and constraint
functions, which shows that the magnitude of all nonzero entries of a local minimizer
must be above a certain positive number. This provides a possible interpretation why
local minimizers of problem (1.1) are often sparse. To this end, we assume in this
report that the function f is twice continuously differentiable, c(x) := Ax − b with
A ∈ IRm×n , b ∈ IRm , and
(
)
n
l−x
d(x) :=
with l, u ∈ IR and l < u.
(1.2)
x−u
It is clear that d(x) ≤ 0 is equivalent to x ∈ [l, u].
The lower bounds on local minimizers have been derived in the literature for some
special cases of problem (1.1) with the regularization term Φ(x) = ∥x∥qq with q ∈ (0, 1).
In particular, Chen et al. [3] derived lower bounds for ℓq -regularized unconstrained
convex quadratic programming. Lu [6] studied lower bounds for general ℓq -regularized
unconstrained nonlinear programming. Recently, Chen et al. [1] developed lower
bounds for the case where f is a convex quadratic function, A = eT or [eT − eT ]
where e is the vector of all ones, b = 1, m = 1, l = 0 and u = ∞. For convenience of
presentation, we first introduce some notations that will be used in this section only.
We set the infimum or minimum over an empty set to be +∞. We let S ∗ be the
set of local minimizers of problem (1.1) and assume that Lf := supx∈S ∗ ∥∇2 f (x)∥ is
∗ Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom,
Kowloon, Hong Kong. Email: [email protected]. The author’s work was supported in part
by Hong Kong Research Grants Council PolyU5002/13p.
† Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Shanghai 200030, China.
Email: [email protected]. This author’s work was supported by the NSFC Grant (No.
11401379) and the China Postdoctoral Science Foundation (No. 2015T80428).
‡ Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. Email:
[email protected]. This author’s work was supported in part by the NSERC.
§ Department of Mathematics and Statistics, University of Victoria, Victoria, BC, V8W 2Y2,
Canada. Email: [email protected]. This author’s work was supported in part by the NSERC.
1
finite which clearly holds provided that either the gradient of f is globally Lipschitz
continuous over [l, u] or S ∗ is bounded.
Given a function φ : IR → IR and a ∈ IR, we define
{ }
φ⋄ (a) := inf |t| φ(t) ≥ a .
Given ∅ =
̸ I ⊆ {1, . . . , n}, let I c be the complement of I with respect to {1, . . . , n},
and
}
∑
T
2
i∈I [(ei − Γi ) (ej − Γj )] > 0 ,
{
+
Γ := (AI ) AI ,
θ(I) :=
(1.3)
j∈I
}
∑
+
c
T
Aj xj ) ̸= 0 for some xj ∈ {lj , 0, uj } j ∈ I
,
i ∈ I Γi = ei and ei (AI ) (b −
c
{
σ(I) :=
j∈I
(1.4)
Lf
∑
[(ei − Γi )T (ej − Γj )]2
j∈I
λ̄i (I) := −
i ∈ θ(I) ̸= ∅,
∥ei − Γi ∥4
(
)
α(I) := min (ϕ′′i )⋄ λ̄i (I) ,
(1.5)
i∈θ(I)
{
β(I) := min
i∈σ(I)
|eTi (AI )+ (b
−
∑
j∈I c
}
∑
T
+
c
Aj xj )|ei (AI ) (b −
Aj xj ) ̸= 0, xj ∈ {lj , 0, uj } j ∈ I ,
j∈I c
(1.6)
{
δ := min
}
min
I⊆{1,...,n}
{α(I), β(I)}, min |ui |, min |li | ,
ui ̸=0
li ̸=0
λ0 :=
min
min λ̄i (I).
I⊆{1,...,n} i∈θ(I)
Here, A+
I denotes the Moore–Penrose pseudoinverse of the matrix AI .
It is not hard to observe that δ and λ0 only depend on the data of problem (1.1).
Moreover, we can see that λ0 < 0 if λ0 ̸= ∞.
Before establishing the main result of this section, we make some further assumptions on Φ that will be used in this section only.
∑n
Assumption 1.1. Assume that Φ(x) := i=1 ϕi (xi ) with ϕi : IR → IR lower
semi-continuous and for each i = 1, . . . , n, ϕi is twice continuously differentiable
everywhere except at 0 and
ϕ′′i (t) < 0
lim sup ϕ′′i (t) < λ0 ,
∀t ̸= 0,
t→0
where λ0 is defined in (1.7).
It is easy to verify that the following widely used regularization functions satisfy
Assumption 1.1. Note that the bridge penalty function is non-Lipschitz and it is
locally Lipschitz everywhere expect at 0 while both the logistic penalty function and
the fraction penalty function are locally Lipschitz everywhere.
(i) Bridge penalty [4] with ϕi (t) = λ|t|q for any q ∈ (0, 1) and
{ λ>
√ 0;
> −λ0 if λ0 < 0,
(ii) Logistic penalty [5] with ϕi (t) = log(1 + λ|t|) for any λ
≥0
if λ0 = ∞;
{
√
−λ0
>
if λ0 < 0,
λ|t|
2
for any λ
(iii) Fraction penalty [7] with ϕi (t) = 1+λ|t|
≥0
if λ0 = ∞.
2
(1.7)
Under Assumption 1.1, we can show δ > 0 as follows.
Proposition 1.1. Let δ be defined in (1.7). If Assumption 1.1 holds, then
δ > 0.
Proof. By the definition of δ, one can easily see δ ≥ 0. It suffices to show δ ̸= 0.
Suppose for contradiction that δ = 0. Notice that α(I) ≥ 0 and β(I) > 0 for any
I ⊆ {1, . . . , n}. These together with the definition of δ and the assumption δ = 0
imply that α(I) = 0 for some I ⊆ {1, . . . , n}. It then follows from (1.5) that there
exists some i0 ∈ θ(I) such that (ϕ′′i0 )⋄ (λ̄i0 (I)) = 0, which along with the definitions
of (ϕ′′i0 )⋄ and λ0 implies that there exists a sequence {tk } converging to 0 such that
ϕ′′i0 (tk ) ≥ λ̄i0 (I) ≥ λ0 . This contradicts the second relation in Assumption 1.1.
We are now ready to establish the main result of this section. The result shows
that δ defined in (1.7) is a lower bound for nonzero entries of local minimizers of
problem (1.1).
Theorem 1.1. Suppose that Assumption 1.1 holds. Let δ be defined in (1.7) and
x∗ a local minimizer of problem (1.1). Then for any i, there holds |x∗i | ≥ δ > 0 if
x∗i ̸= 0.
Proof. Without loss of generality, we assume for simplicity that l, u ∈ IRn . Let
Il = {i | x∗i = li }, Iu = {i | x∗i = ui } and
{
}
I = i | x∗i ∈
/ {li , 0, ui } , Θ = (I − Γ)[∇2 f (x∗ )II ](I − Γ),
where I is the |I| × |I| identity matrix and Γ is defined as in (1.3). In addition,
let the associated θ(I), α(I) and β(I) be defined according to (1.3), (1.5) and (1.6),
respectively.
Without loss of generality, assume that x∗ = (lIl , x∗I , 0, uIu ). Since x∗ is a local
minimizer of problem (1.1), one can observe that x∗I is a local minimizer of the problem
∑
f (lIl , xI , 0, uIu ) + i∈I ϕi (xi )
∑
∑
s.t. AI xI = b − j∈Il Aj lj − j∈Iu Aj uj .
min
(1.8)
Notice that the objective function of problem (1.8) is twice continuously differentiable
at x∗I . By the second-order optimality condition, we have
dT [∇2 f (x∗ )II + Diag(ϕ′′ (x∗I ))]d ≥ 0 ∀d ∈ {d | AI d = 0},
(1.9)
where ϕ′′ (x∗I ) := (ϕ′′i (x∗i ) : i ∈ I). Since I − (AI )+ AI is the orthogonal projector onto
the null space of AI , we have
{d | AI d = 0} = {d | d = (I − (AI )+ AI )w, w ∈ IR|I| },
which together with (1.9) implies
(
)
(
)
wT I − (AI )+ AI [∇2 f (x∗ )II + Diag(ϕ′′ (x∗I ))] I − (AI )+ AI w ≥ 0
∀w ∈ IR|I| .
This inequality can be rewritten as
wT Θw + wT (I − Γ)Diag(ϕ′′ (x∗I ))(I − Γ)w ≥ 0 ∀w ∈ IR|I| .
Letting w = ei − Γi and using this inequality and ϕ′′i (x∗i ) < 0 for i ∈ I, we see
(ei − Γi )T Θ(ei − Γi ) + ϕ′′i (x∗i )∥ei − Γi ∥4 ≥ 0 i ∈ I.
3
(1.10)
Suppose that |x∗i | > 0 for some i. If x∗i ∈ {li , ui }, it immediately follows from
the definition of δ that |x∗i | ≥ δ holds. Otherwise, we have i ∈ I. We next show that
|x∗i | ≥ δ also holds by considering two separate cases as follows.
Case 1): Γi ̸= ei . Noting that ∥∇2 f (x∗ )II ∥ ≤ Lf , it follows from (1.10) that
∑
0 < (ei − Γi )T Θ(ei − Γi ) ≤ Lf
[(ei − Γi )T (ej − Γj )]2 .
(1.11)
j∈I
This together with (1.3) implies that i ∈ θ(I) ̸= ∅. Moreover, (1.10) and (1.11) imply
∑
Lf j∈I [(ei − Γi )T (ej − Γj )]2
′′ ∗
ϕi (xi ) ≥ −
.
∥ei − Γi ∥4
By the definitions of (ϕ′′i )⋄ and α(I), we obtain that
)
(
∑
Lf j∈I [(ei − Γi )T (ej − Γj )]2
∗
′′ ⋄
≥ α(I).
|xi | ≥ (ϕi ) −
∥ei − Γi ∥4
It then follows from this inequality and the definition of δ that |x∗i | ≥ δ holds.
Case 2): Γi = ei . It follows from the relation Ax∗ = b that
∑
∑
AI x∗I = b −
Aj lj −
Aj uj .
j∈Il
j∈Iu
+
Pre-multiplying this relation by eTi A+
I on both sides and using Γ = (AI ) AI , we have
∑
∑
eTi Γx∗I = eTi A+
Aj lj −
Aj uj ),
I (b −
j∈Il
j∈Iu
which together with Γi = ei and x∗i ̸= 0 yields
∑
∑
x∗i = eTi A+
Aj lj −
Aj uj ) ̸= 0
I (b −
j∈Il
j∈Iu
and hence i ∈ σ(I) ̸= ∅, where σ(I) is defined in (1.4). It follows from these relations
and (1.6) that |x∗i | ≥ β(I), which together with the definition of δ implies that
|x∗i | ≥ δ.
The proof is complete since we observe that δ > 0 by Proposition 1.1.
Acknowledgments. The first and the last two authors thank American Institute
of Mathematics where this work was initiated, for support and hospitality during
summer 2010 and summer 2011. The authors are also grateful to Jiming Peng at
University of Houston and Ting Kei Pong at the Hong Kong Polytechnic University
for their helpful discussions, Caihua Chen at Nanjing University for providing the
Matlab code of the IP method in [1] to solve the ℓ1/2 -norm regularized models.
REFERENCES
[1] C. Chen, X. Li, C. Tolman, S. Wang and Y. Ye, Sparse portfolio selection via quasi-norm
regularization, ArXiv preprint, arXiv:1312.6350, 2014.
[2] X. Chen, L. Guo, Z. Lu and J. Ye, A Feasible augmented Lagrangian method for non-Lipschitz
nonconvex programming, Preprint, 2016.
4
[3] X. Chen, F. Xu and Y. Ye, Lower bound theory of nonzero entries in solutions of l2 -lp
minimization, SIAM J. Sci. Comput., 32 (2010), pp. 2832–2852.
[4] W.J. Fu, Penalized regression: The bridge versus the lasso, J. Comput. Graph. Stat., 7 (1998),
pp. 397–416.
[5] D. Geman and G. Reynolds, Constrained restoration and the recovery of discontinuities,
IEEE Trans. Pattern Anal. Mach. Intell., 14 (1992), pp. 357–383.
[6] Z. Lu, Iterative reweighted minimization methods for lp regularized unconstrained nonlinear
programming, Math. Program., 147 (2014), pp. 277–307.
[7] M. Nikolova, M.K. Ng, S. Zhang and W. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 1 (2008),
2–25.
5