Lecture Note 2: Convex function

Lecture Note 2: Convex function
Xiaoqun Zhang
Shanghai Jiao Tong University
Last updated: March 15, 2017
Lecture note 2
Convex optimization
2
Contents
1.1
1.2
1.3
1.4
1.5
1.1
Convex functions . . . . . . . . . . . . . . . . . .
Elemental properties of convex function . . . . .
Operations that preserving convexity of functions
Differential criteria of convexity . . . . . . . . . .
Conjugacy . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
7
7
9
12
Convex functions
• Definition on one variable: f defined on a nonempty interval I is said to be
convex on I if
f (αx + (1 − α)x0 ) ≤ αf (x) + (1 − α)f (x0 )
for all pairs of points (x, x0 ) in I and all α ∈ (0, 1).
• Criterion of increasing slopes: A function f is convex on an interval I iff for
(x0 )
all x0 ∈ I, the slope-function x 7→ f (x)−f
=: s(x) is increasing on I/{x0 }.
x−x0
Proof. Let x0 = αx1 + (1 − α)x2 . By convexity, we have αf (x0 ) + (1 −
α)f (x0 ) ≤ αf (x1 ) + (1 − α)f (x2 ), then s(x1 ) ≤ s(x2 ).
• Let a function f be differentiable with an increasing derivative on an open
interval I, then f is convex on I.
• Let a function f be twice differentiable with a non-negative second derivative
on an open interval I. Then f is convex on I.
Definition 1 A function f : C → R, where C ⊂ Rn and C 6= ∅, is convex if
• C is convex;
• For every x, y ∈ C and every λ ∈ [0, 1] one has
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
More definitions:
3
Lecture note 2
Convex optimization
• Concave: if −f is convex, f is called concave.
• Strictly convex: if the inequality is strict whenever x 6= y, and 0 < λ < 1,
f is called strictly convex.
• Strongly convex: if there exist c > 0 s.t.
1
f (αx + (1 − α)x0 ) ≤ αf (x) + (1 − α)f (x0 ) − cα(1 − α)kx − x0 k2
2
for all x, x0 ∈ C and α ∈ (0, 1). We say that f is strongly convex on C with
modulus of strong convexity c.
Theorem 1 The function f is strongly convex on C with modulus c if and
only if the function f − 21 ck · k2 is convex on C.
• Extended value of f . Define
f˜(x) =
f (x),
+∞,
x ∈ dom(f );
x∈
/ dom(f ).
1. For all x, y ∈ Rn , all α ∈ [0, 1], f˜(αx + (1 − α)y) ≤ αf˜(x) + (1 − α)f˜(y).
2. Arithmetic operations involving ∞: (+∞) + (+∞) = +∞; (−∞) +
(−∞) = −∞; (+∞) + (−∞) undefined. (+∞) × (±∞) = ±∞;
• The domain (or effective domain) of f is the nonempty set such as f (x) < ∞,
denoted as dom(f ) = {x ∈ C|f (x) < ∞} and the epigraph f is a subset in
Rn × R defined as epi(f )(x) = {(x, t) ∈ Rn+1 : x ∈ dom(f ); f (x) ≤ t}.
Strict epigraph: ≤ is replaced by <.
Theorem 2 Let f : Rn → R ∪ {+∞} be not identically equal to +∞. The
three properties below are equivalent:
1. f is convex (in the sense of definition).
2. its epigraph epi(f ) is a convex set in Rn × R.
3. its strict epigraph is a convex set in Rn × R.
Proof.
”only if”: If f is convex, f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). let (x, t) ∈
epi(f ) and (y, t2 ) ∈ epi(f ). then
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
≤ λt1 + (1 − λ)t2 .
Thus (λx + (1 − λ)y, λt1 + (1 − λ)t2 ) ∈ epi(f ). It yields epi(f ) is convex.
”if part”: if epi(f ) is convex, for (x, f (x)) ∈ epi(f ) and (y, f (y)) ∈ epi(f ),
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)). Thus f is convex.
4
Lecture note 2
Convex optimization
• The sub level sets Sr (f ) := {x ∈ Rn : f (x) ≤ r} of a (proper) convex function
f are convex (possibly empty). Conversely it is not necessarily true (such a
function is called quasi-convex).
• Closed convex function: If we want to minimize a function f on some compact
set K, we do not need to bother with existence if f is known to be closed (or
l.s.c) and this holds even if K is not contained in dom(f ).
Lower semi-continuous (ls.c): a function f is l.s.c if, for each x ∈ Rn ,
lim inf f (y) ≥ f (x).
y→x
Proposition 1 For f : Rn ×R∪{+∞}, the following three properties are equivalent:
• f is l.s.c on Rn ;
• epi(f ) is a closed set in Rn ;
• the sublevel-sets Sr (f ) are closed (possibly empty) for all r ∈ R.
Definition 2 The function f : Rn → R ∪ {∞} is said to be closed if it is l.s.c
everywhere, or if its epigraph is closed, or if its sublevel-sets are closed.
Examples of convex functions:
1. Example on R:
• Linear function f (x) = ax + b
• eax for any a ∈ R.
• xα on R++ for α ≥ 1 or α ≤ 0. (f 0 (x) = αxα−1 and f 00 (x) = α(α −
1)xα−2 .
• |x|p on R for p ≥ 1.
• x ln x on R++ (negative entropy).
• Some concave functions: ax + b, xα on R++ for 0 ≤ α ≤ 1, log(x) on
R++ .
2. Example on Rn .
• All affine function f (x) = a> x + b.
• All norms are convex
p
X
kxkp = (
|xi kp )1/p ,
for p ≥ 1
i=1
kxk∞ = max |xi |
i
• Max function f (x) = max{x1 , · · · , xn } is convex on Rn .
5
Lecture note 2
Convex optimization
• Quadratic-over-linear function: f (x, y) =
x1
x2
y ,y
> 0 domf = R × R++
xn
• Log-sum-exp: f (x) = log(e + · + e ) is convex on Rn .
Note: Log-sum-exp function is a differentiable approximation of max
function fβ (x) = β1 log(eβx1 + · · · + eβxn )
max ≤ fβ (x) ≤ max{x1 , · · · , xn } + 1/β log n
x1 ,··· ,xn
i
1
• Geometry mean f (x) = (Πni=1 xi ) n is concave on dom(f ) = Rn++ .
3. Examples in Rm×n
• Affine function X ∈ Rm×n :
f (X) =
XX
i
=
Aij Xij + b
j
X
(A> X)ii + b
i
= tr(AT X) + b
• Spectral norm
f (x) = kxk2 = (λ1 (x> x))1/2
• Nuclear norm
f (x) = kxk∗
Pm
• Sum of largest eigenvalues of a symmetric matrix A . fm (A) := j=1 λj (A)
has also the representation fm (A) = sup{Q> Q=Im } trace(QAQT ) = trace(QQT A).
• Volume of ellipsoids: still in symmetric matrices Sn (R), define the function f (A) := log(det(A−1 )) for positive definite A, otherwise +∞. is
convex.
4. Indicator and support functions: given a nonempty subset S ⊂ Rn , the
function χS : Rn → Rn × {∞} defined by
0;
if x ∈ S,
χS :=
(1.1)
∞
if not
is called the indicator function of S. χS is (closed and) convex if and only if
S is (closed and) convex. Indeed, epiχS = S × R+ by definition.
The support function of a nonempty subset S is defined as
σS (x) := sup{hs, xi : s ∈ S}
is convex. In fact its epigraph epi(σS ) is convex cone in R × R.
6
Lecture note 2
1.2
Convex optimization
Elemental properties of convex function
• Jensen’s inequality
PropositionP2 Let f be convex and dom(f ) = Q, then for every convex
combination i λi xi of points from Q, one has
X
X
f(
λi xi ) ≤
λi f (xi )
i
i
Proof. Using the equivalence definition epi(f ) is convex.
• Convexity of sub-level sets.
Let f be a convex function with the domain Q, then any real α, the set
Levα (f ) := {x ∈ Q, f (x) ≤ α}
is convex.
(convexity⇒ convex sub-level sets, but converse isn’t true).
(any sub-level set is convex ⇒ quasiconvex).
1.3
Operations that preserving convexity of functions
• Non-negative weighted sum: if f and g are convex, then λf + µg is convex for
any λ, µ ≥ 0.
• Affine substitutions is convex: If f is convex, then f (Ax + b) is convex.
• Pointwise max/sup: let {fj }j∈J for J be an arbitray familly of conve functions.
Then f :
sup{fj : j ∈ J} os convex (assume it is not identically ∞).
• Convex monotone superposition : Let f : Rn → R be convex, and g : R → R
be convex and increasing. Then the composite g ◦ f : x → g(f (x)) is convex
(set g(∞) = ∞ and g ◦ f is not identically ∞).
• Partial (marginal) minimization: if f (x, y) is jointly convex, and let C be
nonempty closed convex set and let g(x) = inf y∈C f (x, y)is proper and if f is
bounded below on the set {x} × C for any x ∈ Rn , then g is convex.
Proof.
By the boundedness of {x} × C for any x ∈ Rn . The inf on y is attained on
any x. Thus epi(g) = {(x, t)|for somey ∈ C, (x, y, t) ∈ epi(f )} and it is the
projection of epi(f ) onto Rn × R. Therefore epi(g) is the image of a convex
under a linear mapping and it is convex.
Example:
7
Lecture note 2
Convex optimization
A B
– consider f (x, y) = x Ax+2x By +y Cy with
positive semiBT C
definite, C is positive definite minimizing over y gives g(x) = inf y f (x, y) =
xT (A − BC −1 B T )x, g is convex, hence Schur complement A − BC −1 B T
is positive semi-definite.
T
T
T
– Distance to a set: dist(x, S) = inf y∈S kx − yk is convex if S is convex.
• Dilation and perspectives of a function:
Dilation: fu = uf (x/u) is still convex (epi(fu ) = uepi(f ), Sr (fu ) = uSr/u (f ).)
Perspectives: f˜(u, x) : R × Rn → R ∪ {+∞} defined by
uf (x/u) u > 0
f˜(u, x) :=
+∞
if not
is convex in Rn+1 .
Proof.
It is better to look at f˜ with ”‘epigraph”’.
n
epi(f˜) = {(u, x, r) ∈ R+
∗ × R × R : f (x/u) ≤ r/u}
= {u(1, x0 , r0 ) : u > 0, (x0 , r0 ) ∈ epi(f )}
= ∪u>0 {u({1} × epi(f )}
=
R+
∗ ({1}
× epi(f ))
and it is therefore a convex cone.
Example:
– f (x) = xT x convex, f (x, t) = xT x/t = tf (x/t) is convex.
– f (x) = − log(x) is convex, hence relative entropy g(x, t) = t log t − t log x
is convex on R2++ .
– KullbackLeibler (KL) divergence divergence between u, v ∈ R+
++ :
Dkl (u, v) =
n
X
(ui log(ui /vi ) − ui + vi )
i=1
is convex since it is negative entropy plus linear function of u and v.
– If f is convex, then g(x) = (cT x + d)f ((Ax + b)/(cT x + d)) is convex on
dom(g) = {cT x + d > 0(Ax + b)/(cT x + d) ∈ dom(f )}.
• Restriction of a convex function to a line: If f : Rn → R is convex if and only
if the function g : R → R,
g(t) = f (x + tv),
dom(g) = {t|x + tv ∈ domf }
8
Lecture note 2
Convex optimization
is convex for any x ∈ dom(f ), v ∈ Rn . This means that one can check
convexity of f by checking convexity of functions of one variable.
Proof.
When f (x) is convex, derive g(t) is convex by checking the definition. Conversely, for any x0 , x1 , consider g(t) = f (x0 + t(x1 − x0 )) and let t = 0 and
t = 1.
Example:
n
Consider f : S n → R with f (X) = log det(X), dom(f ) = S++
. g(t) =
1
1
−2
−2
log det(X + tV ) = log det(X) + log det(I + tX V X ) = log det(X) +
Pn
1
− 21
V X − 2 . g is concave in
i=1 log(1 + tλi ) where λi are the eigenvalues of X
n
t (for any choice of X ∈ S++
, V ); hence f is concave.
1.4
Differential criteria of convexity
Consider C ⊂ Rn be nonempty and convex and a function f is defined on C (f (x) <
∞ for all x ∈ C). We consider f is convex and differentiable on C.
Theorem 3 (First order condition) Let f be a function differentiable on an
open set Ω ⊂ Rn , and let C be a convex subset of Ω. Then
• f is convex on C if and only if
f (x) ≥ f (x0 ) + h∇f (x0 ), x − x0 i,
for all (x0 , x) ∈ C × C
• f is strictly convex on C if and only if strict inequality holds in the above
inequality whenever x 6= x0 .
• f is strongly convex with modulus c on C if and only if for all (x0 , x) ∈ C × C,
1
f (x) ≥ f (x0 ) + h∇f (x0 ), x − x0 i + ckx − x0 k2
2
Proof.
1. Let f be convex on C, for arbitrary (x0 , x) ∈ C × C and α ∈ (0, 1), we have
f (αx + (1 − α)x0 ) − f (x0 ) ≤ α(f (x) − f (x0 ))
Divide by α and let α → 0, the lefthand side reduces to
f (x0 + α(x − x0 )) − f (x0 )
→ h∇f (x0 ), x − x0 i
α
and the inequality is established.
9
Lecture note 2
Convex optimization
Conversely, take x1 and x2 in C, α ∈ (0, 1) and set x0 = x2 + α(x1 − x2 ), by
assumption
f (xi ) ≥ f (x0 ) + h∇f (x0 ), xi − x0 i
we obtain the convex combination
αf (x1 ) + (1 − α)f (x2 ) ≥ f (x0 ) + h∇f (x0 ), αx1 + (1 − α)x2 − x0 i
which is the definition of convex function.
2. Similar for strictly convex for x0 6= x
3. Apply the proof in 1) on the function f − 21 ck · k2 .
Example:
Non-negativity of KullbackLeibler (KL) divergence
Dkl (u, v) =
n
X
(ui log(ui /vi ) − ui + vi )
i=1
as
Dkl (u, v) = φ(u) − φ(v) − hφ(v), u − vi
where φ(x) =
P
i
ui log ui is convex.
Definition 3 Let C ⊂ Rn be convex. The mapping F : C → Rn is said to be
monotone [resp. strictly monotone, resp. strongly monotone with modulus c > 0]
on C when, for all x, x0 ∈ C,
hF (x) − F (x), x − x0 i ≥ 0
[resp.hF (x) − F (x), x − x0 i > 0
whenever x 6= x0 ,
resp.hF (x) − F (x), x − x0 i ≥ ckx − x0 k2 ].
Theorem 4 (monotonicity of gradient) Let f be a function differentiable on an
open set Ω ⊂ Rn , and let C be convex subset of Ω. Then f is convex [resp. strictly
convex, resp. strongly convex with modulus c] on C if and only if its gradient ∇f
is monotone [resp. strictly monotone, strongly monotone with modulus c].
Proof.
Combine the case for ”‘convex≡ monotone”’ and ”strongly convex≡ strongly
monotone” allowing c = 0.
Let f be strongly convex on C, and for arbitrary x, x0 ∈ C, we have
1
f (x) ≥ f (x0 ) + h∇f (x0 ), x − x0 i + ckx − x0 k2
2
1
f (x0 ) ≥ f (x) − h∇f (x), x − x0 i + ckx − x0 k2
2
and mere addition leads to ∇f is strongly monotone.
10
Lecture note 2
Convex optimization
Conversely, let (x0 , x1 ) be a pair in C and consider φ(t) := f (xt ) where xt :=
x0 + t(x1 − x0 ) for t in an open interval containing [0, 1] and φ is well-defined
differentiable. Its derivative at t is φ0 (t) = h∇f (xt ), x1 −x0 i. Thus for 0 ≤ t0 < t ≤ 1.
We have
φ0 (t) − φ0 (t0 ) = h∇f (xt ) − ∇f (xt0 ), x1 − x0 i =
1
h∇f (xt ) − ∇f (xt0 ), xt − xt0 i
t − t0
and the monotonicity for ∇f shows that φ0 is increasing, φ is therefore convex.
For strongly convex, set t0 = 0, and use the strongly monotonicity φ0 (t0 )−φ0 (0) ≥
1
2
2
t ckxt − x0 k = tckx1 − x0 k On the other hand, we have
0
Z
φ(1) − φ(0) − φ (0) =
1
[φ0 (t) − φ0 (0)]dt ≥
0
1
ckx1 − x0 k2
2
which by definition of φ, is just the definition of strongly convex function.
Theorem 5 (Second order differentiation) Let f be twice differentiable on an
open convex set Ω ⊂ Rn . Then
• f is convex on Ω iff ∇2 f (x0 ) is positive semi-definite for all x0 ∈ Ω.
• if ∇f 2 (x0 ) is positive definite for all x0 ∈ Ω, then f is strictly convex on Ω.
• f is strongly convex with modulus c on Ω if and only if the smallest eigenvalue
of ∇2 f (·) is minorized by c on Ω: for all x0 ∈ Ω and all d ∈ Rn ,
h∇2 d, di ≥ ckdk2
Proof.Suppose f is convex. As f is twice differentiable, we have
1
f (x + δx) = f (x) + ∇f (x)T δx + (δx)T ∇2 f (x)δx + R(x; δx)||δx||2
2
where R(x; δx) → 0 as δx → 0. As f is convex, by the first-order condition,
f (x + δx) ≥ f (x) + ∇f (x)T δx
Hence
δxT ∇2 f (x)δx + R(x; δx)||δx||2 ≥ 0
for any δx. Let δx = d and taking → 0 yields dT ∇2 f (x)d ≥ 0 for any d, thus
∇2 f (x) 0. It is easy to check when ∇2 f (x) 0, the first order condition hold,
thus f is convex. Note: similarly f is concave iff dom(f ) is convex and ∇2 f (x) is
negative semi-definite for x ∈ dom(f ).
Examples:
n
• Quadratic function f (x) = 21 xT P x + q T x + r with P ∈ S n is convex if P ∈ S+
f
2
2
(∇ (x) = P x + q, ∇ f (x) = P ). Special case: f (x) = kAx − bk is convex as
∇2 f (x) = 2AT A is positive semi-definite for any A.
11
Lecture note 2
Convex optimization
2
• Quadratic over linear f (x, y) = xy for y > 0. Show the Hessian matrix is
2
y
y
−yx
2
2
2
∇ f (x, y) = y3
= y3 y x
.
−yx x2
−x
Pn
• Log-sum-exp: f (x) = log( k=1 exp(xk )) is convex. Show the Hessian matrix
∇2 f (x) =
1
1
diag(z) − T 2 zz T
1T z
(1 z)
is positive semi-definite for zk = exp(xk ). In fact
max{x1 , x2 , · · · , xn } ≤ f (x) ≤ max{x1 , · · · , xn } + log n,
so f can be viewed as a differentiable approximation of the max function.
Qn
• Geometric mean: f (x) = ( k=1 xk )1/n on Rn++ is concave.
1.5
Conjugacy
The conjugate of f :
f ∗ (y) :=
sup
hy, xi − f (x)
x∈dom(f )
Figure 1.1: A function f : R → R, and a value y ∈ R. The conjugate function f ∗ (y)
is the maximum gap between the linear function yx and f (x). If f is differentiable,
this occurs at a point x where f 0 (x) = y.
• The domain of f ∗ (y) consists of y ∈ Rn for which the supermum is finite.
• f ∗ is convex even f is not convex since f ∗ is a set supermum of a convex
(affine function of y).
• For differentiable function, the mapping f → f ∗ is called Legendre-Fenchel
transform.
Examples:
12
Lecture note 2
Convex optimization
• f (x) = aT x+b, f ∗ (y) = supx y T x−ax−b is bounded iff y = a. dom(f ∗ ) = {a}
and f ∗ (a) = b.
• Negative logarithm f (x) = − log(x), dom(f ) = R++ .
unbounded
f ∗ (y) =
sup y T x + log(x) =
− log(−y) − 1
x∈dom(f )
y≥0
y<0
(for x = − y1 ) when dom(f ∗ ) = {y < 0}.
• f (x) = ex . For y < 0, xy − ex is unbounded (let x → −∞); For y > 0, when
x = log y, xy − ex reaches its maximum and f ∗ (y) = y log y − y. For y = 0,
f ∗ (y) = sup −ex = 0, thus f ∗ (y) = y log y − y (with 0 log 0 = 0).
n
. y T x − 21 xT Qx is
• Strictly convex quadratic: f (x) = 1/2xT Qx, Q ∈ S++
−1
∗
bounded above for all x, maxium at Q y = x, thus f (y) = 21 y T Q−1 y.
0
x∈S
• Indicator function: let IS (x) =
, thus IS∗ (y) = supx∈S y T x
+∞ if not
(support function of S).
Pn
T
Pn
i=1 yi log yi y ≥ 0, 1 y = 1
• Log-sum-sup: f (x) = log( i=1 exi ), f ∗ (y) =
+∞
if not
• Norm: let k · k be a norm in Rn , with dual norm k · k∗ . We will show that the
conjugate of f (x) = kxk is
0
kyk∗ ≤ 1
∗
f (y) =
+∞ if not
Recall that kyk∗ = sup{|y T x|, kyk ≤ 1}. For kyk∗ > 1, by definition, there
exists z scuh that kzk ≤ 1 and y T z > 1. Let x = tz amd t → ∞, then
y T x − kxk = t(y T z − kzk) → ∞, thus f ∗ (y) is unbounded. Conversly, if
kyk∗ ≤ 1, then y T x ≤ kyk∗ kxk and y T x − kxk ≤ 0 and the maximum is
atteined at x = 0. Thus
0
kyk∗ ≤ 1
∗
f (y) =
+∞ if not
• Norm square: let f (x) = 12 kxk2 for some norm. Then f ∗ (y) = 21 kyk2∗ .
Properties of conjugate functions
• Fenchel’s inequality: f ∗ (y)+f (x) ≥ xT y (Young’s inequality for differentiable
f ). Example: xT y ≤ 1/2xT Qx + 1/2y T Q−1 y.
• Conjugate of conjugate: if f is convex and f is closed, then f ∗∗ = f .
13
Lecture note 2
Convex optimization
• Suppose f is convex and differentiable with dom(f ) = Rn . Any maximizer
x∗ s.t. ∇f (x∗ ) = y and conversely if x∗ s.t y = ∇f (x∗ ), then x∗ maximize
y T x − f (x) and f ∗ (y) = ∇f (x∗ )T x∗ − f (x∗ ) = y T (∇f )−1 (y) − f ((∇f )−1 (y)).
Thus df ∗ (y) = hy, dxi+hx∗ , yi−h∇f (x∗ ), dx∗ i = hdy, x∗ i. Thus ∇f ∗ (y) = x∗ .
∇f ∗ (y) = x∗ ⇐⇒ y = ∇f (x∗ )
• Scaling and composition with affine transformation: for a > 0, b ∈ R g(x) =
ax + b, thus g ∗ (y) = af ∗ (y/a) − b. Suppose that A ∈ Rn×n is nonsingular and
b ∈ Rn , then the conjugate of f (Ax + b) is g ∗ (y) = f ∗ (A−T y) − bT A−T y with
domg ∗ = AT domf ∗ .
• Sum of independent function f (u, v) = f1 (u) + f2 (v) where f1 , f2 are both
convex with conjugate f ∗ (w, z) = f1∗ (w) + f2∗ (z)
T
• Convexity of the conjugation: if dom(f1 ) dom(f2 ) 6= ∅, and α ∈ (0, 1), then
[αf1 + (1 − α)f2 ]∗ ≤ αf1∗ + (1 − α)f2∗
14

Download Report

Lecture Note 2: Convex function

Paperzz.com

Your Paperzz