3 Convex Functions

3 Convex Functions
3.1 Convexity notions for functions and basic properties
We start the chapter with the basic definition of a convex function.
Definition 3.1.1 (Convex function) A function f : E → R is said to be convex if epi f is a
convex set.
Note that in the above definition we could have substitued the epigraph for the strict epigraph
epi < f := {(x, α) ∈ E × R | f (x) < α } of f , see Exercise 3.3.. Moreover, note that convex
functions have convex level sets, see Exercise 3.6.
Recall that the domain of a function f : E → R is defined by dom f := {x ∈ E | f (x) < ∞ }.
Using the linear mapping
L : (x, α) ∈ E × R 7→ x ∈ E,
(3.1)
we have dom f = L(epi f ), and hence Proposition 2.1.2 yields the following immediate but
important result.
Proposition 3.1.2 (Domain of a convex function) The domain of a convex function is convex.
Recall that a (convex) function f : E → R is proper if dom f 6= ∅ and f (x) > −∞ for all
x ∈ E.
Improper convex functions are somewhat pathological (cf. Exercise 3.4.), but they do occur;
rather as by-products then as primary objects of study. For example the function

 −∞ if |x| < 1,
0 if |x| = 1,
f : x ∈ R 7→

+∞ if |x| > 1.
is improper and convex.
Convex functions have an important interpolation property, which we summarize in the next
result for the case that f does not take the value −∞.
Proposition 3.1.3 (Characterizing convexity) A function f : E → R ∪ {+∞} is convex if
and only if for all x, y ∈ E we have
f (λx + (1 − λ)y) ≤ (1 − λ)f (x) + λf (y)
64
(λ ∈ [0, 1]).
(3.2)
3 Convex Functions
Proof: First, let f be convex. Take x, y ∈ E and λ ∈ [0, 1]. If x ∈
/ dom f or y ∈
/ dom f the
inequality (3.2) holds trivially, since the right-hand side is going to be +∞. If, on the other
hand, x, y ∈ dom f , then (x, f (x)), (y, f (y)) ∈ epi f , hence by convexity
(λx + (1 − λ)y, λf (x) + (1 − λ)f (y)) ∈ epi f,
i.e. f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), which proves the first implication.
In turn, let (3.2) hold for all x, y ∈ E. Now, take (x, α), (y, β) ∈ epi f and let λ ∈ [0, 1]. Due
to (3.2) we obtain
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + λβ,
i.e. λ(x, α) + (1 − λ)(y, β) ∈ epi f , which shows the converse implication.
We move the analogous characterization of convexity for functions E → R to Exercise 3.3.,
because these kinds of functions are not our primary object of study.
The next result is an extension of Proposition 3.1.3, which can be seen in various ways.
Corollary 3.1.4 (Jensen’s Inequality) A function f : E → R ∪ {+∞} is convex if and only
if
!
p
p
X
X
f
λ i xi ≤
λi f (xi ) (xi ∈ E (i = 1, . . . , p), λ ∈ ∆p )
i=1
i=1
Proof: Exercise 3.5.
It is sometimes expedient to consider convexity of a function restricted to a subset of its domain.
Definition 3.1.5 (Convexity on a set) For a nonempty convex set C ⊂ dom f , we call f :
E → R ∪ {+∞} convex on C if (3.2) holds for all x, y ∈ C.
Corollary 3.1.6 Let f : E → R ∪ {+∞}. Then the following are equivalent.
i) f is convex.
ii) f is convex on its domain.
Proof: The implication’ i)⇒ii)’ is obvious from the characterization of convexity in Proposition 3.1.3
For the converse implication note that (3.2) always holds for any pair of points x, y if one of
them is not in the domain.
This completes the proof.
65
3 Convex Functions
Remark 3.1.7 As an immediate consequence of Corollary 3.1.6, we can make the following
statement about proper, convex functions:
”The proper and convex functions E → R ∪ {+∞} are those for which there exists a nomempty,
convex set C ⊂ E such that (3.2) holds on C and f takes the value +∞ outside of C.”
We are mainly interested in proper, convex (even lsc) functions E → R ∪ {+∞}. Hence, we
introduce the abbreviations
Γ := Γ(E) := {f : E → R ∪ {+∞} | f proper and convex }
and
Γ0 := Γ0 (E) := {f : E → R ∪ {+∞} | f proper, lsc and convex }
which we will use frequently in the remainder.
Ever so often some stronger notions of convexity are needed, which we establish now.
Definition 3.1.8 (Strict and strong convexity) Let f be proper and convex and C ⊂ dom f
convex. Then f is said to be
• strictly convex on C if
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y)
(x, y, ∈ C, x 6= y, λ ∈ (0, 1)).
• strongly convex on C if there exists σ > 0 such that
σ
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) − λ(1 − λ)kx − yk2
2
(x, y, ∈ C, λ ∈ (0, 1))
The scalar σ > 0 is called modulus of strong convexity of f (on C).
For C = dom f we simply call f strictly and strongly convex, respectively.
Proposition 3.1.9 (Characterization of strong convexity) Let f be proper and convex and
C ⊂ dom f . Then f is strongly convex on C with modulus σ > 0 if and only if f − σ2 k · k2 is
convex on C.
Proof: First, let f be strongly convex on C with modulus σ > 0. Then for any λ ∈ (0, 1) and
x, y ∈ C we have
σ
kλx + (1 − λ)yk2
2
σ
≤ λf (x) + (1 − λ)f (y) −
λ(1 − λ)kx − yk2 + kλx + (1 − λ)yk2
2
σ
σ
2
≤ λ f (x) − kxk + (1 − λ) f (y) − kyk2 ,
2
2
f (λx + (1 − λ)y) −
66
3 Convex Functions
i.e. f − σ2 k · k2 is convex on C.
If, in turn, f − σ2 k · k2 is convex on C we compute that
σ
kλx + (1 − λy)k2 − λkxk2 − (1 − λ)kyk2
2
σ
= λf (x) + (1 − λ)f (y) − λ(1 − λ)kx − yk2
2
f (λx + (1 − λy)) ≤ λf (x) + (1 − λ)f (y) +
for all x, y ∈ C, i.e. f is strongly convex on C with modulus σ > 0.
We stop our analysis for a short list of obvious convex functions. In Section 3.1.1 we learn how
to build new convex functions from old ones.
Example 3.1.10 (Examples of convex functions)
a) (Affine functions) Every affine function F : E → R is convex.
b) (Indicator of convex sets) For a set C ⊂ E its indicator function δC is convex if and only
if C is convex.
c) (Norms) Any Norm k · k∗ on E is convex.
3.1.1 Functional operations preserving convexity
Proposition 3.1.11 (Positive combinations of convex functions) For p ∈ N let fi : E →
R ∪ {+∞} be convex (and lsc) and αi ≥ 0 for i = 1, . . . , p. Then
p
X
αi fi
i=1
is convex (and lsc). If, in addtion, ∩pi=1 dom fi 6= ∅, then f is also proper.
Proof: The convexity assertion is an immediate consequence of the characterization in (3.2).
For the additional closedness see Exercise 1.12. The properness statement is obvious.
Note that the latter result tells us that Γ and Γ0 are convex cones.
Proposition 3.1.12 (Pointwise supremum of convex functions) For an arbitrary index set
I let fi be convex (and lsc) for all i ∈ I. Then the function f = supi∈I fi , i.e.
f (x) = sup fi (x)
i∈I
is convex (and lsc).
67
(x ∈ E)
3 Convex Functions
Proof: It holds that
epi f = (x, α)
\
sup fi (x) ≤ α = {(x, α) | ∀i ∈ I : fi (x) ≤ α } =
epi fi .
i∈I
i∈I
Since the intersection of (closed) convex sets it (closed) convex, this gives the assertion.
Proposition 3.1.13 (Pre-composition with and affine mapping) Let H : E1 → E2 be
affine and g : E2 → R ∪ {+∞} (lsc and) convex. Then the function f := g ◦ H is (lsc and)
convex.
Proof: Let x, y ∈ E1 and λ ∈ (0, 1). Then we have
f (λx+(1−λx)) = g(λH(x)+(1−λ)y) ≤ λg(H(x))+(1−λ)g(H(y)) = λf (x)+(1−λ)f (y),
which gives the convexity of f . The closedness of f , under the closedness of g, follows from
the continuity (as a consequence of affineness) of H, cf. Exercise 1.13.
Proposition 3.1.14 (Post-composition with monotonically increasing, convex functions)
Let f be convex (and lsc) and let g : R → R ∪ {+∞} be convex (and lsc) and increasing. Under
the convention g(+∞) := +∞ and limx→∞ g(x) = +∞, the function g ◦ f is convex (and lsc).
If in addition, there exists x0 such that f (x0 ) ∈ dom g, then g ◦ f is proper.
Proof: Exercise 3.8.
Proposition 3.1.15 (Convexity under epi-composition) Let f ∈ Γ and L ∈ L(E, E0 ). Then
the function Lf : E0 → R defined by
(Lf )(y) := inf {f (x) | L(x) = y }
is convex.
Proof: We first show that , with T : (x, α) 7→ (Lx, α), we have
epi < Lf = T (epi < f ).
To this end, recall that
epi < Lf = {(y, α) | Lf (y) < α }
and epi < f = {(x, α) | f (x) < α } .
68
(3.3)
3 Convex Functions
First, let (x, α) ∈ epi < f . Then T (x, α) = (L(x), α) and
(Lf )(L(x)) = inf {f (z) | L(z) = L(x) } ≤ f (x) < α,
z
thus, T (x, α) ∈ epi < Lf .
In turn, if (y, α) ∈ epi < Lf , i.e. inf {f (z) | L(z) = y } < α, then L−1 (y) 6= ∅, hence, there
exists x ∈ L−1 (y) with f (x) < α. Thus, we have T (x, α) = (y, α) and (x, α) ∈ epi < f . This
proves (3.3).
Now, as f is convex, epi < f is convex (see Exercise 3.3.). But, since T is linear, from (3.3) it
follows that also epi < Lf is convex, which proves the convexity of Lf .
3.1.2 Differentiable convex functions
We want to apply the notion of differentiability to extended-real valued functions. This only
makes sense at points for which there exists a whole neighborhood on which the function in
question is at least finitely valued, i.e. at points in the interior of the domain:
For f : E → R we say that f is differentiable at x ∈ int (dom f ) if f restricted to int (dom f )
is differentiable at x. Stronger notions of differentiability are defined accordingly.
Convexity of differentiable functions can be handily characterized.
Theorem 3.1.16 (First-order characterizations) Let f : E → R ∪ {+∞} be differentiable
on a convex, open set C ⊂ int (dom f ). Then the following hold:
a) f is convex on C if and only if
f (x) ≥ f (x̄) + h∇f (x̄), x − x̄i
(x, x̄ ∈ C).
(3.4)
b) f is strictly convex on C if and only if (3.4) holds with strict inequality whenever x 6= x̄.
c) f is strongly convex with modulus σ > 0 on C if and only if
f (x) ≥ f (x̄) + h∇f (x̄), x − x̄i +
σ
kx − x̄k2
2
(x, x̄ ∈ C).
(3.5)
Proof:
a) First, let f be convex and take x, x̄ ∈ C and λ ∈ (0, 1). Then by convexity it holds that
f (x̄ + λ(x − x̄)) − f (x̄) ≤ λ(f (x) − f (x̄)).
As f is differentiable on C, dividing by λ and letting λ → 0 gives
h∇f (x̄), x − x̄i ≤ f (x) − f (x̄),
69
3 Convex Functions
which establishes (3.4).
In turn, if (3.4) holds, we take x1 , x2 ∈ C, λ ∈ (0, 1) and put x̄ := λx1 + (1 − λ)x2 ∈ C).
By (3.4) it follows that
f (xi ) ≥ f (x̄) + h∇f (x̄), xi − x̄i
(i = 1, 2).
Multiplying these two inequalities by λ and (1 − λ), respectively, summation of the
resulting inequalities yields
λf (x1 ) + (1 − λ)f (x2 ) ≥ f (x̄) + h∇f (x̄), λx1 + (1 − λ)x2 − x̄i = f (λx1 + (1 − λ)x2 ).
As x1 , x2 were taken arbitrarily from C, f is convex on C.
b) If f is strictly convex on C, for x, x̄ ∈ C, x 6= x̄ and λ ∈ (0, 1), we have
f (x̄ + λ(x − x̄)) − f (x̄) < λ(f (x) − f (x̄)).
In addition, since f is, in particular, convex, part a) implies that
h∇f (x̄), λ(x − x̄)i ≤ f (x̄ + λ(x − x̄)) − f (x̄).
Combining these inequalities gives the desired strict inequality.
The converse implications is proven analogously to the respective implication in a), starting from the strict inequality.
c) Using Proposition 3.1.9, applying part a) to f − σ2 k · k2 gives the assertion.
Theorem 3.1.16 opens the door for another characterization of convexity of differentiable functions on open sets in terms of so-called monotonicity properties of the gradient mapping.
Before we prove it we would like to recall the reader of the chain rule for differentiable
functions.
For i = 1, 2 let Ωi ∈ Ei be open. If f : Ω1 ⊂ E1 → E2 is differentiable at x̄ ∈ Ω1 and
g : Ω2 → E3 is differentiable at f (x̄) ∈ Ω2 , then g ◦ f : Ω1 → E3 is differentiable at x̄ with
(g ◦ f )0 (x̄) = g 0 (f (x̄)) ◦ f 0 (x̄).
Corollary 3.1.17 (Monotonicity of gradient mappings) Let f : E → R ∪ {+∞} be differentiable on the open set Ω ⊂ int (dom f ) and let C ⊂ Ω be convex. Then the following hold:
70
3 Convex Functions
a) f is convex on C if and only if
h∇f (x) − ∇f (y), x − yi ≥ 0
(x, y ∈ C).
(3.6)
b) f is strictly convex on C if and only if (3.6) holds with a strict inequality whenever x 6= y.
c) f is strongly convex with modulus σ > 0 on C if and only if
h∇f (x) − ∇f (y), x − yi ≥ σkx − yk2
(x, y ∈ C).
(3.7)
Proof: We are first going to show one direction in c) and a), respectively: To this end, first,
let f be strongly convex with modulus σ > 0 on C. Hence, by Theorem 3.1.16 c), for x, y ∈ C,
we obtain
σ
f (x) ≥ f (y) + h∇f (y), x − yi + kx − yk2
2
and
σ
f (y) ≥ f (x) + h∇f (x), y − xi + kx − yk2 .
2
Adding these two inequalities yields (3.7), which shows one implication in c). Setting σ = 0
gives the same implication in a).
We now show the converse directions in a) and c): For these purposes, let x, y ∈ C be given,
and consider the function
ϕ : I → R,
ϕ(t) := f (x + t(y − x)).
with I an open interval containing [0, 1]. We put xt := x + t(y − x) ∈ C for all t ∈ [0, 1] and
realize that ϕ is differentiable on I with ϕ0 (t) = h∇f (xt ), y − xi for all t ∈ [0, 1] (chain rule).
Hence, we obtain
ϕ0 (t) − ϕ0 (s) = h∇f (xt ) − ∇f (xs ), y − xi =
1
h∇f (xt ) − ∇f (xs ), xt − xs i
t−s
(3.8)
for all 0 ≤ s < t ≤ 1.
If (3.6) holds, this implies that ϕ0 is nondecreasing on [0, 1], hence ϕ is convex on (0, 1), cf.
Exercise 3.2., i.e. f is convex on (x, y). Since x, y ∈ C were chosen arbitrarily this implies that
f is actually convex on C.
For the strong convexity, set s := 0 in (3.8) and use
ϕ0 (t) − ϕ0 (0) ≥
σ
kxt − xk2 = tσky − xk2 .
t
Integrating and exploiting the definition of ϕ then yields
f (y) − f (x) − h∇f (x), y − xi = ϕ(1) − ϕ(0) − ϕ0 (0)
71
(3.9)
3 Convex Functions
Z
1
=
ϕ0 (t) − ϕ0 (0) dt
0
Z
≥
1
tσky − xk2 dt
0
=
σ
ky − xk2 ,
2
which gives (3.5) for x, y. As they were chosen arbitrarily in C, f is strongly convex on C by
Theorem 3.1.16 c).
The same technique of prove gives part b), where (3.9) becomes a strict inequality with σ = 0,
and remains strict after integration.
We now investigate convexity criteria for even twice differentiable functions.
Theorem 3.1.18 (Twice differentiable convex functions) Let f : E → R ∪ {+∞} be twice
differentiable on the open convex set Ω ⊂ int (dom f ). Then the followoing hold:
a) f is convex on Ω if and only if ∇2 f (x) is positive semidefinite for all x ∈ Ω.
b) If ∇f (x)2 is positive definite for all x ∈ Ω then f is strictly convex on Ω.
c) f is strongly convex with modulus σ > 0 on Ω if and only if, for all x ∈ Ω, the smallest
eigenvalue of ∇2 f (x) is bounded by σ from below.
Proof: Let x ∈ Ω, d ∈ E. Since Ω is open, the intervall I := I(x, d) := {t ∈ R | x + td ∈ Ω }
is open. We define
ϕ : R → R, ϕ(t) := f (x + td).
(3.10)
Then, in particular, ϕ is twice differentiable on I with ϕ00 (t) = ∇2 f (x + td)d, d for all t ∈ I.
a) First, assume that f is convex on Ω. Now, let x ∈ Ω and d ∈ E \ {0}. Then ϕ from (3.10)
is convex on I by Proposition 3.1.13. Using Exercise 3.2. it follows that
0 ≤ ϕ00 (t) = ∇2 f (x + td)d, d ,
which gives the first implication.
Conversely, take x, y ∈ Ω arbitrarily, put d := y − x and assume that ∇2 f (x + td) is
positive semidefinite. Then for ϕ from (3.10) we have ϕ00 (t) ≥ 0 for all t ∈ [0, 1] ⊂ I.
Therefore Exercise 3.2. tells us that ϕ is convex on (0, 1), i.e. f is convex on (x, y). Since
x, y ∈ Ω were chosen arbitrarily, f is convex on Ω.
72
3 Convex Functions
b) Again take x, y ∈ Ω with x 6= y and put d := y − x. Applying the mean-value theorem
to the function ϕ0 , which is differentiable on (0, 1), yields some τ ∈ (0, 1) such that
h∇f (y) − ∇f (x), y − xi = ϕ0 (1) − ϕ0 0) = ϕ00 (τ ) = ∇2 f (x + τ d)d, d > 0.
Corollary 3.1.17 then gives the assertion.
c) Using Proposition 3.1.9, we apply a) to the function f − σ2 k · k2 , whose Hessian at x ∈ Ω
is ∇2 f (x) − σI which has the eigenvalues λi − σ with λ1 , . . . , λN the eigenvalues of
∇2 f (x̄). This gives the assertion, as a symmetric matrix is positive semidefinite if and
only if all of its (real) eigenvalues are nonnegative.
Note that the condition in Theorem 3.1.18 b) is only sufficient for strict convexity. As an example that it is not necessary notice that x 7→ 14 x4 is strictly convex, but f 00 (0) = 0.
We continue with an example where we can succesfully apply a second-order criterion to
detect convexity of an important function.
Example 3.1.19 (The log-determinant function) Consider the function
− log(det X) if X 0,
f : Sn → R ∪ {+∞}, f (x) :=
+∞ else
(3.11)
which we call the (negative) log-determinant or the (negative) logdet function, for short. Then
f is proper, continuous and strictly convex, in particular, f ∈ Γ0 (Sn ): The continuity of f is
easily verified and as dom f = Sn++ , f is proper and twice differentiable on dom f with
∇f (X) = −X
and ∇2 f (X) = X −1 (·)X −1
(X 0),
see Example 1.1.5 and Exercise 1.6..
In particular, it holds for all X ∈ dom f and H ∈ Sn \ {0} that
2
∇ f (X)(H), H = tr X −1 HX −1 H = tr (HX −1/2 )T X −1 (HX −1/2 ) > 0,
as X −1 0 and hence also HX −1/2 6= 0. Thus, by Theorem 3.1.18, f is strictly convex.
3.2 Minimization and convexity
We turn our attention to minimization problems of the form
inf f (x),
x∈C
73
(3.12)
3 Convex Functions
where C ⊂ E is nonempty and closed and f : E → R ∪ {+∞} at least lsc. Note that (3.12) is
equivalent to the problem
inf f (x) + δC (x),
x∈E
a simple fact that we are going to exploit frequently. If f ∈ Γ0 and C is convex, we call (3.12)
a convex minimization (optimization) problem.
When talking about minimizers the questions for uniqueness and existence arise naturally.
We start our study with some general existence results.
3.2.1 General existence results
Existence results tradionally employ coercivity properties of the objective function, and, more
or less, do not depend too much on convexity.
Definition 3.2.1 (Coercivity and supercoercivity) Let f : E → R. Then f is called
i) coercive if
lim
kxk→+∞
f (x) = +∞;
ii) supercoercive if
f (x)
= +∞.
kxk→+∞ kxk
lim
The nomenclature for the above coercivitiy concepts is not unified in the literature. We use the
same naming as in [1]. In [3], for instance, the authors use 0-coercive and 1-coercive for coercive
and supercoercive instead.
In fact, we have already dealt with coercivity under a different moniker as the following
result shows whose elementary proof is left to the reader as an exercise.
Lemma 3.2.2 (Level-boundedness = coercivity) A function f : E → R is coercive if and
only if it is level-bounded.
Proof: Exercise 3.7.
In the lsc and convex case, coercivity is checked much easier.
Proposition 3.2.3 (Coercivity of convex functions) Let f ∈ Γ0 . Then f is coercive if and
only if there exists α ∈ R such that lev≤α f is nonempty and bounded.
Proof: By Lemma 3.2.2, coercivity implies that all level sets are bounded and, as f is proper,
there is a nonempty one, too.
In turn, assume that lev≤α f is nonempty and bounded for some α ∈ R and pick x ∈ lev≤α .
Clearly, all level sets to levels smaller than α are bounded, too. Hence, we still need to show
that lev≤γ is bounded for all γ > α. To this end take v ∈ (lev≤γ )∞ .
74
3 Convex Functions
Since lev≤γ f is closed and convex (as f is lsc and convex, cf. Proposition 1.2.4 and Exercise
3.6.), Corollary 2.4.24 yields
x + λv ∈ lev≤γ f (λ ≥ 0).
(3.13)
Hence, for all λ > 1, it can be seen that
1
1
x+v = 1−
x + (x + λv),
λ
λ
and hence, by convexity and (3.13), we obtain
1
1
1
1
f (x) + f (x + λv) ≤ 1 −
f (x) + γ.
f (x + v) ≤ 1 −
λ
λ
λ
λ
(3.14)
Letting λ → +∞ and recalling that x ∈ lev≤α f , (3.14) gives
f (x + v) ≤ f (x) ≤ α.
As v ∈ (levγ f )∞ was chosen arbitrarily, we infer that
x + (lev≤γ f )∞ ⊂ lev≤α f.
However, by the choice of α, lev≤α f is bounded, and hence, necessarily the cone (lev≤γ f )∞
is bounded, too. That leaves only (lev≤γ f )∞ = {0}, and hence, by Proposition 2.4.21, lev≤γ f
is bounded, which completes the proof.
We now present the main existence result for minimization problems, which is, in fact, only
a corollary to the existence result Theorem 1.2.6 using our new terminology and stating the
constrained case explicitly.
Corollary 3.2.4 (Existence of minimizers) Let f : E → R ∪ {+∞} be lsc and let C ⊂ E be
closed such that dom f ∩ C 6= ∅ and suppose that one of the following holds:
i) f is coercive;
ii) C is bounded.
Then f has a minimizer over C.
Proof: Consider the function g = f + δC . Then it holds that
lev≤α g = C ∩ lev≤α f
(α ∈ R).
Hence, under either assumption i) (cf. Lemma 3.2.2) or ii), g has closed and bounded level-sets
and is hence lsc and level-bounded. The assertion hence follows from Theorem 1.2.6.
We now apply this result to the sum of functions:
75
3 Convex Functions
Corollary 3.2.5 (Existence of minimizers II) Let f, g : E → R ∪ {+∞} be lsc such that
dom f ∩ dom g 6= ∅. If f is coercive and g is bounded from below, then f + g is coercive and has
a minimizer (over E).
Proof: In view of Corollary 3.2.4 it suffices to show that f + g is coercive, as it is already lsc,
cf. Exercise 1.12. Putting g ∗ := inf E g > −∞, we see that
f (x) + g(x) ≥ f (x) + g ∗ →kxk→∞ +∞,
which proves the result.
3.2.2 Convex minimization
We now turn our attention to convex minimization problems: Recall the notion of global and
local minimizers from Definition 2.4.16. It turns out that there is no distinction needed in the
convex setting.
Proposition 3.2.6 Let f ∈ Γ. Then every local minimizer of f (over E) is a global minimizer.
Proof: Let x̄ be a local minimizer of f and suppose there exists x̂ such that f (x̂) < f (x̄). Now
let λ ∈ (0, 1) and put xλ := λx̂ + (1 − λ)x̄. By convexity, we have
f (xλ ) ≤ λf (x̂) + (1 − λ)f (x̄) < f (x̄)
∀λ ∈ (0, 1).
On the other hand, we see that xλ → x̄ as λ ↓ 0, which all in all contradicts the fact that x̄ is a
local minimizer of f . Hence, x̂ cannot exist, which means that x̄ is even a global minimizer of
f.
Using our usual technique of casting a constrained optimization problem as an unconstrained
problem by means of the indicator function of the constraint set, we immediately get the following result.
Corollary 3.2.7 (Minimizers in convex minimization) Let f ∈ Γ and C ⊂ E nonempty
and convex. Then every local minimizer of f over C is a global minimizer of f over C.
Proof: Apply Proposition 3.2.6 to the function f +δC which is convex by Proposition 3.1.11. The following results show that convex minimization problems have convex solution sets.
Proposition 3.2.8 Let f ∈ Γ. Then argmin f is a convex set.
76
3 Convex Functions
Proof: If f ∗ := inf f ∈ R, we have that argmin f = lev≤f ∗ f . But as a convex function, f
has convex level sets, cf. Exercise 3.6.
We state the constrained case explicitly.
Corollary 3.2.9 Let f ∈ Γ and C ⊂ E convex. Then argminC f is convex.
Proof: Apply Proposition 3.2.8 to f + δC ∈ Γ.
Uniqueness of minimizers of (convex) minimization problems comes into play with strict convexity, see Definition 3.1.8.
Proposition 3.2.10 (Uniqueness of minimizers) Let f ∈ Γ be strictly convex. Then f has at
most one minimizer.
Proof: Assume that x, y ∈ argmin f , i.e. inf f = f (x) = f (y). If x 6= y, then strict convexity
of f implies for all λ ∈ (0, 1) that
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) = inf f,
This is a contradiction, hence x = y.
Corollary 3.2.11 (Minimizing the sum of convex functions) Let f, g ∈ Γ0 such that dom f ∩
dom g 6= ∅. Suppose that one of the follwoing holds:
i) f is supercoercive;
ii) f is coercive and g is bounded from below.
Then f + g is coercive and has a minimizer (over E). If f or g is strictly coercive, f + g has exaclty
one minimizer.
Proof: Since f + g ∈ Γ0 , f + g is, in particular, lsc, hence for the first assertion, in view of
Corollary 3.2.4, we only need to prove that f + g is coercive in either of the cases i) or ii). If f
is supercoercive, then f + g is supercoercive by Exercise 3.9., hence in particular, coercive. In
the second case, everything works also without convexity, see Corollary 3.2.5.
The uniqueness result follows immediately from Proposition 3.2.10, realizing that f +g ∈ Γ0
is strictly convex if one of the summands is.
We close out the section with a very powerful result on optimal value functions of parameterdependent convex minimization problem.
77
3 Convex Functions
Theorem 3.2.12 (Parametric minimization) Let h : E1 × E2 → R ∪ {+∞} be convex. Then
the optimal value function
ϕ : E1 → R, ϕ(x) := inf h(x, y)
y∈E2
is convex. Moreover, the set-valued mapping
x 7→ argmin h(x, y) ⊂ E2 .
y∈E2
is convex-valued.
Proof: It can easily be shown that epi < ϕ = L(epi < h) under the linear mapping L :
(x, y, α) 7→ (x, α). This immediately gives the convexity of ϕ.
The remaining assertion follows immediately from Proposition 3.2.8, since y 7→ h(x, y) is
convex for all x ∈ E1 .
3.3 Affine minorization of convex functions
In this section we will prove that every proper, convex function that does not take the value
−∞ is minorized by an affine mapping at every point of the relative interior of its domain,
and this minorant can actually be chosen such that it coincides with the convex function at the
point in question.
This result is a very useful tool for proofs involving convex functions and has tremendous
consequences for subdifferential and duality theory of convex function as we will see later on.
For these purposes we need to study the relative interior of the epigraph of a convex function
and how it is related to the relative interior of the domain of the function in question. Note that
we can acutally speak of these relative interiors, since a convex function has (by definition) a
convex epigraph and (see Proposition 3.1.2) also a convex domain.
Proposition 3.3.1 (Relative interior of epigraph) Let f : E → R be convex. Then
ri (epi f ) = {(x, α) ∈ E × R | x ∈ ri (dom f ), f (x) < α } .
Proof: Let L be the linear mapping from (3.1). By Proposition 2.3.15 we have
ri (dom f ) = L(ri (epi f )).
(3.15)
Now, take x ∈ ri (dom f ). For the subset of ri (epi f ) that is mapped to x under L, we compute
L−1 ({x}) ∩ ri (epi f ) = ({x} × R) ∩ ri (epi f )
= ri [({x} × R) ∩ epi f ]
78
3 Convex Functions
= ri [{x} × [f (x), +∞)]
= {x} × (f (x), +∞),
where the third equality uses the fact that {x} × R is relatively open and Proposition 2.3.14 b).
Thus, for (x, α) ∈ ri (epi f ), we have x ∈ ri (dom f ) by (3.15), and hence (x, α) ∈ L−1 ({x})∩
ri (epi f ) = {x} × (f (x), +∞), in particular, α > f (x).
In turn, if x ∈ ri (dom f ) and f (x) < α then (x, α) ∈ {x} × (f (x), +∞) = L−1 (x) ∩
ri (epi f ), in particular, (x, α) ∈ ri (epi f ).
Note that, by the description from (3.3.1), the relative interior of the epigraph of a given convex
function f does not necessarily coincide with its strict epigraph {(x, α) ∈ E × R | f (x) < α } .
We now come to the promised main theorem of this paragraph.
Theorem 3.3.2 (Affine minorization theorem) Let f ∈ Γ and x̄ ∈ ri (dom f ) (6= ∅). Then
there exists g ∈ E such that
f (x) ≥ f (x̄) + hg, x − x̄i
(x ∈ E).
(3.16)
In particular, there exists an affine mapping, namely
F : x ∈ E 7→ hg, x − x̄i + f (x̄) ∈ R,
which minorizes f everywhere and coincides with f at x̄.
Proof: By Proposition 3.3.1, we have ri (epi f ) = {(x, α) | x ∈ ri (dom f ), f (x) < α } .
Hence, (x̄, f (x̄)) ∈ rbd (epi f ). Thus, we can properly separate (x̄, f (x̄)) from epi f using
Proposition 2.6.10, i.e. there exists (s, η) ∈ (E × R) \ {0} such that
h(s, η), (x, α)i ≤ h(s, η), (x̄, f (x̄))i
((x, α) ∈ epi f )
and
h(s, η), (x, α)i < h(s, η), (x̄, f (x̄))i
((x, α) ∈ ri (epi f )).
For α > f (x̄) we have (x, α) := (x̄, α) ∈ ri (epi f ), hence, the latter yields
η(α − f (x̄)) < 0,
which immediately implies that η < 0.
s
Now put g := |η|
. Dividing (3.17) by |η| then yields
h(g, −1), (x, α)i ≤ h(g, −1), (x̄, f (x̄))i
((x, α) ∈ epi f )
or, equivalently,
α ≥ f (x̄) + hg, x − x̄i
79
((x, α) ∈ epi f ).
(3.17)
3 Convex Functions
As f is proper, (x, f (x)) ∈ epi f for all x ∈ dom f , thus
f (x) ≥ f (x̄) + hg, x − x̄i
(x ∈ dom f ).
For x ∈
/ dom f , this inequality holds trivially, hence the result is proven.
3.4 Infimal convolution of convex functions
Definition 3.4.1 (Infimal convolution) Let f, g : E → R ∪ {+∞}. Then the function
f #g : E → R,
(f #g)(x) := inf {f (u) + g(x − u)}
u∈E
is called the infimal convolution of f and g. We call the infimal convolution f #g exact at x ∈ E
if
argmin{f (u) + g(x − u)} =
6 ∅.
u∈E
We simply call f #g exact if it is a exact at every x ∈ dom f #g.
Observe that we have the representation
(f #g)(x) =
inf
u1 ,u2 :u1 +u2 =x
{f (u1 ) + g(u2 )}.
(3.18)
This has some obvious, yet useful consequences.
Lemma 3.4.2 Let f, g : E → R ∪ {+∞}. Then the following hold:
a) dom f #g = dom f + dom g;
b) f #g = g#f .
Moreover, observe the trivial inequality
(f #g)(x) ≤ f (u) + g(x − u)
(u ∈ E).
(3.19)
Infimal convolution preserves convexity, as can be seen in the next result.
Proposition 3.4.3 (Infimal convolution of convex functions) Let f, g : E → R ∪ {+∞}
be convex. Then f #g is convex.
80
3 Convex Functions
Proof: Defining
h : E × E → R ∪ {+∞},
h(x, y) := f (y) + g(x − y),
we see that h is convex (jointly in (x, y)) as a sum of the convex functions (x, y) 7→ f (y) and
(x, y) 7→ g(x − y), the latter being convex by Proposition 3.1.13. By definition of the infimal
convolution, we have
(f #g)(x) = inf h(x, y),
y∈E
hence, Theorem 3.2.12 yields the assertion.
We continue with an important class of functions that can be constructed using infimal convolution, and that is intimately tied to projection mappings.
Example 3.4.4 (Distance functions) Let C ⊂ E. Then the function dC := δC #k · k is called
the distance (function) to the set C. It holds that
dist C (x) = inf kx − uk.
u∈C
Hence, from Lemma 2.5.1 it is clear that, if C ⊂ E is closed and convex, we have
dist C (x) = kx − PC (x)k.
In order to preserve lower semicontinuity as well, it is not enough to simply assume that the
the functions that are convoluted are lsc (and convex). Section 3.2.1, however, provides us with
the necessary tools to deal with this issue.
Theorem 3.4.5 (Infimal convolution in Γ0 ) Let f, g ∈ Γ0 and suppose that one of the following conditions hold:
i) f is supercoercive;
ii) f is coercive and g is bounded from below.
Then f #g ∈ Γ0 and is exact.
Proof: By Lemma 3.4.2, dom f #g = dom f + dom g 6= ∅. Now, take x ∈ dom f #g. Then, by
the definition of f #g, we have dom f ∩ dom g(x − (·)) 6= ∅. Hence, Corollary 3.2.11 implies
that f + g(x − (·)) has a minimizer. Thus, for all x ∈ dom f #g there exists u ∈ E such that
(f #g)(x) = f (u) + g(x − u) ∈ R.
In particular, f #g is proper and exact. Since, by Proposition 3.4.3, f #g ∈ Γ, it remains to be
shown that f #g is lsc. For these purposes, let x̄ ∈ E and {xk } → x̄ such that (f #g)(xk ) → α.
81
3 Convex Functions
We need to show that α ≥ f (x̄), hence, w.l.o.g. we can assume that α < +∞ (since otherwise
there is nothing prove), in particular, xk ∈ dom f #g for all k ∈ N (sufficiently large).
Then, by our recent findings, there exists {uk ∈ E} such that
(f #g)(xk ) = f (uk ) + g(xk − uk )
(k ∈ N).
We claim that {uk } is bounded: Assume this were false, then (after passing to a subsequence
if necessary) we have 0 6= kuk k → +∞. We now show that under either of the assumptions i)
and ii), respectively, this yields a contradiction:
i): By Theorem 3.3.2, there exists an affine minorant of g, say x 7→ hb, xi + γ. Using the
Cauchy-Schwarz inequality , we have
f (uk )
kuk k
− kbk + hb, xk i + γ ≤ f (uk ) + hb, xk − uk i + γ
kuk k
≤ f (uk ) + g(xk − uk )
=
(f #g)(xk )
→ α
<
+∞.
But, as f is supercoercive and we have (by assumption) that kuk k → ∞, the term on the
left-hand side is unbounded from above, which is a contradiction, and hence, {uk } must
be bounded.
ii): Since f is coercive, we have f (uk ) → +∞ under the assumption that kuk k → +∞. But,
since f (uk ) + g(xk − uk ) → α < +∞, we necessarily have g(xk − uk ) → −∞, which
is impossible if g is bounded from below, hence {uk } must be bounded.
All in all, we get in either case that {uk } is bounded and w.l.o.g. we can assume that uk → u.
Relabeling the sequence {xk } if necessary, we obtain
α =
=
lim (f #g)(xk )
k→∞
lim f (uk ) + g(xk − uk )
k→∞
≥ lim inf f (uk ) + lim inf g(xk − uk )
k→∞
k→∞
≥ f (u) + g(x − u)
≥ (f #g)(x).
This concludes the proof.
82
3 Convex Functions
3.4.1 Moreau envelopes
One of the most important and frequently used instances of infimal convolutions is defined
below.
Definition 3.4.6 (Moreau envelope and proximal mapping) Let f : E → R. The Moreau
envelope (or Moreau-Yosida regularization) of f (to the parameter λ > 0) is the function eλ f :
E → R defined by
1
eλ f (x) := inf f (u) +
kx − uk2 .
u∈E
2λ
The (possibly set-valued) mapping Pλ f : E ⇒ E
1
kx − uk2
Pλ f (x) := argmin f (u) +
2λ
u∈E
is called the proximal mapping or prox-operator to the parameter λ > 0 of f .
Note that it is easily seen that
eλ (αf ) = αeαλ f
(α, λ > 0).
(3.20)
From our findings in Section 3.2 and from above we can immediately state the following
result.
Proposition 3.4.7 Let f ∈ Γ0 and λ > 0. Then eλ f ∈ Γ0 and Pλ f is single-valued (in particular
nonempty).
1
Proof: For the prox-operator everything follows from Corollary 3.2.11 since 2λ
kx − (·)k2 is
strongly convex hence supercoercive and strictly convex, and it is continuous, hence lsc.
The fact that eλ f ∈ Γ0 follows from Theorem 3.4.5.
Note that by definition and the above result, for f ∈ Γ0 , λ > 0 and x ∈ E, we have
eλ f (x) = f (Pλ f (x)) +
1
1
kx − Pλ f (x)k2 ≤ f (y) +
kx − yk2
2λ
2λ
(y ∈ E).
(3.21)
The next results show that the prox-operator for closed, proper convex functions is in fact a
generalization of the projection onto closed, convex sets.
Proposition 3.4.8 Let f ∈ Γ0 and let x, p ∈ E. Then p = P1 f (x) if and only if
hy − p, x − pi + f (p) ≤ f (y)
83
(y ∈ E).
(3.22)
3 Convex Functions
Proof: First, assume that p = P1 f (x) and let y ∈ E. Then put pα := αy+(1−α)p (α ∈ (0, 1)).
Then, for every α ∈ (0, 1), by convexity and (3.21) we have
f (p) ≤ f (pα ) + 21 kx − pα k2 − 12 kx − pk2
≤ αf (y) + (1 − α)f (p) − α hx − p, y − pi +
α2
2 ky
− pk2 ,
and hence
hy − p, x − pi + f (p) ≤ f (y) +
α
ky − pk2
2
(y ∈ E, α ∈ (0, 1)).
Letting α ↓ 0, we obtain (3.22).
Conversely, suppose that (3.22) holds. Then we deduce
1
1
1
f (p) + kx − pk2 ≤ f (y) + kx − pk2 + hx − p, p − yi + kp − yk2
2
2
2
1
2
= f (y) + kx − yk
2
for all y ∈ E. Thus, p = P1 f (x).
Note that (3.22) is a generaliztion of (2.13), which can be seen by simply plugging in f = δC
for some closed convex set C.
The next result shows that the prox-operator is globally Lipschitz continuous.
Proposition 3.4.9 (Lipschitz continuity of prox-operator) Let f ∈ Γ0 . Then
kP1 f (x) − P1 f (y)k ≤ kx − yk (x, y ∈ E),
i.e. P1 f is globally Lipschitz continuous with Lipschitz modulus 1.
Proof: Let x, y ∈ E and put p := P1 f (x) and q := P1 f (y). Then Proposition 3.4.8 yields
hq − p, x − pi + f (p) ≤ f (q)
and
hp − q, y − qi + f (q) ≤ f (p).
Since p, q ∈ dom f , adding thes two inequalities, we get
0 ≤ hp − q, (x − y) − (p − q)i = hp − q, x − yi − kp − qk2 .
Using the Cauchy-Schwarz inequality we obtain the desired result.
Can easily apply the results on the prox-operator of some f ∈ Γ0 and the parameter λ = 1 to
arbitrary parameters λ > 0 through the identity
Pλ f = P1 (λf ),
which is an immediate consequence of (3.20).
84
(3.23)
3 Convex Functions
Theorem 3.4.10 (Differentiability of Moreau envelopes in Γ0 ) Let f ∈ Γ0 and λ > 0.
Then eλ f is differentiable with gradient
∇(eλ f ) =
1
(id − Pλ f ),
λ
which is globally Lipschitz with modulus λ1 .
Proof: Assume that x, y ∈ E are distinct points and set p := Pλ f (x) and q := Pλ f (y). Using
(3.21), (3.23) and Proposition 3.4.8, we obtain
eλ f (y) − eλ f (x) = f (q) − f (p) +
=
≥
=
≥
1
ky − qk2 − kx − pk2
2λ
1
2[(λf )(q) − (λf )(p)] + ky − qk2 − kx − pk2
2λ
1
2 hq − p, x − pi + ky − qk2 − kx − pk2
2λ
1
ky − q − x + pk2 + 2 hy − x, x − pi
2λ
1
hy − x, x − pi .
λ
Analogously, by simply changing the roles of x and y in the application of Proposition 3.4.8,
we obtain
1
eλ f (y) − eλ f (x) ≤ hy − x, y − qi .
λ
Using the last two inequalities and invoking Proposition 3.4.9, we obtain
0 ≤ eλ f (y) − eλ f (x) −
≤
≤
≤
1
hy − x, x − pi
λ
1
hy − x, (y − x) − (p − q)i
λ
1
kx − yk2 − kp − qk2
λ
1
ky − xk2 .
λ
Therefore,
eλ f (y) − eλ f (x) − y − x,
lim
y→x
kx − yk
1
λ (x
− p)
= 0,
which proves the differentiabilty of eλ f and the gradient formula. The Lipschitz continuity of
the gradient is then due to Proposition 3.4.9.
As a nice application we can prove the differentiability of the squared Euclidean distance function.
85
3 Convex Functions
Example 3.4.11 (Differentiability of squared distance function) Let C ⊂ E be nonempty,
closed and convex. Then 21 dist 2C = e1 δC is convex and differentiable with Lipschitz gradient
1
2
∇
dist C = id − PC .
2
We close out the section with a result that is tremendously interesting from an optimization
perspective.
Proposition 3.4.12 (Minimizers of Moreau envelope) Let f ∈ Γ0 , λ > 0 and x̄ ∈ E. Then
argmin f = argmin eλ f
E
E
and
inf f = inf eλ f.
E
E
1
kx̄ − (·)k2 . But since the unique
Proof: Let x̄ ∈ argmin f . Then x̄ also minimizes f + 2λ
minimizer of the latter is Pλ f (x̄), we must have x̄ = Pλ f (x̄). Hence, by Theorem 3.4.10, we
have ∇eλ f (x̄) = 0. Thus, as eλ f is convex, x̄ ∈ argminE eλ f , cf. Exercise 3.11.
In turn, if x̄ ∈ argmin eλ f , then ∇eλ f (x̄) = λ1 (x̄ − Pλ f (x̄)) = 0, hence, Pλ f (x̄) = x̄
therefore
f (x̄) = eλ f (x̄) ≤ eλ f (y) ≤ f (y) (y ∈ E).
All in all we have proven the equality for the argmin sets and the identity for the infima if
attained.
Since eλ f ≤ f , we always have inf eλ f ≤ inf f . Conversely, fix x ∈ E. Then
1
k(·) − xk2 = eλ f (x).
2λ
Taking the infimum over x ∈ E gives the converse inequality.
inf f ≤ f +
Note that, implicitly, we proved above that x̄ ∈ E is a minimizer of f if and only if Pλ f (x̄) = x̄,
i.e. the fixpoints of Pλ f are exactly the minimizers of f .
3.5 Continuity properties of convex functions
In this section we want to study continuity properties of convex functions.
We start by defining continuity notions relative to a set.
Definition 3.5.1 (Continuity relative to a set) Let S ⊂ E. A function f : E → R is said to
be continuous relative to (or on ) S if
lim f (xk ) = f (x)
k→∞
(x ∈ S, {xk ∈ S} → x).
In addition, we call f Lipschitz (continuous) relative to S if there exists L ≥ 0 such that
|f (x) − f (y)| ≤ Lkx − yk (x, y ∈ S).
86
3 Convex Functions
As a preparatory result, we compare a proper convex function with its closure.
Proposition 3.5.2 (Closure of convex functions) Let f ∈ Γ. Then cl f ∈ Γ0 . Moreover, cl f
agrees with f except perhaps on rbd (dom f ).
Proof: Since epi (cl f ) = cl (epi f ), cl f is lsc and convex. Now let x̄ ∈ ri (dom f ). Since
f ∈ Γ there is an affine function h ≤ f with h(x̄) = f (x̄) , cf. Theorem 3.3.2. Since every
affine function is continuous, in particular closed, we have cl f ≥ cl h = h. Hence,
(cl f )(x̄) ≤ f (x̄) = h(x̄) ≤ (cl f )(x̄),
therefore (cl f )(x̄) = f (x̄). This shows that f and cl f agree on ri (dom f ). In particular, we
see that cl f is proper.
Now, let x ∈
/ cl (dom f ). Clearly, all sequences {xk } → x have that xk ∈
/ dom f for all k
sufficiently large, i.e. f (xk ) = +∞, hence lim inf k→∞ f (xk ) = +∞, i.e (cl f )(x) = +∞. This
proves the assertion.
Just like we argued in Remark 2.3.8 that for a single, given nonempty convex set, we can always
assume w.l.o.g. that it has full dimension, we can do the same for the domain of a given proper
convex function.
Remark 3.5.3 When dealing with a proper convex function f : E → R, we know that dom f
is nonempty convex. Let U be the subspace parallel to aff (dom f ) (or any other subspace of E
of the same dimenstion.). By Theorem 1.4.18, there exists an invertible affine mapping F : E →
E such that F (aff (dom f )) = U . Defining g : U → E by g := f ◦ F −1 , we see that g is proper
convex with dom g = F (dom f ), i.e. aff (dom g) = aff (F (dom f )) = F (aff (dom f )) = U .
Hence, dom g has full dimension in U .
Clearly, Remark 3.5.3 does not apply if at least two functions are involved, whose domain does
not generate the same affine hull.
We are now in a position to prove our first main result on continuity of convex functions.
We encourage the reader to recast the proof without using the assumption (justified through
Remark 3.5.3) that f have a domain of full dimension.
Theorem 3.5.4 (Continuity of convex functions) A convex function f : E → R is continous
relative to any relatively open convex subset of dom f . In particular, it is continuous relative to
ri (dom f ).
Proof: Let C ⊂ dom f be relatively open and convex and consider g := f +δC . Then dom g =
C and g agrees with f on C. Hence, w.l.o.g. we may assume that C = dom f = ri (dom f );
otherwise we substitute f for g. Moreover, in the face of Remark 3.5.3, we can assume that
87
3 Convex Functions
C is N -dimensional, hence open instead of merely relatively open. If f is improper, we have
by Exercise 3.4. that f is identically −∞ on C and hence continuous on C. We can therefore
assume that f is proper. Hence, Proposition 3.5.2 guarantees that cl f = f on C, i.e. f is lsc
on C. To prove the result, it suffices to show that f is usc: By Proposition 3.3.1 and openness
of C = dom f , we have
int (epi f ) = {(x, α) | f (x) < α } .
Therefore, for γ ∈ R and with L : (x, α) → x, we find that
{x | f (x) < γ } = L (int (epi f ) ∩ {(x, α) | α < γ }) .
Since L is surjective and the intersection that it is applied to is open, the set {x | f (x) < γ }
is open, cf. Exercise 2.9. Thus, its complement, {x | f (x) ≥ γ } is closed. This is equivalent to
saying that f is usc, which concludes the proof.
Since finite functions have the whole space as their domain, the next result follows trivially.
Corollary 3.5.5 (Continuity of finite convex functions) A convex function f : E → R is
continuous.
We close out the section with our second main result of this section which is concerned with
Lipschitz continuity of convex functions.
Theorem 3.5.6 Let f ∈ Γ and let S ⊂ ri (dom f ) be compact. Then f is Lipschitz relative to S.
Proof: By Remark 3.5.3 we can assume w.l.o.g. that dom f is N -dimensional so that S actually
lies in int (dom f ). By compactness of S, the sets S + εB are compact of all ε > 0, cf. Exercise
1.7. or Corollary 2.4.29. Clearly, for ε > 0 small enough, S + εB ⊂ int (dom f ). Fix such an
ε. By Theorem 3.5.4, f is continuous on conv (S + εB) ⊂ int (dom f ) hence, in particular, on
S + εB. As S + εB is compact, f is bounded on S + εB, and let l and u be lower and upper
bound, respectively. Now, take x, y ∈ E with x 6= y and put
ε
z := y +
(y − x).
kx − yk
Then z ∈ S + εB and for λ :=
of f , we see that
kx−yk
ε+kx−yk
we have y = (1 − λ)x + λz, and hence, by convexity
f (y) ≤ (1 − λ)f (x) + λf (z) = f (x) + λ(f (z) − f (x)),
and consequently, for L :=
u−l
ε ,
we have
f (y) − f (x) ≤ λ(u − l) ≤ Lkx − yk.
Interchanging the roles of x and y gives the desired inequality.
88
3 Convex Functions
3.6 Conjugacy of convex functions
3.6.1 Affine approximation and convex hulls of functions
We have spent a significant amount of time studying affine functions. Affine mappings are tied
utterly close to half-spaces. In fact, given an affine mapping F : E → R, F (x) = hb, xi − β
(cf. Exercise 1.2.), we have
≤
epi F = {(x, α) ∈ E × R | hb, xi − β ≤ α } = H(b,−1),β
⊂ E × R.
Actually, it can be seen that every half-space in E × R has one of the following three forms:
1) {(x, α) | hb, xi ≤ β }
(vertical),
2) {(x, α) | hb, xi − α ≤ β }
(upper),
3) {(x, α) | hb, xi − α ≥ β }
(lower),
for some (b, β) ∈ E × R.
Theorem 3.6.1 (Envelope representation in Γ0 ) Let f ∈ Γ0 . Then f is the pointwise supremum of all affine functions minorizing f , i.e.
f (x) = sup {h(x) | h ≤ f, h affine } .
Proof: Since f is lsc and convex, epi f is a closed convex set in E × R, and therefore, by
Theorem 2.7.1, it is the intersection of all closed half-spaces in E × R containing it. No lower
half-space can possibly contain epi f . Hence, only vertical and upper half-spaces can be involved in the intersection. We argue that not all of these half-spaces can be vertical: Since f
is proper, there exists x ∈ dom f . Then (x, f (x) − ε) (ε > 0) lies in every vertical half-space
containing epi f , hence also in their intersection. On the other hand (x, f (x) − ε) does not lie
in epi f , hence not all half-spaces containing epi f can be vertical.
The upper half-spaces containing f , in turn, are simply the epigraphs of affine mappings
h ≤ f . The function that has the intersection of these epigraphs as its epigraphs is just the
pointwise supremum of all these functions. Hence, to prove the theorem, we must show that
the intersection of the upper half-spaces containing epi f equals the intersection of all upper
and vertical half-spaces containing epi f , i.e. that the first intersection excludes every point
that also the latter intersection excludes:
To this end, suppose that
V := {(x, α) | h1 (x) ≤ 0 } ,
h1 : x 7→ hb1 , xi − β1
is a vertical half-space containing epi f , and that (x0 , α0 ) ∈
/ V . It suffices to show that there
exists an upper half-space containing epi f that does not contain (x0 , α0 ), i.e. we need to find
an affine mapping h : E × R → R such that h ≤ f and h(x0 ) > α0 . We already know that
89
3 Convex Functions
there exists at least on affine function h2 : x 7→ hb2 , xi − β2 such that epi f ⊂ epi h2 , i.e.
h2 ≤ f . For every x ∈ dom f we have h1 (x) ≤ 0 and h2 (x) ≤ f (x), and hence
λh1 (x) + h2 (x) ≤ f (x)
(λ ≥ 0).
The above inequality holds trivially also for x ∈
/ dom f . Now, fix any λ ≥ 0 and define
hλ : E × R → R by
hλ (x) := λh1 (x) + h2 (x) = hλb1 + b2 , xi − (λβ1 + β2 ).
Then, clearly, hλ is affine with hλ ≤ f . Since h1 (x0 ) > 0, choosing λ̄ > 0 sufficiently large
guarantees that hλ̄ (x0 ) > α0 . Then h := hλ̄ has the desired properties, which concludes the
proof.
Let f : E → R and recall from Section 1.2.2 that the lower semicontinuous hull cl f of f is the
largest lower semicontinuous function that minorizes f or, equivalently,
cl (epi f ) = epi (cl f ).
With the same approach, we can build the convex hull of f .
Definition 3.6.2 (Convex hull of a function) Let f : E → R. Then the pointwise supremum
of all convex functions minorizing f , i.e.
conv f := sup h : E → R | h ≤ f, h convex
is called the convex hull of f .
Moreover, we define the closed convex hull of f to be
conv f := cl (conv f ),
i.e. conv f is the largest lsc convex function that minorizes f .
Note that, for f : E → R, we have
epi (conv f ) = conv (epi f ),
(3.24)
cf. Exercise 3.12. An analogous statement does not hold for the convex hull, see the discussion
in [3].
Corollary 3.6.3 (Envelope representation of closed, convex hull of proper functions)
Let f : E → R such that conv f is proper. Then conv f is the pointwise supremum of all affine
functions minorizing f .
90
3 Convex Functions
Proof: Since conv f is proper, so is conv f , by Exercise 3.4. d). Hence, conv f ∈ Γ0 thus,
by Theorem 2.7.1, conv f is the pointwise supremum of all its affine minorants. Moreover, we
have conv f ≤ f . On the other, since all affine functions are lsc and convex, there cannot be an
affine minorant of f , which is not a minorant of conv f . Hence, the affine minorants of conv f
and f coincide, which gives the desired result.
Note that the assumption that conv f be proper implies that f and cl f are proper, and is
equivalent to demanding that f has an affine minorant, cf. Exercise 3.13.
3.6.2 The conjugate of a function
We start with the central definition of this section.
Definition 3.6.4 (Conjugate of a function) Let f : E → R. Then its conjugate is the function
f ∗ : E → R defined by
f ∗ (y) := sup{hx, yi − f (x)}.
x∈E
The function
f ∗∗
:=
(f ∗ )∗
is called the biconjugate of f .
Note that, clearly, we can restrict the supremum in the above definition of the conjugate to the
domain of the underlying function f , i.e.
f ∗ (y) =
sup {hx, yi − f (x)}.
x∈dom f
Moreover, by definition, we always have
f (x) + f ∗ (y) ≥ hx, yi
(x, y) ∈ E,
(3.25)
which is known as the Fenchel-Young inequality.
The mapping f 7→ f ∗ from the space of extended real-valued functions to itself is called the
Legendre-Fenchel transform.
We always have
f ≤ g =⇒ f ∗ ≥ g ∗ ,
i.e. the Legendre-Fenchel transform is order-reversing.
Before we start analyzing the conjugate function in-depth, we want to motivate why we
would be interested in studying it: Let f : E → R. We notice that
epi f ∗ = {(y, β) | hx, yi − f (x) ≤ β
(x ∈ E) } .
(3.26)
This means that the conjugate of f is the function whose epigraph is the set of all (y, β) defining
affine functions x 7→ hy, xi − β that minorize f . In view of Corollary 3.6.3, if conv f is proper,
91
3 Convex Functions
the pointwise supremum of these affine mappings is the closed convex hull of f , i.e., through
its epigraph, f ∗ encodes the family of affine minorants of conv f , i.e. of f itself.
Since,
f ∗ (y) = sup{hx, yi − f (x)} = sup {hy, xi − α} (y ∈ E),
(3.27)
x∈E
(x,α)∈epi f
we also have
epi f ∗ = {(y, β) | hx, yi − α ≤ β
((x, α) ∈ epi f ) }
We use our recent findings to establish the first major result on conjugates and biconjugates:
Theorem 3.6.5 (Fenchel-Moreau Theorem) Let f : E → R such that conv f is proper (hence,
so is f ). Then the following hold:
a) f ∗ and f ∗∗ are closed, proper and convex ;
b) f ∗∗ = conv f ;
c) f ∗ = (conv f )∗ = (cl f )∗ = (conv f )∗ .
Proof: First note that the assumption that conv f is proper implies that both f and conv f
are proper, cf. Exercise 3.13. and Exercise 3.4.
a) Applying Proposition 3.1.12 to (3.27), we see that f ∗ is lsc and convex. If f ∗ attained the
value −∞, f would be constantly +∞, which is false. On the other hand, f ∗ is not identically +∞, since that would imply that its epigraph, which, as conv f is proper, encodes
all minorizing affine mappings of conv f , were empty, which is also false . Hence, f ∗ is
proper.
Applying the same arguments to f ∗∗ = (f ∗ )∗ gives that f ∗∗ is closed, proper and convex,
too.
b) Applying (3.27) to f ∗∗ , for x ∈ E, we have
f ∗∗ (x) =
sup
{hy, xi − β}.
(y,β)∈epi f ∗
Hence, in view of (3.26), f ∗∗ is the pointwise supremum of all affine minorants of f .
Therefore, by Corollary 3.6.3, we see that f ∗∗ = conv f .
c) Since the affine minorants of f, conv f, cl f and conv f coincide their conjugates have
the same epigraph and hence are equal.
Note that due to item b) from Theorem 3.6.5 we always have f ≥ f ∗∗ for a function f : E → R
with conv f proper, and it holds that f ∗∗ = f if and only if f is closed and convex. Thus, the
92
3 Convex Functions
Legendre-Fenchel transform induces a one-to-one correspondence on Γ0 : For f, g ∈ Γ0 , f is
∗
conjugate to g if and only if g is conjugate to f and we write f ←→ g in this case. This is
called the conjugacy correspondence. A property on one side is reflected by a dual property on
the other.
A list of some elementary cases of conjugacy is given below.
∗
Proposition 3.6.6 (Elementary cases of conjugacy) Let f ←→ g. Then the following hold:
∗
a) f − ha, ·i
∗
b) f + γ
c) λf
←→
←→
∗
←→
λg
(a ∈ E);
g((·) + a)
g−γ
(γ ∈ R);
(·)
λ
(λ > 0).
3.6.3 Special cases of conjugacy
Convex quadratic functions
For Q ∈ Sn , b ∈ Rn we consider the quadratic function q : Rn → R defined by
1
q(x) := xT Qx + bT x.
2
(3.28)
From Theorem 3.1.18 we know that Q is (strongly) convex if and only if Q is positive (definite)
semidefinite. Hence, for the remainder we assume that Q 0.
We are interested in computing the conjugate of q. This is easy if Q is postive definite. In
the merely semidefinite case the following tool is very useful:
Theorem 3.6.7 (Moore-Penrose pseudoinverse) Let A ∈ Sn+ with rank A = r and the spectral decomposition
 λ1

..
A = QΛQT
with

Λ=

.

,

λr
0
..
Q ∈ O(n).
.
0
Then the matrix
 λ−1

1
†
†
T
A := QΛ Q
with


Λ := 

†
..
.


,

λ−1
r
0
..
.
0
called the (Moore-Penrose) pseudoinverse of A, has the following properties:
93
3 Convex Functions
a) AA† A = A and A† AA† = A† ;
b) (AA† )T = AA† and (A† A)T = A† A;
c) (A† )T = (AT )† ;
d) If A 0, then A† = A−1 ;
e) AA† = Pim A , i.e. AA† is the projection onto the image of A. In particular, if b ∈ rge A,
we have
{x ∈ Rn | Ax = b } = A† b + ker A.
In fact, the Moore-Penrose pseudoinverse can be uniquely defined through properties a) and b)
from above for any matrix A ∈ Cm×n , see, e.g. [4], but we confine ourselves with the positive
semidefinite case.
We are now in a position to state the desired conjugacy result for convex quadratics.
Proposition 3.6.8 (Conjugate of convex quadratic functions) For q from (3.28) with Q ∈
Sn+ we have
1
T †
if
y − b ∈ rge Q,
∗
2 (y − b) Q (y − b)
q (y) =
+∞ else.
In particular, if Q 0, we have
1
q ∗ (y) = (y − b)T Q−1 (y − b)
2
Proof: By definition, we have
1 T
1 T
∗
T
T
T
q (y) = sup x y − x Qx − b x = − inf
x Qx − (b − y) x .
x∈E 2
2
x∈Rn
(3.29)
The necessary and sufficient optimality conditions of x̄ to be a minimizer of the convex function
x 7→ 21 xT Qx − (b − y)T x read
Qx̄ = y − b
(3.30)
cf. Exercise 3.11. Hence, if y − b ∈
/ im Q, from Exercise 1.8., we know that inf f = −∞, hence
∗
q (y) = +∞ in that case.
Otherwise, we have y − b ∈ im Q, hence, in view of Theorem 3.6.7, (3.30) is equivalent to
x̄ = Q† (y − b) + z,
z ∈ ker A.
Inserting x̄ = Q† (y − b) (we can choose z = 0) in (3.29) yields
1
q ∗ (y) = (Q† (y − b))T y − (Q† (y − b))T QQ† (y − b) − bT Q† (y − b)
2
94
3 Convex Functions
1
= (y − b)Q† (y − b) − (y − b)Q† QQ† (y − b)
2
1
(y − b)Q† (y − b),
=
2
where we make use of Theorem 3.6.7 a) and c). Part d) of the latter result gives the remaining
assertion.
We point out that, by the foregoing result, the function f = 12 k · k2 is self-conjugated in the
sense that f ∗ = f . Exercise 3.14. shows that this is the only function on Rn that has this
property. Clearly, by an isomorphy argument, the same holds for the respective function on
an arbitrary Euclidean space.
Support functions
Definition 3.6.9 (Positive homogeneity, subadditivity, and sublinearity) Let f : E →
R. Then we call f with 0 ∈ dom f
i) positively homogeneous if
f (λx) = λf (x)
(λ > 0, x ∈ E);
b) subadditive if
f (x + y) ≤ f (x) + f (y)
(x, y ∈ E);
c) sublinear if
f (λx + µy) ≤ λf (x) + µf (y)
(x, y ∈ E, λ, µ > 0).
Note that in the definition of positive homogeneity we could have also just demanded an inequality, since f (λx) ≤ λf (x) for all λ > 0 implies that
f (x) = f (λ−1 λx) ≤
1
f (λx).
λ
We note that norms are sublinear.
Example 3.6.10 Every norm k · k is sublinear.
We next proivide a usful list of characerizations of positive homogeneneity and sublinearity,
respectively.
Proposition 3.6.11 (Positive homogeneity, sublinearity and subadditivity) Let f : E →
R. Then the following hold:
95
3 Convex Functions
a) f is positively homogeneous if and only if epi f is a cone. In this case f (0) ∈ {0, −∞}.
b) If f is lsc and positively homogeneous with f (0) = 0 it must be proper.
c) The following are equivalent:
i) f is sublinear;
ii) f is positively homogeneous and convex;
iii) f is positively homogeneous and subadditive;
iv) epi f is a convex cone.
Proof: Exercise 3.16.
We continue with the prototype of a sublinear functions, so-called support functions, which
will from now on occur ubiquitiously.
Definition 3.6.12 (Support functions) Let C ∈ E nonempty. The support function of C is
defined by
σC : x ∈ E 7→ sup hs, xi .
s∈C
We start our investigation of support functions with a list of elementary properties.
Proposition 3.6.13 (Support functions) Let C ⊂ E be nonempty. Then
a) σC = σcl C = σconv C = σconv C .
b) σC is proper, lsc and sublinear.
∗ = σ and σ ∗ = δ
c) δC
C
conv C .
C
∗
d) If C is closed and convex then σC ←→ δC .
Proof:
a) Obviously, closures do not make a difference. On the other hand, we have
*N +1
+ N +1
X
X
λi si , x =
λi hsi , xi ≤ max hsi , xi
i=1
i=1
i=1,...,r
for all si ∈ C, λ ∈ ∆N +1 , which shows that convex hulls also do not change anything.
b) By Proposition 3.1.12 σC is lsc and convex, and as 0 ∈ dom σC and since λσC (x) =
σC (λx) for all x ∈ E and λ > 0 this shows properness and positive homogeneity, which
gives the assertion in view of Proposition 3.6.11 c).
96
3 Convex Functions
∗ = σ . Hence, σ ∗ = δ ∗∗ = conv δ = δ
c) Clearly, δC
conv C , since
C
C
C
C
conv (epi δC ) = conv (C × R+ ) = conv C × R+ = epi (δconv C ).
d) Follows immediately from c).
One of our main goals in this paragraph is to show that, in fact, part b) of Propostion 3.6.13
can be reversed in the sense that we will see that every proper, lsc and sublinear function is a
support function. As a preparation we need the following result.
Proposition 3.6.14 Let f : E → R be closed, proper and convex. Then the following are equivalent:
i) f only takes the values 0 and +∞;
ii f ∗ is positively homogeneous (i.e. sublinear, since convex).
Proof: ’i)⇒ii):’ In this case f = δC for some closed convex set C ⊂ E. Hence, f ∗ = σC ,
which is sublinear, cf. Proposition 3.6.13.
In turn, let f ∗ be positively homogeneous (hence sublinear). Then, for λ > 0 and y ∈ E, we
have
f ∗ (y) = λf ∗ (λ−1 y)
= λ sup x, λ−1 y − f (x)
x∈E
= sup {hx, yi − λf (x)}
x∈E
= (λf )∗ (y).
Thus, (λf )∗ = f ∗ for all λ > 0 and hence, by the Fenchel-Moreau Theorem, we have
λf = (λf )∗∗ = f ∗∗ = f
(λ > 0).
But as f is proper, hence does not takte the value −∞, this immediately implies that f only
takes the values +∞ and 0.
Theorem 3.6.15 (Hörmander’s Theorem) A function f : E → R is proper, lsc and sublinear
if and only if it is a support function.
97
3 Convex Functions
Proof: By Proposition 3.6.13 b), every support function is proper, lsc and sublinear.
In turn, if f is proper, lsc and sublinear (hence f = f ∗∗ ), by Proposition 3.6.14, f ∗ is the
indicator of some set C ⊂ E, which necessary needs to be nonempty, closed and convex, as
∗ =σ .
f ∗ ∈ Γ0 . Hence, f ∗∗ = δC
C
We now want to give a slight refinement of Hörmander’s Theorem, in that we describe the set
that a proper, lsc sublinear function supports.
Corollary 3.6.16 Let f : E → R be proper and sublinear. Then cl f is the support function of
the closed convex set
{s ∈ E | hs, xi ≤ f (x) (x ∈ E) } .
Proof: Since cl f is proper (cf. Exercise 3.4.) closed and sublinear it is a support function of
∗ and thus f ∗ = (cl f )∗ = δ . Hence,
a closed convex set C. Therefore, we have cl f = δC
C
∗
∗
C = {s ∈ E | f (s) ≤ 0 }. But f (s) ≤ 0 if and only hs, xi − f (x) ≤ 0 for all x ∈ E.
Gauges, polar sets and dual norms
We now present a class of functions that makes a connection between support functions and
norms.
Definition 3.6.17 (Gauge function) Let C ⊂ E. The gauge (function) of C is defined by
γC : x ∈ E 7→ inf {λ ≥ 0 | x ∈ λC } .
For a closed convex set that contains the origin, its gauge has very desirable convex-analytical
properties.
Proposition 3.6.18 Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Then γC is proper,
lsc and sublinear.
Proof: γC is obviously proper as γC (0) = 0. Moreover, for t > 0 and x ∈ E, we have
γC (tx) = inf {λ ≥ 0 | tx ∈ λC }
λ
= inf λ ≥ 0 x ∈ C
t
= inf {tµ ≥ 0 | x ∈ µC }
= t inf {µ ≥ 0 | x ∈ µC }
= tγC (x),
98
3 Convex Functions
i.e. γC is positively homogeneous (since also 0 ∈ dom γC ). We now show that it is also
subadditive, hence altogether, sublinear: To this end, take x, y ∈ dom γC (otherwise there is
nothing to prove). Due to the identity
x+y
λ x
µ y
=
+
λ+µ
λ+µλ λ+µµ
(λ + µ 6= 0),
we realize, by convexity of C, that x + y ∈ (λ + µ)C if x ∈ λC and y ∈ µC for all λ, µ ≥ 0.
This implies that γC (x + y) ≤ γC (x) + γC (y).
In order to prove lower semicontinuity of γC notice that (by Exercise 3.19. and positive
homogeneity) we have lev≤α γC = αC for α > 0, lev≤α γC = ∅ for α < 0 and lev≤0 γC = C ∞
(again by Exercise 3.19.), hence all level sets of γC are closed, i.e. γC is lsc.
This concludes the proof.
Note that in the proof of Proposition 3.6.18, we do not need the assumption that C contains the
origin to prove sublinearity. We do need it, though, to get lower semicontinuity, cf. Exercise
3.19.
Since the gauge of a closed convex set that contains 0 is proper, lsc and sublinear we know,
in view of Hörmander’s Theorem (see Theorem 3.6.15), that it is the support function of some
closed convex set. It can be described beautifully using the concept of polar sets which generalizes the notion of polar cones, cf. Definition 2.4.5.
Definition 3.6.19 (Polar sets) Let C ⊂ E. Then its polar set is defined by
C ◦ := {v ∈ E | hv, xi ≤ 1 (x ∈ C) } .
Moreover, we put C ◦◦ := (C ◦ )◦ and call it the bipolar set of C.
Note that there is no ambiguity in notation, since the polar cone and the polar set of a cone
coincide, see Exercise 3.18. Moreover, as an intersection of closed half-spaces, C ◦ is a closed,
convex set containing 0. In addition, like we would expect, we have
C⊂D
⇒ D◦ ⊂ C ◦ ,
and
C ⊂ C ◦◦ .
Before we continue to pursue our question for the support function representation of gauges,
we provide the famous bipolar theorem which generalizes Exercise 2.23. Its proof is based once
more on separation.
Theorem 3.6.20 (Bipolar Theorem) Let C ⊂ E. Then C ◦◦ = conv (C ∪ {0}).
99
3 Convex Functions
Proof: Since C ∪ {0} ⊂ C ◦◦ and C ◦◦ is closed and convex, we clearly have conv (C ∪ {0}) ⊂
C ◦◦ . Now assume there were x̄ ∈ C ◦◦ \ conv (C ∪ {0}). By strong separation, there exists
s ∈ E \ {0} such that
hs, x̄i > σconv (C∪{0}) (s) ≥ max{σC (s), 0}.
After rescaling s accordingly (cf. Remark 2.6.2) we can assume that
hs, x̄i > 1 ≥ σC (s),
in particular, s ∈ C ◦ . On the other hand hs, x̄i > 1 and x̄ ∈ C ◦◦ , which is a contradiction. As a consequence of the bipolar theorem we see that every closed convex set C ⊂ E containing
0 satisfies C = C ◦◦ . Hence, the mapping C 7→ C ◦ establishes a one-to-one correspondence on
the closed convex sets that contain the origin. This is connected to conjugacy through gauge
functions as is highlighted by the next result.
Proposition 3.6.21 Let C ⊂ E be closed and convex with 0 ∈ C. Then
∗
γC = σC ◦ ←→ δC ◦
∗
and γC ◦ = σC ←→ δC .
Proof: Since, by Proposition 3.6.18, γC is proper, lsc and sublinear we have
γC = σD ,
D = {v ∈ E | hv, xi ≤ γC (x) (x ∈ E) }
in view of Corollary 3.6.16. To prove that γC = σC ◦ , we need to show that D = C ◦ . Since
γC (x) ≤ 1 if (and only if; see Exercise 3.19.) x ∈ C, the inclusion D ⊂ C ◦ is clear. In turn,
let v ∈ C ◦ , i.e. hv, xi ≤ 1 for all x ∈ C. Now let x ∈ E. By the definition of γC , there exists
λk → γC (x) and ck ∈ C such that x = λk ck for all k ∈ N. But then
hv, xi = λk hv, ck i ≤ λk → γC (x),
hence v ∈ D, which proves γC = σC ◦ . Since C ◦◦ = C, this implies γC ◦ = σC . The conjugacy
relations are due to Proposition 3.6.13.
Exercise 3.19. tells us that the gauge of a symmetric, compact convex set with nonempty interior is a norm. This justifies the following definition.
Definition 3.6.22 (Dual norm) Let k · k∗ be a norm on E wih closed unit ball B∗ . Then we call
k · k◦∗ := γB∗◦
its dual norm.
Corollary 3.6.23 (Dual norms) For any norm k · k∗ with (closed) unit ball B its dual norm is
σB , the support of its unit ball. In particular, we have k · k◦ = k · k, i.e. the Euclidean norm is
self-dual.
100
3 Convex Functions
3.6.4 Some dual operations
In this section, we give a list of conjugate functions obtained through convexity-preserving
operations that we have studied earlier.
We start by conjugacy on product sets.
Proposition 3.6.24 (Conjugacy on product
Pp sets) Let fi : Ei → R (i = 1, . . . , p) put E :=
i=1 fi (xi ). Then
Xpi Ei and define f : (x1 , . . . , xp ) ∈ E 7→
∗
f : (y1 , . . . , yp ) ∈ E 7→
p
X
fi∗ (yi ).
i=1
Proof: For y = (y1 , . . . , yp ) ∈ E we have
( p
)
p
p
p
X
X
X
X
f ∗ (y) =
sup
hxi , yi i −
f (xi ) =
sup {hxi , yi i − fi (xi )} =
fi∗ (yi ).
(x1 ,...,xp )∈E
i=1
i=1 xi ∈Ei
i=1
i=1
One of the convexity-preserving operations that we have studied thoroughly is infimal convolution, which is, as we will now see, paired in duality with simple addition of functions.
Proposition 3.6.25 (Conjugacy of inf-convolution) Let f, g : E → R. Then the following
hold:
a) (f #g)∗ = f ∗ + g ∗ ;
b) If f, g ∈ Γ0 such that dom f ∩ dom g 6= ∅ then (f + g)∗ = cl (f ∗ #g ∗ ).
Proof:
a) By definition, for all y ∈ E, we have
n
o
(f #g)∗ (y) = sup hx, yi − inf {f (u) + g(x − u)}
u
x
= sup {hx, yi − f (u) − g(x − u)}
x,u
= sup {(hu, yi − f (u)) + (hx − u, yi − g(x − u))}
x,u
∗
= f (y) + g ∗ (y).
b) From a) and the fact that f, g are closed, proper convex, we have
(f ∗ #g ∗ )∗ = f ∗∗ + g ∗∗ = f + g,
101
3 Convex Functions
which is proper, as dom f meets dom g, closed and convex. Thus,
conv (f ∗ #g ∗ ) = (f ∗ #g ∗ )∗∗ = (f + g)∗ .
By Proposition 3.4.3 the convex hull on the left can be omitted, hence cl (f ∗ #g ∗ ) =
(f + g)∗ .
Note that it can be shown that the closure operation in Proposition 3.6.25 can be omitted under
the qualifcation condition
ri (dom f ) ∩ ri (dom g) 6= 0.
(3.31)
This in fact is a prominent theorem which we now state and whose proof we postpone to the
Appendix.
Theorem 3.6.26 (Attouch-Brézis) Let f, g ∈ Γ0 such that (3.31) holds. Then (f + g)∗ =
f ∗ #g ∗ , and the infimal convolution is exact, i.e. the infimum in the infimal convolution is attained.
Some very important cases of infimal convolutions that have occured in our study are considered below from a duality perspective.
Corollary 3.6.27 (Conjugacy for distance functions and Moreau envelopes) Let f ∈ Γ0 , λ >
0 and C nonempty, closed and convex. Then the following hold:
∗
a) dist C ←→ σC + δB ;
∗
b) eλ f ←→ f ∗ + λ2 k · k2 ;
c) eλ f (x) + eλ−1 f ( λx ) =
1
2
2λ kxk
(x ∈ E).
Proof:
∗ = σ (see Proposition 3.6.13) and k · k∗ = σ ∗ = δ , the
a) Since dist C = δC #k · k, δC
C
B
B
assertion follows from Proposition 3.6.25 and Theorem 3.6.26.
1
k · k2 ) the assertion follows from Proposition 3.6.25 and Theorem
b) Since eλ f = f #( 2λ
3.6.26 also using Proposition 3.6.6 c).
c) From b) we have for all x ∈ E that
λ
∗
2
eλ f (x) = sup hx, yi − f (y) − kyk
2
y
1
λ
1 2
2
∗
=
kxk − inf f (y) − ky − xk
y
2λ
2
λ
x
1
=
kxk2 − eλ−1 f ∗
.
2λ
λ
102
3 Convex Functions
We continue with a conjugacy correspondence between pointwise infima and suprema.
Proposition 3.6.28 (Pointwise inf/sup) Let fi : E → R (i ∈ I). Then the following hold:
a) (inf i∈I fi )∗ = supi∈I fi∗ .
b) (supi∈I fi )∗ = conv (inf i∈I fi∗ ) for fi ∈ Γ0 (i ∈ I) and supi∈I fi proper.
Proof:
a) For y ∈ E we have
∗
inf fi (y) = sup hx, yi − inf fi (x) = sup sup {hx, yi − fi (x)} = sup fi∗ (y).
i∈I
i∈I
x∈E
i∈I x∈E
i∈I
b) Since fi = fi∗∗ (i ∈ I), from a) we infer that (inf i∈I fi∗ )∗ = supi∈I fi . Since the latter is
lsc and convex (Proposition 3.1.12) and proper (by assumption), hence its convex hull is
proper, then so is its conjugate, and thus, we have
conv (inf fi∗ ) = (inf fi∗ )∗∗ = (sup fi )∗ .
i∈I
i∈I
i∈I
We proceed with a duality correspondence for parametric minimization.
Proposition 3.6.29 (Parametric minimization) Let f : E1 × E2 → R. Then the following
hold:
a) For p := inf x∈E1 f (x, ·) we have p∗ = f ∗ (0, ·).
b) For f ∈ Γ0 , ū ∈ E2 such that ϕ := f (·, ū) is proper and q := inf y∈E {f ∗ (·, y) − hy, ūi},
we have ϕ∗ = cl q.
Proof:
a) For u ∈ E2 we compute
n
o
p∗ (u) = sup hy, ui − inf f (x, y) = sup {h(x, y), ((0, u)i − f (x, y)} = f ∗ (0, u).
y
x
x,y
103
3 Convex Functions
b) For z ∈ E1 we have
q (z) = sup hv, zi − inf {f (v, y) − hy, ūi}
∗
∗
y
v
= sup {h(v, y), (z, ū)i − f ∗ (v, y)}
v,y
∗∗
= f (z, ū)
= f (z, ū)
= ϕ(z).
Here, the fourth equality is due to the fact that f ∈ Γ0 . Noticing that q is convex by
Theorem 3.2.12, we obtain
cl q = conv q = q ∗∗ = ϕ∗ .
A sufficient condition such that the closure in Propositon 3.6.29 b) can be omitted is that ū lies
in the interior of U := {u ∈ E2 | ∃x ∈ E1 : f (x, u) < 0 }, see, e.g., [7, Theorem 11.23 (c)]
We close this brief with a result on epi-composition, cf. Proposition 3.1.15.
Proposition 3.6.30 Let f : E → R be proper and L ∈ L(E, E0 ) and T ∈ L(E0 , E). Then the
following hold:
a) (Lf )∗ = f ∗ ◦ L∗ .
b) (f ◦ T )∗ = cl (T ∗ f ∗ ) if f ∈ Γ.
Proof:
a) For y ∈ E0 we have
(Lf )∗ (y) = sup hz, yi −
f (x)
x: L(x)=z
z∈E0
=
inf
sup
{hz, yi − f (x)}
z∈E0 , x∈L−1 ({z})
∗
= sup {hx, L (y)i − f (x)}
x∈E
= f ∗ (L∗ (y)).
b) Follows from a) and the Fenchel-Moreau Theorem.
104
3 Convex Functions
3.7 Fenchel-Rockafellar duality
In this section, we associate a very general (convex) minimization problem (the primal program) with a concave maximization problem (the dual probolem) that is built on conjugates of
the functions occuring in the original problem.
Here the following notation is useful: For a function h : E → R we define the function
h∨ : x ∈ E 7→ h(−x) ∈ R.
We start our study with a basic duality result that goes back to Werner Fenchel, the founding
father of convex analysis.
Theorem 3.7.1 (Fenchel Duality Theorem) Let f, g ∈ Γ0 such that ri (dom f ) ∩ ri (dom g) 6=
∅. Then
inf (f + g) = max −(f ∗ + g ∗∨ ).
Proof: It is easily seen that inf f + g = −(f + g)∗ (0). Using Theorem 3.6.26 (Attouch-Brézis),
we infer that
inf f + g = −(f ∗ #g ∗ )(0) = − min f ∗ + g ∗∨ ,
which proves the statement.
An interesting special case of the foregoing theorem is the following.
Corollary 3.7.2 Let f ∈ Γ0 and K ⊂ E be a closed, convex cone such that ri (dom f ) ∩ ri K 6=
∅. Then
inf f = max◦ −f ∗ .
−K
K
Proof: Define g = δK . Then by Exercise 3.20. we have g ∗ = δK ◦ and hence by Theorem 3.7.1
we have
inf f = inf(f + g) = max −(f ∗ + g ∗∨ ) = max◦ −f ∗ .
−K
K
We now want to add a little more structure to the problem by introducing a linear operator.
Definition 3.7.3 (Fenchel-Rockafellar duality) Let f : E1 → R ∪ {+∞} and g : E2 →
R ∪ {+∞} be proper and L ∈ L(E1 , E2 ). We call
inf (f + g ◦ L)
E1
the primal problem and
sup −(f ∗ ◦ L∗ + g ∗∨ ).
E2
105
(3.32)
3 Convex Functions
the dual problem. We call
∆(f, g, L) = inf (f + g ◦ L) − sup −(f ∗ ◦ L∗ + g ∗∨ )
E1
E2
the duality gap between primal and dual problem.
It is very easy to see that the duality gap in the sense of Definition 3.7.3 is always nonnegative,
i.e. the dual optimal value is always a lower bound for the primal optimal value and vice versa.
Proposition 3.7.4 (Weak duality) Let f : E1 → R ∪ {+∞} and g : E2 → R ∪ {+∞} be
proper and L ∈ L(E1 , E2 ). Then we have
inf (f + g ◦ L) ≥ sup −(f ∗ ◦ L∗ + g ∗∨ ),
E1
E2
i.e. ∆(f, g, L) ≥ 0.
Proof: Let x ∈ E1 and y ∈ E2 . Then by the Fenchel-Young inequality we have
f (x) + g(L(x)) ≥ −f ∗ (L∗ (y)) + hx, L∗ (y)i − g ∗ (−y) + h−y, L(x)i = −(f ∗ ◦ L∗ + g ∗∨ )(y).
This already proves the statement.
The weak duality theorem tells us that the duality gap ∆(f, g, L) is always nonnegative. We
now want to investigate under which assumptions it is, in fact, zero.
The following result is known as the Fenchel-Rockafellar duality theorem. In its proof, for
f : E1 → R and g : E2 → R, we use the convenient notation
f ⊕ g : (x, y) ∈ E1 × E2 7→ f (x) + g(y)
and call f ⊕ g the separable sum of f and g.
Theorem 3.7.5 (Strong duality) Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that
0 ∈ ri (dom g − L(dom f )). Then ∆(f, g, L) = 0.
Proof: Define
C := dom f × dom g − gph L ⊂ E1 × E2
and D := dom g − L(dom f ) ∈ E2 .
Using the calculus rules for affine hulls, see Corollary 1.4.17 and Exercise 1.15., and the fact
that gph L is a subspace, we see that
aff C = aff (dom f ) × aff (dom g) − gph L and aff D = aff (dom g) − L(aff (dom f )).
106
3 Convex Functions
Now, let (x, y) ∈ aff C, i.e. there exist r ∈ aff (dom f ), s ∈ aff (dom g) and u ∈ E1 such that
(x, y) = (r, s) − (u, L(u)). Hence,
y − L(x) = s − L(u) − L(r − u) = s − L(r) ∈ aff D.
Since by assumption 0 ∈ ri D, there exists t > 0 such that t(y − L(x)) ∈ D. Hence, there exist
a ∈ dom f and b ∈ dom g such that y − L(x) = 1t (b − L(a)). Putting z := a − tx, we have
b−L(z)
x = a−z
, thus
t and y =
t
(x, y) =
1
[(a, b) − (z, L(z))] ∈ R+ C.
t
Since (x, y) ∈ aff C were chosen arbitrarily, we thus have proven R+ C = aff C. Since by
assumption 0 ∈ D, we have 0 ∈ C, hence aff C = span C, i.e. R+ C = span C. By Exercise
2.5., we have 0 ∈ ri C. Defining ϕ := f ⊕ g, and V := gph L we hence have
ri (dom ϕ) ∩ V = ri (dom f ) × ri (dom g) ∩ gph L 6= ∅.
Hence, using Corollary 3.7.2, we obtain
inf ϕ = max −ϕ∗ .
V⊥
V
By Proposition 3.6.24 we have ϕ∗ = f ∗ ⊕ g ∗ . Moreover, we easily compute that
V ⊥ = {(u, v) ∈ E1 × E2 | u = −L∗ (v) } .
Therefore, we obtain
inf (f + g ◦ L) = inf ϕ
V
E1
= max −ϕ∗
V⊥
=
=
=
max
−f ∗ (u) − g ∗ (v)
(u,v): u=−L∗ v
max −f ∗ (L∗ (w)) − g ∗ (−w)
w∈E2
max −(f ∗ ◦ L∗ + g ∗∨ ).
E2
This proofs the assertion.
Corollary 3.7.6 Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that 0 ∈ ri (dom g −
L(dom f )). Then
(f + g ◦ L)∗ (u) = min {f ∗ (u − L∗ (v)) + g ∗ (v)} .
v∈E2
107
3 Convex Functions
Proof: For u ∈ E1 we have
(f + g ◦ L)∗ (u) =
sup {hx, ui − f (x) − g(L(x))}
x∈E1
= − inf {f (x) − hx, ui + g(L(x))}
x∈E1
= min {f ∗ (L∗ (v) + u) + g ∗ (−v)} .
v∈E2
As one of the many applications of Fenchel-Rockafellar duality we would like to study linear
programs in this regard.
Example 3.7.7 (Linear Programming duality) Let A ∈ Rm×n , c ∈ Rn and b ∈ Rm . The
standard linear program reads
inf cT x
Ax ≥ b.
s.t.
(3.33)
Using the functions
f : x ∈ Rn 7→ cT x
and g : y ∈ Rm 7→ δRm
(y − b)
+
we can write (3.33) as
inf {f (x) + g(Ax)}.
x∈Rn
Its dual program, in the sense of Definition 3.7.3, reads
sup −f ∗ (AT y) − g ∗ (−y) ⇐⇒
y∈Rm
sup δ{c} (AT y) − δRm
(−y) − bT (−y)
−
y∈Rm
⇐⇒
sup
y≥0,
bT y.
AT y=c
3.8 The convex subdifferential
In this section we would like to present a generalized notion of differentiability for (usually
nondifferentiable) convex functions. The idea of so-called subdifferentiabilty is based on affine
minorization properties of convex functions and deeply connected to conjugation.
3.8.1 Definition and basic properties
For f ∈ Γ, Theorem 3.3.2 tells us that at each point x̄ ∈ ri (dom f ) there exists g ∈ E such that
f (x) ≥ f (x̄) + hg, x − x̄i
(x ∈ E).
We take this as a motivation for the following central concept.
108
(3.34)
3 Convex Functions
Definition 3.8.1 (Subdifferential of a convex function) Let f : E → R be convex and x̄ ∈
E. Then g ∈ E is called a subgradient of f at x̄ if the subgradient inequality (3.34) holds at x̄.
The set
∂f (x̄) := {v ∈ E | f (x) ≥ f (x̄) + hv, x − x̄i (x ∈ E) }
of all subgradients is called the subdifferential of f at x̄. We denote the domain of the set valuedmapping ∂f : E ⇒ E by
dom ∂f := {x ∈ E | ∂f (x) 6= ∅ } .
Notice that, clearly, in the subgradient inequaltiy (3.34), we can restrict ourselves to points
x ∈ dom f , since the inequality holds trivially outside of dom f .
We start our study of the subdifferential with some elementary properties.
Proposition 3.8.2 (Elementary properties of the subdifferential) Let f : E → R be convex and x̄ ∈ dom f . Then the following holds:
a) ∂f (x̄) is closed and convex for all x̄ ∈ dom f .
b) If f is proper then ∂f (x) = ∅ for x ∈
/ dom f .
c) If f is proper and x̄ ∈ ri (dom f ) then ∂f (x̄) is nonempty.
d) We have 0 ∈ ∂f (x̄) if and only if x̄ minimizes f (over E).
(Generalized Fermat’s rule)
e) ∂f (x̄) = {v ∈ E | (v, −1) ∈ Nepi f (x̄, f (x̄)) }.
Proof:
a) We have
∂f (x̄) =
\
{v | hx − x̄, vi ≤ f (x̄) − f (x) } ,
x∈E
and intersection preserves closedness and convexity.
b) Obvious.
c) Follows immediately from Theorem 3.3.2.
d) By definition we have
0 ∈ ∂f (x̄)
⇐⇒
f (x) ≥ f (x̄)
(x ∈ E).
e) Notice that
v ∈ ∂f (x̄) ⇐⇒ f (x) ≥ f (x̄) + hv, x − x̄i
⇐⇒ α ≥ f (x̄) + hv, x − x̄i
(x ∈ dom f )
((x, α) ∈ epi f ))
⇐⇒ 0 ≥ h(v, −1), (x − x̄, α − f (x̄))i
⇐⇒ (v, −1) ∈ Nepi f (x̄, f (x̄)).
109
((x, α) ∈ epi f ))
3 Convex Functions
Part b) and c) of the above Proposition imply that
ri (dom f ) ⊂ dom ∂f ⊂ dom f
(f ∈ Γ).
The subdifferential of a convex function might well be empty, contain only a single point, be
bounded (hence compact) or unbounded as the following examples illustrate.
Example 3.8.3
a) (Indicator function) Let C ⊂ E be convex and x̄ ∈ C. Then
g ∈ ∂δC (x̄) ⇐⇒ δC (x) ≥ δC (x̄) + hg, x − x̄i
⇐⇒ 0 ≥ hg, x − x̄i
(x ∈ E)
(x ∈ C),
i.e. ∂δC (x̄) = NC (x̄).
b) (Euclidean norm) We have
(
∂k · k(x̄) =
x̄
kx̄k
if x̄ 6= 0,
B if x̄ = 0
as can be verified by elementary considerations.
c) (Empty subdifferential) Consider
−(1 − |x|2 )1/2
if
|x| ≤ 1,
f : x 7→
+∞ else.
Then ∂f (x) = ∅ for |x| ≥ 1.
There is a tight connection of subdifferentiation and conjugation of convex functions.
Theorem 3.8.4 (Subdifferential and conjugate function) Let f ∈ Γ0 . Then the following
are equivalent:
i) y ∈ ∂f (x);
ii) x ∈ argmaxz {hz, yi − f (z)};
iii) f (x) + f ∗ (y) = hx, yi;
iv) x ∈ ∂f ∗ (y);
110
3 Convex Functions
v) y ∈ argmaxw {hx, wi − f ∗ (w)}.
Proof: Notice that
y ∈ ∂f (x) ⇐⇒ f (z) ≥ f (x) + hy, z − xi
(z ∈ E)
⇐⇒ hy, xi − f (x) ≥ sup{hy, zi − f (z)}
z
∗
⇐⇒ f (x) + f (y) ≤ hx, yi
⇐⇒ f (x) + f ∗ (y) = hx, yi ,
where the last equality makes use of the Fenchel-Young inequality (3.25). This establishes the
equivalences between i), ii) and iii). Applying the same reasoning to f ∗ and noticing that
f ∗∗ = f gives the missing equivalences.
One consequence of Theorem 3.8.4 is that the set-valued mappings ∂f and ∂f ∗ are inverse to
each other. We notice some other interesting implications of the latter theorem.
Corollary 3.8.5 Let C ⊂ E. Then the following hold:
a) For x ∈ dom σC , we have ∂σC (x) = argmaxC h·, xi.
b) If C is a closed, convex cone the following are equivalent:
i) y ∈ ∂δC (x);
ii) x ∈ ∂δC ◦ (y);
iii) x ∈ C,
y ∈ C◦
and
hx, yi = 0.
As another consequence we obtain the very desirable property that the subdifferential operator
of a closed, proper and convex functions f has a closed graph
gph ∂f := {(x, y) ∈ E × E | y ∈ ∂f (x) } ,
which is also referred to as outer semicontinuity of ∂f .
Corollary 3.8.6 (Outer semicontinuity of ∂f ) Let f ∈ Γ0 and suppose {xk } → x and
{yk ∈ ∂f (xk )} → y. Then y ∈ ∂f (x), i.e. gph ∂f ∈ E × E is closed.
Proof: By Theorem 3.8.4 we have
f (xk ) + f ∗ (yk ) = hxk , yk i
(k ∈ N).
Using that f and f ∗ are lsc we obtain
f (x) + f ∗ (y) ≤ hx, yi ,
111
3 Convex Functions
which together with the Fenchel-Young inequality gives
f (x) + f ∗ (y) = hx, yi .
But then, again, Theorem 3.8.4 implies that y ∈ ∂f (x).
We close out the section with some useful boundedness properties of the subdifferential operator.
Theorem 3.8.7 (Boundedness properties ∂f ) Let f ∈ Γ0 and X ⊂ int (dom f ) nonempty,
open and convex. Then the following hold:
a) f is Lipschitz continuous with modulus L > 0 on X if and only if kvk ≤ L for all x ∈ X
and v ∈ ∂f (x).
b) ∂f maps bounded sets which are compactly contained in int (dom f ) to bounded sets.
Proof:
a) First, assume that f is Lipschitz on X with modulus L > 0. Now take x ∈ X and
v ∈ ∂f (x). By the subgradient inequality we have
f (y) ≥ f (x) + hv, y − xi
(y ∈ E).
(3.35)
Since X is open, there exists r > 0 such that B r (x) ⊂ X. Inserting the vector
y =x+
r
v ∈ B r (x).
kvk
in (3.35) yields
r
f x+
v ≥ f (x) + rkvk.
kvk
Rearranging these terms gives
1 r
kvk ≤ f x +
v − f (x) ≤ L,
r
kvk
which shows the first implication in a).
Conversely, assume that kvk ≤ L for all x ∈ X and v ∈ ∂f (x). For x, y ∈ X and
v ∈ ∂f (x) we hence have
f (x) − f (y) ≤ hv, x − yi ≤ kvk · kx − yk ≤ Lkx − yk,
where we use the subgradient inequality and Cauchy-Schwarz. Interchanging the roles
of x and y yields
f (y) − f (x) ≤ Lkx − yk,
112
3 Convex Functions
which all in all gives
|f (x) − f (y)| ≤ Lkx − yk.
Therefore, f is Lipschitz continuous on X with modulus L > 0.
b) Let K be compactly contained in int (dom f ). Hence, we can assume w.l.o.g. that K is
compact. Now, suppose there were a sequences {xk ∈ K} and {vk ∈ ∂f (xk )} such that
kvk k → ∞. Since K is compact, we can assume w.l.o.g. that xk → x ∈ K. Now take
r > 0 such that B r (x) ∈ X. By Theorem 3.5.6, f is Lipschitz on B r (x) with modulus,
say, L > 0. In view of part a) we infer that kvk ≤ L for all v ∈ ∂f (y) and y ∈ B r (x).
Since xk ∈ B r (x) for all k sufficiently large, we thus have kvk k ≤ L for these k, which
contradicts the assumption that {vk } were unbounded.
3.8.2 Connection to the directional derivative
The subdifferential of a convex function is intimately tied to its directional derivative, which we
define now.
Definition 3.8.8 (Directional derivative) Let f : E → R be proper. For x ∈ dom f we say
that f is directionally differentiable at x̄ in the direction d ∈ E if
lim
t↓0
f (x + td) − f (x)
t
exists (in an extended real-valued sense). In this case we call
f 0 (x; d) := lim
t↓0
f (x + td) − f (x)
t
the directional derivative of f at x in the direction of d.
Proposition 3.8.9 (Directional derivative of a convex function) Let f ∈ Γ, x ∈ dom f
and d ∈ E. Then the following hold:
a) The difference quotient
t > 0 7→ q(t) :=
f (x + td) − f (x)
t
is nondecreasing.
b) f 0 (x; d) exists (in R) with
f 0 (x; d) = inf q(t),
t>0
113
3 Convex Functions
c) f 0 (x; ·) is sublinear with dom f 0 (x; ·) = R+ (dom f − x).
d) f 0 (x; ·) is proper and lsc for x ∈ ri (dom f ).
Proof:
a) Fix 0 < s < t and put λ := st ∈ (0, 1) and z := x + td. If f (z) = +∞, then
q(s) ≤ q(t) = f (z) = +∞. Otherwise, by convexity of f , we have
f (x + sd) = f (λz + (1 − λ)x) ≤ λf (z) + (1 − λ)f (x) = f (x) + λ(f (z) − f (x)),
hence, q(s) ≤ q(t) also in this case.
b) The infimum representation follows from a) since q(t) decreases as t ↓ 0. This also
gives the existence statement, since an infimum always exists in the extended real-valued
sense.
c) First notice that 0 ∈ dom f as f 0 (x; 0) = 0 and that f 0 (x; αd) = αf 0 (x; d) for all α > 0
and d ∈ E, i.e. f is positively homogeneous. We now show that f 0 (x; ·) is also convex,
which then proves sublinearity: To this end, let (d, α), (h, β) ∈ epi < f 0 (x; ·). Then
f (x + td) − f (x)
f (x + th) − f (x)
< α and
<β
t
t
for all t > 0 sufficienty small. For such t > 0, by convexity of f , we compute
f (x + t(λd + (1 − λ)h)) − f (x) = f (λ(x + td) + (1 − λ)(x + th)) − f (x)
≤ λ(f (x + td) − f (x)) + (1 − λ)(f (x + th) − f (x))
for all λ ∈ (0, 1). This implies
f (x + t(λd + (1 − λ)h)) − f (x)
t
(x + th) − f (x)
f (x + td) − f (x)
+ (1 − λ)
≤ λ
t
t
for all t > 0 sufficiently small and λ ∈ (0, 1). Letting t ↓ 0 gives
f 0 (x; λd + (1 − λ)h) ≤ λf 0 (x; d) + (1 − λ)f 0 (x; h) < λα + (1 − λ)β
(λ ∈ (0, 1)),
which shows convexity of epi < f 0 (x ·) and thus of f 0 (x; ·). Hence, as f 0 (x; ·) was proven
to be positively homogeneous as well, it is sublinear, cf. Proposition 3.6.11.
The fact that dom f 0 (x; ·) = R+ (dom f − x) follows from b):
f (x + td) − f (x)
< +∞
t
⇐⇒ ∃t > 0 : f (x + td) − f (x) < +∞
d ∈ dom f 0 (x; ·) ⇐⇒ ∃t > 0 :
⇐⇒ ∃t > 0 : x + td ∈ dom f
⇐⇒ d ∈ R+ (dom f − x).
114
3 Convex Functions
d) From c) we know that f 0 (x; ·) is, in particular, convex with dom f 0 (x; ·) = R+ (dom f −
x) which is a subspace by Exercise 2.5. Since f 0 (x; 0) = 0, by Exercise 3.4. we now see
that f 0 (x, ·) must be proper. Moreover, by Proposition 3.5.2 it follows that it agrees with
its closure everywhere since its domain has no relative boundary. Thus, f 0 (x; ·) is lsc.
We now establish the connection between the subdifferential and the directional derivative of a
proper convex functions. The first result in this regard characterizes subgradients using directionbal derivatives and shows that the latter is even proper on the domain of the subdifferential
operator.
Proposition 3.8.10 Let f ∈ Γ and x ∈ dom ∂f . Then we have:
a) The following are equivalent.
i) v ∈ ∂f (x);
ii) f 0 (x; d) ≥ hv, di
(d ∈ E).
b) f 0 (x; ·) is proper and sublinear.
Proof:
a) We realize that the subgradient inequality for v ∈ E is equivalent to
f (x + λd) − f (x)
≥ hd, vi
λ
(λ > 0, d ∈ E).
(3.36)
As the left-hand side decreases to f 0 (x; d) as λ ↓ 0, this is equivalent to ii). This shows
the equivalence of i) and ii).
b) Take v ∈ ∂f (x). Then a) yields f 0 (x; ·) ≥ h·, vi and therefore, f 0 (x; ·) does not take the
value −∞. Hence, in view of Proposition 3.8.9 c) and the fact that f 0 (x; 0) = 0, f 0 (x; ·)
is proper and sublinear.
We continue with our main result of this section.
Theorem 3.8.11 (Directional derivative and subdifferential) Let f ∈ Γ and x ∈ dom ∂f .
Then
cl (f 0 (x, ·)) = σ∂f (x) ,
i.e. the lower semicontinuous hull of f 0 (x; ·) is the support function of ∂f (x).
115
3 Convex Functions
Proof: Due to Proposition 3.8.9 we know that f 0 (x; ·) is proper and sublinear. By Corollary
3.6.16, we thus see that cl (f 0 (x; ·)) = σC for C = {v | hv, di ≤ f 0 (x; d) (d ∈ E) }. But in
view of Proposition 3.8.9 C = ∂f (x), which concludes the proof.
Theorem 3.8.11 has a list of very important consequences.
Corollary 3.8.12 Let f ∈ Γ and x ∈ ri (dom f ). Then
f 0 (x; ·) = σ∂f (x) .
Proof: Follows from Theorem 3.8.11 and Proposition 3.8.9 d).
Corollary 3.8.13 Let f ∈ Γ and x ∈ dom f . Then ∂f (x) is nonempty and bounded if and only
if x ∈ int (dom f ).
Proof: If x ∈ int (dom f ), we know from Corollary 3.8.12 that f 0 (x; ·) = σ∂f (x) > −∞. From
Proposition 3.8.9 we know that dom f 0 (x; ·) = R+ (dom f − x). Since x ∈ int (dom f ), we
have R+ (dom f − x) = E, hence f 0 (x; ·) is finite, i.e. σ∂f (x) is finite, therefore (see Exercise
3.17.), ∂f (x) is bounded (and nonempty).
In turn, if ∂f (x) is bounded and nonempty then, by Theorem 3.8.11 and Exercise 3.17.,
cl (f 0 (x; ·)) = σ∂f (x) is finite. Hence, f 0 (x; ·) must be finite, thus R+ (dom f −x) = dom f 0 (x; ·) =
E. Using Exercise 2.5., this implies that 0 ∈ int (dom f − x), i.e. x ∈ int (dom f ).
Corollary 3.8.14 (Max formula) Let f ∈ Γ and x ∈ int (dom f ). Then
f 0 (x; ·) = max hv, ·i .
v∈∂f (x)
3.8.3 Subgradients of differentiable functions
In this section we want to study the subdifferential of convex functions at points of differentiablilty. We will ultimately proof that a convex function is differentiable, in fact continuously
differentiable, at a point in the interior of its domain if and only if its subdifferentiable is a singleton, which then consists of the gradient only. Moreover, we will show that a differentiable
convex functions is in fact continuously differentiable.
Theorem 3.8.15 Let f ∈ Γ and x ∈ int (dom f ). Then ∂f (x) is a singleton if and only if f is
differentiable at x. In this case we have ∂f (x) = {∇f (x)}.
116
3 Convex Functions
Proof: If f is differentiable at x then f 0 (x; ·) = h∇f (x), ·i. Thus, by Proposition 3.8.10 a) the
elements v ∈ ∂f (x) are characterized through
h∇f (x), di ≥ hv, di
(d ∈ E),
which implies that v = ∇f (x), i.e. ∂f (x) = {∇f (x)}. This proves the first implication.
Conversely, assume that ∂f (x) = {v}. We have to show that
f (x + d) − f (x) − hv, di
= 0.
d→0
kdk
lim
(3.37)
To this end, take {dk } → 0 arbitrarily and put
tk := kdk k
and pk :=
dk
dk
=
kdk k
tk
(k ∈ N).
Then there exists K ⊂ N and p ∈ E \ {0} such that pk →K p. Then we compute that
f (x + dk ) − f (x) − hv, dk i
kdk k
=
=
f (x + tk pk ) − f (x) − tk hv, pk i
tk
f (x + tk p) − f (x) f (x + tk pk ) − f (x + tk p)
+
− hv, pk i
tk
tk
As we pass to the limit on K, the first summand tends to f 0 (x; p), cf. Theorem 3.8.9. The second
one goes to 0, since f is Lipschitz around x ∈ int (dom f ), cf. Theorem 3.5.6. Thus, we have
lim
k∈K
f (x + dk ) − f (x) − hv, dk i
kdk k
= f 0 (x; p) − hv, pi
=
max hw, pi − hv, pi
w∈∂f (x)
= 0,
where the second equality uses Corollary 3.8.14 and the last one exploits the fact that ∂f (x) =
{v}. Since p was an arbitrary accumulation point of the bounded sequence {pk }, we have that
f (x + dk ) − f (x) − hv, dk i
= 0.
k∈N
kdk k
lim
Since {dk } → 0 was chosen arbitrarily this gives (3.37) and hence concludes the proof.
Theorem 3.8.16 Let f ∈ Γ and x ∈ int (dom f ). Then f is continuously differentiable on
int (dom f ) if and only if ∂f (x) is a singleton for all x ∈ int (dom f ).
117
3 Convex Functions
Proof: If f is continuously differentiable on int (dom f ), Theorem 3.8.15 immediately implies
that ∂f (x) = {∇f (x)} for all x ∈ int (dom f ).
Conversely, let ∂f (x) be a singleton for all x ∈ int (dom f ). By Theorem 3.8.15, f is
differentiable at every point x ∈ int (dom f ). Now, fix x ∈ int (dom f ) and take {xk ∈
int (dom f )} → x. Then we have ∇f (xk ) ∈ ∂f (xk ) for all k ∈ N. (In fact, we have
∂f (xk ) = {∇f (xk )}, but that is unimportant to our reasoning.) Now choose r > 0 such that
B r (x) ⊂ int (dom f ). Since xk ∈ B r (x) for all k sufficiently large, we also have ∇f (xk ) ∈
∂f (B r (x)), which is bounded due to Theorem 3.8.7 b). Hence, {∇f (xk )} has an accumulation
point g ∈ E which, by Corollary 3.8.6 lies in ∂f (x) = {∇f (x)}. Hence, ∇f (xk ) → g = ∇f (x)
on the respective subsequence. Since this holds true for any accumulation point, we acutally
have ∇f (xk ) → ∇f (x) on the whole sequence. As xk → x was chosen arbitrarily, this proves
the statement.
Corollary 3.8.17 (Differentiability of finite convex functions) Let f : E → R convex.
Then f is differentiable if and only if f is continuously differentiable.
3.8.4 Subdifferential calculus
In this section we want to compute the subdifferential for various convex functions that come
out of convexity-preserving operations.
We start with the subdifferential of the separable sum of convex functions.
Proposition 3.8.18 (Subdifferential of separable sum) Let fi ∈ Γ(Ei ) (i = 1, 2). Then
∂(f1 ⊕ f2 ) = ∂f1 × ∂f2 .
Proof: For (x1 , x2 ) ∈ E1 × E2 we have
(v1 , v2 ) ∈ ∂f1 (x) × ∂f2 (y)
⇔ fi (yi ) ≥ fi (xi ) + hvi , yi − xi i
(yi ∈ Ei , i = 1, 2)
⇔ f1 (y1 ) + f2 (y2 ) ≥ f1 (x1 ) + f2 (x2 ) + hv1 , y1 − x1 i + hv2 , y2 − x2 i
⇔ (f1 ⊕ f2 )(y1 , y2 ) ≥ (f1 ⊕ f2 )(x) + h(v1 , v2 ), (y1 , y2 ) − (x1 , x2 )i
((y1 , y2 ) ∈ E1 × E2 )
((y1 , y2 ) ∈ E1 × E2 )
⇔ (v1 , v2 ) ∈ ∂(f1 ⊕ f2 )(x1 , x2 ).
Here, the ’⇐’-implication in the second equivalence follows from setting xj = yj for j =
6 i,
i = 1, 2.
Note that the above result, by induction, extends to artbitrary finite separable sums of convex
functions, and without any more effort, can be proven for a separable some over much more
general than only finite index sets, cf. [1, Proposition 16.8], but we only need that case of two
functions in our study.
118
3 Convex Functions
We continue with a subdifferential rule fo epi-compositions, cf. Proposition 3.1.15 and Proposition 3.6.30.
Proposition 3.8.19 (Subdifferential of epi-composition) Let f ∈ Γ0 (E) and L ∈ L(E, E0 ).
Then for y ∈ dom Lf and x ∈ L−1 ({y}) the following hold:
a) If (Lf )(y) = f (x) then ∂(Lf )(y) = (L∗ )−1 (∂f (x)).
b) If (L∗ )−1 (∂f (x)) 6= ∅ then (Lf )(y) = f (x).
Proof: Let v ∈ E0 . From Proposition 3.6.30 a) and Theorem 3.8.4 we infer that
f (x) + (Lf )∗ (v) = hy, vi ⇔
⇔
⇔
⇔
f (x) + f ∗ (L∗ (v)) = hL(x), vi
f (x) + f ∗ (L∗ (v)) = hx, L∗ (v)i
L∗ (v) ∈ ∂f (x)
v ∈ (L∗ )−1 (∂f (x)).
(3.38)
a) Theorem 3.8.4 and Proposition 3.6.30 a) imply that
v ∈ ∂(Lf )(y) ⇔ (Lf )(y) + (Lf )∗ (v) = hy, vi
⇔ f (x) + f ∗ (L∗ v) = hL(x), vi .
(3.39)
Combining (3.38) and (3.39) gives a).
b) Suppose v ∈ (L∗ )−1 (∂f (x)). Then the Fenchel-Young inequality, the fact that L(x) = y
and (3.38) imply that
hy, vi ≤ (Lf )(y) + (Lf )∗ (v) ≤ f (x) + (Lf )∗ (v) = hy, vi ,
hence (Lf )(y) = f (x).
We exploit Proposition 3.8.19 to obtain an important subdifferential result for infimal convolutions.
Theorem 3.8.20 (Subdifferentiation of infimal convolutions) Let f, g ∈ Γ0 as well as
x ∈ dom (f #g)(= dom f + dom g). Then the following hold:
a) We have
∂(f #g)(x) = ∂f (y) ∩ ∂g(x − y)
(y ∈ argmin {f (u) + g(x − u)}).
u∈E
b) If ∂f (y) ∩ ∂g(x − y) 6= ∅ for some y ∈ E then (f #g)(x) = f (y) + g(x − y), i.e.
y ∈ argminu∈E {f (u) + g(x − u)}.
119
3 Convex Functions
Proof: Consider the linear mapping L : (a, b) ∈ E × E 7→ a + b ∈ E. Then L∗ : z ∈
E 7→ (z, z) ∈ E × E. By definition of the respective operations we have f #g = L(f ⊕ g), in
particular, dom L(f ⊕ g) = dom f #g. Thus, L(y, x − y) = x ∈ dom L(f ⊕ g).
a) Let y ∈ argminu∈E {f (u) + g(x − u)}. Since (L(f ⊕ g))(x) = (f ⊕ g)(y, x − y),
Proposition 3.8.19 a) and Proposition 3.8.18 imply that
∂(f #g)(x) = ∂(L(f ⊕ g))(x)
= (L∗ )−1 (∂(f ⊕ g)(y, x − y))
= (L∗ )−1 (∂f (x) × ∂g(x − y))
= ∂f (x) ∩ ∂g(x − y).
b) By assumption we have
∅=
6 ∂f (x) ∩ ∂g(x − y) = (L∗ )−1 (∂f (x) × ∂g(x − y)) = (L∗ )−1 (∂(f ⊕ g)(x, x − y)).
Thus, Proposition 3.8.19 b) implies
(f #g)(x) = (L(f ⊕ g))(x) = (f ⊕ g)(x, x − y) = f (x) + g(x − y).
As a first application we obtain the subdifferential of the (Euclidean) distance function.
Example 3.8.21 (Subdifferential of Euclidean distance) Let C ⊂ E be nonempty, closed
and convex. Then
o
 n
x−P (x)

if
x∈
/ C,
 dist CC(x)
∂dist C (x) =
NC (x) ∩ B
if
x ∈ bd C,


{0} else.
This can be seen using the Examples 3.4.4, 3.4.11 and 3.8.3 a) and Theorem 3.8.20.
Our next goal is to establish a subdifferential for the sum of convex functions as well as the
composition of a convex function and a linear mapping.In fact, both of these problems will be
answered by a general result for the subdifferentiation of the convex function
f + g ◦ L where
f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ), L ∈ L(E1 , E2 ).
To establish the subdifferential calculus for the latter function we need some preliminary results.
120
3 Convex Functions
Lemma 3.8.22 Let f ∈ Γ(E1 ), g ∈ Γ(E2 ) and L ∈ L(E1 , E2 ). Then
∂(f + g ◦ L) ⊃ ∂f + L∗ ◦ (∂g) ◦ L.
Proof: Let x ∈ E. A generic point in ∂f (x) + (L∗ ◦ (∂g) ◦ L)(x) is of the form u + L∗ v with
u ∈ ∂f (x) and v ∈ ∂g(L(x)). The subdifferential inequality yields
f (y) ≥ f (x) + hu, y − xi
and g(L(y)) ≥ g(L(x)) + hv, L(y) − L(x)i
(y ∈ E1 ).
Combining these two inqeualities gives
f (y) + g(L(y)) ≥ f (x) + g(L(x)) + hu + L∗ (v), y − xi
(y ∈ E1 ),
i.e. u + L∗ v ∈ ∂(f + g ◦ L)(x).
Proposition 3.8.23 Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that (f + g ◦ L)∗ =
minv∈E2 {f ∗ ((·) − L∗ (v)) + g ∗ (v)}. Then
∂(f + g ◦ L) = ∂f + L∗ ◦ (∂g) ◦ L.
Proof: In view of Lemma 3.8.22 it remains to show that gph ∂(f +g◦L) ⊂ gph (∂f +L∗ (∂g)◦
L): To this end, take (x, u) ∈ gph ∂(f + g ◦ L). By Theorem 3.8.4, we have
(f + g ◦ L)(x) + (f + g ◦ L)∗ (u) = hx, ui .
(3.40)
On the other hand, by asumption, there exists v ∈ E2 such that
(f + g ◦ L)∗ (u) = f ∗ (u − L∗ (v)) + g ∗ (v).
Combining this with (3.40), we obtain
[f (x) + f ∗ (u − L∗ (v)) − hx, u − L∗ (v)i] + [g(L(x)) + g ∗ (v) − hx, L∗ (v)i] = 0.
By the Fenchel-Young inequality (3.25) we thus obtain
f (x) + f ∗ (u − L∗ (v)) = hx, u − L∗ (v)i
and g(L(x)) + g ∗ (v) − hx, L∗ (v)i = 0.
Invoking Theorem 3.8.4 again yields
u − L∗ (v) ∈ ∂f (x)
and v ∈ ∂g(L(x)),
hence, u ∈ ∂f (x) + L∗ ∂g(L(x)) as desired.
We now come to the announced main result.
121
3 Convex Functions
Theorem 3.8.24 (Generalized subdifferential sum rule) Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and
L ∈ L(E1 , E2 )
∂(f + g ◦ L) ⊃ ∂f + L∗ ◦ (∂g) ◦ L.
(3.41)
Under the qualification condition
L(ri (dom f )) ∩ ri (dom g) 6= ∅
(3.42)
equality holds in (3.41).
Proof: By our qualification condition ri (dom g) ∩ L(ri dom f ) 6= ∅, we infer from Corollary
3.7.6 that (f + g ◦ L)∗ = minv∈E2 {f ∗ ((·) − L∗ (v)) + g ∗ (v)}. Hence, the assertion follows
from Proposition 3.8.23.
The generalized subdifferential sum rule has two many important consequences, two of which
we present now. The proof is, once more, based on a (strong) separation argument.
Corollary 3.8.25 (Subdifferential sum rule) Let f, g ∈ Γ then
∂(f + g) ⊃ ∂f (x) + ∂g(x)
(x ∈ E).
(3.43)
Under the qualification condition
ri (dom f ) ∩ ri (dom g) 6= ∅
(3.44)
equality holds in (3.43).
Corollary 3.8.26 (Subdifferential chain rule) Let g ∈ Γ(E2 ) and L ∈ L(E1 , E2 ). Then
∂(g ◦ L) ⊃ L∗ (∂g) ◦ L.
(3.45)
rge L ∩ ri (dom g) 6= ∅
(3.46)
Under the qualification condition
equality holds in (3.45).
We proceed with the subdifferential of the pointwise maximum of a finite collection of convex
functions.
Theorem 3.8.27 (Subdifferential of maximum of convex functions) For i ∈ I := {1, . . . , m}
let fi ∈ Γ and x ∈ ∩i∈I int (dom fi ) and set f := maxi∈I fi and I(x) := {i ∈ I | fi (x) = f (x) }.
Then
[
∂f (x) = conv
∂fi (x).
i∈I(x)
122
3 Convex Functions
Proof: Let i ∈ I(x) and u ∈ ∂fi (x). Then, by the subdifferential inequality, we have
hu, y − xi ≤ fi (y) − fi (x) ≤ f (y) − f (x)
(y ∈ E),
i.e. u ∈ ∂f (x). Using this and the fact that ∂f (x) is closed and convex, cf. Proposition 3.8.2 a),
we have
[
∂f (x) ⊃ conv
∂fi (x).
i∈I(x)
Now assume the inclusion were strict, i.e. there exists
[
u ∈ ∂f (x) \ conv
∂fi (x).
(3.47)
i∈I(x)
By strong separation there hence exists s ∈ E \ {0} such that
hs, ui > max
sup hs, zi = max fi0 (x; s),
i∈I(x) z∈∂fi (x)
(3.48)
i∈I(x)
where we use Corollary 3.8.14 for the second identity. In view of Remark 2.6.2 and the fact that
x ∈ int (dom fi ) for all i ∈ I, we realize that we can rescale s such that
\
x+s∈
dom fi = dom f.
(3.49)
i∈I
Now let {αk ∈ (0, 1)} ↓ 0. Since I is finite, we can assume w.l.o.g that there exists j ∈ I such
that
fj (x + αk s) = f (x + αk s) (k ∈ N).
(3.50)
For k ∈ N we hence have fj (x + αk s) ≤ (1 − αk )fj (x) + αk fj (x + s) and thus
(1 − αk )fj (x) ≥ fj (x + αk s) − αk fj (x + s)
≥ f (x + αk s) − αk f (x + s)
≥ f (x) + hu, αk si − αk f (x + s)
≥ fj (x) + αk hu, si − αk f (x + s).
Here, the second inequality uses (3.50) and the definition of f . The third one uses that u ∈
∂f (x) (see 3.47) and the last one is again due to the definition of f . Now letting k → ∞ and
using (3.49) yields
fj (x) = f (x).
(3.51)
Finally, using (3.50), (3.51), (3.47) and (3.48), we obtain
fj0 (x; s) < hs, ui ≤
fj (x + αk s) − fj (x)
f (x + αk s) − f (x)
=
→ fj (x; s),
αk
αk
which is the desired contradiction.
A frequently occuring special case of the foregoing result is the following.
123
3 Convex Functions
Corollary 3.8.28 For i ∈ I := {1, . . . , m} let fi ∈ Γ be differentiable at x ∈
and set f := maxi∈I fi and I(x) := {i ∈ I | fi (x) = f (x) }. Then
T
i∈I
int (dom fi )
∂f (x) = conv {∇fi (x) | i ∈ I(x) } .
Proof: Combine Theorem 3.8.27 and Theorem 3.8.15.
Exercises to Chapter 3
3.1. (Domain of an lsc function) Is the domain of an lsc function closed?
3.2. (Univariate convex functions) Let f : R → R ∪ {+∞} and I ⊂ dom f be an open
intervall. Show the following (without using results from Section 3.1.2):
a) f is convex on I if and only if the slope-function
x 7→
f (x) − f (x0 )
x − x0
is nondecreasing on I \ {x0 }.
b) Let f is differentiable on I: Then f is convex on I if f 0 is nondecreasing on I, i.e.
f 0 (s) ≤ f 0 (t)
(s, t ∈ I : s ≤ t).
c) Let f is twice differentiable on I. Then f is convex on I if and only if f 00 (x) ≥ 0
for all x ∈ I.
3.3. (Characterization of convexity) Let f : E → R. Show the equivalance of:
i) f is convex;
ii) The strict epigraph epi < f := {(x, α) ∈ E × R | f (x) < α } of f is convex;
iii) For all λ ∈ (0, 1) we have f (λx + (1 − λ)y) < λα + (1 − λ)β whenever f (x) < α
and f (y) < β.
3.4. (Properness and closedness of convex functions) Prove the following:
a) An improper convex function f : E → R must have f (x) = −∞ for all x ∈
ri (dom f ).
b) An improper convex function which is lsc, can only have infinite values.
c) If f is convex then cl f is proper if and only f is proper.
124
3 Convex Functions
3.5. (Jensen’s Inequality) Show that f : E → R ∪ {+∞} is convex if and only if
!
p
p
X
X
λi f (xi ) ∀xi ∈ E (i = 1, . . . , p), λ ∈ ∆p .
f
λ i xi ≤
i=1
i=1
3.6. (Quasiconvex functions) A function f : E → R is called quasiconvex if the level sets
lev≤α f are convex for every α ∈ R. Show:
a) Every convex function is quasiconvex.
b) f : E → R ∪ {+∞} is quasiconvex if an only if
f (λx + (1 − λ)y) ≤ max{f (x), f (y)}
(x, y ∈ dom f, λ ∈ [0, 1]).
c) If f : E → R ∪ {+∞} is quasiconvex then argmin f is a convex set.
3.7. (Coercivity is level-boundedness) Show that a function f : E → R is coercive if and
only if it is level-bounded.
3.8. (Post-composition with monotonically increasing, convex functions) Let f : E →
R ∪ {+∞} be convex (and lsc) and let g : R → R ∪ {+∞} be convex (and lsc) and
nondecreasing. We put g(+∞) := +∞ and assume that limx→∞ g(x) = +∞.
a) Show that g ◦ f is convex (and lsc);
b) Give a necessary and sufficient condition for g ◦ f to be proper.
3.9. (Supercoercivity in sums) Let f ∈ Γ and g : E → R ∪ {+∞} supercoercive. Show
that f + g is supercoercive.
3.10. (Convergence of prox-operator) Let f ∈ Γ0 and x̄ ∈ dom f . Prove that
Pλ f (x̄) → x̄
and f (Pλ f (x̄)) → f (x̄)
(λ ↓ 0).
3.11. (Minimizing differentiable convex functions) Let f : E → R be convex and differentiable and C ⊂ E. Show that x̄ ∈ C is a minimizer of f over C if and only if
−∇f (x̄) ∈ NC (x̄).
3.12. (Convex hulls of functions) Let f : E → R. Show the following:
a) epi (conv f ) = conv (epi f );
nP
o
PN +2
N +2
b) (conv f )(x) = inf
λ
f
(x
)
λ
∈
∆
,
x
=
λ
x
i
i
i
i
N
+2
i=1
i=1
(x ∈ E).
3.13. (Properness of convex hull) Let f : E → R.
a) Show that f is proper if conv f is. Does the reverse implication hold as well?
125
3 Convex Functions
b) Show that conv f is proper if and only if f has an affine minorant.
3.14. (Self-conjugacy) Show that 12 k · k2 is the only function f : Rn → R with f ∗ = f .
3.15. (Conjugate of negative logdet) Compute f ∗ and f ∗∗ for
− log(det X)
if
X 0,
n
f : X ∈ S 7→
+∞ else.
3.16. (Positive homogeneity, sublinearity and subadditivity) Let f : E → R. Show the
following:
a) f is positively homogeneous if and only if epi f is a cone. In this case f (0) ∈
{0, −∞}.
b) If f is lsc and positively homogeneous with f (0) = 0 it must be proper.
c) The following are equivalent:
i) f is sublinear;
ii) f is positively homogeneous and convex;
iii) f is positively homogeneous and subadditive;
iv) epi f is a convex cone.
3.17. (Finiteness of support functions) Let S ∈ E be nonempty. Then σS is finite if and
only if S is bounded.
3.18. (Polar sets) Show the following:
a) If C ∈ E is a cone, we have {v | hv, xi ≤ 0 (x ∈ C) } = {v | hv, xi ≤ 1 (x ∈ C) }.
b) C ⊂ E is bounded if and only if 0 ∈ int C ◦ .
c) For any closed half-space H containing 0 we have H ◦◦ = H.
3.19. (Gauge functions) Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Prove:
a) C = lev≤1 γC ,
C ∞ = γC−1 ({0}),
dom γC = R+ C
b) The following are equivalent:
i) γC is a norm (with C as its unit ball);
ii) C is bounded, symmetric (C = −C) with nonempty interior.
∗
3.20. (Cone polarity and conjugacy) Let K ⊂ E be a convex cone. Then δK ←→ δK ◦ .
3.21. (Soft thresholding) For f : x ∈ Rn 7→ kxk1 compute ∂f and eλ f (λ > 0).
126