3 Convex Functions 3.1 Convexity notions for functions and basic properties We start the chapter with the basic definition of a convex function. Definition 3.1.1 (Convex function) A function f : E → R is said to be convex if epi f is a convex set. Note that in the above definition we could have substitued the epigraph for the strict epigraph epi < f := {(x, α) ∈ E × R | f (x) < α } of f , see Exercise 3.3.. Moreover, note that convex functions have convex level sets, see Exercise 3.6. Recall that the domain of a function f : E → R is defined by dom f := {x ∈ E | f (x) < ∞ }. Using the linear mapping L : (x, α) ∈ E × R 7→ x ∈ E, (3.1) we have dom f = L(epi f ), and hence Proposition 2.1.2 yields the following immediate but important result. Proposition 3.1.2 (Domain of a convex function) The domain of a convex function is convex. Recall that a (convex) function f : E → R is proper if dom f 6= ∅ and f (x) > −∞ for all x ∈ E. Improper convex functions are somewhat pathological (cf. Exercise 3.4.), but they do occur; rather as by-products then as primary objects of study. For example the function   −∞ if |x| < 1, 0 if |x| = 1, f : x ∈ R 7→  +∞ if |x| > 1. is improper and convex. Convex functions have an important interpolation property, which we summarize in the next result for the case that f does not take the value −∞. Proposition 3.1.3 (Characterizing convexity) A function f : E → R ∪ {+∞} is convex if and only if for all x, y ∈ E we have f (λx + (1 − λ)y) ≤ (1 − λ)f (x) + λf (y) 64 (λ ∈ [0, 1]). (3.2) 3 Convex Functions Proof: First, let f be convex. Take x, y ∈ E and λ ∈ [0, 1]. If x ∈ / dom f or y ∈ / dom f the inequality (3.2) holds trivially, since the right-hand side is going to be +∞. If, on the other hand, x, y ∈ dom f , then (x, f (x)), (y, f (y)) ∈ epi f , hence by convexity (λx + (1 − λ)y, λf (x) + (1 − λ)f (y)) ∈ epi f, i.e. f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), which proves the first implication. In turn, let (3.2) hold for all x, y ∈ E. Now, take (x, α), (y, β) ∈ epi f and let λ ∈ [0, 1]. Due to (3.2) we obtain f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ≤ λα + λβ, i.e. λ(x, α) + (1 − λ)(y, β) ∈ epi f , which shows the converse implication. We move the analogous characterization of convexity for functions E → R to Exercise 3.3., because these kinds of functions are not our primary object of study. The next result is an extension of Proposition 3.1.3, which can be seen in various ways. Corollary 3.1.4 (Jensen’s Inequality) A function f : E → R ∪ {+∞} is convex if and only if ! p p X X f λ i xi ≤ λi f (xi ) (xi ∈ E (i = 1, . . . , p), λ ∈ ∆p ) i=1 i=1 Proof: Exercise 3.5. It is sometimes expedient to consider convexity of a function restricted to a subset of its domain. Definition 3.1.5 (Convexity on a set) For a nonempty convex set C ⊂ dom f , we call f : E → R ∪ {+∞} convex on C if (3.2) holds for all x, y ∈ C. Corollary 3.1.6 Let f : E → R ∪ {+∞}. Then the following are equivalent. i) f is convex. ii) f is convex on its domain. Proof: The implication’ i)⇒ii)’ is obvious from the characterization of convexity in Proposition 3.1.3 For the converse implication note that (3.2) always holds for any pair of points x, y if one of them is not in the domain. This completes the proof. 65 3 Convex Functions Remark 3.1.7 As an immediate consequence of Corollary 3.1.6, we can make the following statement about proper, convex functions: ”The proper and convex functions E → R ∪ {+∞} are those for which there exists a nomempty, convex set C ⊂ E such that (3.2) holds on C and f takes the value +∞ outside of C.” We are mainly interested in proper, convex (even lsc) functions E → R ∪ {+∞}. Hence, we introduce the abbreviations Γ := Γ(E) := {f : E → R ∪ {+∞} | f proper and convex } and Γ0 := Γ0 (E) := {f : E → R ∪ {+∞} | f proper, lsc and convex } which we will use frequently in the remainder. Ever so often some stronger notions of convexity are needed, which we establish now. Definition 3.1.8 (Strict and strong convexity) Let f be proper and convex and C ⊂ dom f convex. Then f is said to be • strictly convex on C if f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) (x, y, ∈ C, x 6= y, λ ∈ (0, 1)). • strongly convex on C if there exists σ > 0 such that σ f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) − λ(1 − λ)kx − yk2 2 (x, y, ∈ C, λ ∈ (0, 1)) The scalar σ > 0 is called modulus of strong convexity of f (on C). For C = dom f we simply call f strictly and strongly convex, respectively. Proposition 3.1.9 (Characterization of strong convexity) Let f be proper and convex and C ⊂ dom f . Then f is strongly convex on C with modulus σ > 0 if and only if f − σ2 k · k2 is convex on C. Proof: First, let f be strongly convex on C with modulus σ > 0. Then for any λ ∈ (0, 1) and x, y ∈ C we have σ kλx + (1 − λ)yk2 2 σ ≤ λf (x) + (1 − λ)f (y) − λ(1 − λ)kx − yk2 + kλx + (1 − λ)yk2 2 σ σ 2 ≤ λ f (x) − kxk + (1 − λ) f (y) − kyk2 , 2 2 f (λx + (1 − λ)y) − 66 3 Convex Functions i.e. f − σ2 k · k2 is convex on C. If, in turn, f − σ2 k · k2 is convex on C we compute that σ kλx + (1 − λy)k2 − λkxk2 − (1 − λ)kyk2 2 σ = λf (x) + (1 − λ)f (y) − λ(1 − λ)kx − yk2 2 f (λx + (1 − λy)) ≤ λf (x) + (1 − λ)f (y) + for all x, y ∈ C, i.e. f is strongly convex on C with modulus σ > 0. We stop our analysis for a short list of obvious convex functions. In Section 3.1.1 we learn how to build new convex functions from old ones. Example 3.1.10 (Examples of convex functions) a) (Affine functions) Every affine function F : E → R is convex. b) (Indicator of convex sets) For a set C ⊂ E its indicator function δC is convex if and only if C is convex. c) (Norms) Any Norm k · k∗ on E is convex. 3.1.1 Functional operations preserving convexity Proposition 3.1.11 (Positive combinations of convex functions) For p ∈ N let fi : E → R ∪ {+∞} be convex (and lsc) and αi ≥ 0 for i = 1, . . . , p. Then p X αi fi i=1 is convex (and lsc). If, in addtion, ∩pi=1 dom fi 6= ∅, then f is also proper. Proof: The convexity assertion is an immediate consequence of the characterization in (3.2). For the additional closedness see Exercise 1.12. The properness statement is obvious. Note that the latter result tells us that Γ and Γ0 are convex cones. Proposition 3.1.12 (Pointwise supremum of convex functions) For an arbitrary index set I let fi be convex (and lsc) for all i ∈ I. Then the function f = supi∈I fi , i.e. f (x) = sup fi (x) i∈I is convex (and lsc). 67 (x ∈ E) 3 Convex Functions Proof: It holds that epi f = (x, α) \ sup fi (x) ≤ α = {(x, α) | ∀i ∈ I : fi (x) ≤ α } = epi fi . i∈I i∈I Since the intersection of (closed) convex sets it (closed) convex, this gives the assertion. Proposition 3.1.13 (Pre-composition with and affine mapping) Let H : E1 → E2 be affine and g : E2 → R ∪ {+∞} (lsc and) convex. Then the function f := g ◦ H is (lsc and) convex. Proof: Let x, y ∈ E1 and λ ∈ (0, 1). Then we have f (λx+(1−λx)) = g(λH(x)+(1−λ)y) ≤ λg(H(x))+(1−λ)g(H(y)) = λf (x)+(1−λ)f (y), which gives the convexity of f . The closedness of f , under the closedness of g, follows from the continuity (as a consequence of affineness) of H, cf. Exercise 1.13. Proposition 3.1.14 (Post-composition with monotonically increasing, convex functions) Let f be convex (and lsc) and let g : R → R ∪ {+∞} be convex (and lsc) and increasing. Under the convention g(+∞) := +∞ and limx→∞ g(x) = +∞, the function g ◦ f is convex (and lsc). If in addition, there exists x0 such that f (x0 ) ∈ dom g, then g ◦ f is proper. Proof: Exercise 3.8. Proposition 3.1.15 (Convexity under epi-composition) Let f ∈ Γ and L ∈ L(E, E0 ). Then the function Lf : E0 → R defined by (Lf )(y) := inf {f (x) | L(x) = y } is convex. Proof: We first show that , with T : (x, α) 7→ (Lx, α), we have epi < Lf = T (epi < f ). To this end, recall that epi < Lf = {(y, α) | Lf (y) < α } and epi < f = {(x, α) | f (x) < α } . 68 (3.3) 3 Convex Functions First, let (x, α) ∈ epi < f . Then T (x, α) = (L(x), α) and (Lf )(L(x)) = inf {f (z) | L(z) = L(x) } ≤ f (x) < α, z thus, T (x, α) ∈ epi < Lf . In turn, if (y, α) ∈ epi < Lf , i.e. inf {f (z) | L(z) = y } < α, then L−1 (y) 6= ∅, hence, there exists x ∈ L−1 (y) with f (x) < α. Thus, we have T (x, α) = (y, α) and (x, α) ∈ epi < f . This proves (3.3). Now, as f is convex, epi < f is convex (see Exercise 3.3.). But, since T is linear, from (3.3) it follows that also epi < Lf is convex, which proves the convexity of Lf . 3.1.2 Differentiable convex functions We want to apply the notion of differentiability to extended-real valued functions. This only makes sense at points for which there exists a whole neighborhood on which the function in question is at least finitely valued, i.e. at points in the interior of the domain: For f : E → R we say that f is differentiable at x ∈ int (dom f ) if f restricted to int (dom f ) is differentiable at x. Stronger notions of differentiability are defined accordingly. Convexity of differentiable functions can be handily characterized. Theorem 3.1.16 (First-order characterizations) Let f : E → R ∪ {+∞} be differentiable on a convex, open set C ⊂ int (dom f ). Then the following hold: a) f is convex on C if and only if f (x) ≥ f (x̄) + h∇f (x̄), x − x̄i (x, x̄ ∈ C). (3.4) b) f is strictly convex on C if and only if (3.4) holds with strict inequality whenever x 6= x̄. c) f is strongly convex with modulus σ > 0 on C if and only if f (x) ≥ f (x̄) + h∇f (x̄), x − x̄i + σ kx − x̄k2 2 (x, x̄ ∈ C). (3.5) Proof: a) First, let f be convex and take x, x̄ ∈ C and λ ∈ (0, 1). Then by convexity it holds that f (x̄ + λ(x − x̄)) − f (x̄) ≤ λ(f (x) − f (x̄)). As f is differentiable on C, dividing by λ and letting λ → 0 gives h∇f (x̄), x − x̄i ≤ f (x) − f (x̄), 69 3 Convex Functions which establishes (3.4). In turn, if (3.4) holds, we take x1 , x2 ∈ C, λ ∈ (0, 1) and put x̄ := λx1 + (1 − λ)x2 ∈ C). By (3.4) it follows that f (xi ) ≥ f (x̄) + h∇f (x̄), xi − x̄i (i = 1, 2). Multiplying these two inequalities by λ and (1 − λ), respectively, summation of the resulting inequalities yields λf (x1 ) + (1 − λ)f (x2 ) ≥ f (x̄) + h∇f (x̄), λx1 + (1 − λ)x2 − x̄i = f (λx1 + (1 − λ)x2 ). As x1 , x2 were taken arbitrarily from C, f is convex on C. b) If f is strictly convex on C, for x, x̄ ∈ C, x 6= x̄ and λ ∈ (0, 1), we have f (x̄ + λ(x − x̄)) − f (x̄) < λ(f (x) − f (x̄)). In addition, since f is, in particular, convex, part a) implies that h∇f (x̄), λ(x − x̄)i ≤ f (x̄ + λ(x − x̄)) − f (x̄). Combining these inequalities gives the desired strict inequality. The converse implications is proven analogously to the respective implication in a), starting from the strict inequality. c) Using Proposition 3.1.9, applying part a) to f − σ2 k · k2 gives the assertion. Theorem 3.1.16 opens the door for another characterization of convexity of differentiable functions on open sets in terms of so-called monotonicity properties of the gradient mapping. Before we prove it we would like to recall the reader of the chain rule for differentiable functions. For i = 1, 2 let Ωi ∈ Ei be open. If f : Ω1 ⊂ E1 → E2 is differentiable at x̄ ∈ Ω1 and g : Ω2 → E3 is differentiable at f (x̄) ∈ Ω2 , then g ◦ f : Ω1 → E3 is differentiable at x̄ with (g ◦ f )0 (x̄) = g 0 (f (x̄)) ◦ f 0 (x̄). Corollary 3.1.17 (Monotonicity of gradient mappings) Let f : E → R ∪ {+∞} be differentiable on the open set Ω ⊂ int (dom f ) and let C ⊂ Ω be convex. Then the following hold: 70 3 Convex Functions a) f is convex on C if and only if h∇f (x) − ∇f (y), x − yi ≥ 0 (x, y ∈ C). (3.6) b) f is strictly convex on C if and only if (3.6) holds with a strict inequality whenever x 6= y. c) f is strongly convex with modulus σ > 0 on C if and only if h∇f (x) − ∇f (y), x − yi ≥ σkx − yk2 (x, y ∈ C). (3.7) Proof: We are first going to show one direction in c) and a), respectively: To this end, first, let f be strongly convex with modulus σ > 0 on C. Hence, by Theorem 3.1.16 c), for x, y ∈ C, we obtain σ f (x) ≥ f (y) + h∇f (y), x − yi + kx − yk2 2 and σ f (y) ≥ f (x) + h∇f (x), y − xi + kx − yk2 . 2 Adding these two inequalities yields (3.7), which shows one implication in c). Setting σ = 0 gives the same implication in a). We now show the converse directions in a) and c): For these purposes, let x, y ∈ C be given, and consider the function ϕ : I → R, ϕ(t) := f (x + t(y − x)). with I an open interval containing [0, 1]. We put xt := x + t(y − x) ∈ C for all t ∈ [0, 1] and realize that ϕ is differentiable on I with ϕ0 (t) = h∇f (xt ), y − xi for all t ∈ [0, 1] (chain rule). Hence, we obtain ϕ0 (t) − ϕ0 (s) = h∇f (xt ) − ∇f (xs ), y − xi = 1 h∇f (xt ) − ∇f (xs ), xt − xs i t−s (3.8) for all 0 ≤ s < t ≤ 1. If (3.6) holds, this implies that ϕ0 is nondecreasing on [0, 1], hence ϕ is convex on (0, 1), cf. Exercise 3.2., i.e. f is convex on (x, y). Since x, y ∈ C were chosen arbitrarily this implies that f is actually convex on C. For the strong convexity, set s := 0 in (3.8) and use ϕ0 (t) − ϕ0 (0) ≥ σ kxt − xk2 = tσky − xk2 . t Integrating and exploiting the definition of ϕ then yields f (y) − f (x) − h∇f (x), y − xi = ϕ(1) − ϕ(0) − ϕ0 (0) 71 (3.9) 3 Convex Functions Z 1 = ϕ0 (t) − ϕ0 (0) dt 0 Z ≥ 1 tσky − xk2 dt 0 = σ ky − xk2 , 2 which gives (3.5) for x, y. As they were chosen arbitrarily in C, f is strongly convex on C by Theorem 3.1.16 c). The same technique of prove gives part b), where (3.9) becomes a strict inequality with σ = 0, and remains strict after integration. We now investigate convexity criteria for even twice differentiable functions. Theorem 3.1.18 (Twice differentiable convex functions) Let f : E → R ∪ {+∞} be twice differentiable on the open convex set Ω ⊂ int (dom f ). Then the followoing hold: a) f is convex on Ω if and only if ∇2 f (x) is positive semidefinite for all x ∈ Ω. b) If ∇f (x)2 is positive definite for all x ∈ Ω then f is strictly convex on Ω. c) f is strongly convex with modulus σ > 0 on Ω if and only if, for all x ∈ Ω, the smallest eigenvalue of ∇2 f (x) is bounded by σ from below. Proof: Let x ∈ Ω, d ∈ E. Since Ω is open, the intervall I := I(x, d) := {t ∈ R | x + td ∈ Ω } is open. We define ϕ : R → R, ϕ(t) := f (x + td). (3.10) Then, in particular, ϕ is twice differentiable on I with ϕ00 (t) = ∇2 f (x + td)d, d for all t ∈ I. a) First, assume that f is convex on Ω. Now, let x ∈ Ω and d ∈ E \ {0}. Then ϕ from (3.10) is convex on I by Proposition 3.1.13. Using Exercise 3.2. it follows that 0 ≤ ϕ00 (t) = ∇2 f (x + td)d, d , which gives the first implication. Conversely, take x, y ∈ Ω arbitrarily, put d := y − x and assume that ∇2 f (x + td) is positive semidefinite. Then for ϕ from (3.10) we have ϕ00 (t) ≥ 0 for all t ∈ [0, 1] ⊂ I. Therefore Exercise 3.2. tells us that ϕ is convex on (0, 1), i.e. f is convex on (x, y). Since x, y ∈ Ω were chosen arbitrarily, f is convex on Ω. 72 3 Convex Functions b) Again take x, y ∈ Ω with x 6= y and put d := y − x. Applying the mean-value theorem to the function ϕ0 , which is differentiable on (0, 1), yields some τ ∈ (0, 1) such that h∇f (y) − ∇f (x), y − xi = ϕ0 (1) − ϕ0 0) = ϕ00 (τ ) = ∇2 f (x + τ d)d, d > 0. Corollary 3.1.17 then gives the assertion. c) Using Proposition 3.1.9, we apply a) to the function f − σ2 k · k2 , whose Hessian at x ∈ Ω is ∇2 f (x) − σI which has the eigenvalues λi − σ with λ1 , . . . , λN the eigenvalues of ∇2 f (x̄). This gives the assertion, as a symmetric matrix is positive semidefinite if and only if all of its (real) eigenvalues are nonnegative. Note that the condition in Theorem 3.1.18 b) is only sufficient for strict convexity. As an example that it is not necessary notice that x 7→ 14 x4 is strictly convex, but f 00 (0) = 0. We continue with an example where we can succesfully apply a second-order criterion to detect convexity of an important function. Example 3.1.19 (The log-determinant function) Consider the function − log(det X) if X 0, f : Sn → R ∪ {+∞}, f (x) := +∞ else (3.11) which we call the (negative) log-determinant or the (negative) logdet function, for short. Then f is proper, continuous and strictly convex, in particular, f ∈ Γ0 (Sn ): The continuity of f is easily verified and as dom f = Sn++ , f is proper and twice differentiable on dom f with ∇f (X) = −X and ∇2 f (X) = X −1 (·)X −1 (X 0), see Example 1.1.5 and Exercise 1.6.. In particular, it holds for all X ∈ dom f and H ∈ Sn \ {0} that 2 ∇ f (X)(H), H = tr X −1 HX −1 H = tr (HX −1/2 )T X −1 (HX −1/2 ) > 0, as X −1 0 and hence also HX −1/2 6= 0. Thus, by Theorem 3.1.18, f is strictly convex. 3.2 Minimization and convexity We turn our attention to minimization problems of the form inf f (x), x∈C 73 (3.12) 3 Convex Functions where C ⊂ E is nonempty and closed and f : E → R ∪ {+∞} at least lsc. Note that (3.12) is equivalent to the problem inf f (x) + δC (x), x∈E a simple fact that we are going to exploit frequently. If f ∈ Γ0 and C is convex, we call (3.12) a convex minimization (optimization) problem. When talking about minimizers the questions for uniqueness and existence arise naturally. We start our study with some general existence results. 3.2.1 General existence results Existence results tradionally employ coercivity properties of the objective function, and, more or less, do not depend too much on convexity. Definition 3.2.1 (Coercivity and supercoercivity) Let f : E → R. Then f is called i) coercive if lim kxk→+∞ f (x) = +∞; ii) supercoercive if f (x) = +∞. kxk→+∞ kxk lim The nomenclature for the above coercivitiy concepts is not unified in the literature. We use the same naming as in [1]. In [3], for instance, the authors use 0-coercive and 1-coercive for coercive and supercoercive instead. In fact, we have already dealt with coercivity under a different moniker as the following result shows whose elementary proof is left to the reader as an exercise. Lemma 3.2.2 (Level-boundedness = coercivity) A function f : E → R is coercive if and only if it is level-bounded. Proof: Exercise 3.7. In the lsc and convex case, coercivity is checked much easier. Proposition 3.2.3 (Coercivity of convex functions) Let f ∈ Γ0 . Then f is coercive if and only if there exists α ∈ R such that lev≤α f is nonempty and bounded. Proof: By Lemma 3.2.2, coercivity implies that all level sets are bounded and, as f is proper, there is a nonempty one, too. In turn, assume that lev≤α f is nonempty and bounded for some α ∈ R and pick x ∈ lev≤α . Clearly, all level sets to levels smaller than α are bounded, too. Hence, we still need to show that lev≤γ is bounded for all γ > α. To this end take v ∈ (lev≤γ )∞ . 74 3 Convex Functions Since lev≤γ f is closed and convex (as f is lsc and convex, cf. Proposition 1.2.4 and Exercise 3.6.), Corollary 2.4.24 yields x + λv ∈ lev≤γ f (λ ≥ 0). (3.13) Hence, for all λ > 1, it can be seen that 1 1 x+v = 1− x + (x + λv), λ λ and hence, by convexity and (3.13), we obtain 1 1 1 1 f (x) + f (x + λv) ≤ 1 − f (x) + γ. f (x + v) ≤ 1 − λ λ λ λ (3.14) Letting λ → +∞ and recalling that x ∈ lev≤α f , (3.14) gives f (x + v) ≤ f (x) ≤ α. As v ∈ (levγ f )∞ was chosen arbitrarily, we infer that x + (lev≤γ f )∞ ⊂ lev≤α f. However, by the choice of α, lev≤α f is bounded, and hence, necessarily the cone (lev≤γ f )∞ is bounded, too. That leaves only (lev≤γ f )∞ = {0}, and hence, by Proposition 2.4.21, lev≤γ f is bounded, which completes the proof. We now present the main existence result for minimization problems, which is, in fact, only a corollary to the existence result Theorem 1.2.6 using our new terminology and stating the constrained case explicitly. Corollary 3.2.4 (Existence of minimizers) Let f : E → R ∪ {+∞} be lsc and let C ⊂ E be closed such that dom f ∩ C 6= ∅ and suppose that one of the following holds: i) f is coercive; ii) C is bounded. Then f has a minimizer over C. Proof: Consider the function g = f + δC . Then it holds that lev≤α g = C ∩ lev≤α f (α ∈ R). Hence, under either assumption i) (cf. Lemma 3.2.2) or ii), g has closed and bounded level-sets and is hence lsc and level-bounded. The assertion hence follows from Theorem 1.2.6. We now apply this result to the sum of functions: 75 3 Convex Functions Corollary 3.2.5 (Existence of minimizers II) Let f, g : E → R ∪ {+∞} be lsc such that dom f ∩ dom g 6= ∅. If f is coercive and g is bounded from below, then f + g is coercive and has a minimizer (over E). Proof: In view of Corollary 3.2.4 it suffices to show that f + g is coercive, as it is already lsc, cf. Exercise 1.12. Putting g ∗ := inf E g > −∞, we see that f (x) + g(x) ≥ f (x) + g ∗ →kxk→∞ +∞, which proves the result. 3.2.2 Convex minimization We now turn our attention to convex minimization problems: Recall the notion of global and local minimizers from Definition 2.4.16. It turns out that there is no distinction needed in the convex setting. Proposition 3.2.6 Let f ∈ Γ. Then every local minimizer of f (over E) is a global minimizer. Proof: Let x̄ be a local minimizer of f and suppose there exists x̂ such that f (x̂) < f (x̄). Now let λ ∈ (0, 1) and put xλ := λx̂ + (1 − λ)x̄. By convexity, we have f (xλ ) ≤ λf (x̂) + (1 − λ)f (x̄) < f (x̄) ∀λ ∈ (0, 1). On the other hand, we see that xλ → x̄ as λ ↓ 0, which all in all contradicts the fact that x̄ is a local minimizer of f . Hence, x̂ cannot exist, which means that x̄ is even a global minimizer of f. Using our usual technique of casting a constrained optimization problem as an unconstrained problem by means of the indicator function of the constraint set, we immediately get the following result. Corollary 3.2.7 (Minimizers in convex minimization) Let f ∈ Γ and C ⊂ E nonempty and convex. Then every local minimizer of f over C is a global minimizer of f over C. Proof: Apply Proposition 3.2.6 to the function f +δC which is convex by Proposition 3.1.11. The following results show that convex minimization problems have convex solution sets. Proposition 3.2.8 Let f ∈ Γ. Then argmin f is a convex set. 76 3 Convex Functions Proof: If f ∗ := inf f ∈ R, we have that argmin f = lev≤f ∗ f . But as a convex function, f has convex level sets, cf. Exercise 3.6. We state the constrained case explicitly. Corollary 3.2.9 Let f ∈ Γ and C ⊂ E convex. Then argminC f is convex. Proof: Apply Proposition 3.2.8 to f + δC ∈ Γ. Uniqueness of minimizers of (convex) minimization problems comes into play with strict convexity, see Definition 3.1.8. Proposition 3.2.10 (Uniqueness of minimizers) Let f ∈ Γ be strictly convex. Then f has at most one minimizer. Proof: Assume that x, y ∈ argmin f , i.e. inf f = f (x) = f (y). If x 6= y, then strict convexity of f implies for all λ ∈ (0, 1) that f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) = inf f, This is a contradiction, hence x = y. Corollary 3.2.11 (Minimizing the sum of convex functions) Let f, g ∈ Γ0 such that dom f ∩ dom g 6= ∅. Suppose that one of the follwoing holds: i) f is supercoercive; ii) f is coercive and g is bounded from below. Then f + g is coercive and has a minimizer (over E). If f or g is strictly coercive, f + g has exaclty one minimizer. Proof: Since f + g ∈ Γ0 , f + g is, in particular, lsc, hence for the first assertion, in view of Corollary 3.2.4, we only need to prove that f + g is coercive in either of the cases i) or ii). If f is supercoercive, then f + g is supercoercive by Exercise 3.9., hence in particular, coercive. In the second case, everything works also without convexity, see Corollary 3.2.5. The uniqueness result follows immediately from Proposition 3.2.10, realizing that f +g ∈ Γ0 is strictly convex if one of the summands is. We close out the section with a very powerful result on optimal value functions of parameterdependent convex minimization problem. 77 3 Convex Functions Theorem 3.2.12 (Parametric minimization) Let h : E1 × E2 → R ∪ {+∞} be convex. Then the optimal value function ϕ : E1 → R, ϕ(x) := inf h(x, y) y∈E2 is convex. Moreover, the set-valued mapping x 7→ argmin h(x, y) ⊂ E2 . y∈E2 is convex-valued. Proof: It can easily be shown that epi < ϕ = L(epi < h) under the linear mapping L : (x, y, α) 7→ (x, α). This immediately gives the convexity of ϕ. The remaining assertion follows immediately from Proposition 3.2.8, since y 7→ h(x, y) is convex for all x ∈ E1 . 3.3 Affine minorization of convex functions In this section we will prove that every proper, convex function that does not take the value −∞ is minorized by an affine mapping at every point of the relative interior of its domain, and this minorant can actually be chosen such that it coincides with the convex function at the point in question. This result is a very useful tool for proofs involving convex functions and has tremendous consequences for subdifferential and duality theory of convex function as we will see later on. For these purposes we need to study the relative interior of the epigraph of a convex function and how it is related to the relative interior of the domain of the function in question. Note that we can acutally speak of these relative interiors, since a convex function has (by definition) a convex epigraph and (see Proposition 3.1.2) also a convex domain. Proposition 3.3.1 (Relative interior of epigraph) Let f : E → R be convex. Then ri (epi f ) = {(x, α) ∈ E × R | x ∈ ri (dom f ), f (x) < α } . Proof: Let L be the linear mapping from (3.1). By Proposition 2.3.15 we have ri (dom f ) = L(ri (epi f )). (3.15) Now, take x ∈ ri (dom f ). For the subset of ri (epi f ) that is mapped to x under L, we compute L−1 ({x}) ∩ ri (epi f ) = ({x} × R) ∩ ri (epi f ) = ri [({x} × R) ∩ epi f ] 78 3 Convex Functions = ri [{x} × [f (x), +∞)] = {x} × (f (x), +∞), where the third equality uses the fact that {x} × R is relatively open and Proposition 2.3.14 b). Thus, for (x, α) ∈ ri (epi f ), we have x ∈ ri (dom f ) by (3.15), and hence (x, α) ∈ L−1 ({x})∩ ri (epi f ) = {x} × (f (x), +∞), in particular, α > f (x). In turn, if x ∈ ri (dom f ) and f (x) < α then (x, α) ∈ {x} × (f (x), +∞) = L−1 (x) ∩ ri (epi f ), in particular, (x, α) ∈ ri (epi f ). Note that, by the description from (3.3.1), the relative interior of the epigraph of a given convex function f does not necessarily coincide with its strict epigraph {(x, α) ∈ E × R | f (x) < α } . We now come to the promised main theorem of this paragraph. Theorem 3.3.2 (Affine minorization theorem) Let f ∈ Γ and x̄ ∈ ri (dom f ) (6= ∅). Then there exists g ∈ E such that f (x) ≥ f (x̄) + hg, x − x̄i (x ∈ E). (3.16) In particular, there exists an affine mapping, namely F : x ∈ E 7→ hg, x − x̄i + f (x̄) ∈ R, which minorizes f everywhere and coincides with f at x̄. Proof: By Proposition 3.3.1, we have ri (epi f ) = {(x, α) | x ∈ ri (dom f ), f (x) < α } . Hence, (x̄, f (x̄)) ∈ rbd (epi f ). Thus, we can properly separate (x̄, f (x̄)) from epi f using Proposition 2.6.10, i.e. there exists (s, η) ∈ (E × R) \ {0} such that h(s, η), (x, α)i ≤ h(s, η), (x̄, f (x̄))i ((x, α) ∈ epi f ) and h(s, η), (x, α)i < h(s, η), (x̄, f (x̄))i ((x, α) ∈ ri (epi f )). For α > f (x̄) we have (x, α) := (x̄, α) ∈ ri (epi f ), hence, the latter yields η(α − f (x̄)) < 0, which immediately implies that η < 0. s Now put g := |η| . Dividing (3.17) by |η| then yields h(g, −1), (x, α)i ≤ h(g, −1), (x̄, f (x̄))i ((x, α) ∈ epi f ) or, equivalently, α ≥ f (x̄) + hg, x − x̄i 79 ((x, α) ∈ epi f ). (3.17) 3 Convex Functions As f is proper, (x, f (x)) ∈ epi f for all x ∈ dom f , thus f (x) ≥ f (x̄) + hg, x − x̄i (x ∈ dom f ). For x ∈ / dom f , this inequality holds trivially, hence the result is proven. 3.4 Infimal convolution of convex functions Definition 3.4.1 (Infimal convolution) Let f, g : E → R ∪ {+∞}. Then the function f #g : E → R, (f #g)(x) := inf {f (u) + g(x − u)} u∈E is called the infimal convolution of f and g. We call the infimal convolution f #g exact at x ∈ E if argmin{f (u) + g(x − u)} = 6 ∅. u∈E We simply call f #g exact if it is a exact at every x ∈ dom f #g. Observe that we have the representation (f #g)(x) = inf u1 ,u2 :u1 +u2 =x {f (u1 ) + g(u2 )}. (3.18) This has some obvious, yet useful consequences. Lemma 3.4.2 Let f, g : E → R ∪ {+∞}. Then the following hold: a) dom f #g = dom f + dom g; b) f #g = g#f . Moreover, observe the trivial inequality (f #g)(x) ≤ f (u) + g(x − u) (u ∈ E). (3.19) Infimal convolution preserves convexity, as can be seen in the next result. Proposition 3.4.3 (Infimal convolution of convex functions) Let f, g : E → R ∪ {+∞} be convex. Then f #g is convex. 80 3 Convex Functions Proof: Defining h : E × E → R ∪ {+∞}, h(x, y) := f (y) + g(x − y), we see that h is convex (jointly in (x, y)) as a sum of the convex functions (x, y) 7→ f (y) and (x, y) 7→ g(x − y), the latter being convex by Proposition 3.1.13. By definition of the infimal convolution, we have (f #g)(x) = inf h(x, y), y∈E hence, Theorem 3.2.12 yields the assertion. We continue with an important class of functions that can be constructed using infimal convolution, and that is intimately tied to projection mappings. Example 3.4.4 (Distance functions) Let C ⊂ E. Then the function dC := δC #k · k is called the distance (function) to the set C. It holds that dist C (x) = inf kx − uk. u∈C Hence, from Lemma 2.5.1 it is clear that, if C ⊂ E is closed and convex, we have dist C (x) = kx − PC (x)k. In order to preserve lower semicontinuity as well, it is not enough to simply assume that the the functions that are convoluted are lsc (and convex). Section 3.2.1, however, provides us with the necessary tools to deal with this issue. Theorem 3.4.5 (Infimal convolution in Γ0 ) Let f, g ∈ Γ0 and suppose that one of the following conditions hold: i) f is supercoercive; ii) f is coercive and g is bounded from below. Then f #g ∈ Γ0 and is exact. Proof: By Lemma 3.4.2, dom f #g = dom f + dom g 6= ∅. Now, take x ∈ dom f #g. Then, by the definition of f #g, we have dom f ∩ dom g(x − (·)) 6= ∅. Hence, Corollary 3.2.11 implies that f + g(x − (·)) has a minimizer. Thus, for all x ∈ dom f #g there exists u ∈ E such that (f #g)(x) = f (u) + g(x − u) ∈ R. In particular, f #g is proper and exact. Since, by Proposition 3.4.3, f #g ∈ Γ, it remains to be shown that f #g is lsc. For these purposes, let x̄ ∈ E and {xk } → x̄ such that (f #g)(xk ) → α. 81 3 Convex Functions We need to show that α ≥ f (x̄), hence, w.l.o.g. we can assume that α < +∞ (since otherwise there is nothing prove), in particular, xk ∈ dom f #g for all k ∈ N (sufficiently large). Then, by our recent findings, there exists {uk ∈ E} such that (f #g)(xk ) = f (uk ) + g(xk − uk ) (k ∈ N). We claim that {uk } is bounded: Assume this were false, then (after passing to a subsequence if necessary) we have 0 6= kuk k → +∞. We now show that under either of the assumptions i) and ii), respectively, this yields a contradiction: i): By Theorem 3.3.2, there exists an affine minorant of g, say x 7→ hb, xi + γ. Using the Cauchy-Schwarz inequality , we have f (uk ) kuk k − kbk + hb, xk i + γ ≤ f (uk ) + hb, xk − uk i + γ kuk k ≤ f (uk ) + g(xk − uk ) = (f #g)(xk ) → α < +∞. But, as f is supercoercive and we have (by assumption) that kuk k → ∞, the term on the left-hand side is unbounded from above, which is a contradiction, and hence, {uk } must be bounded. ii): Since f is coercive, we have f (uk ) → +∞ under the assumption that kuk k → +∞. But, since f (uk ) + g(xk − uk ) → α < +∞, we necessarily have g(xk − uk ) → −∞, which is impossible if g is bounded from below, hence {uk } must be bounded. All in all, we get in either case that {uk } is bounded and w.l.o.g. we can assume that uk → u. Relabeling the sequence {xk } if necessary, we obtain α = = lim (f #g)(xk ) k→∞ lim f (uk ) + g(xk − uk ) k→∞ ≥ lim inf f (uk ) + lim inf g(xk − uk ) k→∞ k→∞ ≥ f (u) + g(x − u) ≥ (f #g)(x). This concludes the proof. 82 3 Convex Functions 3.4.1 Moreau envelopes One of the most important and frequently used instances of infimal convolutions is defined below. Definition 3.4.6 (Moreau envelope and proximal mapping) Let f : E → R. The Moreau envelope (or Moreau-Yosida regularization) of f (to the parameter λ > 0) is the function eλ f : E → R defined by 1 eλ f (x) := inf f (u) + kx − uk2 . u∈E 2λ The (possibly set-valued) mapping Pλ f : E ⇒ E 1 kx − uk2 Pλ f (x) := argmin f (u) + 2λ u∈E is called the proximal mapping or prox-operator to the parameter λ > 0 of f . Note that it is easily seen that eλ (αf ) = αeαλ f (α, λ > 0). (3.20) From our findings in Section 3.2 and from above we can immediately state the following result. Proposition 3.4.7 Let f ∈ Γ0 and λ > 0. Then eλ f ∈ Γ0 and Pλ f is single-valued (in particular nonempty). 1 Proof: For the prox-operator everything follows from Corollary 3.2.11 since 2λ kx − (·)k2 is strongly convex hence supercoercive and strictly convex, and it is continuous, hence lsc. The fact that eλ f ∈ Γ0 follows from Theorem 3.4.5. Note that by definition and the above result, for f ∈ Γ0 , λ > 0 and x ∈ E, we have eλ f (x) = f (Pλ f (x)) + 1 1 kx − Pλ f (x)k2 ≤ f (y) + kx − yk2 2λ 2λ (y ∈ E). (3.21) The next results show that the prox-operator for closed, proper convex functions is in fact a generalization of the projection onto closed, convex sets. Proposition 3.4.8 Let f ∈ Γ0 and let x, p ∈ E. Then p = P1 f (x) if and only if hy − p, x − pi + f (p) ≤ f (y) 83 (y ∈ E). (3.22) 3 Convex Functions Proof: First, assume that p = P1 f (x) and let y ∈ E. Then put pα := αy+(1−α)p (α ∈ (0, 1)). Then, for every α ∈ (0, 1), by convexity and (3.21) we have f (p) ≤ f (pα ) + 21 kx − pα k2 − 12 kx − pk2 ≤ αf (y) + (1 − α)f (p) − α hx − p, y − pi + α2 2 ky − pk2 , and hence hy − p, x − pi + f (p) ≤ f (y) + α ky − pk2 2 (y ∈ E, α ∈ (0, 1)). Letting α ↓ 0, we obtain (3.22). Conversely, suppose that (3.22) holds. Then we deduce 1 1 1 f (p) + kx − pk2 ≤ f (y) + kx − pk2 + hx − p, p − yi + kp − yk2 2 2 2 1 2 = f (y) + kx − yk 2 for all y ∈ E. Thus, p = P1 f (x). Note that (3.22) is a generaliztion of (2.13), which can be seen by simply plugging in f = δC for some closed convex set C. The next result shows that the prox-operator is globally Lipschitz continuous. Proposition 3.4.9 (Lipschitz continuity of prox-operator) Let f ∈ Γ0 . Then kP1 f (x) − P1 f (y)k ≤ kx − yk (x, y ∈ E), i.e. P1 f is globally Lipschitz continuous with Lipschitz modulus 1. Proof: Let x, y ∈ E and put p := P1 f (x) and q := P1 f (y). Then Proposition 3.4.8 yields hq − p, x − pi + f (p) ≤ f (q) and hp − q, y − qi + f (q) ≤ f (p). Since p, q ∈ dom f , adding thes two inequalities, we get 0 ≤ hp − q, (x − y) − (p − q)i = hp − q, x − yi − kp − qk2 . Using the Cauchy-Schwarz inequality we obtain the desired result. Can easily apply the results on the prox-operator of some f ∈ Γ0 and the parameter λ = 1 to arbitrary parameters λ > 0 through the identity Pλ f = P1 (λf ), which is an immediate consequence of (3.20). 84 (3.23) 3 Convex Functions Theorem 3.4.10 (Differentiability of Moreau envelopes in Γ0 ) Let f ∈ Γ0 and λ > 0. Then eλ f is differentiable with gradient ∇(eλ f ) = 1 (id − Pλ f ), λ which is globally Lipschitz with modulus λ1 . Proof: Assume that x, y ∈ E are distinct points and set p := Pλ f (x) and q := Pλ f (y). Using (3.21), (3.23) and Proposition 3.4.8, we obtain eλ f (y) − eλ f (x) = f (q) − f (p) + = ≥ = ≥ 1 ky − qk2 − kx − pk2 2λ 1 2[(λf )(q) − (λf )(p)] + ky − qk2 − kx − pk2 2λ 1 2 hq − p, x − pi + ky − qk2 − kx − pk2 2λ 1 ky − q − x + pk2 + 2 hy − x, x − pi 2λ 1 hy − x, x − pi . λ Analogously, by simply changing the roles of x and y in the application of Proposition 3.4.8, we obtain 1 eλ f (y) − eλ f (x) ≤ hy − x, y − qi . λ Using the last two inequalities and invoking Proposition 3.4.9, we obtain 0 ≤ eλ f (y) − eλ f (x) − ≤ ≤ ≤ 1 hy − x, x − pi λ 1 hy − x, (y − x) − (p − q)i λ 1 kx − yk2 − kp − qk2 λ 1 ky − xk2 . λ Therefore, eλ f (y) − eλ f (x) − y − x, lim y→x kx − yk 1 λ (x − p) = 0, which proves the differentiabilty of eλ f and the gradient formula. The Lipschitz continuity of the gradient is then due to Proposition 3.4.9. As a nice application we can prove the differentiability of the squared Euclidean distance function. 85 3 Convex Functions Example 3.4.11 (Differentiability of squared distance function) Let C ⊂ E be nonempty, closed and convex. Then 21 dist 2C = e1 δC is convex and differentiable with Lipschitz gradient 1 2 ∇ dist C = id − PC . 2 We close out the section with a result that is tremendously interesting from an optimization perspective. Proposition 3.4.12 (Minimizers of Moreau envelope) Let f ∈ Γ0 , λ > 0 and x̄ ∈ E. Then argmin f = argmin eλ f E E and inf f = inf eλ f. E E 1 kx̄ − (·)k2 . But since the unique Proof: Let x̄ ∈ argmin f . Then x̄ also minimizes f + 2λ minimizer of the latter is Pλ f (x̄), we must have x̄ = Pλ f (x̄). Hence, by Theorem 3.4.10, we have ∇eλ f (x̄) = 0. Thus, as eλ f is convex, x̄ ∈ argminE eλ f , cf. Exercise 3.11. In turn, if x̄ ∈ argmin eλ f , then ∇eλ f (x̄) = λ1 (x̄ − Pλ f (x̄)) = 0, hence, Pλ f (x̄) = x̄ therefore f (x̄) = eλ f (x̄) ≤ eλ f (y) ≤ f (y) (y ∈ E). All in all we have proven the equality for the argmin sets and the identity for the infima if attained. Since eλ f ≤ f , we always have inf eλ f ≤ inf f . Conversely, fix x ∈ E. Then 1 k(·) − xk2 = eλ f (x). 2λ Taking the infimum over x ∈ E gives the converse inequality. inf f ≤ f + Note that, implicitly, we proved above that x̄ ∈ E is a minimizer of f if and only if Pλ f (x̄) = x̄, i.e. the fixpoints of Pλ f are exactly the minimizers of f . 3.5 Continuity properties of convex functions In this section we want to study continuity properties of convex functions. We start by defining continuity notions relative to a set. Definition 3.5.1 (Continuity relative to a set) Let S ⊂ E. A function f : E → R is said to be continuous relative to (or on ) S if lim f (xk ) = f (x) k→∞ (x ∈ S, {xk ∈ S} → x). In addition, we call f Lipschitz (continuous) relative to S if there exists L ≥ 0 such that |f (x) − f (y)| ≤ Lkx − yk (x, y ∈ S). 86 3 Convex Functions As a preparatory result, we compare a proper convex function with its closure. Proposition 3.5.2 (Closure of convex functions) Let f ∈ Γ. Then cl f ∈ Γ0 . Moreover, cl f agrees with f except perhaps on rbd (dom f ). Proof: Since epi (cl f ) = cl (epi f ), cl f is lsc and convex. Now let x̄ ∈ ri (dom f ). Since f ∈ Γ there is an affine function h ≤ f with h(x̄) = f (x̄) , cf. Theorem 3.3.2. Since every affine function is continuous, in particular closed, we have cl f ≥ cl h = h. Hence, (cl f )(x̄) ≤ f (x̄) = h(x̄) ≤ (cl f )(x̄), therefore (cl f )(x̄) = f (x̄). This shows that f and cl f agree on ri (dom f ). In particular, we see that cl f is proper. Now, let x ∈ / cl (dom f ). Clearly, all sequences {xk } → x have that xk ∈ / dom f for all k sufficiently large, i.e. f (xk ) = +∞, hence lim inf k→∞ f (xk ) = +∞, i.e (cl f )(x) = +∞. This proves the assertion. Just like we argued in Remark 2.3.8 that for a single, given nonempty convex set, we can always assume w.l.o.g. that it has full dimension, we can do the same for the domain of a given proper convex function. Remark 3.5.3 When dealing with a proper convex function f : E → R, we know that dom f is nonempty convex. Let U be the subspace parallel to aff (dom f ) (or any other subspace of E of the same dimenstion.). By Theorem 1.4.18, there exists an invertible affine mapping F : E → E such that F (aff (dom f )) = U . Defining g : U → E by g := f ◦ F −1 , we see that g is proper convex with dom g = F (dom f ), i.e. aff (dom g) = aff (F (dom f )) = F (aff (dom f )) = U . Hence, dom g has full dimension in U . Clearly, Remark 3.5.3 does not apply if at least two functions are involved, whose domain does not generate the same affine hull. We are now in a position to prove our first main result on continuity of convex functions. We encourage the reader to recast the proof without using the assumption (justified through Remark 3.5.3) that f have a domain of full dimension. Theorem 3.5.4 (Continuity of convex functions) A convex function f : E → R is continous relative to any relatively open convex subset of dom f . In particular, it is continuous relative to ri (dom f ). Proof: Let C ⊂ dom f be relatively open and convex and consider g := f +δC . Then dom g = C and g agrees with f on C. Hence, w.l.o.g. we may assume that C = dom f = ri (dom f ); otherwise we substitute f for g. Moreover, in the face of Remark 3.5.3, we can assume that 87 3 Convex Functions C is N -dimensional, hence open instead of merely relatively open. If f is improper, we have by Exercise 3.4. that f is identically −∞ on C and hence continuous on C. We can therefore assume that f is proper. Hence, Proposition 3.5.2 guarantees that cl f = f on C, i.e. f is lsc on C. To prove the result, it suffices to show that f is usc: By Proposition 3.3.1 and openness of C = dom f , we have int (epi f ) = {(x, α) | f (x) < α } . Therefore, for γ ∈ R and with L : (x, α) → x, we find that {x | f (x) < γ } = L (int (epi f ) ∩ {(x, α) | α < γ }) . Since L is surjective and the intersection that it is applied to is open, the set {x | f (x) < γ } is open, cf. Exercise 2.9. Thus, its complement, {x | f (x) ≥ γ } is closed. This is equivalent to saying that f is usc, which concludes the proof. Since finite functions have the whole space as their domain, the next result follows trivially. Corollary 3.5.5 (Continuity of finite convex functions) A convex function f : E → R is continuous. We close out the section with our second main result of this section which is concerned with Lipschitz continuity of convex functions. Theorem 3.5.6 Let f ∈ Γ and let S ⊂ ri (dom f ) be compact. Then f is Lipschitz relative to S. Proof: By Remark 3.5.3 we can assume w.l.o.g. that dom f is N -dimensional so that S actually lies in int (dom f ). By compactness of S, the sets S + εB are compact of all ε > 0, cf. Exercise 1.7. or Corollary 2.4.29. Clearly, for ε > 0 small enough, S + εB ⊂ int (dom f ). Fix such an ε. By Theorem 3.5.4, f is continuous on conv (S + εB) ⊂ int (dom f ) hence, in particular, on S + εB. As S + εB is compact, f is bounded on S + εB, and let l and u be lower and upper bound, respectively. Now, take x, y ∈ E with x 6= y and put ε z := y + (y − x). kx − yk Then z ∈ S + εB and for λ := of f , we see that kx−yk ε+kx−yk we have y = (1 − λ)x + λz, and hence, by convexity f (y) ≤ (1 − λ)f (x) + λf (z) = f (x) + λ(f (z) − f (x)), and consequently, for L := u−l ε , we have f (y) − f (x) ≤ λ(u − l) ≤ Lkx − yk. Interchanging the roles of x and y gives the desired inequality. 88 3 Convex Functions 3.6 Conjugacy of convex functions 3.6.1 Affine approximation and convex hulls of functions We have spent a significant amount of time studying affine functions. Affine mappings are tied utterly close to half-spaces. In fact, given an affine mapping F : E → R, F (x) = hb, xi − β (cf. Exercise 1.2.), we have ≤ epi F = {(x, α) ∈ E × R | hb, xi − β ≤ α } = H(b,−1),β ⊂ E × R. Actually, it can be seen that every half-space in E × R has one of the following three forms: 1) {(x, α) | hb, xi ≤ β } (vertical), 2) {(x, α) | hb, xi − α ≤ β } (upper), 3) {(x, α) | hb, xi − α ≥ β } (lower), for some (b, β) ∈ E × R. Theorem 3.6.1 (Envelope representation in Γ0 ) Let f ∈ Γ0 . Then f is the pointwise supremum of all affine functions minorizing f , i.e. f (x) = sup {h(x) | h ≤ f, h affine } . Proof: Since f is lsc and convex, epi f is a closed convex set in E × R, and therefore, by Theorem 2.7.1, it is the intersection of all closed half-spaces in E × R containing it. No lower half-space can possibly contain epi f . Hence, only vertical and upper half-spaces can be involved in the intersection. We argue that not all of these half-spaces can be vertical: Since f is proper, there exists x ∈ dom f . Then (x, f (x) − ε) (ε > 0) lies in every vertical half-space containing epi f , hence also in their intersection. On the other hand (x, f (x) − ε) does not lie in epi f , hence not all half-spaces containing epi f can be vertical. The upper half-spaces containing f , in turn, are simply the epigraphs of affine mappings h ≤ f . The function that has the intersection of these epigraphs as its epigraphs is just the pointwise supremum of all these functions. Hence, to prove the theorem, we must show that the intersection of the upper half-spaces containing epi f equals the intersection of all upper and vertical half-spaces containing epi f , i.e. that the first intersection excludes every point that also the latter intersection excludes: To this end, suppose that V := {(x, α) | h1 (x) ≤ 0 } , h1 : x 7→ hb1 , xi − β1 is a vertical half-space containing epi f , and that (x0 , α0 ) ∈ / V . It suffices to show that there exists an upper half-space containing epi f that does not contain (x0 , α0 ), i.e. we need to find an affine mapping h : E × R → R such that h ≤ f and h(x0 ) > α0 . We already know that 89 3 Convex Functions there exists at least on affine function h2 : x 7→ hb2 , xi − β2 such that epi f ⊂ epi h2 , i.e. h2 ≤ f . For every x ∈ dom f we have h1 (x) ≤ 0 and h2 (x) ≤ f (x), and hence λh1 (x) + h2 (x) ≤ f (x) (λ ≥ 0). The above inequality holds trivially also for x ∈ / dom f . Now, fix any λ ≥ 0 and define hλ : E × R → R by hλ (x) := λh1 (x) + h2 (x) = hλb1 + b2 , xi − (λβ1 + β2 ). Then, clearly, hλ is affine with hλ ≤ f . Since h1 (x0 ) > 0, choosing λ̄ > 0 sufficiently large guarantees that hλ̄ (x0 ) > α0 . Then h := hλ̄ has the desired properties, which concludes the proof. Let f : E → R and recall from Section 1.2.2 that the lower semicontinuous hull cl f of f is the largest lower semicontinuous function that minorizes f or, equivalently, cl (epi f ) = epi (cl f ). With the same approach, we can build the convex hull of f . Definition 3.6.2 (Convex hull of a function) Let f : E → R. Then the pointwise supremum of all convex functions minorizing f , i.e. conv f := sup h : E → R | h ≤ f, h convex is called the convex hull of f . Moreover, we define the closed convex hull of f to be conv f := cl (conv f ), i.e. conv f is the largest lsc convex function that minorizes f . Note that, for f : E → R, we have epi (conv f ) = conv (epi f ), (3.24) cf. Exercise 3.12. An analogous statement does not hold for the convex hull, see the discussion in [3]. Corollary 3.6.3 (Envelope representation of closed, convex hull of proper functions) Let f : E → R such that conv f is proper. Then conv f is the pointwise supremum of all affine functions minorizing f . 90 3 Convex Functions Proof: Since conv f is proper, so is conv f , by Exercise 3.4. d). Hence, conv f ∈ Γ0 thus, by Theorem 2.7.1, conv f is the pointwise supremum of all its affine minorants. Moreover, we have conv f ≤ f . On the other, since all affine functions are lsc and convex, there cannot be an affine minorant of f , which is not a minorant of conv f . Hence, the affine minorants of conv f and f coincide, which gives the desired result. Note that the assumption that conv f be proper implies that f and cl f are proper, and is equivalent to demanding that f has an affine minorant, cf. Exercise 3.13. 3.6.2 The conjugate of a function We start with the central definition of this section. Definition 3.6.4 (Conjugate of a function) Let f : E → R. Then its conjugate is the function f ∗ : E → R defined by f ∗ (y) := sup{hx, yi − f (x)}. x∈E The function f ∗∗ := (f ∗ )∗ is called the biconjugate of f . Note that, clearly, we can restrict the supremum in the above definition of the conjugate to the domain of the underlying function f , i.e. f ∗ (y) = sup {hx, yi − f (x)}. x∈dom f Moreover, by definition, we always have f (x) + f ∗ (y) ≥ hx, yi (x, y) ∈ E, (3.25) which is known as the Fenchel-Young inequality. The mapping f 7→ f ∗ from the space of extended real-valued functions to itself is called the Legendre-Fenchel transform. We always have f ≤ g =⇒ f ∗ ≥ g ∗ , i.e. the Legendre-Fenchel transform is order-reversing. Before we start analyzing the conjugate function in-depth, we want to motivate why we would be interested in studying it: Let f : E → R. We notice that epi f ∗ = {(y, β) | hx, yi − f (x) ≤ β (x ∈ E) } . (3.26) This means that the conjugate of f is the function whose epigraph is the set of all (y, β) defining affine functions x 7→ hy, xi − β that minorize f . In view of Corollary 3.6.3, if conv f is proper, 91 3 Convex Functions the pointwise supremum of these affine mappings is the closed convex hull of f , i.e., through its epigraph, f ∗ encodes the family of affine minorants of conv f , i.e. of f itself. Since, f ∗ (y) = sup{hx, yi − f (x)} = sup {hy, xi − α} (y ∈ E), (3.27) x∈E (x,α)∈epi f we also have epi f ∗ = {(y, β) | hx, yi − α ≤ β ((x, α) ∈ epi f ) } We use our recent findings to establish the first major result on conjugates and biconjugates: Theorem 3.6.5 (Fenchel-Moreau Theorem) Let f : E → R such that conv f is proper (hence, so is f ). Then the following hold: a) f ∗ and f ∗∗ are closed, proper and convex ; b) f ∗∗ = conv f ; c) f ∗ = (conv f )∗ = (cl f )∗ = (conv f )∗ . Proof: First note that the assumption that conv f is proper implies that both f and conv f are proper, cf. Exercise 3.13. and Exercise 3.4. a) Applying Proposition 3.1.12 to (3.27), we see that f ∗ is lsc and convex. If f ∗ attained the value −∞, f would be constantly +∞, which is false. On the other hand, f ∗ is not identically +∞, since that would imply that its epigraph, which, as conv f is proper, encodes all minorizing affine mappings of conv f , were empty, which is also false . Hence, f ∗ is proper. Applying the same arguments to f ∗∗ = (f ∗ )∗ gives that f ∗∗ is closed, proper and convex, too. b) Applying (3.27) to f ∗∗ , for x ∈ E, we have f ∗∗ (x) = sup {hy, xi − β}. (y,β)∈epi f ∗ Hence, in view of (3.26), f ∗∗ is the pointwise supremum of all affine minorants of f . Therefore, by Corollary 3.6.3, we see that f ∗∗ = conv f . c) Since the affine minorants of f, conv f, cl f and conv f coincide their conjugates have the same epigraph and hence are equal. Note that due to item b) from Theorem 3.6.5 we always have f ≥ f ∗∗ for a function f : E → R with conv f proper, and it holds that f ∗∗ = f if and only if f is closed and convex. Thus, the 92 3 Convex Functions Legendre-Fenchel transform induces a one-to-one correspondence on Γ0 : For f, g ∈ Γ0 , f is ∗ conjugate to g if and only if g is conjugate to f and we write f ←→ g in this case. This is called the conjugacy correspondence. A property on one side is reflected by a dual property on the other. A list of some elementary cases of conjugacy is given below. ∗ Proposition 3.6.6 (Elementary cases of conjugacy) Let f ←→ g. Then the following hold: ∗ a) f − ha, ·i ∗ b) f + γ c) λf ←→ ←→ ∗ ←→ λg (a ∈ E); g((·) + a) g−γ (γ ∈ R); (·) λ (λ > 0). 3.6.3 Special cases of conjugacy Convex quadratic functions For Q ∈ Sn , b ∈ Rn we consider the quadratic function q : Rn → R defined by 1 q(x) := xT Qx + bT x. 2 (3.28) From Theorem 3.1.18 we know that Q is (strongly) convex if and only if Q is positive (definite) semidefinite. Hence, for the remainder we assume that Q 0. We are interested in computing the conjugate of q. This is easy if Q is postive definite. In the merely semidefinite case the following tool is very useful: Theorem 3.6.7 (Moore-Penrose pseudoinverse) Let A ∈ Sn+ with rank A = r and the spectral decomposition  λ1  .. A = QΛQT with  Λ=  .  ,  λr 0 .. Q ∈ O(n). . 0 Then the matrix  λ−1  1 † † T A := QΛ Q with   Λ :=   † .. .   ,  λ−1 r 0 .. . 0 called the (Moore-Penrose) pseudoinverse of A, has the following properties: 93 3 Convex Functions a) AA† A = A and A† AA† = A† ; b) (AA† )T = AA† and (A† A)T = A† A; c) (A† )T = (AT )† ; d) If A 0, then A† = A−1 ; e) AA† = Pim A , i.e. AA† is the projection onto the image of A. In particular, if b ∈ rge A, we have {x ∈ Rn | Ax = b } = A† b + ker A. In fact, the Moore-Penrose pseudoinverse can be uniquely defined through properties a) and b) from above for any matrix A ∈ Cm×n , see, e.g. [4], but we confine ourselves with the positive semidefinite case. We are now in a position to state the desired conjugacy result for convex quadratics. Proposition 3.6.8 (Conjugate of convex quadratic functions) For q from (3.28) with Q ∈ Sn+ we have 1 T † if y − b ∈ rge Q, ∗ 2 (y − b) Q (y − b) q (y) = +∞ else. In particular, if Q 0, we have 1 q ∗ (y) = (y − b)T Q−1 (y − b) 2 Proof: By definition, we have 1 T 1 T ∗ T T T q (y) = sup x y − x Qx − b x = − inf x Qx − (b − y) x . x∈E 2 2 x∈Rn (3.29) The necessary and sufficient optimality conditions of x̄ to be a minimizer of the convex function x 7→ 21 xT Qx − (b − y)T x read Qx̄ = y − b (3.30) cf. Exercise 3.11. Hence, if y − b ∈ / im Q, from Exercise 1.8., we know that inf f = −∞, hence ∗ q (y) = +∞ in that case. Otherwise, we have y − b ∈ im Q, hence, in view of Theorem 3.6.7, (3.30) is equivalent to x̄ = Q† (y − b) + z, z ∈ ker A. Inserting x̄ = Q† (y − b) (we can choose z = 0) in (3.29) yields 1 q ∗ (y) = (Q† (y − b))T y − (Q† (y − b))T QQ† (y − b) − bT Q† (y − b) 2 94 3 Convex Functions 1 = (y − b)Q† (y − b) − (y − b)Q† QQ† (y − b) 2 1 (y − b)Q† (y − b), = 2 where we make use of Theorem 3.6.7 a) and c). Part d) of the latter result gives the remaining assertion. We point out that, by the foregoing result, the function f = 12 k · k2 is self-conjugated in the sense that f ∗ = f . Exercise 3.14. shows that this is the only function on Rn that has this property. Clearly, by an isomorphy argument, the same holds for the respective function on an arbitrary Euclidean space. Support functions Definition 3.6.9 (Positive homogeneity, subadditivity, and sublinearity) Let f : E → R. Then we call f with 0 ∈ dom f i) positively homogeneous if f (λx) = λf (x) (λ > 0, x ∈ E); b) subadditive if f (x + y) ≤ f (x) + f (y) (x, y ∈ E); c) sublinear if f (λx + µy) ≤ λf (x) + µf (y) (x, y ∈ E, λ, µ > 0). Note that in the definition of positive homogeneity we could have also just demanded an inequality, since f (λx) ≤ λf (x) for all λ > 0 implies that f (x) = f (λ−1 λx) ≤ 1 f (λx). λ We note that norms are sublinear. Example 3.6.10 Every norm k · k is sublinear. We next proivide a usful list of characerizations of positive homogeneneity and sublinearity, respectively. Proposition 3.6.11 (Positive homogeneity, sublinearity and subadditivity) Let f : E → R. Then the following hold: 95 3 Convex Functions a) f is positively homogeneous if and only if epi f is a cone. In this case f (0) ∈ {0, −∞}. b) If f is lsc and positively homogeneous with f (0) = 0 it must be proper. c) The following are equivalent: i) f is sublinear; ii) f is positively homogeneous and convex; iii) f is positively homogeneous and subadditive; iv) epi f is a convex cone. Proof: Exercise 3.16. We continue with the prototype of a sublinear functions, so-called support functions, which will from now on occur ubiquitiously. Definition 3.6.12 (Support functions) Let C ∈ E nonempty. The support function of C is defined by σC : x ∈ E 7→ sup hs, xi . s∈C We start our investigation of support functions with a list of elementary properties. Proposition 3.6.13 (Support functions) Let C ⊂ E be nonempty. Then a) σC = σcl C = σconv C = σconv C . b) σC is proper, lsc and sublinear. ∗ = σ and σ ∗ = δ c) δC C conv C . C ∗ d) If C is closed and convex then σC ←→ δC . Proof: a) Obviously, closures do not make a difference. On the other hand, we have *N +1 + N +1 X X λi si , x = λi hsi , xi ≤ max hsi , xi i=1 i=1 i=1,...,r for all si ∈ C, λ ∈ ∆N +1 , which shows that convex hulls also do not change anything. b) By Proposition 3.1.12 σC is lsc and convex, and as 0 ∈ dom σC and since λσC (x) = σC (λx) for all x ∈ E and λ > 0 this shows properness and positive homogeneity, which gives the assertion in view of Proposition 3.6.11 c). 96 3 Convex Functions ∗ = σ . Hence, σ ∗ = δ ∗∗ = conv δ = δ c) Clearly, δC conv C , since C C C C conv (epi δC ) = conv (C × R+ ) = conv C × R+ = epi (δconv C ). d) Follows immediately from c). One of our main goals in this paragraph is to show that, in fact, part b) of Propostion 3.6.13 can be reversed in the sense that we will see that every proper, lsc and sublinear function is a support function. As a preparation we need the following result. Proposition 3.6.14 Let f : E → R be closed, proper and convex. Then the following are equivalent: i) f only takes the values 0 and +∞; ii f ∗ is positively homogeneous (i.e. sublinear, since convex). Proof: ’i)⇒ii):’ In this case f = δC for some closed convex set C ⊂ E. Hence, f ∗ = σC , which is sublinear, cf. Proposition 3.6.13. In turn, let f ∗ be positively homogeneous (hence sublinear). Then, for λ > 0 and y ∈ E, we have f ∗ (y) = λf ∗ (λ−1 y) = λ sup x, λ−1 y − f (x) x∈E = sup {hx, yi − λf (x)} x∈E = (λf )∗ (y). Thus, (λf )∗ = f ∗ for all λ > 0 and hence, by the Fenchel-Moreau Theorem, we have λf = (λf )∗∗ = f ∗∗ = f (λ > 0). But as f is proper, hence does not takte the value −∞, this immediately implies that f only takes the values +∞ and 0. Theorem 3.6.15 (Hörmander’s Theorem) A function f : E → R is proper, lsc and sublinear if and only if it is a support function. 97 3 Convex Functions Proof: By Proposition 3.6.13 b), every support function is proper, lsc and sublinear. In turn, if f is proper, lsc and sublinear (hence f = f ∗∗ ), by Proposition 3.6.14, f ∗ is the indicator of some set C ⊂ E, which necessary needs to be nonempty, closed and convex, as ∗ =σ . f ∗ ∈ Γ0 . Hence, f ∗∗ = δC C We now want to give a slight refinement of Hörmander’s Theorem, in that we describe the set that a proper, lsc sublinear function supports. Corollary 3.6.16 Let f : E → R be proper and sublinear. Then cl f is the support function of the closed convex set {s ∈ E | hs, xi ≤ f (x) (x ∈ E) } . Proof: Since cl f is proper (cf. Exercise 3.4.) closed and sublinear it is a support function of ∗ and thus f ∗ = (cl f )∗ = δ . Hence, a closed convex set C. Therefore, we have cl f = δC C ∗ ∗ C = {s ∈ E | f (s) ≤ 0 }. But f (s) ≤ 0 if and only hs, xi − f (x) ≤ 0 for all x ∈ E. Gauges, polar sets and dual norms We now present a class of functions that makes a connection between support functions and norms. Definition 3.6.17 (Gauge function) Let C ⊂ E. The gauge (function) of C is defined by γC : x ∈ E 7→ inf {λ ≥ 0 | x ∈ λC } . For a closed convex set that contains the origin, its gauge has very desirable convex-analytical properties. Proposition 3.6.18 Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Then γC is proper, lsc and sublinear. Proof: γC is obviously proper as γC (0) = 0. Moreover, for t > 0 and x ∈ E, we have γC (tx) = inf {λ ≥ 0 | tx ∈ λC } λ = inf λ ≥ 0 x ∈ C t = inf {tµ ≥ 0 | x ∈ µC } = t inf {µ ≥ 0 | x ∈ µC } = tγC (x), 98 3 Convex Functions i.e. γC is positively homogeneous (since also 0 ∈ dom γC ). We now show that it is also subadditive, hence altogether, sublinear: To this end, take x, y ∈ dom γC (otherwise there is nothing to prove). Due to the identity x+y λ x µ y = + λ+µ λ+µλ λ+µµ (λ + µ 6= 0), we realize, by convexity of C, that x + y ∈ (λ + µ)C if x ∈ λC and y ∈ µC for all λ, µ ≥ 0. This implies that γC (x + y) ≤ γC (x) + γC (y). In order to prove lower semicontinuity of γC notice that (by Exercise 3.19. and positive homogeneity) we have lev≤α γC = αC for α > 0, lev≤α γC = ∅ for α < 0 and lev≤0 γC = C ∞ (again by Exercise 3.19.), hence all level sets of γC are closed, i.e. γC is lsc. This concludes the proof. Note that in the proof of Proposition 3.6.18, we do not need the assumption that C contains the origin to prove sublinearity. We do need it, though, to get lower semicontinuity, cf. Exercise 3.19. Since the gauge of a closed convex set that contains 0 is proper, lsc and sublinear we know, in view of Hörmander’s Theorem (see Theorem 3.6.15), that it is the support function of some closed convex set. It can be described beautifully using the concept of polar sets which generalizes the notion of polar cones, cf. Definition 2.4.5. Definition 3.6.19 (Polar sets) Let C ⊂ E. Then its polar set is defined by C ◦ := {v ∈ E | hv, xi ≤ 1 (x ∈ C) } . Moreover, we put C ◦◦ := (C ◦ )◦ and call it the bipolar set of C. Note that there is no ambiguity in notation, since the polar cone and the polar set of a cone coincide, see Exercise 3.18. Moreover, as an intersection of closed half-spaces, C ◦ is a closed, convex set containing 0. In addition, like we would expect, we have C⊂D ⇒ D◦ ⊂ C ◦ , and C ⊂ C ◦◦ . Before we continue to pursue our question for the support function representation of gauges, we provide the famous bipolar theorem which generalizes Exercise 2.23. Its proof is based once more on separation. Theorem 3.6.20 (Bipolar Theorem) Let C ⊂ E. Then C ◦◦ = conv (C ∪ {0}). 99 3 Convex Functions Proof: Since C ∪ {0} ⊂ C ◦◦ and C ◦◦ is closed and convex, we clearly have conv (C ∪ {0}) ⊂ C ◦◦ . Now assume there were x̄ ∈ C ◦◦ \ conv (C ∪ {0}). By strong separation, there exists s ∈ E \ {0} such that hs, x̄i > σconv (C∪{0}) (s) ≥ max{σC (s), 0}. After rescaling s accordingly (cf. Remark 2.6.2) we can assume that hs, x̄i > 1 ≥ σC (s), in particular, s ∈ C ◦ . On the other hand hs, x̄i > 1 and x̄ ∈ C ◦◦ , which is a contradiction. As a consequence of the bipolar theorem we see that every closed convex set C ⊂ E containing 0 satisfies C = C ◦◦ . Hence, the mapping C 7→ C ◦ establishes a one-to-one correspondence on the closed convex sets that contain the origin. This is connected to conjugacy through gauge functions as is highlighted by the next result. Proposition 3.6.21 Let C ⊂ E be closed and convex with 0 ∈ C. Then ∗ γC = σC ◦ ←→ δC ◦ ∗ and γC ◦ = σC ←→ δC . Proof: Since, by Proposition 3.6.18, γC is proper, lsc and sublinear we have γC = σD , D = {v ∈ E | hv, xi ≤ γC (x) (x ∈ E) } in view of Corollary 3.6.16. To prove that γC = σC ◦ , we need to show that D = C ◦ . Since γC (x) ≤ 1 if (and only if; see Exercise 3.19.) x ∈ C, the inclusion D ⊂ C ◦ is clear. In turn, let v ∈ C ◦ , i.e. hv, xi ≤ 1 for all x ∈ C. Now let x ∈ E. By the definition of γC , there exists λk → γC (x) and ck ∈ C such that x = λk ck for all k ∈ N. But then hv, xi = λk hv, ck i ≤ λk → γC (x), hence v ∈ D, which proves γC = σC ◦ . Since C ◦◦ = C, this implies γC ◦ = σC . The conjugacy relations are due to Proposition 3.6.13. Exercise 3.19. tells us that the gauge of a symmetric, compact convex set with nonempty interior is a norm. This justifies the following definition. Definition 3.6.22 (Dual norm) Let k · k∗ be a norm on E wih closed unit ball B∗ . Then we call k · k◦∗ := γB∗◦ its dual norm. Corollary 3.6.23 (Dual norms) For any norm k · k∗ with (closed) unit ball B its dual norm is σB , the support of its unit ball. In particular, we have k · k◦ = k · k, i.e. the Euclidean norm is self-dual. 100 3 Convex Functions 3.6.4 Some dual operations In this section, we give a list of conjugate functions obtained through convexity-preserving operations that we have studied earlier. We start by conjugacy on product sets. Proposition 3.6.24 (Conjugacy on product Pp sets) Let fi : Ei → R (i = 1, . . . , p) put E := i=1 fi (xi ). Then Xpi Ei and define f : (x1 , . . . , xp ) ∈ E 7→ ∗ f : (y1 , . . . , yp ) ∈ E 7→ p X fi∗ (yi ). i=1 Proof: For y = (y1 , . . . , yp ) ∈ E we have ( p ) p p p X X X X f ∗ (y) = sup hxi , yi i − f (xi ) = sup {hxi , yi i − fi (xi )} = fi∗ (yi ). (x1 ,...,xp )∈E i=1 i=1 xi ∈Ei i=1 i=1 One of the convexity-preserving operations that we have studied thoroughly is infimal convolution, which is, as we will now see, paired in duality with simple addition of functions. Proposition 3.6.25 (Conjugacy of inf-convolution) Let f, g : E → R. Then the following hold: a) (f #g)∗ = f ∗ + g ∗ ; b) If f, g ∈ Γ0 such that dom f ∩ dom g 6= ∅ then (f + g)∗ = cl (f ∗ #g ∗ ). Proof: a) By definition, for all y ∈ E, we have n o (f #g)∗ (y) = sup hx, yi − inf {f (u) + g(x − u)} u x = sup {hx, yi − f (u) − g(x − u)} x,u = sup {(hu, yi − f (u)) + (hx − u, yi − g(x − u))} x,u ∗ = f (y) + g ∗ (y). b) From a) and the fact that f, g are closed, proper convex, we have (f ∗ #g ∗ )∗ = f ∗∗ + g ∗∗ = f + g, 101 3 Convex Functions which is proper, as dom f meets dom g, closed and convex. Thus, conv (f ∗ #g ∗ ) = (f ∗ #g ∗ )∗∗ = (f + g)∗ . By Proposition 3.4.3 the convex hull on the left can be omitted, hence cl (f ∗ #g ∗ ) = (f + g)∗ . Note that it can be shown that the closure operation in Proposition 3.6.25 can be omitted under the qualifcation condition ri (dom f ) ∩ ri (dom g) 6= 0. (3.31) This in fact is a prominent theorem which we now state and whose proof we postpone to the Appendix. Theorem 3.6.26 (Attouch-Brézis) Let f, g ∈ Γ0 such that (3.31) holds. Then (f + g)∗ = f ∗ #g ∗ , and the infimal convolution is exact, i.e. the infimum in the infimal convolution is attained. Some very important cases of infimal convolutions that have occured in our study are considered below from a duality perspective. Corollary 3.6.27 (Conjugacy for distance functions and Moreau envelopes) Let f ∈ Γ0 , λ > 0 and C nonempty, closed and convex. Then the following hold: ∗ a) dist C ←→ σC + δB ; ∗ b) eλ f ←→ f ∗ + λ2 k · k2 ; c) eλ f (x) + eλ−1 f ( λx ) = 1 2 2λ kxk (x ∈ E). Proof: ∗ = σ (see Proposition 3.6.13) and k · k∗ = σ ∗ = δ , the a) Since dist C = δC #k · k, δC C B B assertion follows from Proposition 3.6.25 and Theorem 3.6.26. 1 k · k2 ) the assertion follows from Proposition 3.6.25 and Theorem b) Since eλ f = f #( 2λ 3.6.26 also using Proposition 3.6.6 c). c) From b) we have for all x ∈ E that λ ∗ 2 eλ f (x) = sup hx, yi − f (y) − kyk 2 y 1 λ 1 2 2 ∗ = kxk − inf f (y) − ky − xk y 2λ 2 λ x 1 = kxk2 − eλ−1 f ∗ . 2λ λ 102 3 Convex Functions We continue with a conjugacy correspondence between pointwise infima and suprema. Proposition 3.6.28 (Pointwise inf/sup) Let fi : E → R (i ∈ I). Then the following hold: a) (inf i∈I fi )∗ = supi∈I fi∗ . b) (supi∈I fi )∗ = conv (inf i∈I fi∗ ) for fi ∈ Γ0 (i ∈ I) and supi∈I fi proper. Proof: a) For y ∈ E we have ∗ inf fi (y) = sup hx, yi − inf fi (x) = sup sup {hx, yi − fi (x)} = sup fi∗ (y). i∈I i∈I x∈E i∈I x∈E i∈I b) Since fi = fi∗∗ (i ∈ I), from a) we infer that (inf i∈I fi∗ )∗ = supi∈I fi . Since the latter is lsc and convex (Proposition 3.1.12) and proper (by assumption), hence its convex hull is proper, then so is its conjugate, and thus, we have conv (inf fi∗ ) = (inf fi∗ )∗∗ = (sup fi )∗ . i∈I i∈I i∈I We proceed with a duality correspondence for parametric minimization. Proposition 3.6.29 (Parametric minimization) Let f : E1 × E2 → R. Then the following hold: a) For p := inf x∈E1 f (x, ·) we have p∗ = f ∗ (0, ·). b) For f ∈ Γ0 , ū ∈ E2 such that ϕ := f (·, ū) is proper and q := inf y∈E {f ∗ (·, y) − hy, ūi}, we have ϕ∗ = cl q. Proof: a) For u ∈ E2 we compute n o p∗ (u) = sup hy, ui − inf f (x, y) = sup {h(x, y), ((0, u)i − f (x, y)} = f ∗ (0, u). y x x,y 103 3 Convex Functions b) For z ∈ E1 we have q (z) = sup hv, zi − inf {f (v, y) − hy, ūi} ∗ ∗ y v = sup {h(v, y), (z, ū)i − f ∗ (v, y)} v,y ∗∗ = f (z, ū) = f (z, ū) = ϕ(z). Here, the fourth equality is due to the fact that f ∈ Γ0 . Noticing that q is convex by Theorem 3.2.12, we obtain cl q = conv q = q ∗∗ = ϕ∗ . A sufficient condition such that the closure in Propositon 3.6.29 b) can be omitted is that ū lies in the interior of U := {u ∈ E2 | ∃x ∈ E1 : f (x, u) < 0 }, see, e.g., [7, Theorem 11.23 (c)] We close this brief with a result on epi-composition, cf. Proposition 3.1.15. Proposition 3.6.30 Let f : E → R be proper and L ∈ L(E, E0 ) and T ∈ L(E0 , E). Then the following hold: a) (Lf )∗ = f ∗ ◦ L∗ . b) (f ◦ T )∗ = cl (T ∗ f ∗ ) if f ∈ Γ. Proof: a) For y ∈ E0 we have (Lf )∗ (y) = sup hz, yi − f (x) x: L(x)=z z∈E0 = inf sup {hz, yi − f (x)} z∈E0 , x∈L−1 ({z}) ∗ = sup {hx, L (y)i − f (x)} x∈E = f ∗ (L∗ (y)). b) Follows from a) and the Fenchel-Moreau Theorem. 104 3 Convex Functions 3.7 Fenchel-Rockafellar duality In this section, we associate a very general (convex) minimization problem (the primal program) with a concave maximization problem (the dual probolem) that is built on conjugates of the functions occuring in the original problem. Here the following notation is useful: For a function h : E → R we define the function h∨ : x ∈ E 7→ h(−x) ∈ R. We start our study with a basic duality result that goes back to Werner Fenchel, the founding father of convex analysis. Theorem 3.7.1 (Fenchel Duality Theorem) Let f, g ∈ Γ0 such that ri (dom f ) ∩ ri (dom g) 6= ∅. Then inf (f + g) = max −(f ∗ + g ∗∨ ). Proof: It is easily seen that inf f + g = −(f + g)∗ (0). Using Theorem 3.6.26 (Attouch-Brézis), we infer that inf f + g = −(f ∗ #g ∗ )(0) = − min f ∗ + g ∗∨ , which proves the statement. An interesting special case of the foregoing theorem is the following. Corollary 3.7.2 Let f ∈ Γ0 and K ⊂ E be a closed, convex cone such that ri (dom f ) ∩ ri K 6= ∅. Then inf f = max◦ −f ∗ . −K K Proof: Define g = δK . Then by Exercise 3.20. we have g ∗ = δK ◦ and hence by Theorem 3.7.1 we have inf f = inf(f + g) = max −(f ∗ + g ∗∨ ) = max◦ −f ∗ . −K K We now want to add a little more structure to the problem by introducing a linear operator. Definition 3.7.3 (Fenchel-Rockafellar duality) Let f : E1 → R ∪ {+∞} and g : E2 → R ∪ {+∞} be proper and L ∈ L(E1 , E2 ). We call inf (f + g ◦ L) E1 the primal problem and sup −(f ∗ ◦ L∗ + g ∗∨ ). E2 105 (3.32) 3 Convex Functions the dual problem. We call ∆(f, g, L) = inf (f + g ◦ L) − sup −(f ∗ ◦ L∗ + g ∗∨ ) E1 E2 the duality gap between primal and dual problem. It is very easy to see that the duality gap in the sense of Definition 3.7.3 is always nonnegative, i.e. the dual optimal value is always a lower bound for the primal optimal value and vice versa. Proposition 3.7.4 (Weak duality) Let f : E1 → R ∪ {+∞} and g : E2 → R ∪ {+∞} be proper and L ∈ L(E1 , E2 ). Then we have inf (f + g ◦ L) ≥ sup −(f ∗ ◦ L∗ + g ∗∨ ), E1 E2 i.e. ∆(f, g, L) ≥ 0. Proof: Let x ∈ E1 and y ∈ E2 . Then by the Fenchel-Young inequality we have f (x) + g(L(x)) ≥ −f ∗ (L∗ (y)) + hx, L∗ (y)i − g ∗ (−y) + h−y, L(x)i = −(f ∗ ◦ L∗ + g ∗∨ )(y). This already proves the statement. The weak duality theorem tells us that the duality gap ∆(f, g, L) is always nonnegative. We now want to investigate under which assumptions it is, in fact, zero. The following result is known as the Fenchel-Rockafellar duality theorem. In its proof, for f : E1 → R and g : E2 → R, we use the convenient notation f ⊕ g : (x, y) ∈ E1 × E2 7→ f (x) + g(y) and call f ⊕ g the separable sum of f and g. Theorem 3.7.5 (Strong duality) Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that 0 ∈ ri (dom g − L(dom f )). Then ∆(f, g, L) = 0. Proof: Define C := dom f × dom g − gph L ⊂ E1 × E2 and D := dom g − L(dom f ) ∈ E2 . Using the calculus rules for affine hulls, see Corollary 1.4.17 and Exercise 1.15., and the fact that gph L is a subspace, we see that aff C = aff (dom f ) × aff (dom g) − gph L and aff D = aff (dom g) − L(aff (dom f )). 106 3 Convex Functions Now, let (x, y) ∈ aff C, i.e. there exist r ∈ aff (dom f ), s ∈ aff (dom g) and u ∈ E1 such that (x, y) = (r, s) − (u, L(u)). Hence, y − L(x) = s − L(u) − L(r − u) = s − L(r) ∈ aff D. Since by assumption 0 ∈ ri D, there exists t > 0 such that t(y − L(x)) ∈ D. Hence, there exist a ∈ dom f and b ∈ dom g such that y − L(x) = 1t (b − L(a)). Putting z := a − tx, we have b−L(z) x = a−z , thus t and y = t (x, y) = 1 [(a, b) − (z, L(z))] ∈ R+ C. t Since (x, y) ∈ aff C were chosen arbitrarily, we thus have proven R+ C = aff C. Since by assumption 0 ∈ D, we have 0 ∈ C, hence aff C = span C, i.e. R+ C = span C. By Exercise 2.5., we have 0 ∈ ri C. Defining ϕ := f ⊕ g, and V := gph L we hence have ri (dom ϕ) ∩ V = ri (dom f ) × ri (dom g) ∩ gph L 6= ∅. Hence, using Corollary 3.7.2, we obtain inf ϕ = max −ϕ∗ . V⊥ V By Proposition 3.6.24 we have ϕ∗ = f ∗ ⊕ g ∗ . Moreover, we easily compute that V ⊥ = {(u, v) ∈ E1 × E2 | u = −L∗ (v) } . Therefore, we obtain inf (f + g ◦ L) = inf ϕ V E1 = max −ϕ∗ V⊥ = = = max −f ∗ (u) − g ∗ (v) (u,v): u=−L∗ v max −f ∗ (L∗ (w)) − g ∗ (−w) w∈E2 max −(f ∗ ◦ L∗ + g ∗∨ ). E2 This proofs the assertion. Corollary 3.7.6 Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that 0 ∈ ri (dom g − L(dom f )). Then (f + g ◦ L)∗ (u) = min {f ∗ (u − L∗ (v)) + g ∗ (v)} . v∈E2 107 3 Convex Functions Proof: For u ∈ E1 we have (f + g ◦ L)∗ (u) = sup {hx, ui − f (x) − g(L(x))} x∈E1 = − inf {f (x) − hx, ui + g(L(x))} x∈E1 = min {f ∗ (L∗ (v) + u) + g ∗ (−v)} . v∈E2 As one of the many applications of Fenchel-Rockafellar duality we would like to study linear programs in this regard. Example 3.7.7 (Linear Programming duality) Let A ∈ Rm×n , c ∈ Rn and b ∈ Rm . The standard linear program reads inf cT x Ax ≥ b. s.t. (3.33) Using the functions f : x ∈ Rn 7→ cT x and g : y ∈ Rm 7→ δRm (y − b) + we can write (3.33) as inf {f (x) + g(Ax)}. x∈Rn Its dual program, in the sense of Definition 3.7.3, reads sup −f ∗ (AT y) − g ∗ (−y) ⇐⇒ y∈Rm sup δ{c} (AT y) − δRm (−y) − bT (−y) − y∈Rm ⇐⇒ sup y≥0, bT y. AT y=c 3.8 The convex subdifferential In this section we would like to present a generalized notion of differentiability for (usually nondifferentiable) convex functions. The idea of so-called subdifferentiabilty is based on affine minorization properties of convex functions and deeply connected to conjugation. 3.8.1 Definition and basic properties For f ∈ Γ, Theorem 3.3.2 tells us that at each point x̄ ∈ ri (dom f ) there exists g ∈ E such that f (x) ≥ f (x̄) + hg, x − x̄i (x ∈ E). We take this as a motivation for the following central concept. 108 (3.34) 3 Convex Functions Definition 3.8.1 (Subdifferential of a convex function) Let f : E → R be convex and x̄ ∈ E. Then g ∈ E is called a subgradient of f at x̄ if the subgradient inequality (3.34) holds at x̄. The set ∂f (x̄) := {v ∈ E | f (x) ≥ f (x̄) + hv, x − x̄i (x ∈ E) } of all subgradients is called the subdifferential of f at x̄. We denote the domain of the set valuedmapping ∂f : E ⇒ E by dom ∂f := {x ∈ E | ∂f (x) 6= ∅ } . Notice that, clearly, in the subgradient inequaltiy (3.34), we can restrict ourselves to points x ∈ dom f , since the inequality holds trivially outside of dom f . We start our study of the subdifferential with some elementary properties. Proposition 3.8.2 (Elementary properties of the subdifferential) Let f : E → R be convex and x̄ ∈ dom f . Then the following holds: a) ∂f (x̄) is closed and convex for all x̄ ∈ dom f . b) If f is proper then ∂f (x) = ∅ for x ∈ / dom f . c) If f is proper and x̄ ∈ ri (dom f ) then ∂f (x̄) is nonempty. d) We have 0 ∈ ∂f (x̄) if and only if x̄ minimizes f (over E). (Generalized Fermat’s rule) e) ∂f (x̄) = {v ∈ E | (v, −1) ∈ Nepi f (x̄, f (x̄)) }. Proof: a) We have ∂f (x̄) = \ {v | hx − x̄, vi ≤ f (x̄) − f (x) } , x∈E and intersection preserves closedness and convexity. b) Obvious. c) Follows immediately from Theorem 3.3.2. d) By definition we have 0 ∈ ∂f (x̄) ⇐⇒ f (x) ≥ f (x̄) (x ∈ E). e) Notice that v ∈ ∂f (x̄) ⇐⇒ f (x) ≥ f (x̄) + hv, x − x̄i ⇐⇒ α ≥ f (x̄) + hv, x − x̄i (x ∈ dom f ) ((x, α) ∈ epi f )) ⇐⇒ 0 ≥ h(v, −1), (x − x̄, α − f (x̄))i ⇐⇒ (v, −1) ∈ Nepi f (x̄, f (x̄)). 109 ((x, α) ∈ epi f )) 3 Convex Functions Part b) and c) of the above Proposition imply that ri (dom f ) ⊂ dom ∂f ⊂ dom f (f ∈ Γ). The subdifferential of a convex function might well be empty, contain only a single point, be bounded (hence compact) or unbounded as the following examples illustrate. Example 3.8.3 a) (Indicator function) Let C ⊂ E be convex and x̄ ∈ C. Then g ∈ ∂δC (x̄) ⇐⇒ δC (x) ≥ δC (x̄) + hg, x − x̄i ⇐⇒ 0 ≥ hg, x − x̄i (x ∈ E) (x ∈ C), i.e. ∂δC (x̄) = NC (x̄). b) (Euclidean norm) We have ( ∂k · k(x̄) = x̄ kx̄k if x̄ 6= 0, B if x̄ = 0 as can be verified by elementary considerations. c) (Empty subdifferential) Consider −(1 − |x|2 )1/2 if |x| ≤ 1, f : x 7→ +∞ else. Then ∂f (x) = ∅ for |x| ≥ 1. There is a tight connection of subdifferentiation and conjugation of convex functions. Theorem 3.8.4 (Subdifferential and conjugate function) Let f ∈ Γ0 . Then the following are equivalent: i) y ∈ ∂f (x); ii) x ∈ argmaxz {hz, yi − f (z)}; iii) f (x) + f ∗ (y) = hx, yi; iv) x ∈ ∂f ∗ (y); 110 3 Convex Functions v) y ∈ argmaxw {hx, wi − f ∗ (w)}. Proof: Notice that y ∈ ∂f (x) ⇐⇒ f (z) ≥ f (x) + hy, z − xi (z ∈ E) ⇐⇒ hy, xi − f (x) ≥ sup{hy, zi − f (z)} z ∗ ⇐⇒ f (x) + f (y) ≤ hx, yi ⇐⇒ f (x) + f ∗ (y) = hx, yi , where the last equality makes use of the Fenchel-Young inequality (3.25). This establishes the equivalences between i), ii) and iii). Applying the same reasoning to f ∗ and noticing that f ∗∗ = f gives the missing equivalences. One consequence of Theorem 3.8.4 is that the set-valued mappings ∂f and ∂f ∗ are inverse to each other. We notice some other interesting implications of the latter theorem. Corollary 3.8.5 Let C ⊂ E. Then the following hold: a) For x ∈ dom σC , we have ∂σC (x) = argmaxC h·, xi. b) If C is a closed, convex cone the following are equivalent: i) y ∈ ∂δC (x); ii) x ∈ ∂δC ◦ (y); iii) x ∈ C, y ∈ C◦ and hx, yi = 0. As another consequence we obtain the very desirable property that the subdifferential operator of a closed, proper and convex functions f has a closed graph gph ∂f := {(x, y) ∈ E × E | y ∈ ∂f (x) } , which is also referred to as outer semicontinuity of ∂f . Corollary 3.8.6 (Outer semicontinuity of ∂f ) Let f ∈ Γ0 and suppose {xk } → x and {yk ∈ ∂f (xk )} → y. Then y ∈ ∂f (x), i.e. gph ∂f ∈ E × E is closed. Proof: By Theorem 3.8.4 we have f (xk ) + f ∗ (yk ) = hxk , yk i (k ∈ N). Using that f and f ∗ are lsc we obtain f (x) + f ∗ (y) ≤ hx, yi , 111 3 Convex Functions which together with the Fenchel-Young inequality gives f (x) + f ∗ (y) = hx, yi . But then, again, Theorem 3.8.4 implies that y ∈ ∂f (x). We close out the section with some useful boundedness properties of the subdifferential operator. Theorem 3.8.7 (Boundedness properties ∂f ) Let f ∈ Γ0 and X ⊂ int (dom f ) nonempty, open and convex. Then the following hold: a) f is Lipschitz continuous with modulus L > 0 on X if and only if kvk ≤ L for all x ∈ X and v ∈ ∂f (x). b) ∂f maps bounded sets which are compactly contained in int (dom f ) to bounded sets. Proof: a) First, assume that f is Lipschitz on X with modulus L > 0. Now take x ∈ X and v ∈ ∂f (x). By the subgradient inequality we have f (y) ≥ f (x) + hv, y − xi (y ∈ E). (3.35) Since X is open, there exists r > 0 such that B r (x) ⊂ X. Inserting the vector y =x+ r v ∈ B r (x). kvk in (3.35) yields r f x+ v ≥ f (x) + rkvk. kvk Rearranging these terms gives 1 r kvk ≤ f x + v − f (x) ≤ L, r kvk which shows the first implication in a). Conversely, assume that kvk ≤ L for all x ∈ X and v ∈ ∂f (x). For x, y ∈ X and v ∈ ∂f (x) we hence have f (x) − f (y) ≤ hv, x − yi ≤ kvk · kx − yk ≤ Lkx − yk, where we use the subgradient inequality and Cauchy-Schwarz. Interchanging the roles of x and y yields f (y) − f (x) ≤ Lkx − yk, 112 3 Convex Functions which all in all gives |f (x) − f (y)| ≤ Lkx − yk. Therefore, f is Lipschitz continuous on X with modulus L > 0. b) Let K be compactly contained in int (dom f ). Hence, we can assume w.l.o.g. that K is compact. Now, suppose there were a sequences {xk ∈ K} and {vk ∈ ∂f (xk )} such that kvk k → ∞. Since K is compact, we can assume w.l.o.g. that xk → x ∈ K. Now take r > 0 such that B r (x) ∈ X. By Theorem 3.5.6, f is Lipschitz on B r (x) with modulus, say, L > 0. In view of part a) we infer that kvk ≤ L for all v ∈ ∂f (y) and y ∈ B r (x). Since xk ∈ B r (x) for all k sufficiently large, we thus have kvk k ≤ L for these k, which contradicts the assumption that {vk } were unbounded. 3.8.2 Connection to the directional derivative The subdifferential of a convex function is intimately tied to its directional derivative, which we define now. Definition 3.8.8 (Directional derivative) Let f : E → R be proper. For x ∈ dom f we say that f is directionally differentiable at x̄ in the direction d ∈ E if lim t↓0 f (x + td) − f (x) t exists (in an extended real-valued sense). In this case we call f 0 (x; d) := lim t↓0 f (x + td) − f (x) t the directional derivative of f at x in the direction of d. Proposition 3.8.9 (Directional derivative of a convex function) Let f ∈ Γ, x ∈ dom f and d ∈ E. Then the following hold: a) The difference quotient t > 0 7→ q(t) := f (x + td) − f (x) t is nondecreasing. b) f 0 (x; d) exists (in R) with f 0 (x; d) = inf q(t), t>0 113 3 Convex Functions c) f 0 (x; ·) is sublinear with dom f 0 (x; ·) = R+ (dom f − x). d) f 0 (x; ·) is proper and lsc for x ∈ ri (dom f ). Proof: a) Fix 0 < s < t and put λ := st ∈ (0, 1) and z := x + td. If f (z) = +∞, then q(s) ≤ q(t) = f (z) = +∞. Otherwise, by convexity of f , we have f (x + sd) = f (λz + (1 − λ)x) ≤ λf (z) + (1 − λ)f (x) = f (x) + λ(f (z) − f (x)), hence, q(s) ≤ q(t) also in this case. b) The infimum representation follows from a) since q(t) decreases as t ↓ 0. This also gives the existence statement, since an infimum always exists in the extended real-valued sense. c) First notice that 0 ∈ dom f as f 0 (x; 0) = 0 and that f 0 (x; αd) = αf 0 (x; d) for all α > 0 and d ∈ E, i.e. f is positively homogeneous. We now show that f 0 (x; ·) is also convex, which then proves sublinearity: To this end, let (d, α), (h, β) ∈ epi < f 0 (x; ·). Then f (x + td) − f (x) f (x + th) − f (x) < α and <β t t for all t > 0 sufficienty small. For such t > 0, by convexity of f , we compute f (x + t(λd + (1 − λ)h)) − f (x) = f (λ(x + td) + (1 − λ)(x + th)) − f (x) ≤ λ(f (x + td) − f (x)) + (1 − λ)(f (x + th) − f (x)) for all λ ∈ (0, 1). This implies f (x + t(λd + (1 − λ)h)) − f (x) t (x + th) − f (x) f (x + td) − f (x) + (1 − λ) ≤ λ t t for all t > 0 sufficiently small and λ ∈ (0, 1). Letting t ↓ 0 gives f 0 (x; λd + (1 − λ)h) ≤ λf 0 (x; d) + (1 − λ)f 0 (x; h) < λα + (1 − λ)β (λ ∈ (0, 1)), which shows convexity of epi < f 0 (x ·) and thus of f 0 (x; ·). Hence, as f 0 (x; ·) was proven to be positively homogeneous as well, it is sublinear, cf. Proposition 3.6.11. The fact that dom f 0 (x; ·) = R+ (dom f − x) follows from b): f (x + td) − f (x) < +∞ t ⇐⇒ ∃t > 0 : f (x + td) − f (x) < +∞ d ∈ dom f 0 (x; ·) ⇐⇒ ∃t > 0 : ⇐⇒ ∃t > 0 : x + td ∈ dom f ⇐⇒ d ∈ R+ (dom f − x). 114 3 Convex Functions d) From c) we know that f 0 (x; ·) is, in particular, convex with dom f 0 (x; ·) = R+ (dom f − x) which is a subspace by Exercise 2.5. Since f 0 (x; 0) = 0, by Exercise 3.4. we now see that f 0 (x, ·) must be proper. Moreover, by Proposition 3.5.2 it follows that it agrees with its closure everywhere since its domain has no relative boundary. Thus, f 0 (x; ·) is lsc. We now establish the connection between the subdifferential and the directional derivative of a proper convex functions. The first result in this regard characterizes subgradients using directionbal derivatives and shows that the latter is even proper on the domain of the subdifferential operator. Proposition 3.8.10 Let f ∈ Γ and x ∈ dom ∂f . Then we have: a) The following are equivalent. i) v ∈ ∂f (x); ii) f 0 (x; d) ≥ hv, di (d ∈ E). b) f 0 (x; ·) is proper and sublinear. Proof: a) We realize that the subgradient inequality for v ∈ E is equivalent to f (x + λd) − f (x) ≥ hd, vi λ (λ > 0, d ∈ E). (3.36) As the left-hand side decreases to f 0 (x; d) as λ ↓ 0, this is equivalent to ii). This shows the equivalence of i) and ii). b) Take v ∈ ∂f (x). Then a) yields f 0 (x; ·) ≥ h·, vi and therefore, f 0 (x; ·) does not take the value −∞. Hence, in view of Proposition 3.8.9 c) and the fact that f 0 (x; 0) = 0, f 0 (x; ·) is proper and sublinear. We continue with our main result of this section. Theorem 3.8.11 (Directional derivative and subdifferential) Let f ∈ Γ and x ∈ dom ∂f . Then cl (f 0 (x, ·)) = σ∂f (x) , i.e. the lower semicontinuous hull of f 0 (x; ·) is the support function of ∂f (x). 115 3 Convex Functions Proof: Due to Proposition 3.8.9 we know that f 0 (x; ·) is proper and sublinear. By Corollary 3.6.16, we thus see that cl (f 0 (x; ·)) = σC for C = {v | hv, di ≤ f 0 (x; d) (d ∈ E) }. But in view of Proposition 3.8.9 C = ∂f (x), which concludes the proof. Theorem 3.8.11 has a list of very important consequences. Corollary 3.8.12 Let f ∈ Γ and x ∈ ri (dom f ). Then f 0 (x; ·) = σ∂f (x) . Proof: Follows from Theorem 3.8.11 and Proposition 3.8.9 d). Corollary 3.8.13 Let f ∈ Γ and x ∈ dom f . Then ∂f (x) is nonempty and bounded if and only if x ∈ int (dom f ). Proof: If x ∈ int (dom f ), we know from Corollary 3.8.12 that f 0 (x; ·) = σ∂f (x) > −∞. From Proposition 3.8.9 we know that dom f 0 (x; ·) = R+ (dom f − x). Since x ∈ int (dom f ), we have R+ (dom f − x) = E, hence f 0 (x; ·) is finite, i.e. σ∂f (x) is finite, therefore (see Exercise 3.17.), ∂f (x) is bounded (and nonempty). In turn, if ∂f (x) is bounded and nonempty then, by Theorem 3.8.11 and Exercise 3.17., cl (f 0 (x; ·)) = σ∂f (x) is finite. Hence, f 0 (x; ·) must be finite, thus R+ (dom f −x) = dom f 0 (x; ·) = E. Using Exercise 2.5., this implies that 0 ∈ int (dom f − x), i.e. x ∈ int (dom f ). Corollary 3.8.14 (Max formula) Let f ∈ Γ and x ∈ int (dom f ). Then f 0 (x; ·) = max hv, ·i . v∈∂f (x) 3.8.3 Subgradients of differentiable functions In this section we want to study the subdifferential of convex functions at points of differentiablilty. We will ultimately proof that a convex function is differentiable, in fact continuously differentiable, at a point in the interior of its domain if and only if its subdifferentiable is a singleton, which then consists of the gradient only. Moreover, we will show that a differentiable convex functions is in fact continuously differentiable. Theorem 3.8.15 Let f ∈ Γ and x ∈ int (dom f ). Then ∂f (x) is a singleton if and only if f is differentiable at x. In this case we have ∂f (x) = {∇f (x)}. 116 3 Convex Functions Proof: If f is differentiable at x then f 0 (x; ·) = h∇f (x), ·i. Thus, by Proposition 3.8.10 a) the elements v ∈ ∂f (x) are characterized through h∇f (x), di ≥ hv, di (d ∈ E), which implies that v = ∇f (x), i.e. ∂f (x) = {∇f (x)}. This proves the first implication. Conversely, assume that ∂f (x) = {v}. We have to show that f (x + d) − f (x) − hv, di = 0. d→0 kdk lim (3.37) To this end, take {dk } → 0 arbitrarily and put tk := kdk k and pk := dk dk = kdk k tk (k ∈ N). Then there exists K ⊂ N and p ∈ E \ {0} such that pk →K p. Then we compute that f (x + dk ) − f (x) − hv, dk i kdk k = = f (x + tk pk ) − f (x) − tk hv, pk i tk f (x + tk p) − f (x) f (x + tk pk ) − f (x + tk p) + − hv, pk i tk tk As we pass to the limit on K, the first summand tends to f 0 (x; p), cf. Theorem 3.8.9. The second one goes to 0, since f is Lipschitz around x ∈ int (dom f ), cf. Theorem 3.5.6. Thus, we have lim k∈K f (x + dk ) − f (x) − hv, dk i kdk k = f 0 (x; p) − hv, pi = max hw, pi − hv, pi w∈∂f (x) = 0, where the second equality uses Corollary 3.8.14 and the last one exploits the fact that ∂f (x) = {v}. Since p was an arbitrary accumulation point of the bounded sequence {pk }, we have that f (x + dk ) − f (x) − hv, dk i = 0. k∈N kdk k lim Since {dk } → 0 was chosen arbitrarily this gives (3.37) and hence concludes the proof. Theorem 3.8.16 Let f ∈ Γ and x ∈ int (dom f ). Then f is continuously differentiable on int (dom f ) if and only if ∂f (x) is a singleton for all x ∈ int (dom f ). 117 3 Convex Functions Proof: If f is continuously differentiable on int (dom f ), Theorem 3.8.15 immediately implies that ∂f (x) = {∇f (x)} for all x ∈ int (dom f ). Conversely, let ∂f (x) be a singleton for all x ∈ int (dom f ). By Theorem 3.8.15, f is differentiable at every point x ∈ int (dom f ). Now, fix x ∈ int (dom f ) and take {xk ∈ int (dom f )} → x. Then we have ∇f (xk ) ∈ ∂f (xk ) for all k ∈ N. (In fact, we have ∂f (xk ) = {∇f (xk )}, but that is unimportant to our reasoning.) Now choose r > 0 such that B r (x) ⊂ int (dom f ). Since xk ∈ B r (x) for all k sufficiently large, we also have ∇f (xk ) ∈ ∂f (B r (x)), which is bounded due to Theorem 3.8.7 b). Hence, {∇f (xk )} has an accumulation point g ∈ E which, by Corollary 3.8.6 lies in ∂f (x) = {∇f (x)}. Hence, ∇f (xk ) → g = ∇f (x) on the respective subsequence. Since this holds true for any accumulation point, we acutally have ∇f (xk ) → ∇f (x) on the whole sequence. As xk → x was chosen arbitrarily, this proves the statement. Corollary 3.8.17 (Differentiability of finite convex functions) Let f : E → R convex. Then f is differentiable if and only if f is continuously differentiable. 3.8.4 Subdifferential calculus In this section we want to compute the subdifferential for various convex functions that come out of convexity-preserving operations. We start with the subdifferential of the separable sum of convex functions. Proposition 3.8.18 (Subdifferential of separable sum) Let fi ∈ Γ(Ei ) (i = 1, 2). Then ∂(f1 ⊕ f2 ) = ∂f1 × ∂f2 . Proof: For (x1 , x2 ) ∈ E1 × E2 we have (v1 , v2 ) ∈ ∂f1 (x) × ∂f2 (y) ⇔ fi (yi ) ≥ fi (xi ) + hvi , yi − xi i (yi ∈ Ei , i = 1, 2) ⇔ f1 (y1 ) + f2 (y2 ) ≥ f1 (x1 ) + f2 (x2 ) + hv1 , y1 − x1 i + hv2 , y2 − x2 i ⇔ (f1 ⊕ f2 )(y1 , y2 ) ≥ (f1 ⊕ f2 )(x) + h(v1 , v2 ), (y1 , y2 ) − (x1 , x2 )i ((y1 , y2 ) ∈ E1 × E2 ) ((y1 , y2 ) ∈ E1 × E2 ) ⇔ (v1 , v2 ) ∈ ∂(f1 ⊕ f2 )(x1 , x2 ). Here, the ’⇐’-implication in the second equivalence follows from setting xj = yj for j = 6 i, i = 1, 2. Note that the above result, by induction, extends to artbitrary finite separable sums of convex functions, and without any more effort, can be proven for a separable some over much more general than only finite index sets, cf. [1, Proposition 16.8], but we only need that case of two functions in our study. 118 3 Convex Functions We continue with a subdifferential rule fo epi-compositions, cf. Proposition 3.1.15 and Proposition 3.6.30. Proposition 3.8.19 (Subdifferential of epi-composition) Let f ∈ Γ0 (E) and L ∈ L(E, E0 ). Then for y ∈ dom Lf and x ∈ L−1 ({y}) the following hold: a) If (Lf )(y) = f (x) then ∂(Lf )(y) = (L∗ )−1 (∂f (x)). b) If (L∗ )−1 (∂f (x)) 6= ∅ then (Lf )(y) = f (x). Proof: Let v ∈ E0 . From Proposition 3.6.30 a) and Theorem 3.8.4 we infer that f (x) + (Lf )∗ (v) = hy, vi ⇔ ⇔ ⇔ ⇔ f (x) + f ∗ (L∗ (v)) = hL(x), vi f (x) + f ∗ (L∗ (v)) = hx, L∗ (v)i L∗ (v) ∈ ∂f (x) v ∈ (L∗ )−1 (∂f (x)). (3.38) a) Theorem 3.8.4 and Proposition 3.6.30 a) imply that v ∈ ∂(Lf )(y) ⇔ (Lf )(y) + (Lf )∗ (v) = hy, vi ⇔ f (x) + f ∗ (L∗ v) = hL(x), vi . (3.39) Combining (3.38) and (3.39) gives a). b) Suppose v ∈ (L∗ )−1 (∂f (x)). Then the Fenchel-Young inequality, the fact that L(x) = y and (3.38) imply that hy, vi ≤ (Lf )(y) + (Lf )∗ (v) ≤ f (x) + (Lf )∗ (v) = hy, vi , hence (Lf )(y) = f (x). We exploit Proposition 3.8.19 to obtain an important subdifferential result for infimal convolutions. Theorem 3.8.20 (Subdifferentiation of infimal convolutions) Let f, g ∈ Γ0 as well as x ∈ dom (f #g)(= dom f + dom g). Then the following hold: a) We have ∂(f #g)(x) = ∂f (y) ∩ ∂g(x − y) (y ∈ argmin {f (u) + g(x − u)}). u∈E b) If ∂f (y) ∩ ∂g(x − y) 6= ∅ for some y ∈ E then (f #g)(x) = f (y) + g(x − y), i.e. y ∈ argminu∈E {f (u) + g(x − u)}. 119 3 Convex Functions Proof: Consider the linear mapping L : (a, b) ∈ E × E 7→ a + b ∈ E. Then L∗ : z ∈ E 7→ (z, z) ∈ E × E. By definition of the respective operations we have f #g = L(f ⊕ g), in particular, dom L(f ⊕ g) = dom f #g. Thus, L(y, x − y) = x ∈ dom L(f ⊕ g). a) Let y ∈ argminu∈E {f (u) + g(x − u)}. Since (L(f ⊕ g))(x) = (f ⊕ g)(y, x − y), Proposition 3.8.19 a) and Proposition 3.8.18 imply that ∂(f #g)(x) = ∂(L(f ⊕ g))(x) = (L∗ )−1 (∂(f ⊕ g)(y, x − y)) = (L∗ )−1 (∂f (x) × ∂g(x − y)) = ∂f (x) ∩ ∂g(x − y). b) By assumption we have ∅= 6 ∂f (x) ∩ ∂g(x − y) = (L∗ )−1 (∂f (x) × ∂g(x − y)) = (L∗ )−1 (∂(f ⊕ g)(x, x − y)). Thus, Proposition 3.8.19 b) implies (f #g)(x) = (L(f ⊕ g))(x) = (f ⊕ g)(x, x − y) = f (x) + g(x − y). As a first application we obtain the subdifferential of the (Euclidean) distance function. Example 3.8.21 (Subdifferential of Euclidean distance) Let C ⊂ E be nonempty, closed and convex. Then o  n x−P (x)  if x∈ / C,  dist CC(x) ∂dist C (x) = NC (x) ∩ B if x ∈ bd C,   {0} else. This can be seen using the Examples 3.4.4, 3.4.11 and 3.8.3 a) and Theorem 3.8.20. Our next goal is to establish a subdifferential for the sum of convex functions as well as the composition of a convex function and a linear mapping.In fact, both of these problems will be answered by a general result for the subdifferentiation of the convex function f + g ◦ L where f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ), L ∈ L(E1 , E2 ). To establish the subdifferential calculus for the latter function we need some preliminary results. 120 3 Convex Functions Lemma 3.8.22 Let f ∈ Γ(E1 ), g ∈ Γ(E2 ) and L ∈ L(E1 , E2 ). Then ∂(f + g ◦ L) ⊃ ∂f + L∗ ◦ (∂g) ◦ L. Proof: Let x ∈ E. A generic point in ∂f (x) + (L∗ ◦ (∂g) ◦ L)(x) is of the form u + L∗ v with u ∈ ∂f (x) and v ∈ ∂g(L(x)). The subdifferential inequality yields f (y) ≥ f (x) + hu, y − xi and g(L(y)) ≥ g(L(x)) + hv, L(y) − L(x)i (y ∈ E1 ). Combining these two inqeualities gives f (y) + g(L(y)) ≥ f (x) + g(L(x)) + hu + L∗ (v), y − xi (y ∈ E1 ), i.e. u + L∗ v ∈ ∂(f + g ◦ L)(x). Proposition 3.8.23 Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) such that (f + g ◦ L)∗ = minv∈E2 {f ∗ ((·) − L∗ (v)) + g ∗ (v)}. Then ∂(f + g ◦ L) = ∂f + L∗ ◦ (∂g) ◦ L. Proof: In view of Lemma 3.8.22 it remains to show that gph ∂(f +g◦L) ⊂ gph (∂f +L∗ (∂g)◦ L): To this end, take (x, u) ∈ gph ∂(f + g ◦ L). By Theorem 3.8.4, we have (f + g ◦ L)(x) + (f + g ◦ L)∗ (u) = hx, ui . (3.40) On the other hand, by asumption, there exists v ∈ E2 such that (f + g ◦ L)∗ (u) = f ∗ (u − L∗ (v)) + g ∗ (v). Combining this with (3.40), we obtain [f (x) + f ∗ (u − L∗ (v)) − hx, u − L∗ (v)i] + [g(L(x)) + g ∗ (v) − hx, L∗ (v)i] = 0. By the Fenchel-Young inequality (3.25) we thus obtain f (x) + f ∗ (u − L∗ (v)) = hx, u − L∗ (v)i and g(L(x)) + g ∗ (v) − hx, L∗ (v)i = 0. Invoking Theorem 3.8.4 again yields u − L∗ (v) ∈ ∂f (x) and v ∈ ∂g(L(x)), hence, u ∈ ∂f (x) + L∗ ∂g(L(x)) as desired. We now come to the announced main result. 121 3 Convex Functions Theorem 3.8.24 (Generalized subdifferential sum rule) Let f ∈ Γ0 (E1 ), g ∈ Γ0 (E2 ) and L ∈ L(E1 , E2 ) ∂(f + g ◦ L) ⊃ ∂f + L∗ ◦ (∂g) ◦ L. (3.41) Under the qualification condition L(ri (dom f )) ∩ ri (dom g) 6= ∅ (3.42) equality holds in (3.41). Proof: By our qualification condition ri (dom g) ∩ L(ri dom f ) 6= ∅, we infer from Corollary 3.7.6 that (f + g ◦ L)∗ = minv∈E2 {f ∗ ((·) − L∗ (v)) + g ∗ (v)}. Hence, the assertion follows from Proposition 3.8.23. The generalized subdifferential sum rule has two many important consequences, two of which we present now. The proof is, once more, based on a (strong) separation argument. Corollary 3.8.25 (Subdifferential sum rule) Let f, g ∈ Γ then ∂(f + g) ⊃ ∂f (x) + ∂g(x) (x ∈ E). (3.43) Under the qualification condition ri (dom f ) ∩ ri (dom g) 6= ∅ (3.44) equality holds in (3.43). Corollary 3.8.26 (Subdifferential chain rule) Let g ∈ Γ(E2 ) and L ∈ L(E1 , E2 ). Then ∂(g ◦ L) ⊃ L∗ (∂g) ◦ L. (3.45) rge L ∩ ri (dom g) 6= ∅ (3.46) Under the qualification condition equality holds in (3.45). We proceed with the subdifferential of the pointwise maximum of a finite collection of convex functions. Theorem 3.8.27 (Subdifferential of maximum of convex functions) For i ∈ I := {1, . . . , m} let fi ∈ Γ and x ∈ ∩i∈I int (dom fi ) and set f := maxi∈I fi and I(x) := {i ∈ I | fi (x) = f (x) }. Then [ ∂f (x) = conv ∂fi (x). i∈I(x) 122 3 Convex Functions Proof: Let i ∈ I(x) and u ∈ ∂fi (x). Then, by the subdifferential inequality, we have hu, y − xi ≤ fi (y) − fi (x) ≤ f (y) − f (x) (y ∈ E), i.e. u ∈ ∂f (x). Using this and the fact that ∂f (x) is closed and convex, cf. Proposition 3.8.2 a), we have [ ∂f (x) ⊃ conv ∂fi (x). i∈I(x) Now assume the inclusion were strict, i.e. there exists [ u ∈ ∂f (x) \ conv ∂fi (x). (3.47) i∈I(x) By strong separation there hence exists s ∈ E \ {0} such that hs, ui > max sup hs, zi = max fi0 (x; s), i∈I(x) z∈∂fi (x) (3.48) i∈I(x) where we use Corollary 3.8.14 for the second identity. In view of Remark 2.6.2 and the fact that x ∈ int (dom fi ) for all i ∈ I, we realize that we can rescale s such that \ x+s∈ dom fi = dom f. (3.49) i∈I Now let {αk ∈ (0, 1)} ↓ 0. Since I is finite, we can assume w.l.o.g that there exists j ∈ I such that fj (x + αk s) = f (x + αk s) (k ∈ N). (3.50) For k ∈ N we hence have fj (x + αk s) ≤ (1 − αk )fj (x) + αk fj (x + s) and thus (1 − αk )fj (x) ≥ fj (x + αk s) − αk fj (x + s) ≥ f (x + αk s) − αk f (x + s) ≥ f (x) + hu, αk si − αk f (x + s) ≥ fj (x) + αk hu, si − αk f (x + s). Here, the second inequality uses (3.50) and the definition of f . The third one uses that u ∈ ∂f (x) (see 3.47) and the last one is again due to the definition of f . Now letting k → ∞ and using (3.49) yields fj (x) = f (x). (3.51) Finally, using (3.50), (3.51), (3.47) and (3.48), we obtain fj0 (x; s) < hs, ui ≤ fj (x + αk s) − fj (x) f (x + αk s) − f (x) = → fj (x; s), αk αk which is the desired contradiction. A frequently occuring special case of the foregoing result is the following. 123 3 Convex Functions Corollary 3.8.28 For i ∈ I := {1, . . . , m} let fi ∈ Γ be differentiable at x ∈ and set f := maxi∈I fi and I(x) := {i ∈ I | fi (x) = f (x) }. Then T i∈I int (dom fi ) ∂f (x) = conv {∇fi (x) | i ∈ I(x) } . Proof: Combine Theorem 3.8.27 and Theorem 3.8.15. Exercises to Chapter 3 3.1. (Domain of an lsc function) Is the domain of an lsc function closed? 3.2. (Univariate convex functions) Let f : R → R ∪ {+∞} and I ⊂ dom f be an open intervall. Show the following (without using results from Section 3.1.2): a) f is convex on I if and only if the slope-function x 7→ f (x) − f (x0 ) x − x0 is nondecreasing on I \ {x0 }. b) Let f is differentiable on I: Then f is convex on I if f 0 is nondecreasing on I, i.e. f 0 (s) ≤ f 0 (t) (s, t ∈ I : s ≤ t). c) Let f is twice differentiable on I. Then f is convex on I if and only if f 00 (x) ≥ 0 for all x ∈ I. 3.3. (Characterization of convexity) Let f : E → R. Show the equivalance of: i) f is convex; ii) The strict epigraph epi < f := {(x, α) ∈ E × R | f (x) < α } of f is convex; iii) For all λ ∈ (0, 1) we have f (λx + (1 − λ)y) < λα + (1 − λ)β whenever f (x) < α and f (y) < β. 3.4. (Properness and closedness of convex functions) Prove the following: a) An improper convex function f : E → R must have f (x) = −∞ for all x ∈ ri (dom f ). b) An improper convex function which is lsc, can only have infinite values. c) If f is convex then cl f is proper if and only f is proper. 124 3 Convex Functions 3.5. (Jensen’s Inequality) Show that f : E → R ∪ {+∞} is convex if and only if ! p p X X λi f (xi ) ∀xi ∈ E (i = 1, . . . , p), λ ∈ ∆p . f λ i xi ≤ i=1 i=1 3.6. (Quasiconvex functions) A function f : E → R is called quasiconvex if the level sets lev≤α f are convex for every α ∈ R. Show: a) Every convex function is quasiconvex. b) f : E → R ∪ {+∞} is quasiconvex if an only if f (λx + (1 − λ)y) ≤ max{f (x), f (y)} (x, y ∈ dom f, λ ∈ [0, 1]). c) If f : E → R ∪ {+∞} is quasiconvex then argmin f is a convex set. 3.7. (Coercivity is level-boundedness) Show that a function f : E → R is coercive if and only if it is level-bounded. 3.8. (Post-composition with monotonically increasing, convex functions) Let f : E → R ∪ {+∞} be convex (and lsc) and let g : R → R ∪ {+∞} be convex (and lsc) and nondecreasing. We put g(+∞) := +∞ and assume that limx→∞ g(x) = +∞. a) Show that g ◦ f is convex (and lsc); b) Give a necessary and sufficient condition for g ◦ f to be proper. 3.9. (Supercoercivity in sums) Let f ∈ Γ and g : E → R ∪ {+∞} supercoercive. Show that f + g is supercoercive. 3.10. (Convergence of prox-operator) Let f ∈ Γ0 and x̄ ∈ dom f . Prove that Pλ f (x̄) → x̄ and f (Pλ f (x̄)) → f (x̄) (λ ↓ 0). 3.11. (Minimizing differentiable convex functions) Let f : E → R be convex and differentiable and C ⊂ E. Show that x̄ ∈ C is a minimizer of f over C if and only if −∇f (x̄) ∈ NC (x̄). 3.12. (Convex hulls of functions) Let f : E → R. Show the following: a) epi (conv f ) = conv (epi f ); nP o PN +2 N +2 b) (conv f )(x) = inf λ f (x ) λ ∈ ∆ , x = λ x i i i i N +2 i=1 i=1 (x ∈ E). 3.13. (Properness of convex hull) Let f : E → R. a) Show that f is proper if conv f is. Does the reverse implication hold as well? 125 3 Convex Functions b) Show that conv f is proper if and only if f has an affine minorant. 3.14. (Self-conjugacy) Show that 12 k · k2 is the only function f : Rn → R with f ∗ = f . 3.15. (Conjugate of negative logdet) Compute f ∗ and f ∗∗ for − log(det X) if X 0, n f : X ∈ S 7→ +∞ else. 3.16. (Positive homogeneity, sublinearity and subadditivity) Let f : E → R. Show the following: a) f is positively homogeneous if and only if epi f is a cone. In this case f (0) ∈ {0, −∞}. b) If f is lsc and positively homogeneous with f (0) = 0 it must be proper. c) The following are equivalent: i) f is sublinear; ii) f is positively homogeneous and convex; iii) f is positively homogeneous and subadditive; iv) epi f is a convex cone. 3.17. (Finiteness of support functions) Let S ∈ E be nonempty. Then σS is finite if and only if S is bounded. 3.18. (Polar sets) Show the following: a) If C ∈ E is a cone, we have {v | hv, xi ≤ 0 (x ∈ C) } = {v | hv, xi ≤ 1 (x ∈ C) }. b) C ⊂ E is bounded if and only if 0 ∈ int C ◦ . c) For any closed half-space H containing 0 we have H ◦◦ = H. 3.19. (Gauge functions) Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Prove: a) C = lev≤1 γC , C ∞ = γC−1 ({0}), dom γC = R+ C b) The following are equivalent: i) γC is a norm (with C as its unit ball); ii) C is bounded, symmetric (C = −C) with nonempty interior. ∗ 3.20. (Cone polarity and conjugacy) Let K ⊂ E be a convex cone. Then δK ←→ δK ◦ . 3.21. (Soft thresholding) For f : x ∈ Rn 7→ kxk1 compute ∂f and eλ f (λ > 0). 126