Convex sets and functions Dimitar Dimitrov Örebro University May, 2011 1 / 45 Topics addressed in this material Convex sets Convex functions The presentation is mainly based on [1], Chapters 2 and 3. 2 / 45 Lines and line segments 4.5 4 θ = 1.3 3.5 x2 θ = 1 3 2.5 θ = 0.5 y 2 1.5 x1 θ = 0 θ = −0.2 1 0.5 0 −0.5 −1 −3 −2 −1 0 x 1 2 3 Given two distinct points x1 , x2 ∈ Rn , any point x on the line passing through x1 and x2 can be expressed as x = (1 − θ)x1 + θx2 , for some θ ∈ R. θ = 0 corresponds to x1 and θ = 1 corresponds to x2 . Values of θ between 0 and 1 correspond to points in the (closed) line segment between x1 and x2 . 3 / 45 4.5 4 θ = 1.3 3.5 θ=1 3 2.5 θ = 0.5 2 y ∆x 1.5 x1 θ = 0 θ = −0.2 1 0.5 0 −0.5 −1 −3 −2 −1 0 x 1 2 3 Alternatively, we can represent any point on the line passing through x1 and parallel to ∆x as x = x1 + θ (x2 − x1 ), | {z } ∆x which is clearly equivalent to x = (1 − θ)x1 + θx2 . 4 / 45 Affine combination Note that when a line ℓ is defined using x = (1 − θ)x1 + θx2 , for some θ ∈ R, implicit in the definition is that the coefficients in the linear combination of x1 and x2 sum to one, i.e., θ + (1 − θ) = 1. This is called an affine combination of the two vectors, and the line ℓ is called an affine set (because it contains every affine combination of two points in it). Hence, we can define all points on a line ℓ using x = θ1 x1 + θ2 x2 , θ1 + θ2 = 1. If the constraint θ1 + θ2 = 1 is not imposed, then θ1 x1 + θ2 x2 is simply a linear combination of x1 and x2 (which can generate any point on the plane R2 with a proper choice of θ1 and θ2 , provided that the two vectors are linearly independent). Recall that k vectors x1 , . . . , xk ∈ Rn are linearly independent if θ1 x1 + · · · + θk xk = 0, only if θ1 = · · · = θk = 0, i.e., no vector can be expressed as a linear combination of the others. In general, a point where Pk x = θ1 x 1 + · · · + θk x k , i=1 θi = 1, is called an affine combination of the points x1 , . . . , xk . 5 / 45 Affine sets A set X ⊆ Rn is affine if the line through any two distinct points in X lies in X . Moreover, if X is an affine P set, it contains all affine combinations of its points, i.e., if x1 , . . . , xk ∈ X , and ki=1 θi = 1, then θ1 x1 + · · · + θk xk ∈ X . Let S be a subspace of Rn and let x ∈ Rn , then the set X = {s + x : for all s ∈ S}, is an affine set. In particular if x = 0, we see that every subspace of Rn is an affine set as well. Inversely, if X is an affine set and x0 ∈ X , then the set S = {x − x0 : for all x ∈ X }, is a subspace. Recall that if V is a vector space, and S ⊆ V is a subset of V (i.e., S contains some of the vectors in V). Then, S is a subspace of V if: 0 ∈ S (S contains the “zero” element), S is closed under addition and scalar multiplication, i.e., for any s1 , s2 ∈ S and θ1 , θ2 ∈ R, we have that θ1 s1 + θ2 s2 ∈ S. 6 / 45 Example (affine set) 6 x2 5 affi ne set 4 3 2 sub spa ce of R2 1 x1 0 −1 −2 −3 −4 | 1 2 {z A −4 {x : x1 = 4 |{z} } x2 | {z } b x −2 0 Ax = b} N (A) 2 4 6 The solution set of a system of linear equations X = {x : Ax = b}, where A ∈ Rm×n and b ∈ Rm , is an affine set. The subspace “associated with” X is the nullspace of A. 7 / 45 Convex sets A set X is convex if the line segment between any two points in X lies in X , i.e., if for any x1 , x2 ∈ X and any θ ∈ [0, 1], we have (1 − θ)x1 + θx2 ∈ X . x2 x2 x1 x1 Examples of three sets (only the first one is convex). It the set {0, 1, 2, . . . } convex? Convex combination A point x of the form x = θ1 x 1 + · · · + θk x k , Pk where θ = 1 and θ ≥ 0, i = 1, . . . , k is called a convex combination of the i i=1 i points x1 , . . . , xk . A set is convex if and only if it contains every convex combination of its points. 8 / 45 Convex hulls The convex hull of a set X , denoted conv(X ), is the set of all convex combinations of points in X , i.e., conv(X ) = {θ1 x1 + · · · + θk xk : xi ∈ X , θi ≥ 0, i = 1, . . . , k, k X θk = 1}. i=1 The convex hull of a set X , is the smallest convex set that contains X . The figure depicts the convex hull of a set X containing 11 points (black circles). The convex hull conv(X ) has infinitely many points, since it contains all convex combinations of the elements of X . All convex combinations between two points x1 and x2 are depicted with a dashed line. x2 x1 The point depicted in red is not a convex combination of any two elements of X , but it is a convex combination of x1 , x3 , x4 red point = 0.1x1 + 0.6x3 + 0.3x4 . x4 x3 9 / 45 Cones A set X is a cone if for every x ∈ X , and θ ≥ 0 we have θx ∈ X . If in addition for any x1 , x2 ∈ X , and θ1 , θ2 ≥ 0 we have θ1 x1 + θ2 x2 ∈ X , then the set X is called a convex cone. A point of the form θ1 x1 + · · · + θk xk , with θ1 , . . . , θk ≥ 0 is called a conic combination (or nonnegative linear combination) of x1 , . . . , xk . x1 0 x2 x3 x2 0 x1 0 0 Examples of a convex (left) and non-convex (right) cone. The latter one is defined as X := {x ∈ R2 : x1 ≥ 0, x2 ≥ 0, x1 x2 ≤ 0}. 10 / 45 Example (positive semidefinite cone) [1], pp. 35 1 0.5 z The figure depicts the boundary of a positive semidefinite cone in S2 plotted as (x, y, z) in R3 . x y A= ∈ S2+ ⇔ x, z ≥ 0, xz ≥ y 2 y z 0 1 0 1 y 0.5 −1 0 x Consider the set of positive semidefinite symmetric matrices n Sn + = {A ∈ S : A is positive semidefinite}. n The set Sn + is a convex cone, since if θ1 , θ2 ≥ 0 and A, B ∈ S+ , then θ1 A + θ2 B ∈ Sn . This follows directly from the properties of positive + matrices. Let x ∈ Rn , then semidefinite xT (θ1 A + θ2 B)x = θ1 xT Ax + θ2 xT Bx ≥ 0. 11 / 45 Example (norm cones) Some examples of norms Recall that Rn a function k·k : → R is called a norm if the following conditions are satisfied: kxk ≥ 0 for all x ∈ Rn , kxk = 0 if and only if x = 0, kαxk = |α|kxk, for all x ∈ Rn , α ∈ R (homogeneity), ℓ2 norm (Euclidean norm) kxk2 = The set x2i i=1 !1 2 = √ xT x ℓ1 norm kx + yk ≤ kxk + kyk for all x, y ∈ Rn (triangle inequality). A norm is a measure of the length of a vector. The distance between two vectors x and y can be measured as the norm of their difference kx − yk. n X kxk1 = n X i=1 |xi | ℓ∞ norm kxk∞ = max {|x1 |, · · · , |xn |} X = {(x, t) : kxk ≤ t} ⊆ Rn+1 is a cone associated with the norm k·k. 12 / 45 Norm cones associated with ℓ1 , ℓ2 and ℓ∞ norms 0.5 0.5 z 1 z 1 0 −1 0 −1 1 0 1 0 1 0 0 −1 1 −1 1 1 ℓ5 ℓ∞ ℓ2 ℓ1 y 0.5 z 0.5 0 −0.5 t=1 −1 0 −1 1 0 0 1 −1 −0.5 0 x 0.5 1 −1 13 / 45 Hulls - summary The convex hull of x1 , · · · , xk is defined as ( ) k X θ1 x1 + · · · + θk xk θ1 , . . . , θk ≥ 0, θi = 1 , i=1 i.e., the set of all convex combinations of {xi }. The affine hull of x1 , · · · , xk is defined as ( ) k X θi = 1 , θ1 x1 + · · · + θk xk θ1 , . . . , θk ∈ R, i=1 i.e., the set of all affine combinations of {xi }. The affine hull of a set of vectors {xi } is the smallest affine set that contains {xi }. The conic hull of x1 , · · · , xk is defined as {θ1 x1 + · · · + θk xk | θ1 , . . . , θk ≥ 0, } , i.e., the set of all conic combinations of {xi }. For example, the affine hull of three (or more) distinct points in R2 not all lying on the same line is R2 itself. The convex hull of such three points is the triangle with vertices the points themselves. 14 / 45 Hyperplanes A hyperplane H is a set of the form H = {x : aT x = b}, where a ∈ Rn (a 6= 0) and b ∈ R are given constants. In words, the above definition states that H is the set of all points whose inner product with a is equal to b. Furthermore, note that since H is the solution set of aT x = b, it is an affine set and has an associated subspace S = {x − x0 : for all x ∈ H}, where x0 is an arbitrary point from H. Let x0 ∈ H, then we can define the set of all points that belong to H as H = {x : aT (x − x0 ) = 0}. This is because if x0 ∈ H ⇒ aT x0 = b. From this definition we can conclude that any vector of the form x − x0 is orthogonal to a. Or in other words a is orthogonal to S. Commonly, we say that a is the normal to the hyperplane H. 15 / 45 Halfspaces A hyperplane H divides Rn into two halfspaces H+ = {x : aT x ≥ b} H− = {x : aT x ≤ b} Both H+ and H− are closed and unbounded sets. The set {x : aT x < b} is the interior of H− and is called an open halfspace (likewise for {x : aT x > b}). Let x0 ∈ H. Alternatively, a halfspace can be defined as H+ = {x : aT (x − x0 ) ≥ 0}, i.e., all vectors x such that ∠(a, x − x0 ) ≤ = {x : aT (x − x0 ) ≤ 0}, i.e., all vectors x such that ∠(a, x − x0 ) ≥ π 2 π 2 ≥ b H− x1 aT x ≤ b aT x a a x2 aT x = b x0 16 / 45 Polyhedra A polyhedron is the solution set of a finite number of linear equalities and inequalities P = {x ∈ Rn : Ax ≤ b, Cx = d}, where A ∈ Rmi ×n and C ∈ Rme ×n . Or in other words, the intersection of a finite number of halfspaces and hyperplanes. The figure shows an example of P = {x ∈ R2 : Ax ≤ b}. 5 4 3 P x2 2 1 0 −1 −1 0 −1 −1 0 −1 1 0 0 1 | {z A −2 −2 −2 0 0 3 3 | {z b x1 ≤ x2 } −1 0 } 1 x1 2 3 4 5 17 / 45 Example (polyhedron) The polyhedron in the figure is defined as the intersection of three halfspaces and one hyperplane P = {(x1 , x2 , x3 ) : x1 , x2 , x3 ≥ 0, x1 + x2 + x3 = 1}. Each of the points v 1 = (1, 0, 0), v 2 = (0, 1, 0), v 3 = (0, 0, 1) is called a vertex of P. P is a special type of polyhedron called a simplex (in fact, this particular simplex is called a probability simplex). Note that the hyperplane removes one “degree of freedom” from the choice of x. x3 v3 v2 v1 x1 x2 18 / 45 Polyhedron in standard form Given a polyhedron described as P = {x : Ax ≤ b, Cx = d}, can always be represented as P = {x̃ : C̃ x̃ = d̃, x̃ ≥ 0}. (1) We say that a polyhedron is in standard form if it is represented in the form (1). For example consider P = {x ∈ Rn : Ax ≤ b} First, we introduce a vector of (nonnegative) variables s (called slack variables) to obtain the following equivalent representation of P P = {(x, s) : Ax + s = b, s ≥ 0}. The above definition is still not in standard form, since there is no non-negativity constraint for x. Next, we note that we can represent any vector x as x = v − w, for some v, w ≥ 0. Introducing v and w leads to P = {(v, w, s) : A(v − w) + s = b, (v, w, s) ≥ 0}, hence, x̃ = (v, w, s), C̃ = A −A I , and d̃ = b. 19 / 45 Vertices There are multiple ways to define what is a vertex of a polyhedron. The definition below is purely geometric i.e., it does not depend on the specific representation of the polyhedron in terms of linear constraints. [3], pp. 46 Let P be a polyhedron. A point v ∈ P is a vertex (extreme point) of P if there are no two points x1 , x2 ∈ P, both different from v, and a scalar θ ∈ [0, 1], such that v = (1 − θ)x1 + θx2 . x1 The point w on the figure is not a vertex of P because it is a convex combination of the points x3 and x4 . Note that not every polyhedron has a vertex (for example a halfspace in Rn does not have a vertex). v x2 P x3 w x4 A nonempty and bounded polyhedron is the convex hull of its vertices [3], pp. 68. 20 / 45 Operations that preserve convexity of sets There is a variety of operations that preserve convexity of sets. We outline the following three: The intersection of convex sets is convex. As an example consider a polyhedron, which is the intersection of halfspaces. The projection of a convex set on some of its coordinates is convex. For example, the projections of a cube from R3 on the x − y plane, and on the x axis (are convex). The figures below depict two different views. Let X ⊆ Rn be a convex set, and f : Rn → Rm be an affine function, i.e., f (x) = Ax + b, with A ∈ Rm×n and b ∈ Rm . Then, the image of X under f , i.e, f (X ) = {f (x) : x ∈ X } is convex. z z y y x x 21 / 45 Example (image of P under an affine function) 4 3 P 2 x2 1 f (P) 0 f −1 −2 −3 −5 −4 −3 −2 −1 x1 0 1 2 3 The figure depicts the polyhedron P and the polyhedron f (P) = {f (x) : x ∈ P}. f (x) = Ax + b, with −0.5 −0.9 −0.7 A= , b= . 1.7 −0.5 −1.2 Note that f ({colored points} ∈ P) = {colored points} ∈ f (P) 22 / 45 Example (image of a unit ball under an affine function) 2 1 B 0 x2 E −1 xc f −2 −3 −4 −4 −3 −2 −1 x1 0 1 2 An origin-centered unit ball in Rn is a convex set defined as B = {x : kxk2 ≤ 1}, i.e., the set of points within an Euclidean distance 1 form the origin. Consider the affine function f (x) = Ax + xc , with A being a square and non-singular matrix. The following set is an ellipsoid with center xc (which is a convex set as well). E = {f (x) : kxk2 ≤ 1}. 23 / 45 Separating hyperplanes H− H+ B If B and E are two convex subsets of Rn that do not intersect (i.e., B ∩ E = ∅), then there exists a nonzero vector a ∈ Rn and a scalar b such that aT x ≤ b for all x ∈ E and aT x ≥ b for all x ∈ B. E a If there exists a nonzero vector a ∈ Rn and a scalar b such that aT x < b for all x ∈ E and aT x > b for all x ∈ B. we say that the sets B and E are strictly separable. In general, it turns out that two disjoint convex sets need not be strictly separable by a hyperplane [1], pp. 49. For example consider the sets E = {x : x2 ≤ 0} and B = {x : x ≥ 0, x1 x2 ≥ 1} in R2 . In the special case, when one of the (two disjoint) sets is a singleton (i.e., it contains only a single point, say x0 ), then x0 is strictly separable from the other set [3], pp. 170, [1], pp. 49. 24 / 45 Supporting hyperplanes Let X ⊆ Rn , and x0 is a point on the boundary of X . If a nonzero vector a satisfies aT x ≤ aT x0 for all x ∈ X , then the hyperplane H = {x : aT x = aT x0 } is called a supporting hyperplane to X at the point x0 [1], pp. 50. H− x0 Geometric interpretation If H is a supporting hyperplane to X at x0 , then the halfspace H− contains X . Note that in the figure, no supporting hyperplane to X at x1 exists. On the other hand, there are infinitely many supporting hyperplanes at the point x2 . x1 x2 X The supporting hyperplane theorem For any nonempty convex set P, and any x0 on the boundary of P, there exists a supporting hyperplane to P at x0 (of course it need not be unique). 25 / 45 The projection theorem [2], pp. 704 Suppose that X ⊆ Rn is a closed convex (nonempty) set. For every v ∈ Rn , there exists a unique vector x ∈ X that minimizes kx − vk2 over all x ∈ X . This vector is called the projection of v on X . Given some v ∈ Rn , a vector x ∈ X is equal to the projection of v on X if and only if (y − x)T (v − x) ≤ 0, for all y ∈ X (see the figure). Why do we need the assumption that X is a closed subset of Rn ? v v1 x1 x X x2 v2 x3 ′ v2 y v3 X 26 / 45 Convex functions Definition (Jensen’s inequality) A function f : Rn → R is convex if dom(f ) is a convex set and if for all x1 , x2 ∈ dom(f ), and θ ∈ [0, 1], we have f ((1 − θ)x1 + θx2 ) ≤ (1 − θ)f (x1 ) + θf (x2 ). Geometric interpretation If f is a convex function, then the line segment between (x1 , f (x1 )) and (x2 , f (x2 )) lies above the graph of f . θ = 0.7 f (x) = x2 + 0.1 1 (1 − 0.5 (x 1 θ )f )+ x 2) θf ( f (x3 ) 0 x1 x3 x2 x3 = (1 − θ)x1 + θx2 −1 −0.5 0 x 0.5 1 27 / 45 A function f : Rn → R is called strictly convex if dom(f ) is a convex set and f ((1 − θ)x1 + θx2 ) < (1 − θ)f (x1 ) + θf (x2 ) for all x1 , x2 ∈ dom(f ) and θ ∈ (0, 1). A function f is concave if −f is convex. An affine function is both convex and concave, hence it satisfies f ((1 − θ)x1 + θx2 ) = (1 − θ)f (x1 ) + θf (x2 ) for all x1 , x2 ∈ dom(f ) and θ ∈ (0, 1). In fact, affine functions are the only functions that are both convex and concave [3], pp. 15. A function f is convex if and only if it is convex when restricted to any line intersecting dom(f ). For example, consider the strictly convex quadratic function depicted on the figure. 28 / 45 A function can be neither convex nor concave 1 f (x) = x3 + 0.1 0.5 x1 0 x2 −0.5 −1 −0.5 0 0.5 1 x 29 / 45 First-order (necessary and sufficient) condition Assume that f is differentiable everywhere in dom(f ) (implication: dom(f ) is open). f is convex if an only if dom(f ) is convex and f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ), (2) holds for all x1 , x2 ∈ dom(f ). Since the RHS of the above inequality is the first-order Taylor-series expansion of f (x) at point x1 in the direction of ∆x = x2 − x1 we can rewrite (2) as f (x1 + ∆x) ≥ f (x1 ) + ∇f (x1 )T ∆x. f (x2 ) global underestimator of f f (x1 ) + ∇f (x1 )(x2 − x1 ) f (x1 ) x1 ∇f (x1 ) −1 x2 30 / 45 Some important points to remember The inequality f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ) states that the first-order Taylor approximation of a function is always a global underestimator of the function. Important: this actually means that using only local information about a convex function (i.e., its gradient at x1 ) we can derive global properties. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex. Inequality (2) shows that if ∇f (x̃) = 0, then for all x ∈ dom(f ), f (x) ≥ f (x̃), i.e., x̃ is a global minimizer of the convex function f . If f is a strictly convex function, and if ∇f (x̃) = 0, then for all x ∈ dom(f ), f (x) > f (x̃), i.e., x̃ is a unique global minimizer of f . If dom(f ) is a convex subset of Rn , and f is a convex function, then a local minimum of f is also a global minimum in addition if f is strictly convex, then there exists at most one global minimum The Jensen’s inequality can be easily extended to a convex combinations of more than two points i.e., f (θ1 x1 + · · · + θk xk ) ≤ θ1 f (x1 ) + · · · + θk f (xk ), θ1 , · · · , θk ≥ 0, k X θi = 1. i=1 31 / 45 A local minimum of f is also a global minimum - proof [2], pp. 703 If dom(f ) is a convex subset of Rn , and f is a convex function then, a local minimum of f is also a global minimum This can be proved by contradiction. Suppose that x⋆ is a local minimizer of f that is not a global minimizer. Then, there must exist some x 6= x⋆ such that f (x) < f (x⋆ ). Using the Jensen’s inequality we have f ((1 − θ)x⋆ + θx) ≤ (1 − θ)f (x⋆ ) + θf (x) < f (x⋆ ), However, this contradicts the assumption that x⋆ for all θ ∈ (0, 1]. is a local minimizer. in addition if f is strictly convex, then there exists at most one global minimum Again we use a proof by contradiction. Suppose that two distinct global minima x⋆1 and x⋆2 exist (f (x⋆1 ) = f (x⋆2 )). Then their average 1 1 ⋆ x + x⋆2 2 1 2 must belong to dom(f ) (since it is assumed to be convex). However, by the strict convexity of f we have 1 ⋆ 1 1 1 f x1 + x⋆2 < f (x⋆1 ) + f (x⋆2 ) = f (x⋆1 ) = f (x⋆2 ). 2 2 2 2 But this is a contradiction, since x⋆1 and x⋆1 are assumed to be global minima. 32 / 45 Second-order (necessary and sufficient) conditions Assume that f is twice differentiable everywhere in dom(f ) (i.e., dom(f ) is open). f is convex if an only if dom(f ) is convex and its Hessian matrix ∇2 f (x) is positive semidefinite for all x ∈ dom(f ). If ∇2 f (x) is positive definite for all x ∈ dom(f ), then f is strictly convex. The converse is not true, for example f (x) = x4 is strictly convex, but has a zero second derivative at x = 0. For more details see [2]. pp. 693. Example A quadratic function f : Rn → R with dom(f ) = Rn given by f (x) = 1 T x Hx + xT g 2 is convex if and only if ∇2 f = H is positive semidefinite, and is strictly convex if and only if H is positive definite. In the latter case, there is a unique global minimizer given as the solution of ∇f (x) = Hx + g = 0. 33 / 45 Examples of convex/concave functions Exponential. f (x) = epx is convex on R, for any p ∈ R. Powers. f (x) = xp is convex on R+ for p ≥ 1 and concave for p ∈ [0, 1]. Powers of absolute value. f (x) = |x|p is convex on R for p ≥ 1. Logarithm. f (x) = log(x) is concave on R++ . Max function. f (x) = max{x1 , . . . , xn } is convex on Rn . Norms. Every norm k·k : Rn → R is convex. For θ ∈ [0, 1] we have k(1 − θ)x1 + θx2 k ≤ k(1 − θ)x1 k + kθx2 k = (1 − θ)kx1 k + θkx2 k. | {z } triangle inequality Affine function. f (x) = aT x + b, with a ∈ Rn is both convex and concave. Quadratic function. f (x) = Examples of nonconvex functions f (x) = x21 − f (x) = x1 x2 x22 1 T x Hx 2 + xT g, with H ∈ Sn + is convex. Are the following functions convex? f (x) = x21 + x22 − x1 x2 f (x) = x21 + x22 + 5x1 x2 34 / 45 Epigraph of a function The graph of a function f : Rn → R is defined as {(x, f (x)) : x ∈ dom(f )} ⊆ Rn+1 . The epigraph of a function f : Rn → R is defined as epi(f ) = {(x, t) : t ≥ f (x), x ∈ dom(f )} ⊆ Rn+1 . “Epi” means “above” so epi(f ) is the set of points lying on or above the graph of f . epi(f ) (x, f (x)) x dom(f ) 35 / 45 Convex function and sets Link between convex functions and convex sets A function if convex if and only if its epigraph is a convex set. epi(f ) (x2 , t) (x1 , f (x1 )) g ortin supp ane erpl hyp x1 dom(f ) ∇f (x1 ) −1 x2 Since in the definition of epigraph t ≥ f (x) for any x ∈ dom(f ), we have t ≥ f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ), for any x1 , x2 ∈ dom(f ). The above relation can be expressed as follows T ∇f (x1 ) x2 x1 (x2 , t) ∈ epi(f ) ⇒ − ≤ 0, −1 t f (x1 ) clearly showing that (∇f (x1 ), −1) supports epi(f ) at (x1 , f (x1 )). 36 / 45 Why assume that dom(f ) is convex? epi(f ) epi(f ) dom(f ) dom(f ) 37 / 45 Sublevel sets of a convex function Sublevel sets The sublevel set of a function f : Rn → R corresponding to a real value c is the set of points {x ∈ dom(f ) : f (x) ≤ c}. Superlevel sets The superlevel set of a function f : Rn → R corresponding to a real value c is the set of points {x ∈ dom(f ) : f (x) ≥ c}. Convex functions The sublevel sets of a convex function are convex. The converse is not true. Even if a function has all its sublevel sets convex, it need not be convex. For example consider f (x) = −ex which is a strictly concave function, however, all its sublevel sets are convex (in fact they are rays). Concave functions The superlevel sets of a concave function are convex. 38 / 45 Verifying convexity of a function We can verify that a given function f is convex by using the definition (Jensen’s inequality) using the first-order condition (if f is differentiable) using the second-order condition (if f is twice differentiable) restricting a convex function to a line. Recall that f is convex ⇔ f (x0 + t∆x) is convex in t ∈ R for all x0 and ∆x showing that f is obtained through operations preserving convexity. Example of simple operations A positive multiple of a convex function is convex f is convex, θ ≥ 0 ⇒ θf is convex Sums of convex functions is convex f1 , f2 are convex ⇒ f1 + f2 is convex Nonnegative weighted sum (i.e., conic combination) f1 , . . . , fk are convex, θ1 , . . . , θk ≥ 0 ⇒ θ1 f1 + · · · + θk fk is convex 39 / 45 Example (sum of convex functions) f1 + f2 f2 (x) = x2 + x f1 (x) = x2 40 / 45 Pointwise maximum f2 (x) = x2 + x epi(max{f1 (x), f2 (x)}) f1 (x) = x2 If f1 and f2 are convex functions, their pointwise maximum f defined by f (x) = max{f1 (x), f2 (x)}, with dom(f ) = dom(f1 ) ∩ dom(f2 ) is convex . The epigraph of f (x) is given by the intersection of the epigraphs of f1 (x) and f2 (x), i.e., epi(f ) = epi(f1 ) ∩ epi(f2 ). This property extends to the pointwise maximum of k convex functions [1], pp. 80. 41 / 45 Example (pointwise maximum of affine functions) 8 T a1 x+ 6 b1 maxi=1,··· ,m (aT i x + bi ) 4 2 0 T x a2 −2 +b aT 2 3 x+ b3 −4 mini=1,··· ,m (aT i x + bi ) −6 −8 −10 −8 −6 −4 −2 0 2 4 6 8 10 T The function f (x) = max{aT 1 x + b1 , . . . , am x + bm } is convex and epi(f ) is a polyhedron. f (x) is called a piecewise linear convex function. T The function min{aT 1 x + b1 , . . . , am x + bm } is concave. The set of points lying below its graph (i.e., its hypograph) is a polyhedron. 42 / 45 Distance to a convex set Consider again the projection of v ∈ Rn on a closed convex (nonempty) set X . Let us denote the operator projecting v on X by x = projX (v). The following distance function is convex [4], pp. 67 (see [1], pp. 88 for a more general case) distX (v) = kv − projX (v)k2 . This can be demonstrated as follows. Let v 1 , v2 ∈ Rn and θ ∈ [0, 1], then distX ((1 − θ)v 1 + θv2 ) = k(1 − θ)v 1 + θv2 − projX ((1 − θ)v 1 + θv 2 )k2 see figure → triangle inequality → ≤ k(1 − θ)v 1 + θv2 − (1 − θ)projX (v 1 ) − θprojX v 2 )k2 ≤ (1 − θ)kv 1 − projX (v 1 )k2 + θkθv 2 − projX v 2 )k2 = (1 − θ)distX (v 1 ) + θdistX (v 2 ). v1 x1 P x3 v3 x2 v2 43 / 45 Composition with an affine mapping Let g : Rn → Rm be an affine function, i.e., g(x) = Ax + b for some A ∈ Rm×n and b ∈ Rm and let f : Rm → R. The composition of the two functions f (g(x)) = f (Ax + b) has a domain {x : Ax + b ∈ dom(f )}. Then, f is convex ⇒ f (g(x)) is convex f is concave ⇒ f (g(x)) is concave Example Consider the logarithm function log : R → R with domain dom(log) = R++ . This is a concave function. Let a ∈ Rn and b ∈ R. Then, the function log(b − aT x) is concave with domain {x : aT x < b} the function − log(b − aT x) is convex P the function ki=1 − log(bi − aT i x) is convex because it is a sum of k convex functions There are other operations that preserve convexity and many other interesting examples of convex functions ... for more details see [1]. 44 / 45 [1] S. Boyd, and L. Vandenberghe, “Convex Optimization,” Cambridge, 2004. [2] D. P. Bertsekas, “Nonlinear Programming,” Athena Scientific, (3rd print) 2008. [3] D. Bertsimas, and J. N. Tsitsiklis, “Introduction to Linear Optimization,” Athena Scientific, 1997. [4] N. Andréasson, A. Evgrafov, and M. Patriksson, “An Introduction to Continuous Optimization: Foundations and Fundamental Algorithms,” 2005. 45 / 45
© Copyright 2024 Paperzz