Convex sets and functions

Convex sets and functions
Dimitar Dimitrov
Örebro University
May, 2011
1 / 45
Topics addressed in this material
Convex sets
Convex functions
The presentation is mainly based on [1], Chapters 2 and 3.
2 / 45
Lines and line segments
4.5
4
θ = 1.3
3.5
x2 θ = 1
3
2.5
θ = 0.5
y
2
1.5
x1 θ = 0
θ = −0.2
1
0.5
0
−0.5
−1
−3
−2
−1
0
x
1
2
3
Given two distinct points x1 , x2 ∈ Rn , any point x on the line passing through x1
and x2 can be expressed as
x = (1 − θ)x1 + θx2 ,
for some θ ∈ R.
θ = 0 corresponds to x1 and θ = 1 corresponds to x2 . Values of θ between 0 and 1
correspond to points in the (closed) line segment between x1 and x2 .
3 / 45
4.5
4
θ = 1.3
3.5
θ=1
3
2.5
θ = 0.5
2
y
∆x
1.5
x1 θ = 0
θ = −0.2
1
0.5
0
−0.5
−1
−3
−2
−1
0
x
1
2
3
Alternatively, we can represent any point on the line passing through x1 and parallel
to ∆x as
x = x1 + θ (x2 − x1 ),
| {z }
∆x
which is clearly equivalent to x = (1 − θ)x1 + θx2 .
4 / 45
Affine combination
Note that when a line ℓ is defined using
x = (1 − θ)x1 + θx2 ,
for some θ ∈ R, implicit in the definition is that the coefficients in the linear
combination of x1 and x2 sum to one, i.e., θ + (1 − θ) = 1. This is called an affine
combination of the two vectors, and the line ℓ is called an affine set (because it
contains every affine combination of two points in it). Hence, we can define all points
on a line ℓ using
x = θ1 x1 + θ2 x2 ,
θ1 + θ2 = 1.
If the constraint θ1 + θ2 = 1 is not imposed, then θ1 x1 + θ2 x2 is simply a linear
combination of x1 and x2 (which can generate any point on the plane R2 with a
proper choice of θ1 and θ2 , provided that the two vectors are linearly independent).
Recall that
k vectors x1 , . . . , xk ∈ Rn are linearly independent if
θ1 x1 + · · · + θk xk = 0,
only if θ1 = · · · = θk = 0,
i.e., no vector can be expressed as a linear combination of the others.
In general, a point
where
Pk
x = θ1 x 1 + · · · + θk x k ,
i=1 θi = 1, is called an affine combination of the points x1 , . . . , xk .
5 / 45
Affine sets
A set X ⊆ Rn is affine if the line through any two distinct points in X lies in X .
Moreover, if X is an affine
P set, it contains all affine combinations of its points, i.e., if
x1 , . . . , xk ∈ X , and ki=1 θi = 1, then
θ1 x1 + · · · + θk xk ∈ X .
Let S be a subspace of Rn and let x ∈ Rn , then the set
X = {s + x : for all s ∈ S},
is an affine set. In particular if x = 0, we see that every subspace of Rn is an affine
set as well. Inversely, if X is an affine set and x0 ∈ X , then the set
S = {x − x0 : for all x ∈ X },
is a subspace.
Recall that
if V is a vector space, and S ⊆ V is a subset of V (i.e., S contains some of the
vectors in V). Then, S is a subspace of V if:
0 ∈ S (S contains the “zero” element),
S is closed under addition and scalar multiplication, i.e., for any s1 , s2 ∈ S and
θ1 , θ2 ∈ R, we have that θ1 s1 + θ2 s2 ∈ S.
6 / 45
Example (affine set)
6
x2
5
affi
ne
set
4
3
2
sub
spa
ce
of
R2
1
x1
0
−1
−2
−3
−4
|
1
2
{z
A
−4
{x
:
x1
= 4
|{z}
} x2
| {z }
b
x
−2
0
Ax
=
b}
N (A)
2
4
6
The solution set of a system of linear equations X = {x : Ax = b}, where A ∈ Rm×n
and b ∈ Rm , is an affine set. The subspace “associated with” X is the nullspace of A.
7 / 45
Convex sets
A set X is convex if the line segment between any two points in X lies in X , i.e., if
for any x1 , x2 ∈ X and any θ ∈ [0, 1], we have
(1 − θ)x1 + θx2 ∈ X .
x2
x2
x1
x1
Examples of three sets (only the first one is convex). It the set {0, 1, 2, . . . } convex?
Convex combination
A point x of the form
x = θ1 x 1 + · · · + θk x k ,
Pk
where
θ
=
1
and
θ
≥
0,
i = 1, . . . , k is called a convex combination of the
i
i=1 i
points x1 , . . . , xk .
A set is convex if and only if it contains every convex combination of its points.
8 / 45
Convex hulls
The convex hull of a set X , denoted conv(X ), is the set of all convex combinations of
points in X , i.e.,
conv(X ) = {θ1 x1 + · · · + θk xk : xi ∈ X , θi ≥ 0, i = 1, . . . , k,
k
X
θk = 1}.
i=1
The convex hull of a set X , is the smallest convex set that contains X .
The figure depicts the convex hull of a set
X containing 11 points (black circles).
The convex hull conv(X ) has infinitely
many points, since it contains all convex
combinations of the elements of X . All
convex combinations between two points
x1 and x2 are depicted with a dashed line.
x2
x1
The point depicted in red is not a convex
combination of any two elements of X , but
it is a convex combination of x1 , x3 , x4
red point = 0.1x1 + 0.6x3 + 0.3x4 .
x4
x3
9 / 45
Cones
A set X is a cone if for every x ∈ X , and θ ≥ 0 we have θx ∈ X . If in addition for
any x1 , x2 ∈ X , and θ1 , θ2 ≥ 0 we have
θ1 x1 + θ2 x2 ∈ X ,
then the set X is called a convex cone. A point of the form θ1 x1 + · · · + θk xk , with
θ1 , . . . , θk ≥ 0 is called a conic combination (or nonnegative linear combination) of
x1 , . . . , xk .
x1
0
x2
x3
x2
0
x1
0
0
Examples of a convex (left) and non-convex (right) cone. The latter one is defined as
X := {x ∈ R2 : x1 ≥ 0, x2 ≥ 0, x1 x2 ≤ 0}.
10 / 45
Example (positive semidefinite cone) [1], pp. 35
1
0.5
z
The figure depicts the boundary of a
positive semidefinite cone in S2 plotted as
(x, y, z) in R3 .
x y
A=
∈ S2+ ⇔ x, z ≥ 0, xz ≥ y 2
y z
0
1
0
1
y
0.5
−1
0
x
Consider the set of positive semidefinite symmetric matrices
n
Sn
+ = {A ∈ S : A is positive semidefinite}.
n
The set Sn
+ is a convex cone, since if θ1 , θ2 ≥ 0 and A, B ∈ S+ , then
θ1 A + θ2 B ∈ Sn
.
This
follows
directly
from
the
properties
of
positive
+
matrices. Let x ∈ Rn , then
semidefinite
xT (θ1 A + θ2 B)x = θ1 xT Ax + θ2 xT Bx ≥ 0.
11 / 45
Example (norm cones)
Some examples of norms
Recall that
Rn
a function k·k :
→ R is called a norm if
the following conditions are satisfied:
kxk ≥ 0 for all x ∈ Rn ,
kxk = 0 if and only if x = 0,
kαxk = |α|kxk, for all x ∈ Rn , α ∈ R
(homogeneity),
ℓ2 norm (Euclidean norm)
kxk2 =
The set
x2i
i=1
!1
2
=
√
xT x
ℓ1 norm
kx + yk ≤ kxk + kyk for all
x, y ∈ Rn (triangle inequality).
A norm is a measure of the length of a
vector. The distance between two vectors
x and y can be measured as the norm of
their difference kx − yk.
n
X
kxk1 =
n
X
i=1
|xi |
ℓ∞ norm
kxk∞ = max {|x1 |, · · · , |xn |}
X = {(x, t) : kxk ≤ t} ⊆ Rn+1
is a cone associated with the norm k·k.
12 / 45
Norm cones associated with ℓ1 , ℓ2 and ℓ∞ norms
0.5
0.5
z
1
z
1
0
−1
0
−1
1
0
1
0
1
0
0
−1
1
−1
1
1
ℓ5 ℓ∞
ℓ2
ℓ1
y
0.5
z
0.5
0
−0.5
t=1
−1
0
−1
1
0
0
1
−1
−0.5
0
x
0.5
1
−1
13 / 45
Hulls - summary
The convex hull of x1 , · · · , xk is defined as
(
)
k
X
θ1 x1 + · · · + θk xk θ1 , . . . , θk ≥ 0,
θi = 1 ,
i=1
i.e., the set of all convex combinations of {xi }.
The affine hull of x1 , · · · , xk is defined as
(
)
k
X
θi = 1 ,
θ1 x1 + · · · + θk xk θ1 , . . . , θk ∈ R,
i=1
i.e., the set of all affine combinations of {xi }. The affine hull of a set of vectors {xi }
is the smallest affine set that contains {xi }.
The conic hull of x1 , · · · , xk is defined as
{θ1 x1 + · · · + θk xk | θ1 , . . . , θk ≥ 0, } ,
i.e., the set of all conic combinations of {xi }.
For example, the affine hull of three (or more) distinct points in R2 not all lying on
the same line is R2 itself. The convex hull of such three points is the triangle with
vertices the points themselves.
14 / 45
Hyperplanes
A hyperplane H is a set of the form
H = {x : aT x = b},
where a ∈ Rn (a 6= 0) and b ∈ R are given constants. In words, the above definition
states that H is the set of all points whose inner product with a is equal to b.
Furthermore, note that since H is the solution set of aT x = b, it is an affine set and
has an associated subspace
S = {x − x0 : for all x ∈ H},
where x0 is an arbitrary point from H.
Let x0 ∈ H, then we can define the set of all points that belong to H as
H = {x : aT (x − x0 ) = 0}.
This is because if x0 ∈ H ⇒ aT x0 = b. From this definition we can conclude that any
vector of the form x − x0 is orthogonal to a. Or in other words a is orthogonal to S.
Commonly, we say that a is the normal to the hyperplane H.
15 / 45
Halfspaces
A hyperplane H divides Rn into two halfspaces
H+ = {x : aT x ≥ b}
H− = {x : aT x ≤ b}
Both H+ and H− are closed and unbounded sets. The set {x : aT x < b} is the
interior of H− and is called an open halfspace (likewise for {x : aT x > b}).
Let x0 ∈ H. Alternatively, a halfspace can be defined as
H+ = {x : aT (x − x0 ) ≥ 0}, i.e., all vectors x such that ∠(a, x − x0 ) ≤
= {x :
aT (x
− x0 ) ≤ 0}, i.e., all vectors x such that ∠(a, x − x0 ) ≥
π
2
π
2
≥
b
H−
x1
aT
x
≤
b
aT
x
a
a
x2
aT
x
=
b
x0
16 / 45
Polyhedra
A polyhedron is the solution set of a finite number of linear equalities and inequalities
P = {x ∈ Rn : Ax ≤ b, Cx = d},
where A ∈ Rmi ×n and C ∈ Rme ×n . Or in other words, the intersection of a finite
number of halfspaces and hyperplanes. The figure shows an example of
P = {x ∈ R2 : Ax ≤ b}.
5
4
3
P
x2
2
1
0

−1 −1
0
 −1

−1
 0
−1  1
0
0
1
|
{z
A
−2
−2


−2
0
0
3
3
| {z
b


 x1

≤

 x2

}
−1
0





}
1
x1
2
3
4
5
17 / 45
Example (polyhedron)
The polyhedron in the figure is defined as the intersection of three halfspaces and one
hyperplane
P = {(x1 , x2 , x3 ) : x1 , x2 , x3 ≥ 0, x1 + x2 + x3 = 1}.
Each of the points v 1 = (1, 0, 0), v 2 = (0, 1, 0), v 3 = (0, 0, 1) is called a vertex of P.
P is a special type of polyhedron called a simplex (in fact, this particular simplex is
called a probability simplex). Note that the hyperplane removes one “degree of
freedom” from the choice of x.
x3
v3
v2
v1
x1
x2
18 / 45
Polyhedron in standard form
Given a polyhedron described as
P = {x : Ax ≤ b, Cx = d},
can always be represented as
P = {x̃ : C̃ x̃ = d̃, x̃ ≥ 0}.
(1)
We say that a polyhedron is in standard form if it is represented in the form (1).
For example consider P = {x ∈ Rn : Ax ≤ b}
First, we introduce a vector of (nonnegative) variables s (called slack variables) to
obtain the following equivalent representation of P
P = {(x, s) : Ax + s = b, s ≥ 0}.
The above definition is still not in standard form, since there is no non-negativity
constraint for x. Next, we note that we can represent any vector x as x = v − w, for
some v, w ≥ 0. Introducing v and w leads to
P = {(v, w, s) : A(v − w) + s = b, (v, w, s) ≥ 0},
hence, x̃ = (v, w, s), C̃ =
A
−A
I , and d̃ = b.
19 / 45
Vertices
There are multiple ways to define what is a vertex of a polyhedron. The definition
below is purely geometric i.e., it does not depend on the specific representation of the
polyhedron in terms of linear constraints.
[3], pp. 46
Let P be a polyhedron. A point v ∈ P is a vertex (extreme point) of P if there are no
two points x1 , x2 ∈ P, both different from v, and a scalar θ ∈ [0, 1], such that
v = (1 − θ)x1 + θx2 .
x1
The point w on the figure is not a vertex
of P because it is a convex combination of
the points x3 and x4 . Note that not every
polyhedron has a vertex (for example a
halfspace in Rn does not have a vertex).
v
x2
P
x3
w
x4
A nonempty and bounded polyhedron is the convex hull of its vertices [3], pp. 68.
20 / 45
Operations that preserve convexity of sets
There is a variety of operations that preserve convexity of sets. We outline the
following three:
The intersection of convex sets is convex. As an example consider a polyhedron,
which is the intersection of halfspaces.
The projection of a convex set on some of its coordinates is convex. For example,
the projections of a cube from R3 on the x − y plane, and on the x axis (are
convex). The figures below depict two different views.
Let X ⊆ Rn be a convex set, and f : Rn → Rm be an affine function, i.e.,
f (x) = Ax + b, with A ∈ Rm×n and b ∈ Rm . Then, the image of X under f ,
i.e, f (X ) = {f (x) : x ∈ X } is convex.
z
z
y
y
x
x
21 / 45
Example (image of P under an affine function)
4
3
P
2
x2
1
f (P)
0
f
−1
−2
−3
−5
−4
−3
−2
−1
x1
0
1
2
3
The figure depicts the polyhedron P and the polyhedron f (P) = {f (x) : x ∈ P}.
f (x) = Ax + b, with
−0.5 −0.9
−0.7
A=
, b=
.
1.7
−0.5
−1.2
Note that f ({colored points} ∈ P) = {colored points} ∈ f (P)
22 / 45
Example (image of a unit ball under an affine function)
2
1
B
0
x2
E
−1
xc
f
−2
−3
−4
−4
−3
−2
−1
x1
0
1
2
An origin-centered unit ball in Rn is a convex set defined as
B = {x : kxk2 ≤ 1},
i.e., the set of points within an Euclidean distance 1 form the origin. Consider the
affine function f (x) = Ax + xc , with A being a square and non-singular matrix. The
following set is an ellipsoid with center xc (which is a convex set as well).
E = {f (x) : kxk2 ≤ 1}.
23 / 45
Separating hyperplanes
H−
H+
B
If B and E are two convex subsets of Rn
that do not intersect (i.e., B ∩ E = ∅),
then there exists a nonzero vector a ∈ Rn
and a scalar b such that
aT x ≤ b for all x ∈ E and
aT x ≥ b for all x ∈ B.
E
a
If there exists a nonzero vector a ∈ Rn and a scalar b such that
aT x < b for all x ∈ E and
aT x > b for all x ∈ B.
we say that the sets B and E are strictly separable. In general, it turns out that two
disjoint convex sets need not be strictly separable by a hyperplane [1], pp. 49. For
example consider the sets E = {x : x2 ≤ 0} and B = {x : x ≥ 0, x1 x2 ≥ 1} in R2 .
In the special case, when one of the (two disjoint) sets is a singleton (i.e., it contains
only a single point, say x0 ), then x0 is strictly separable from the other set
[3], pp. 170, [1], pp. 49.
24 / 45
Supporting hyperplanes
Let X ⊆ Rn , and x0 is a point on the boundary of X . If a nonzero vector a satisfies
aT x ≤ aT x0 for all x ∈ X , then the hyperplane H = {x : aT x = aT x0 } is called a
supporting hyperplane to X at the point x0 [1], pp. 50.
H−
x0
Geometric interpretation
If H is a supporting hyperplane to X at
x0 , then the halfspace H− contains X .
Note that in the figure, no supporting
hyperplane to X at x1 exists. On the
other hand, there are infinitely many
supporting hyperplanes at the point x2 .
x1
x2
X
The supporting hyperplane theorem
For any nonempty convex set P, and any x0 on the boundary of P, there exists a
supporting hyperplane to P at x0 (of course it need not be unique).
25 / 45
The projection theorem [2], pp. 704
Suppose that X ⊆ Rn is a closed convex (nonempty) set.
For every v ∈ Rn , there exists a unique vector x ∈ X that minimizes kx − vk2
over all x ∈ X . This vector is called the projection of v on X .
Given some v ∈ Rn , a vector x ∈ X is equal to the projection of v on X if and
only if
(y − x)T (v − x) ≤ 0, for all y ∈ X (see the figure).
Why do we need the assumption that X is a closed subset of Rn ?
v
v1
x1
x
X
x2
v2
x3
′
v2
y
v3
X
26 / 45
Convex functions
Definition (Jensen’s inequality)
A function f : Rn → R is convex if dom(f ) is a convex set and if for all
x1 , x2 ∈ dom(f ), and θ ∈ [0, 1], we have
f ((1 − θ)x1 + θx2 ) ≤ (1 − θ)f (x1 ) + θf (x2 ).
Geometric interpretation
If f is a convex function, then the line segment between (x1 , f (x1 )) and (x2 , f (x2 ))
lies above the graph of f .
θ = 0.7
f (x) = x2 + 0.1
1
(1 −
0.5
(x 1
θ )f
)+
x 2)
θf (
f (x3 )
0
x1
x3
x2
x3 = (1 − θ)x1 + θx2
−1
−0.5
0
x
0.5
1
27 / 45
A function f : Rn → R is called strictly convex if dom(f ) is a convex set and
f ((1 − θ)x1 + θx2 ) < (1 − θ)f (x1 ) + θf (x2 )
for all x1 , x2 ∈ dom(f ) and θ ∈ (0, 1).
A function f is concave if −f is convex. An affine function is both convex and
concave, hence it satisfies
f ((1 − θ)x1 + θx2 ) = (1 − θ)f (x1 ) + θf (x2 )
for all x1 , x2 ∈ dom(f ) and θ ∈ (0, 1). In fact, affine functions are the only functions
that are both convex and concave [3], pp. 15.
A function f is convex if and only if it is
convex when restricted to any line
intersecting dom(f ). For example, consider
the strictly convex quadratic function
depicted on the figure.
28 / 45
A function can be neither convex nor concave
1
f (x) = x3 + 0.1
0.5
x1
0
x2
−0.5
−1
−0.5
0
0.5
1
x
29 / 45
First-order (necessary and sufficient) condition
Assume that f is differentiable everywhere in dom(f ) (implication: dom(f ) is open).
f is convex if an only if dom(f ) is convex and
f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ),
(2)
holds for all x1 , x2 ∈ dom(f ). Since the RHS of the above inequality is the first-order
Taylor-series expansion of f (x) at point x1 in the direction of ∆x = x2 − x1 we can
rewrite (2) as
f (x1 + ∆x) ≥ f (x1 ) + ∇f (x1 )T ∆x.
f (x2 )
global underestimator of f
f (x1 ) + ∇f (x1 )(x2 − x1 )
f (x1 )
x1
∇f (x1 )
−1
x2
30 / 45
Some important points to remember
The inequality f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ) states that the first-order
Taylor approximation of a function is always a global underestimator of the
function. Important: this actually means that using only local information about
a convex function (i.e., its gradient at x1 ) we can derive global properties.
Conversely, if the first-order Taylor approximation of a function is always a global
underestimator of the function, then the function is convex.
Inequality (2) shows that if ∇f (x̃) = 0, then for all x ∈ dom(f ), f (x) ≥ f (x̃),
i.e., x̃ is a global minimizer of the convex function f .
If f is a strictly convex function, and if ∇f (x̃) = 0, then for all x ∈ dom(f ),
f (x) > f (x̃), i.e., x̃ is a unique global minimizer of f .
If dom(f ) is a convex subset of Rn , and f is a convex function, then
a local minimum of f is also a global minimum
in addition if f is strictly convex, then there exists at most one global minimum
The Jensen’s inequality can be easily extended to a convex combinations of more
than two points i.e.,
f (θ1 x1 + · · · + θk xk ) ≤ θ1 f (x1 ) + · · · + θk f (xk ),
θ1 , · · · , θk ≥ 0,
k
X
θi = 1.
i=1
31 / 45
A local minimum of f is also a global minimum - proof [2], pp. 703
If dom(f ) is a convex subset of Rn , and f is a convex function then,
a local minimum of f is also a global minimum
This can be proved by contradiction. Suppose that x⋆ is a local minimizer of f
that is not a global minimizer. Then, there must exist some x 6= x⋆ such that
f (x) < f (x⋆ ). Using the Jensen’s inequality we have
f ((1 − θ)x⋆ + θx) ≤ (1 − θ)f (x⋆ ) + θf (x) < f (x⋆ ),
However, this contradicts the assumption that
x⋆
for all θ ∈ (0, 1].
is a local minimizer.
in addition if f is strictly convex, then there exists at most one global minimum
Again we use a proof by contradiction. Suppose that two distinct global minima
x⋆1 and x⋆2 exist (f (x⋆1 ) = f (x⋆2 )). Then their average
1
1 ⋆
x + x⋆2
2 1
2
must belong to dom(f ) (since it is assumed to be convex). However, by the strict
convexity of f we have
1 ⋆
1
1
1
f
x1 + x⋆2 < f (x⋆1 ) + f (x⋆2 ) = f (x⋆1 ) = f (x⋆2 ).
2
2
2
2
But this is a contradiction, since x⋆1 and x⋆1 are assumed to be global minima.
32 / 45
Second-order (necessary and sufficient) conditions
Assume that f is twice differentiable everywhere in dom(f ) (i.e., dom(f ) is open).
f is convex if an only if dom(f ) is convex and its Hessian matrix ∇2 f (x) is positive
semidefinite for all x ∈ dom(f ).
If ∇2 f (x) is positive definite for all x ∈ dom(f ), then f is strictly convex. The
converse is not true, for example f (x) = x4 is strictly convex, but has a zero second
derivative at x = 0. For more details see [2]. pp. 693.
Example
A quadratic function f : Rn → R with dom(f ) = Rn given by
f (x) =
1 T
x Hx + xT g
2
is convex if and only if ∇2 f = H is positive semidefinite, and is strictly convex if and
only if H is positive definite. In the latter case, there is a unique global minimizer
given as the solution of
∇f (x) = Hx + g = 0.
33 / 45
Examples of convex/concave functions
Exponential. f (x) = epx is convex on R, for any p ∈ R.
Powers. f (x) = xp is convex on R+ for p ≥ 1 and concave for p ∈ [0, 1].
Powers of absolute value. f (x) = |x|p is convex on R for p ≥ 1.
Logarithm. f (x) = log(x) is concave on R++ .
Max function. f (x) = max{x1 , . . . , xn } is convex on Rn .
Norms. Every norm k·k : Rn → R is convex. For θ ∈ [0, 1] we have
k(1 − θ)x1 + θx2 k ≤ k(1 − θ)x1 k + kθx2 k = (1 − θ)kx1 k + θkx2 k.
|
{z
}
triangle inequality
Affine function. f (x) = aT x + b, with a ∈ Rn is both convex and concave.
Quadratic function. f (x) =
Examples of nonconvex functions
f (x) =
x21
−
f (x) = x1 x2
x22
1 T
x Hx
2
+ xT g, with H ∈ Sn
+ is convex.
Are the following functions convex?
f (x) = x21 + x22 − x1 x2
f (x) = x21 + x22 + 5x1 x2
34 / 45
Epigraph of a function
The graph of a function f : Rn → R is defined as
{(x, f (x)) : x ∈ dom(f )} ⊆ Rn+1 .
The epigraph of a function f : Rn → R is defined as
epi(f ) = {(x, t) : t ≥ f (x), x ∈ dom(f )} ⊆ Rn+1 .
“Epi” means “above” so epi(f ) is the set of points lying on or above the graph of f .
epi(f )
(x, f (x))
x
dom(f )
35 / 45
Convex function and sets
Link between convex functions and convex sets
A function if convex if and only if its epigraph is a convex set.
epi(f )
(x2 , t)
(x1 , f (x1 ))
g
ortin
supp
ane
erpl
hyp
x1
dom(f )
∇f (x1 )
−1
x2
Since in the definition of epigraph t ≥ f (x) for any x ∈ dom(f ), we have
t ≥ f (x2 ) ≥ f (x1 ) + ∇f (x1 )T (x2 − x1 ),
for any x1 , x2 ∈ dom(f ). The above relation can be expressed as follows
T ∇f (x1 )
x2
x1
(x2 , t) ∈ epi(f ) ⇒
−
≤ 0,
−1
t
f (x1 )
clearly showing that (∇f (x1 ), −1) supports epi(f ) at (x1 , f (x1 )).
36 / 45
Why assume that dom(f ) is convex?
epi(f )
epi(f )
dom(f )
dom(f )
37 / 45
Sublevel sets of a convex function
Sublevel sets
The sublevel set of a function f : Rn → R corresponding to a real value c is the set of
points {x ∈ dom(f ) : f (x) ≤ c}.
Superlevel sets
The superlevel set of a function f : Rn → R corresponding to a real value c is the set
of points {x ∈ dom(f ) : f (x) ≥ c}.
Convex functions
The sublevel sets of a convex function are convex.
The converse is not true. Even if a function has all its sublevel sets convex, it need
not be convex. For example consider f (x) = −ex which is a strictly concave function,
however, all its sublevel sets are convex (in fact they are rays).
Concave functions
The superlevel sets of a concave function are convex.
38 / 45
Verifying convexity of a function
We can verify that a given function f is convex by
using the definition (Jensen’s inequality)
using the first-order condition (if f is differentiable)
using the second-order condition (if f is twice differentiable)
restricting a convex function to a line. Recall that
f is convex ⇔ f (x0 + t∆x) is convex in t ∈ R for all x0 and ∆x
showing that f is obtained through operations preserving convexity.
Example of simple operations
A positive multiple of a convex function is convex
f is convex, θ ≥ 0 ⇒ θf is convex
Sums of convex functions is convex
f1 , f2 are convex ⇒ f1 + f2 is convex
Nonnegative weighted sum (i.e., conic combination)
f1 , . . . , fk are convex, θ1 , . . . , θk ≥ 0 ⇒ θ1 f1 + · · · + θk fk is convex
39 / 45
Example (sum of convex functions)
f1 + f2
f2 (x) = x2 + x
f1 (x) = x2
40 / 45
Pointwise maximum
f2 (x) = x2 + x
epi(max{f1 (x), f2 (x)})
f1 (x) = x2
If f1 and f2 are convex functions, their pointwise maximum f defined by
f (x) = max{f1 (x), f2 (x)},
with dom(f ) = dom(f1 ) ∩ dom(f2 ) is convex . The epigraph of f (x) is given by the
intersection of the epigraphs of f1 (x) and f2 (x), i.e., epi(f ) = epi(f1 ) ∩ epi(f2 ). This
property extends to the pointwise maximum of k convex functions [1], pp. 80.
41 / 45
Example (pointwise maximum of affine functions)
8
T
a1
x+
6
b1
maxi=1,··· ,m (aT
i x + bi )
4
2
0
T x
a2
−2
+b
aT
2
3
x+
b3
−4
mini=1,··· ,m (aT
i x + bi )
−6
−8
−10
−8
−6
−4
−2
0
2
4
6
8
10
T
The function f (x) = max{aT
1 x + b1 , . . . , am x + bm } is convex and epi(f ) is a
polyhedron. f (x) is called a piecewise linear convex function.
T
The function min{aT
1 x + b1 , . . . , am x + bm } is concave. The set of points lying
below its graph (i.e., its hypograph) is a polyhedron.
42 / 45
Distance to a convex set
Consider again the projection of v ∈ Rn on a closed convex (nonempty) set X .
Let us denote the operator projecting v on X by x = projX (v). The following
distance function is convex [4], pp. 67 (see [1], pp. 88 for a more general case)
distX (v) = kv − projX (v)k2 .
This can be demonstrated as follows. Let v 1 , v2 ∈ Rn and θ ∈ [0, 1], then
distX ((1 − θ)v 1 + θv2 ) = k(1 − θ)v 1 + θv2 − projX ((1 − θ)v 1 + θv 2 )k2
see figure →
triangle inequality →
≤ k(1 − θ)v 1 + θv2 − (1 − θ)projX (v 1 ) − θprojX v 2 )k2
≤ (1 − θ)kv 1 − projX (v 1 )k2 + θkθv 2 − projX v 2 )k2
= (1 − θ)distX (v 1 ) + θdistX (v 2 ).
v1
x1
P
x3
v3
x2
v2
43 / 45
Composition with an affine mapping
Let g : Rn → Rm be an affine function, i.e., g(x) = Ax + b for some A ∈ Rm×n and
b ∈ Rm and let f : Rm → R. The composition of the two functions
f (g(x)) = f (Ax + b)
has a domain {x : Ax + b ∈ dom(f )}. Then,
f is convex ⇒ f (g(x)) is convex
f is concave ⇒ f (g(x)) is concave
Example
Consider the logarithm function log : R → R with domain dom(log) = R++ . This is a
concave function. Let a ∈ Rn and b ∈ R. Then,
the function log(b − aT x) is concave with domain {x : aT x < b}
the function − log(b − aT x) is convex
P
the function ki=1 − log(bi − aT
i x) is convex because it is a sum of k convex
functions
There are other operations that preserve convexity and many other interesting
examples of convex functions ... for more details see [1].
44 / 45
[1]
S. Boyd, and L. Vandenberghe, “Convex Optimization,” Cambridge, 2004.
[2]
D. P. Bertsekas, “Nonlinear Programming,” Athena Scientific, (3rd print) 2008.
[3]
D. Bertsimas, and J. N. Tsitsiklis, “Introduction to Linear Optimization,” Athena
Scientific, 1997.
[4]
N. Andréasson, A. Evgrafov, and M. Patriksson, “An Introduction to Continuous
Optimization: Foundations and Fundamental Algorithms,” 2005.
45 / 45