Optimization Theory Petr Lachout

Petr Lachout November 28, 2016:959
Optimization Theory
Petr Lachout
- working text for the lecture
“NMSA403 Optimization Theory (Teorie optimalizace)”
Version - November 27, 2016
1
2
TO November 28, 2016:959
Contents
1 Geometry in Rn
1.1 Euclidean space Rn . . . . . . . . . . . .
1.2 Convex sets . . . . . . . . . . . . . . . .
1.2.1 Particular convex sets . . . . . .
1.2.2 Directions of convex sets . . . . .
1.3 Extremal points and extremal directions
1.4 Separation of sets . . . . . . . . . . . . .
1.5 Results from linear algebra . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
8
10
11
12
20
2 Functions
2.1 General notions . . . . . . . . . . . . . . .
2.2 Differentiability of a function . . . . . . .
2.2.1 On the real line . . . . . . . . . . .
2.2.2 Several arguments . . . . . . . . . .
2.2.3 Vector valued functions . . . . . . .
2.2.4 Chain rule . . . . . . . . . . . . . .
2.2.5 Second derivative . . . . . . . . . .
2.2.6 Arguments for differentiability . . .
2.3 Convex functions . . . . . . . . . . . . . .
2.3.1 Definition of a convex function . . .
2.3.2 Convex functions of one variable . .
2.3.3 Convex function of several variables
2.3.4 Vector valued convex functions . .
2.3.5 Generalization of convex functions .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
27
27
27
30
31
32
35
37
37
42
44
53
54
.
.
.
.
57
57
60
62
64
3 Mathematical programming
3.1 Convex objective function
3.2 Concave objective function
3.3 Unconstrained problem . .
3.4 General objective function
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
TO November 28, 2016:959
4 Nonlinear programming - inequalities
4.1 Auxiliary results and observations . . . . .
4.2 General results . . . . . . . . . . . . . . .
4.3 Karush-Kuhn-Tucker optimality condition
4.4 Constraint qualifications . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Nonlinear programing - equalities and inequalities
5.1 General constraints . . . . . . . . . . . . . . . . . .
5.2 Karush-Kuhn-Tucker optimality condition . . . . .
5.3 Constraint qualifications . . . . . . . . . . . . . . .
5.4 Second order optimality conditions . . . . . . . . .
6 Polar of a cone and normal cone
6.1 Polar of a cone . . . . . . . . .
6.2 Clarke’s regularity . . . . . . .
6.3 Mathematical program . . . . .
6.4 Lagrange coefficients . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
73
74
77
81
86
.
.
.
.
89
91
94
98
99
.
.
.
.
105
105
106
115
117
7 Appendix
119
7.1 Basic on Cone Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Petr Lachout November 28, 2016:959
Used symbols
int (A) . . . . . . interior of A
clo (A) . . . . . . closure of A
∂ (A) . . . . . . boundary of A
Fin (A) . . . . . . set of all finite subsets of A
5
6
TO November 28, 2016:959
Chapter 1
Geometry in Rn
1.1
Euclidean space Rn
We will consider a finite dimensional Euclidean space Rn containing column vectors of
reals. The space is equipped with scalar product (or, inner product) (cz. skalárnı́ součin)
h x, y i =
n
X
xi y i .
(1.1)
i=1
We accept this notation to recall general situation based on duality between spaces X
and
X ∗ = {f : f : X → R is a continuous linear function} ,
where scalar product h x, x∗ i, x ∈ X , x∗ ∈ X ∗ means evaluation of a function at a given
point, i.e. h x, x∗ i = x∗ (x).
Duality of a finite dimensional Euclidean space is simple. The space is finite dimensional Hilbert space and Hilbert spaces are self-dual, i.e. dual space can be identified
with the original space.
p
Topology of the space is given by the norm kxk = h x, x i.
In some places we employ transposition of vectors or matrices, denoted by x> , A> .
We distinguish two types of linear functions: a linear function (cz. lineárnı́ funkce)
fulfilling f (ax + by) = af (x) + bf (y), and, an affine linear function (cz. afinně lineárnı́
funkce) fulfilling f (ax + (1 − a)y) = af (x) + (1 − a)f (y). These notions are closely
related, each affine linear function is a linear function plus a real constant.
1.2
Convex sets
We recall definition of convex sets and their basic properties. Properties are stated
without proofs; interested reader can find proofs and additional information at [8],
7
8
TO November 28, 2016:959
[LachKonv].
Definition 1.1 We call a set A ⊂ Rn to be convex (cz. konvexnı́) if for any couple of
points x, y ∈ A and 0 < λ < 1 we have λx + (1 − λ)y ∈ A.
Note, that empty set fulfills the definition and, therefore, is a convex set.
Definition of convex sets can be expressed verbally: “Each convex linear combination
of two points from A belongs again to A.” or geometrically: “Whole segment between
two points from A belongs to A.”
Immediately, we observe convex set contains each convex linear combination of finite
number of points of it.
n
Lemma 1.2 Let A ⊂
be a convex set.PThen, for each k ∈ N, a1 , a2 , . . . , ak ∈ A,
PR
k
λ1 , λ2 , . . . , λk ∈ [0, 1], i=1 λi = 1 we have ki=1 λi ai ∈ A.
Some of basic operations preserves set convexity. Consider arithmetic operations.
Lemma 1.3 If A, B ⊂ Rn are convex sets and α, β ∈ R, then
αA + βB = {αa + βb : a ∈ A, b ∈ B}
is a convex set. Particularly, αA, A + B, A − B (! Different to set difference A \ B!)
are convex sets.
Consider set operations. Intersection preserves convexity.
Lemma 1.4TLet I =
6 ∅ be an index set and for each i ∈ I a convex set Ai ⊂ Rn be
given. Then i∈I Ai is a convex set.
Unfortunately, results of the other set operations, i.e. union, complement and set
difference, are typically nonconvex.
Consider topological operations.
Lemma 1.5 If A ⊂ Rn is a convex set, then clo (A), int (A), rint (A) are convex sets.
Unfortunately, boundary ∂ (A) is typically nonconvex.
1.2.1
Particular convex sets
Remember other helpful notions.
Definition 1.6 If A ⊂ Rn , then symbol conv (A) denotes the smallest convex set containing set A.
Set conv (A) is called convex hull of A (cz. konvexnı́ obal).
9
Petr Lachout November 28, 2016:959
Convex hull can be constructed.
Lemma 1.7 If A ⊂ Rn then
(
)
X
X
conv (A) =
λ(s)s : λ(s) ≥ 0 ∀ s ∈ I ,
λ(s) = 1 , I ∈ Fin (A) .
s∈I
s∈I
In n-dimensional Euclidean space convex linear combinations of at most n + 1 points
from the set are sufficient.
Theorem 1.8 (Caratheodory) : Pro A ⊂ Rn we have
conv (A) =
(
)
X
X
λ(s)s : λ(s) ≥ 0 ∀ s ∈ I ,
λ(s) = 1 , I ⊂ A , card (I) ≤ n + 1 .
s∈I
s∈I
Important role is played by polyhedral sets.
Definition 1.9 A set A ⊂ Rn is called
i) a convex polyhedral set (cz. konvexnı́ polyedrická množina) if there is a finite
T
number of closed halfspaces H1 , H2 , . . . , Hk such that A = ki=1 Hi .
ii) convex polyhedron (cz. konvexnı́ polyedr), if there is a finite set S ⊂ Rn such
that A = conv (S).
Let us note, each closed halfspace can be expressed as {x ∈ Rn : h γ, x i ≥ b} or
{x ∈ Rn : h γ, x i ≤ b}, where γ ∈ Rn , γ 6= 0, b ∈ R.
Let us note, there is a family of locally simplicial sets, e.g. union of two convex
polyhedral sets is a locally simplicial set. Such sets can be considered as “nonconvex
polyhedral sets”.
Theorem 1.10 : Convex polyhedron is a compact.
Theorem 1.11 : Let A ⊂ Rn , then A is a convex polyhedron if and only if A is a
bounded convex polyhedral set.
Recall definition and properties of cones.
Definition 1.12 A set A ⊂ Rn is called
i) a cone (with vertex at origin) (cz. kužel), if 0 ∈ A and for each point s ∈ A and
α > 0 we have αs ∈ A.
10
TO November 28, 2016:959
ii) a cone with vertex at point p ∈ Rn , whenever A − p is a cone.
iii) a convex cone (cz. konvexnı́ kužel), whenever A is a cone and a convex set.
Definition 1.13 If A ⊂ Rn , pos (A) denotes the smallest convex cone containing set
A.
Set pos (A) is called a nonnegative hull of the set A (cz. nezáporný obal).
Nonnegative hull of a set can be constructed.
Lemma 1.14 If A ⊂ Rn then
(
)
X
pos (A) =
λ(s)s : λ(s) ≥ 0 ∀ s ∈ I , I ⊂ A is finite .
s∈I
Working in a finite dimension, construction can be simplified.
Theorem 1.15 (Caratheodory) : If A ⊂ Rn then
(
)
X
pos (A) =
λ(s)s : λ(s) ≥ 0 ∀ s ∈ I , I ⊂ A , card (I) ≤ n .
s∈I
Finally, we have to introduced a polyhedral cone.
Definition 1.16 A set A ⊂ Rn is called convex polyhedral cone (cz. konvexnı́ polyedrický
kužel), if there is a finite set S ⊂ Rn such that A = pos (S).
Theorem 1.17 : Let A ⊂ Rn contains no line. Then, A is a convex polyhedral cone if
and only if A is a cone and a convex polyhedral set.
1.2.2
Directions of convex sets
Directions in which a convex set is unbounded are important for optimization.
Definition 1.18 Let A ⊂ Rn be a nonempty set. We call s ∈ Rn a direction of A (cz.
směr), whenever there is a point a ∈ A such that for each α > 0 we have a + αs ∈ A.
Set of all directions of A will be denoted by direct (A).
We understand direct (∅) = {0}.
Directions of sets possess following properties.
Lemma 1.19 If A ⊂ Rn then direct (A) is a cone.
Lemma 1.20 If A ⊂ Rn is a convex set then direct (A) is a convex cone.
11
Petr Lachout November 28, 2016:959
Lemma 1.21 If A ⊂ Rn is a closed convex set, then direct (A) is a closed convex cone.
Lemma 1.22 If A ⊂ Rn is a convex set, s ∈ direct (A) and a ∈ rint (A), then for each
α > 0 we have a + αs ∈ A.
Lemma 1.23 If A ⊂ Rn is a closed convex set, s ∈ direct (A) and a ∈ A, then for each
α > 0 is a a + αs ∈ A.
Closed convex set can be decomposed.
Theorem 1.24 : A closed convex set A ⊂ Rn can be decomposed as algebraic sums of
sets
A = L (A) + K (A) + btt (A) ,
A = L (A) + K (A) + conv (btt (A)) ,
(1.2)
(1.3)
where
L (A)
L (A)⊥
D
K (A)
btt (A)
1.3
= direct (A) ∩ −direct (A) is a subspace of Rn ,
is a subspace of Rn perpendicular at L (A) and Rn = L (A) ⊕ L (A)⊥ ,
= A ∩ L (A)⊥ is a projection A at L (A)⊥ ,
= direct (D) = direct (A) ∩ L (A)⊥ is a projection direct (A) at L (A)⊥ ,
= {x ∈ D : ∀s ∈ K (A) , s 6= 0 we have x − s 6∈ D} .
Extremal points and extremal directions
Convex polyhedral sets can be fully characterized by their extremal points and extremal
directions.
Definition 1.25 Let A ⊂ Rn be a nonempty convex set. We call a ∈ A an extremal point of A
(cz. krajnı́ bod), whenever there are no x, y ∈ A, x 6= y and 0 < λ < 1 such that
a = λx + (1 − λ)y.
The set of all extremal points of A will be denoted by ext (A).
Definition 1.26 Let A ⊂ Rn be a nonempty convex set. We call s ∈ direct (A)
an extremal direction of A (cz. krajnı́ směr), whenever s 6= 0 and there are no x, y ∈
direct (A), x, y 6∈ pos ({s}) and λ > 0, ϕ > 0 such that s = λx + ϕy.
We denote
extd (A) = {s ∈ direct (A) : ksk = 1, s is an extremal direction of A} .
Hence, we have a characterization.
12
TO November 28, 2016:959
Lemma 1.27 Let A ⊂ Rn be a nonempty convex set. Then, A is a convex polyhedron
if and only if ext (A) is a finite set and A = conv (ext (A)).
Lemma 1.28 Let A ⊂ Rn be a nonempty convex set containing no line. Then, A is a
convex polyhedral cone if and only if extd (A) is a finite set and A = pos (extd (A)).
Lemma 1.29 Let A ⊂ Rn be a nonempty convex set containing no line. Then, A is a
convex polyhedral set if and only if ext (A), extd (A) are finite sets and
A = conv (ext (A)) + pos (extd (A)).
1.4
Separation of sets
Developing of mathematical programing theory, description, formulation and justification of methodology convenient for solving of optimization programs need Hahn-Banach
theorem, see [4] chapter 2. Actually, we do not need this theorem in full general setting.
For our purposes, theorem on separation of sets in finite dimension is sufficient.
Let us begin with a geometrical law for parallelogram.
Lemma 1.30 If x, y ∈ Rn , then we have kx + yk2 + kx − yk2 = 2 kxk2 + 2 kyk2 .
Proof: Formula is straightforward since
kx + yk2 + kx − yk2 = kxk2 + 2 h x, y i + kyk2 + kxk2 − 2 h x, y i + kyk2
= 2 kxk2 + 2 kyk2 .
Q.E.D.
Theory is based on theorem on projection of a point on a convex closed set.
Theorem 1.31 (on projection) : Let K ⊂ Rn be a convex closed set, K 6= ∅ and
x ∈ Rn be a given point. Then, there is the uniquely determined point x̂ ∈ K such that
kx − x̂k = min {kx − yk : y ∈ K}.
The point is uniquely determined by a condition
h x − x̂, y − x̂ i ≤ 0 for each y ∈ K.
Point x̂ is called the projection of the point x on set K (cz. projekce na množinu).
Proof: Begin our proof with denotation ∆ = inf {kx − yk : y ∈ K}.
Evidently, 0 ≤ ∆ < +∞, since K is non-empty.
(1.4)
Petr Lachout November 28, 2016:959
13
1. Existence
Take a sequence yi ∈ K, i ∈ N such that limi→+∞ kx − yi k = ∆.
Using lemma 1.30, we are receiving an equality for each couple of indexes i, j ∈ N
kyi − yj k2 = k(x − yj ) − (x − yi )k2
= 2 kx − yi k2 + 2 kx − yj k2 − k2x − (yi + yj )k2
2
1
2
2
= 2 kx − yi k + 2 kx − yj k − 4 x − (yi + yj )
.
2
We know 21 (yi + yj ) ∈ K because K is convex. Hence,
kyi − yj k2 ≤ 2 kx − yi k2 + 2 kx − yj k2 − 4∆2 −→ 0 whenever i, j → +∞.
The sequence yi , i ∈ N is Cauchy in Rn , therefore, there is x̂ ∈ Rn such that
yi −→ x̂. Since K is a closed set, we have x̂ ∈ K. Moreover, kx − x̂k = ∆ because
norm is continuous.
2. Uniqueness.
Let x∗ , x∗∗ ∈ K and kx − x∗ k = kx − x∗∗ k = ∆.
Using similar arguments as in the first part of the proof, we are receiving:
2
1 ∗
∗∗ ∗
∗∗ 2
∗ 2
∗∗ 2
kx − x k = 2 kx − x k + 2 kx − x k − 4 x − (x + x ) ≤
2
≤ 2 kx − x∗ k2 + 2 kx − x∗∗ k2 − 4∆2 = 0.
Consequently, x∗ = x∗∗ .
3. Equivalent characterization using condition (1.4).
(a) Let x̂ ∈ K be with kx − x̂k = ∆, and, assume the condition (1.4) is not valid.
Then, there is y ∈ K such that h x − x̂, y − x̂ i > 0 (the angle is acute).
For α ∈ R, we denote z(α) = (1 − α)x̂ + αy. Then, a function
f (α) = kx − z(α)k2 = k(1 − α)(x − x̂) + α(x − y)k2 =
= (1 − α)2 kx − x̂k2 + 2α(1 − α) h x − x̂, x − y i + α2 kx − yk2
possesses a first derivative
f 0 (α) =
= −2(1 − α) kx − x̂k2 + 2(1 − 2α) h x − x̂, x − y i + 2α kx − yk2 .
14
TO November 28, 2016:959
Plugging α = 0, we obtain
f (0) = ∆2 ,
f 0 (0) = −2 kx − x̂k2 + 2 h x − x̂, x − y i
= −2 h x − x̂, x − x̂ i + 2 h x − x̂, x − y i
= 2 h x − x̂, x̂ − y i = −2 h x − x̂, y − x̂ i < 0.
Consequently, f (α) < ∆2 in a right neighborhood of zero. That is a contradiction with the definition of constant ∆.
(b) Let the condition (1.4) be fulfilled for x̂ ∈ K and all y ∈ K. Then,
kx − yk2 = k(x − x̂) + (x̂ − y)k2
= kx − x̂k2 + 2 h x − x̂, x̂ − y i + kx̂ − yk2 ≥ kx − x̂k2 .
Consequently, kx − x̂k = ∆.
Q.E.D.
Geometrical meaning of the condition (1.4) is that the angle between x − x̂ and y − x̂
is not acute for any y ∈ K (the angle is right or obtuse).
Theorem 1.32 (separation of a point and a set) : Let K ⊂ Rn , K 6= ∅ be a convex set
and x 6∈ clo (K). Then, there is γ ∈ Rn , γ 6= 0 such that
inf {h γ, y i : y ∈ K} > h γ, x i .
For example one can choose γ = x̂ − x, where x̂ is the projection of x on the set
clo (K). Hence,
inf {h γ, y i : y ∈ K} = min {h γ, y i : y ∈ clo (K)} = h γ, x̂ i > h γ, x i .
Proof: Accordingly to Theorem 1.31, there is a point x̂ ∈ clo (K) uniquely determined
by property (1.4), i.e.
h x − x̂, y − x̂ i ≤ 0 for all y ∈ clo (K) .
Set γ = x̂ − x. Then,
h γ, y − x̂ i = − h x − x̂, y − x̂ i ≥ 0 for each y ∈ K,
h γ, x̂ − x i = kγk2 = h x̂ − x, x̂ − x i > 0 since γ 6= 0.
Consequently,
h γ, y i ≥ h γ, x̂ i > h γ, x i for each y ∈ K.
15
Petr Lachout November 28, 2016:959
Q.E.D.
Theorem 1.32 is saying there is a closed halfspace H = {ξ ∈ Rn : h γ, ξ i ≥ c} such
that K ⊂ H and distance of x from H is positive.
Theorem 1.33 (supporting hyperplane) : Let K ⊂ Rn , K =
6 ∅ be a convex set and
n
x ∈ ∂ (K). Then, there is γ ∈ R , γ 6= 0 such that inf {h γ, ξ i : ξ ∈ K} = h γ, x i.
Proof: Let x ∈ ∂ (K).
1. There is a sequence of points yk 6∈ clo (K), k ∈ N such that yk → x.
According to Theorem 1.32, for each k ∈ N there is γk ∈ Rn , γk 6= 0 such that
inf {h γk , ξ i : ξ ∈ K} > h γk , yk i.
Normalized vectors βk = kγγkk k , k ∈ N belong in a compact sphere
S = {x ∈ Rn : kxk = 1}.
Hence, we can select a convergent subsequence βkm → γ ∈ S letting m → +∞.
Then, for each ξ ∈ K we receive:
h γ, ξ i = lim h βkm , ξ i ≥ lim h βkm , ykm i = h γ, x i .
m→+∞
m→+∞
We have shown inf {h γ, ξ i : ξ ∈ K} ≥ h γ, x i.
2. There is a sequence of points zk ∈ rint (K), k ∈ N such that zk → x.
Then, h γ, zk i → h γ, x i.
We derived inf {h γ, ξ i : ξ ∈ K} = h γ, x i, therefore, required statement is proved.
Q.E.D.
Theorem 1.33 is saying there is a hyperplane L = {ξ ∈ Rn : h γ, ξ i = c} and its
closed halfspace H = {ξ ∈ Rn : h γ, ξ i ≥ c} such that x ∈ L ∩ clo (K) and K ⊂ H.
Hyperplane L is called a supporting hyperplane of K at x (cz. opěrná nadrovina) and
halfspace H is called a supporting halfspace of K at x (cz. opěrný poloprostor).
Theorem 1.32 is giving a nice characterization of closed convex sets.
Theorem 1.34 (characterization of closedTconvex sets) : Let K ⊂ Rn , K 6= ∅. Then,
K is a closed convex set if and only if K = H∈HK H, where
HK = {H ⊂ Rn : H is a closed halfspace containing K} .
16
TO November 28, 2016:959
Proof: Closed convex halfspaces are closed convex sets. Hence, each their intersection
is a closed convex set. We need to show opposite implication, only.
Assume K isTa closed convex set.
Denote C = H∈HK H.
1. Evidently C ⊃ K.
2. Take y 6∈ K.
Accordingly to Theorem 1.32 there is a vector γ ∈ Rn , γ 6= 0 such that
inf {h γ, x i : x ∈ K} > h γ, y i.
Denoting ∆ := inf {h γ, x i : x ∈ K}, we obtain a closed halfspace
H := {x ∈ Rn : h γ, x i ≥ ∆} with properties K ⊂ H and y 6∈ H.
Consequently, y 6∈ C and, therefore, C ⊂ K.
Q.E.D.
The characterization can be improved.
Theorem 1.35 (characterization of closed convex setsT
2) : Let K ⊂ Rn be a non-empty
set. Then, K is a closed convex set if and only if K = H∈SK H, where
SK = {H ⊂ Rn : H is a supporting halfspace of K} .
Proof:
We have to show implication from left-hand side
T to right-hand side, only.
Assume for that K is a closed convex set and denote C = H∈SK H.
1. Evidently C ⊃ K.
2. Take y 6∈ K.
Set γ = ŷ − y, where ŷ is the projection of y to K.
Accordingly to Theorem 1.32 or Theorem 1.33, we have
min {h γ, x i : x ∈ K} = h γ, ŷ i > h γ, y i .
Then, H := {x ∈ Rn : h γ, x i ≥ h γ, ŷ i} is a supporting halfspace of K and y 6∈ H.
Consequently, y 6∈ C and, therefore, C ⊂ K.
We have derived K = C.
Q.E.D.
17
Petr Lachout November 28, 2016:959
Theorems 1.32 and 1.33 solve fully separation of a point and a convex set. Now, we
start to consider separation of two sets. Two cases will be distinguished.
Definition 1.36 Let A, B ⊂ Rn be non-empty sets.
i) Sets A, B are properly separable (cz. neostře oddělitelné), if there is γ ∈ Rn ,
γ 6= 0 such that
h γ, a i ≥ h γ, b i for all a ∈ A , b ∈ B.
(1.5)
ii) Sets A, B are strictly separable (cz. ostře oddělitelné), if there is γ ∈ Rn , γ 6= 0
and c, d ∈ R such that
h γ, a i ≥ c > d ≥ h γ, b i for all a ∈ A , b ∈ B.
(1.6)
Introduced separations are simply related.
Lemma 1.37 If two sets are strictly separable, then they are properly separable.
Proof: Implication is evident.
Q.E.D.
Separation of two sets can be expressed in an equivalent way.
Lemma 1.38 Non-empty sets A, B ⊂ Rn are properly separable if and only if there is
γ ∈ Rn , γ 6= 0 such that
inf {h γ, a i : a ∈ A} ≥ sup {h γ, b i : b ∈ B} .
(1.7)
Proof: Property (1.7) is an equivalent rewriting of (1.5).
Q.E.D.
Lemma 1.39 Non-empty sets A, B ⊂ Rn are strictly separable if and only if there is
γ ∈ Rn , γ 6= 0 such that
inf {h γ, a i : a ∈ A} > sup {h γ, b i : b ∈ B} .
Proof: Property (1.6) implies (1.8).
If (1.8) is in power, it is sufficient to set
c := inf {h γ, a i : a ∈ A} , d := sup {h γ, b i : b ∈ B}
and we are receiving (1.6).
(1.8)
18
TO November 28, 2016:959
Q.E.D.
Separation of sets is equivalent with separation of their convex hulls.
Lemma 1.40 Let A, B ⊂ Rn be non-empty sets. Then, we have:
• Sets A, B are properly separable if and only if conv (A), conv (B) are properly
separable.
• Sets A, B are strictly separable if and only if conv (A), conv (B) are strictly separable.
Proof: Equivalences are evident, since, for each γ ∈ Rn and C ⊂ Rn we have
inf {h γ, c i : c ∈ C} = inf {h γ, c i : c ∈ conv (C)} .
Q.E.D.
In the following case, we are able strictly separate. given sets.
Theorem 1.41 (strict separation of convex sets) : Let A ⊂ Rn be a non-empty closed
convex set and B ⊂ Rn be a non-empty compact convex set. If A ∩ B = ∅, then A, B
can be strictly separated.
Proof: Set
K = A − B = {a − b : a ∈ A, b ∈ B} .
1. We know, the set K is non-empty and convex.
2. Also, 0 6∈ K, since A ∩ B = ∅.
3. We need to show closedness of K.
Let yi ∈ K, i ∈ N be such that yi → ŷ ∈ Rn if i → +∞.
According to definition of K, for each i ∈ N there are ai ∈ A, bi ∈ B such that
y i = ai − b i .
Since B is a compact, we can select a subsequence bik , k ∈ N such that bik → b̂ ∈ B
if k → +∞.
Then, aik = yik + bik → â = ŷ + b̂ ∈ Rn if k → +∞.
Set A is closed, therefore, â ∈ A.
We have found ŷ = â − b̂ ∈ K and, consequently, K is a closed set.
19
Petr Lachout November 28, 2016:959
The set K and the point 0 fulfill assumptions of Theorem 1.32. Then, there is γ ∈ Rn ,
γ 6= 0 such that inf {h γ, x i : x ∈ K} > h γ, 0 i = 0.
According to the definition of K, we have
0 < inf {h γ, x i : x ∈ K} = inf {h γ, a − b i : a ∈ A , b ∈ B} =
= inf {h γ, a i : a ∈ A} − sup {h γ, b i : b ∈ B} .
Hence, inf {h γ, a i : a ∈ A} > sup {h γ, b i : b ∈ B}, which means strict separability of
sets A and B, according to Lemma 1.39.
Q.E.D.
Also, there is a nice case in which given sets can be properly separated.
Theorem 1.42 (proper separation of convex sets) : Let A, B ⊂ Rn be non-empty
convex sets. If rint (A) ∩ rint (B) = ∅, then A, B can be properly separated.
Proof: Set
K = A − B = {a − b : a ∈ A, b ∈ B} .
1. Set K is non-empty and convex.
2. We need to show 0 6∈ rint (K).
Without any loss of generality we will work in the space Aff (K), where
rint (K) = int (K).
Assume 0 ∈ int (K).
Then, there is x̄ ∈ A ∩ B and ε > 0 such that Uε (0) ⊂ K.
Sets are convex, therefore, there are points a1 ∈ rint (A) and b1 ∈ rint (B) such
that ka1 − x̄k < 2ε , kb1 − x̄k < 2ε .
Then, a1 − b1 , b1 − a1 ∈ Uε (0) ⊂ K.
Then, there are a2 ∈ A and b2 ∈ B such that a2 − b2 = b1 − a1 .
1
2
1
2
After this, the point a +a
= b +b
∈ rint (A) ∩ rint (B), which is a contradiction
2
2
with the assumption that the intersection of relative interiors of considered sets is
empty.
Consequently, the set K and the point 0 can be properly separated; either, according to
Theorem 1.32, if 0 6∈ clo (K), or, according to Theorem 1.33, if 0 ∈ ∂ (K).
Thus, there is γ ∈ Rn , γ 6= 0 such that inf {h γ, x i : x ∈ K} ≥ h γ, 0 i = 0.
20
TO November 28, 2016:959
Applying the definition of set K, we are receiving
0 ≤ inf {h γ, x i : x ∈ K} = inf {h γ, a − b i : a ∈ A , b ∈ B}
= inf {h γ, a i : a ∈ A} − sup {h γ, b i : b ∈ B} .
Therefore, inf {h γ, a i : a ∈ A} ≥ sup {h γ, b i : b ∈ B}, which means proper separability
of A, B, according to Lemma 1.38.
Q.E.D.
1.5
Results from linear algebra
Lemma 1.43 Let G ⊂ Rn be open, 0 ∈ G and x ∈ Rn .
If h y, x i ≥ 0 for each y ∈ G then x = 0.
Proof: Since G ⊂ Rn is open and 0 ∈ G, there is δ > 0 such that −δx ∈ G.
Hence, 0 ≤ h x, −δx i = −δ h x, x i = −δ kxk2 .
Consequently, x = 0.
Q.E.D.
Lemma 1.44 Let G ⊂ Rn be open, 0 ∈ G and x ∈ Rn .
If h y, x i ≥ 0 for each y ≥ 0, y ∈ G then x ≥ 0.
Proof: Define ξ ∈ Rn such that for i ∈ {1, 2, . . . , n} we set ξi = max{0, −xi }.
Since G ⊂ Rn is open and 0 ∈ G, there is δ > 0 such that δξ ∈ G.
Moreover, δξ ≥ 0,P
therefore,
0 ≤ h x, δξ i = −δ xi <0 x2i = −δ kξk2 .
Thus, ξ = 0 and, consequently, x ≥ 0.
Q.E.D.
Lemma 1.45 Let A ∈ Rn×n be a symmetric matrix.
If h y, Ay i = 0 for each y ≥ 0, y ∈ Rn then A = 0.
Proof:
1. Take i ∈ {1, 2, . . . , n}, then
0 = h ei:n , Aei:n i = Ai,i .
21
Petr Lachout November 28, 2016:959
2. Take i, j ∈ {1, 2, . . . , n}, i 6= j, then
0 = h ei:n + ej:n , A (ei:n + ej:n ) i
= h ei:n , Aei:n i + h ei:n , Aej:n i + h ej:n , Aei:n i + h ej:n , Aej:n i
= Ai,i + Ai,j + Aj,i + Aj,j = 2Ai,j ,
because of symmetry.
We have proved A = 0.
Q.E.D.
Theorem 1.46 (Farkas) : Let A ∈ Rm×n be a matrix and b ∈ Rm be a vector. Then, an
equation Ax = b possesses a non-negative solution if and only if for all u ∈ Rm fulfilling
A> u ≥ 0 we have h b, u i ≥ 0.
Proof:
1. If x ≥ 0, Ax = b, u ∈ Rm , A> u ≥ 0, then we have
h b, u i = h Ax, u i = x, A> u ≥ 0,
since scalar product of non-negative vectors is non-negative.
2. Assume Ax = b possesses no non-negative solution.
Denote K := {Ax : x ≥ 0}.
Then, b 6∈ K and K is a non-empty convex closed cone. Hence, assumptions of
Theorem 1.32 are fulfilled and there is u ∈ Rm , u 6= 0 such that
h u, b i < inf {h u, y i : y ∈ K}.
(a) Choosing x = 0 we receive 0 = A0 ∈ K. Consequently, h u, b i < 0.
(b) Take y ∈ K and ξ > 0. Then, ξy ∈ K and we have
h u, b i < h u, ξy i = ξ h u, y i .
Therefore,
h u, y i >
1
h u, b i −−−−−→ 0.
ξ→+∞
ξ
We have found h u, y i ≥ 0.
Hence, for all x ≥ 0 we have h u, Ax i = A> u, x ≥ 0.
According to Lemma 1.44, A> u ≥ 0.
22
TO November 28, 2016:959
We have found u such that h b, u i < 0 and A> u ≥ 0.
Therefore, right-hand side is also violated.
Q.E.D.
Later another theorem based on set separation will be employed.
Theorem 1.47 (Gordan) : Let A ∈ Rm×n . Then, just one of the following problems
possesses a solution:
1. There is p ∈ Rm such that p> A < 0.
2. There is x ∈ Rn , x ≥ 0, x 6= 0 such that Ax = 0.
Proof:
1. Let p ∈ Rm be such that p> A < 0.
Then for each x ∈ Rn , x ≥ 0, x 6= 0, we have p> Ax < 0.
Consequently, Ax 6= 0 and the second problem possesses no solution.
2. Let there is no p ∈ Rm such that p> A < 0.
Consider two sets
M = A> p : p ∈ Rm , N = {z ∈ Rn : z < 0} .
Sets M, N are non-empty convex and M ∩ N = ∅.
Accordingly to Theorem 1.42, they can be properly separated. Then there is a
q ∈ Rn , q 6= 0 such that ∀ u ∈ M, ∀ v ∈ N we have h q, u i ≥ h q, v i.
(a) Consider a sequence vk = − k1 1I, k ∈ N. Hence, vk ∈ N , k ∈ N and ∀ u ∈ M,
∀ k ∈ N we have
n
1X
h q, u i ≥ h q, vk i = −
qj .
k j=1
Letting k go to +∞, ∀ u ∈ M we have h q, u i ≥ 0.
(b) Consider for index i ∈ {1, 2, . . . , n} a sequence wk = −1I − kei::n , k ∈ N.
Hence, wk ∈ N , k ∈ N and for a fixed u ∈ M, ∀ k ∈ N we have
h q, u i ≥ h q, wk i = −
n
X
j=1
qj − kqi .
23
Petr Lachout November 28, 2016:959
Consequently, ∀ k ∈ N we have
1
qi ≥ −
k
n
X
!
qj + h q, u i .
j=1
Letting k go to +∞, we receive q ≥ 0.
(c) For every p ∈ Rm , we have A> p ∈ M.
Consequently, q, A> p ≥ 0.
Therefore for each p ∈ Rm , we have q, A> p = h Aq, p i ≥ 0.
According to Lemma 1.43, Aq = 0.
We have found q ∈ Rn a solution of the second problem.
Q.E.D.
24
TO November 28, 2016:959
Chapter 2
Functions
2.1
General notions
We consider functions defined on a finite dimensional Euclidean space with values in an
extended real line, i.e. real values extended with +∞ and −∞. Extended real line is
denoted by R∗ .
Definition 2.1 For a function f
hypograph (cz. hypograf )
epi (f ) =
hypo (f ) =
: Rn → R∗ , we define its epigraph (cz. epigraf ) and
: f (x) ≤ η, x ∈ R , η ∈ R ,
x
n
: f (x) ≥ η, x ∈ R , η ∈ R
η
x
η
n
(2.1)
(2.2)
and its domain (cz. doména)
Dom (f ) = {x : f (x) < +∞, x ∈ Rn } .
(2.3)
Definition 2.2 We say, function f : Rn → R∗ is proper (cz. vlastnı́), if Dom (f ) 6= ∅
and f (x) > −∞ for all x ∈ Rn .
Acceptance of value +∞ is important for optimization, particularly for its theory. It
allows more simple and readable description of an optimization program.
For example optimization
program
n
o inf {f (x) : x ∈ D} can be rewritten as an unconn
strained problem inf f˜(x) : x ∈ R , where
f˜(x) = f (x) if x ∈ D,
= +∞ otherwise.
Epigraph of a function is a particular set.
25
(2.4)
(2.5)
26
TO November 28, 2016:959
Lemma 2.3 Set E ⊂ Rn+1 is an epigraph of a function f : Rn → R∗ if and only if for
all x ∈ Rn we have
x
η :
∈E
is either ∅ or R or [η̂, +∞) for a proper η̂ ∈ R.
η
x
If E is an epigraph of a function f : Rn → R∗ , then f (x) = min η :
∈E .
η
Proof: Property is evident.
Q.E.D.
Mapping between a function and its epigraph is a 1-1 bijection.
Lemma 2.4 Let I be an index set and for each i ∈ I a function fi : Rn → R∗ be given.
Then,
epi sup fi
=
i∈I
\
\
epi (fi ) , hypo inf fi =
hypo (fi ) ,
i∈I
i∈I
⊃
epi inf fi
i∈I
[
[
epi (fi ) , hypo sup fi ⊃
hypo (fi ) .
i∈I
i∈I
(2.6)
i∈I
(2.7)
i∈I
If I is finite, we receive equalities
epi inf fi
i∈I
=
[
i∈I
epi (fi ) , hypo sup fi
i∈I
=
[
hypo (fi ) .
(2.8)
i∈I
Proof: Statement is a direct consequence of Lemma 2.3. Intersection or union of a finite
number of intervals of type [ξ, +∞) is giving again an interval of the same type. Union
of infinite number of such intervals can violate this property. Similarly, Intersection or
union of intervals of type (−∞, ξ] is again a interval of the same type. Union of infinite
number of such intervals can violate this property.
Q.E.D.
27
Petr Lachout November 28, 2016:959
2.2
2.2.1
Differentiability of a function
On the real line
Definition 2.5 Let D ⊂ R, D 6= ∅, f : D → R and x ∈ int (D). We say,
f is differentiable at x (cz. diferencovatelná v bodě x) if there is f 0 (x) ∈ R such that for
all y ∈ D we have
f (y) = f (x) + f 0 (x) (y − x) + |y − x| R1 (y − x; f, x) ,
(2.9)
where limh→0 R1 (h; f, x) = 0.
(x)
Equivalently, f is differentiable at x if and only if limh→0 f (x+h)−f
= f 0 (x) ∈ R.
h
If S ⊂ int (D), then we say f is differentiable at S (cz. diferencovatelná v množině
S), if it is differentiable at each point x ∈ S.
Lemma 2.6 If D ⊂ R, D 6= ∅ and f : D → R is differentiable at x ∈ int (D) then f is
continuous at x.
Proof: Continuity of f at x follows immediately (2.9).
Q.E.D.
Lemma 2.7 Let a, b ∈ R, a < b, f : [a, b] → R be differentiable at (a, b), rightcontinuous at a and left-continuous at b. Then,
Z
b
f 0 (s) ds = f (b) − f (a) .
(2.10)
a
2.2.2
Several arguments
Definition 2.8 Let D ⊂ Rn , D 6= ∅, f : D → R, x ∈ int (D) and h ∈ Rn . We say,
f is differentiable at x in direction h (cz. diferencovatelná v bodě x ve směru h) if there
is f 0 (x; h) ∈ R such that for all t ∈ R, x + th ∈ D we have
f (x + th) = f (x) + f 0 (x; h) t + |t| R1 (t; f, x, h) ,
where lims→0 R1 (s; f, x, h) = 0.
Equivalently, f is differentiable at x in direction h if and only if limt→0
0
f (x; h) ∈ R.
(2.11)
f (x+th)−f (x)
t
=
28
TO November 28, 2016:959
Definition 2.9 Let D ⊂ Rn , D 6= ∅, f : D → R, x ∈ int (D). For i ∈ {1, 2, . . . , n}
we say, f possesses a partial derivative at x w.r.t. xi (cz. parciálnı́ derivace v bodě x
vzhledem k xi ) if f is differentiable at x in direction ei:n and we set
∂f
(x) = f 0 (x; ei:n ) .
∂xi
If f possesses a partial derivative at x w.r.t. xi for all i ∈ {1, 2, . . . , n} we say f possesses
a gradient at x (cz. gradient v bodě x) and we denote
n
∂f
(x)
.
∇f (x) =
∂xi
i=1
In the text, we are using differentiability of a function convenient for optimization, see
e.g. [1], [9]. We will introduce necessary terminology and basic properties of differentiable
functions.
Definition 2.10 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). We say,
f is differentiable at x (or, possesses total differential at x, Fréchet differentiable at x)
(cz. diferencovatelná v bodě x) if f possesses a gradient ∇f (x) ∈ Rn and for all y ∈ D
we have
f (y) = f (x) + h ∇f (x) , y − x i + ky − xk R1 (y − x; f, x) ,
(2.12)
where limh→0 R1 (h; f, x) = 0.
If S ⊂ int (D), then we say f is differentiable at S (cz. diferencovatelná v množině
S), if it is differentiable at each point x ∈ S.
Definition 2.11 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D).
We say, f is continuously differentiable at x (cz. spojitě diferencovatelná v bodě x),
if there is δ > 0 such that U (x, δ) ⊂ D, f is differentiable at U (x, δ) and gradient ∇f is
continuous at x.
We say, f is continuously differentiable at a neighborhood of x (cz. spojitě diferencovatelná v okolı́ bodu x), if there is δ > 0 such that U (x, δ) ⊂ D, f is differentiable at
U (x, δ) and gradient ∇f is continuous at U (x, δ).
Gradient is necessary for expansion (2.12).
Lemma 2.12 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). Let f fulfills an
expansion for all y ∈ D
f (y) = f (x) + h ξ, y − x i + ky − xk R1 (y − x; f, x) ,
(2.13)
where ξ ∈ Rn and limh→0 R1 (h; f, x) = 0.
Then f is differentiable at x, ξ = ∇f (x) and f 0 (x; h) = h ∇f (x) , h i for all directions
h ∈ Rn .
29
Petr Lachout November 28, 2016:959
Proof: Using (2.13) for a direction h ∈ Rn and t ∈ R small enough, we have
f (x + th) = f (x) + h ξ, th i + kthk R1 (th; f, x) ,
where limh→0 R1 (h; f, x) = 0 .
Consider derivative ratio and let t → 0:
lim
t→0
|t|
f (x + th) − f (x)
= h ξ, h i + khk lim R1 (th; f, x) = h ξ, h i .
t→0
t
t
∂f
Setting h = ei:n , we receive ξi = ∂x
(x).
i
We have verified ξ is the gradient of f at x, f is differentiable at x and directional
derivatives possess announced form.
Q.E.D.
Lemma 2.13 If D ⊂ Rn , D 6= ∅ and f : D → R is differentiable at x ∈ int (D) then f
is continuous at x.
Proof: Continuity of f at x follows immediately (2.12).
Q.E.D.
Lemma 2.14 Let D ⊂ Rn , D 6= ∅ and f : D → R. Consider x ∈ D and h ∈ Rn such
that x+th ∈ D for all 0 ≤ t ≤ 1. Define function ϕ : [0, 1] → R : t ∈ [0, 1] → f (x + th).
(i) If 0 < t < 1, x + th ∈ int (D) and f is differentiable at x + th then ϕ is
differentiable at t and ϕ0 (t) = h ∇f (x + th) , h i.
(ii) If x + th ∈ int (D) and f is differentiable at x + th for all 0 < t < 1, ϕ is
continuous at 0 from right and ϕ is continuous at 1 from left then
Z 1
f (x + h) − f (x) = ϕ(1) − ϕ(0) =
h ∇f (x + th) , h i dt.
(2.14)
0
Proof: (i) follows Lemma 2.12 and (ii) is a consequence of Lemma 2.7.
Q.E.D.
30
TO November 28, 2016:959
2.2.3
Vector valued functions
Start with a curve.
Definition 2.15 Let D ⊂ R, D 6= ∅, f : D → Rm and t ∈ int (D). Express the function
as a vector of functions f = (f1 , f2 , . . . , fm )> . We say,
• f is differentiable at x if fj is differentiable at x for each j ∈ {1, 2, . . . , m}. We
0
denote the derivative by f 0 (t) = (f10 (t) , f20 (t) , . . . , fm
(t))> .
• If S ⊂ int (D), f is differentiable at S if fj is differentiable at S for each j ∈
{1, 2, . . . , m}.
And now a general case. We start with a notion of multidimensional scalar product.
Definition 2.16 Let n, m ∈ N, x ∈ Rn and A ∈ Rn×m . We define denotation
lA, xm = (h A·,1 , x i , h A·,2 , x i , . . . , h A·,m , x i)> .
Using matrix notation, we can write lA, xm = A> x.
Definition 2.17 Let D ⊂ Rn , D 6= ∅, n ≥ 2, f : D → Rm and x ∈ int (D). Express the
function as a vector of functions f = (f1 , f2 , . . . , fm )> . We say,
• f possesses a gradient at x if fj possesses a gradient at x for each j ∈ {1, 2, . . . , m}.
We denote ∇f (x) = (∇f1 (x) , ∇f2 (x) , . . . , ∇fm (x)).
• f is differentiable at x if fj is differentiable at x for each j ∈ {1, 2, . . . , m}.
• If S ⊂ int (D), f is differentiable at S if fj is differentiable at S for each j ∈
{1, 2, . . . , m}.
Lemma 2.18 Let D ⊂ Rn , D 6= ∅, f : D → Rm and x ∈ int (D). Then, f is differentiable at x if and only if f possesses a gradient ∇f (x) ∈ Rn×m and for all y ∈ D we
have
f (y) = f (x) + l∇f (x) , y − x m + ky − xk R1 (y − x; f, x) ,
(2.15)
where R1 (·; f, x) : Rn → Rm and limh→0 R1 (h; f, x) = 0.
The expression is more simple for n = 1. Let D ⊂ R, D 6= ∅, f : D → Rm
and t ∈ int (D). Then, f is differentiable at t if and only if f possesses a derivative
f 0 (t) ∈ Rm and for all s ∈ D we have
f (s) = f (t) + (s − t)f 0 (t) + |s − t| R1 (s − t; f, t) ,
(2.16)
where R1 (·; f, x) : R → Rm and limh→0 R1 (h; f, x) = 0.
Proof: It is a straightforward rewriting of definition.
Q.E.D.
Petr Lachout November 28, 2016:959
2.2.4
31
Chain rule
Differentiability directly implies chain rule (cz. řetı́zkové pravidlo).
Lemma 2.19 Let I ⊂ R, int (I) 6= ∅, D ⊂ Rn , int (D) 6= ∅, g : I → D,
f : D → R and t ∈ int (I) such that g(t) ∈ int (D). If f is differentiable at g(t) and g is
differentiable at t, then f ◦ g is differentiable at t and
n
X
∂f
(f ◦ g) (t) =
(g (t)) gi0 (t) = h ∇f (g (t)) , g 0 (t) i .
∂x
i
i=1
0
(2.17)
Proof: Take s ∈ I, s 6= t. Accordingly to differentiability of f at g(t) and differentiability of g at t, we have
f (g(s)) − f (g(t))
= h ∇f (g (t)) , g(s) − g(t) i + kg(s) − g(t)k R1 (g(s) − g(t); f, g(t))
= h ∇f (g (t)) , (s − t) g 0 (t) + |s − t| R1 (s − t; g, t) i
+ k(s − t) g 0 (t) + |s − t| R1 (s − t; g, t)k R1 (g(s) − g(t); f, g(t))
= h ∇f (g (t)) , g 0 (t) i (s − t) + |s − t| h ∇f (g (t)) , R1 (s − t; g, t) i
s−t 0
R1 (g(s) − g(t); f, g(t))
g
(t)
+
R
(s
−
t;
g,
t)
+ |s − t| 1
|s − t|
= h ∇f (g (t)) , g 0 (t) i (s − t) + |s − t| R1 (s − t; f ◦ g, t) ,
where
R1 (w; f ◦ g, t) = h ∇f (g (t)) , R1 (w; g, t) i
+ kg 0 (t) + R1 (w; g, t)k R1 (g(t + w) − g(t); f, g(t))
if w > 0,
= h ∇f (g (t)) , R1 (w; g, t) i
+ k−g 0 (t) + R1 (w; g, t)k R1 (g(t + w) − g(t); f, g(t))
if w < 0,
= 0 if w = 0,
Thus, f ◦ g is differentiable at t and (2.17) is shown.
Q.E.D.
32
TO November 28, 2016:959
2.2.5
Second derivative
Also, second derivative will be employed.
Definition 2.20 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). We say, f possesses
second partial derivatives at x (or, Second Peano derivative) (cz. má druhé parciálnı́
derivace v x), if f possesses a gradient
on a neighborhood of x and all partial derivatives
∂f
(x) exists for all indexes i, j ∈ {1, 2, . . . , n}.
of gradient at x exists; i.e. ∂x∂ j ∂x
i
∂2f
∂f
∂
Then, we denote ∂xi ∂xj (x) = ∂xj ∂xi (x) for all i, j ∈ {1, 2, . . . , n}. Matrix of secn,n
2
f
(x)
and called Hessian matrix.
ond partial derivatives is denoted ∇2 f (x) = ∂x∂i ∂x
j
i=1,j=1
Definition 2.21 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). We say,
f is twice differentiable at x (cz. dvakrát diferencovatelná v x), if there is a gradient
∇f (x) ∈ Rn and a symmetric matrix Hf (x) ∈ Rn×n such that for all y ∈ D we have
1
h y − x, Hf (x) (y − x) i
2
+ ky − xk2 R2 (y − x; f, x) ,
f (y) = f (x) + h ∇f (x) , y − x i +
(2.18)
where limh→0 R2 (h; f, x) = 0 .
If S ⊂ int (D), then we say, f is twice differentiable at S (cz. dvakrát diferencovatelná
v množině S), if it is twice differentiable at each x ∈ S.
Matrix Hf (x) can differ from Hessian matrix. The reasons are
• ∇f does not exist in any neighborhood of x,
• ∇f exists in a neighborhood of x and ∇2 f (x) does not exist.
• ∇f exists in a neighborhood of x, ∇2 f (x) exist, but, asymmetric.
Lemma 2.22 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). If f is twice differentiable at x then matrix Hf (x) is unequally determined.
Proof: Since Hf (x) is symmetric, its uniqueness follows from Lemma 1.45.
Q.E.D.
33
Petr Lachout November 28, 2016:959
Lemma 2.23 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). If f is differentiable
at a neighborhood of x and ∇f is differentiable at x, then, ∇2 f (x) exists and f is twice
differentiable at x with
Hf (x) =
>
1
1 2
∇ f (x) +
∇2 f (x) .
2
2
If, moreover, Hessian matrix is symmetric, i.e.
{1, 2, . . . , n}, then
∂2f
∂xi ∂xj
(x) =
∂2f
∂xj ∂xi
(x) for all i, j ∈
Hf (x) = ∇2 f (x) .
Proof: According to our assumptions, there is δ > 0 such that U (x, δ) ⊂ D and for all
y ∈ U (x, δ), h ∈ Rn , khk < δ − ky − xk we have
f (y + h) − f (y) = h ∇f (y) , h i + khk R1 (h; f, y) ,
>
∇f (y) − ∇f (x) = l ∇2 f (x) , y − x m + ky − xk R1 (y − x; ∇f, x) .
According to Lemma 2.14
Z
f (x + h) − f (x) =
1
h ∇f (x + th) , h i dt.
0
Plugging in expansion of gradient, we are receiving
Z 1
f (x + h) − f (x) − ∇f (x) =
h ∇f (x + th) − ∇f (x) , h i dt
0
Z 1D
E
>
=
l ∇2 f (x) , th m + kthk R1 (th; ∇f, x) , h dt
Z0 1 D
Z 1
E
>
2
=
t l ∇ f (x) , hm, h dt +
|t| h khk R1 (th; ∇f, x) , h i dt
0
0
Z 1 > E
h
1D
2
2
h, ∇ f (x) h + khk
|t| R1 (th; ∇f, x) ,
dt
=
2
khk
0
Z 1 > 1 2
h
1
2
2
∇ f (x) + ∇ f (x)
h + khk
|t| R1 (th; ∇f, x) ,
=
h,
dt,
2
2
khk
0
where
Z
|t|
lim
h→0
1
0
h
R1 (th; ∇f, x) ,
khk
dt = 0 since lim R1 (s; ∇f, x) = 0.
s→0
We have proved f is twice differentiable at x with Hf (x) =
1
2
2
2
>
∇ f (x) + (∇ f (x))
.
34
TO November 28, 2016:959
Q.E.D.
Lemma 2.24 Let D ⊂ Rn , D 6= ∅, f : D → R, x ∈ int (D) and h ∈ Rn .
(i) If f is twice differentiable at x, then
f (x + th) − f (x) − t h ∇f (x) , h i
1
h h, Hf (x) h i .
=
2
t→0
t
2
lim
(2.19)
(ii) Let us denote Dh = {t ∈ R : x + th ∈ D}. If f is differentiable at a neighborhood of x and ∇f is differentiable at x, then, ∇2 f (x) exists and function
ϕ : Dh → R : t ∈ Dh → f (x + th) possesses derivatives
ϕ0 (t) = h ∇f (x + th) , h i for all t small enough,
ϕ00 (0) = h, ∇2 f (x) h .
(2.20)
(2.21)
Proof:
1. (i) follows (2.18), since for t 6= 0
1
f (x + th) − f (x) − t h ∇f (x) , h i
=
h h, Hf (x) h i + khk2 R2 (th; f, x) .
2
t
2
2. (ii) follows Lemma 2.23 and (2.12), (2.15), since for s 6= 0
f (x + (t + s)h) − f (x + th)
ϕ(t + s) − ϕ(t)
=
s
s
= h ∇f (x + th) , h i + khk R1 (sh; f, x + th) ,
ϕ0 (s) − ϕ0 (0)
h ∇f (x + sh) , h i − h ∇f (x) , h i
=
s
s
2
= h, ∇ f (x) h + khk R1 (sh; ∇f, x) .
Q.E.D.
35
Petr Lachout November 28, 2016:959
2.2.6
Arguments for differentiability
Existence and continuity of gradient, resp. of Hessian, are sufficient conditions for differentiability in the sense of Definitions 2.10 and 2.21.
Lemma 2.25 Let I ⊂ R, int (I) 6= ∅, D ⊂ Rn , D 6= ∅, g : I → D, f : D → R and
t ∈ int (I) such that g(t) ∈ int (D). If gradient of f exists on a neighborhood of g(t) and
is continuous at g(t) and g is differentiable at t, then f ◦ g is differentiable at t with
n
X
∂f
(g (t)) gi0 (t) = h ∇f (g (t)) , g 0 (t) i .
(f ◦ g) (t) =
∂x
i
i=1
0
(2.22)
Proof: For s ∈ I, s 6= t, i ∈ {1, 2, . . . , n}, u ∈ [0, 1], we denote
ξ(u, s, i) = (g1 (t) , . . . , gi−1 (t) , gi (t) + u (gi (s) − gi (t)) , gi+1 (s) , . . . , gn (s)) .
Then,
f ◦ g(s) − f ◦ g(t) =
=
n
X
[f (ξ(1, s, i)) − f (ξ(0, s, i))]
i=1
n Z 1
X
i=1
0
∂f
(ξ(u, s, i)) (gi (s) − gi (t)) du.
∂xi
Divide formula by s − t and let s → t.
We receive formula (2.22), since gradient of f is continuous at g (t).
Q.E.D.
Using Lemma 2.25, we derive differentiability of a function.
Lemma 2.26 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). If gradient of f exists
on a neighborhood of g(t) and is continuous at x, then f is differentiable at x with
f (x + h) = f (x) + h ∇f (x) , h i + khk R1 (h; f, x) ,
|R1 (h; f, x) | ≤ max {k∇f (x + uh) − ∇f (x)k : 0 ≤ u ≤ 1}
(2.23)
if h is sufficiently small.
Proof: Using Lemma 2.25 for h ∈ Rn sufficiently small, we receive an expansion
Z 1
f (x + h) − f (x) =
h ∇f (x + uh) , h i du
0
Z 1
=
h ∇f (x) , h i +
h ∇f (x + uh) − ∇f (x) , h i du
0
=
h ∇f (x) , h i + khk R1 (h; f, x) ,
|R1 (h; f, x) | ≤ max {k∇f (x + uh) − ∇f (x)k : 0 ≤ u ≤ 1} .
36
TO November 28, 2016:959
Q.E.D.
Lemma 2.27 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). Then,
f is continuously differentiable at a neighborhood of x if and only if there is δ > 0 such
that ∇f exists at U (x, δ) and is continuous at U (x, δ).
Proof: A consequence of Lemma 2.26.
Q.E.D.
Lemma 2.28 Let D ⊂ Rn , D 6= ∅, f : D → R and x ∈ int (D). If ∇f , ∇2 f exist on
a neighborhood of x and ∇2 f is continuous at x, then Hessian ∇2 f (x) is a symmetric
matrix and f is twice differentiable at x with
f (x + h) = f (x) + h ∇f (x) , h i +
1
h, ∇2 f (x) h +
2
(2.24)
1
+ khk2 R2 (h; f, x) ,
2
|R2 (h; f, x) | ≤ max ∇2 f (x + uh) − ∇2 f (x) : 0 ≤ u ≤ 1
if h sufficiently small. Moreover, Hf (x) = ∇2 f (x).
Proof: Take x, h ∈ Rn . Using twice Lemma 2.25, we derive
Z 1
h ∇f (x + uh) , h i du
f (x + h) − f (x) =
0
Z 1
h ∇f (x + uh) − ∇f (x) , h i du
=
h ∇f (x) , h i +
0
Z 1Z u
=
h ∇f (x) , h i +
h, ∇2 f (x + vh) h dv du
0
=
0
=
0
1
h ∇f (x) , h i +
h, ∇2 f (x) h +
2
Z 1Z u
+
h, ∇2 f (x + vh) − ∇2 f (x) h dv du
0
1
1
h, ∇2 f (x) h + khk2 R2 (h; f, x) ,
2
2
2
|R2 (h; f, x) | ≤ max ∇ f (x + uh) − ∇2 f (x) : 0 ≤ u ≤ 1 .
h ∇f (x) , h i +
Q.E.D.
37
Petr Lachout November 28, 2016:959
2.3
Convex functions
2.3.1
Definition of a convex function
Definition 2.29 A function f : Rn → R∗ is convex (cz. konvexnı́), if epi (f ) is a
convex set.
Convexity of a function can be equivalently explained.
Lemma 2.30 If a function f : Rn → R∗ is convex, then Dom (f ) is a convex set.
Proof: Let x, y ∈ Dom (f ) and 0 < λ < 1.
Then, there
is η,ξ ∈ R such that f (x) ≤ η and f (y) ≤ ξ.
x
y
Hence,
,
∈ epi (f ).
η
ξ
Since epi (f ) is convex, (λx + (1 − λ)y, λη + (1 − λ)ξ) ∈ epi (f ).
Hence, f (λx + (1 − λ)y) ≤ λη + (1 − λ)ξ < +∞.
Therefore, λx + (1 − λ)y ∈ Dom (f ) and convexity of Dom (f ) is shown.
Q.E.D.
Theorem 2.31 : Function f : Rn → R∗ is convex if and only if Dom (f ) is a convex
set and for all x, y ∈ Dom (f ) and 0 < λ < 1 we have
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) .
Proof:
1. Let f is convex.
Then accordingly to Lemma 2.30, Dom (f ) is a convex set.
Let x, y ∈ Dom (f ) and 0 < λ < 1.
Then for all η, ξ ∈ R fulfilling f (x) ≤ η and f (y) ≤ ξ,
x
y
one has
,
∈ epi (f ).
η
ξ
epi (f ) is convex, then, (λx + (1 − λ)y, λη + (1 − λ)ξ) ∈ epi (f ).
Hence, f (λx + (1 − λ)y) ≤ λη + (1 − λ)ξ < +∞.
Minimum over all possible η, ξ is giving (2.25).
(2.25)
38
TO November 28, 2016:959
2. Let property (2.25) be fulfilled.
x
y
Take
,
∈ epi (f ) and 0 < λ < 1. Then,
η
ξ
λη + (1 − λ)ξ ≥ λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y) .
Hence, (λx + (1 − λ)y, λη + (1 − λ)ξ) ∈ epi (f ).
We found epi (f ) is a convex set, therefore, f is a convex function.
Q.E.D.
Theorem 2.31 shows, that new definition 2.29 coincides with classical definition of a
convex function, if function is proper and a restriction f : Dom (f ) → R is considered.
A note, convex function attaining value −∞ is degenerated.
Lemma 2.32 Let f : Rn → R∗ be a convex function. Then, either f (x) ∈ R for all
x ∈ Dom (f ) or f (x) = −∞ for all x ∈ rint (Dom (f )).
Proof: Let x ∈ Dom (f ) and f (x) = −∞.
If y ∈ rint (Dom (f )), then there is z ∈ Dom (f ) and 0 < λ ≤ 1 such that y = λx+(1−λ)z.
Using property (2.25), we receive
f (y) = f (λx + (1 − λ)z) ≤ λf (x) + (1 − λ)f (z) = −∞.
Q.E.D.
Theorem 2.33 : If function f : Rn → R∗ is convex and proper, then it is continuous
on rint (Dom (f )).
Proof: Without any loss of generality we can assume, int (Dom (f )) 6= ∅. Otherwise, we
will consider the problem in coordinate system of the smallest lineal containing Dom (f ).
Let x ∈ int (Dom (f )).
Then, there is ∆ > 0 such that x + ∆ei:n , x − ∆ei:n ∈ Dom (f ) for all
i = 1, 2, . . . , n.
Dom (f ) is convex, therefore,
M = conv ({x + ∆ei:n , x − ∆ei:n : i = 1, 2, . . . , n}) ⊂ Dom (f ) .
39
Petr Lachout November 28, 2016:959
Each point y ∈ M can be written as
y
n
X
=
λi,+ (x + ∆ei:n ) +
i=1
where
n
X
n
X
λi,− (x − ∆ei:n ),
i=1
λi,+ +
i=1
n
X
λi,− = 1, λi,+ , λi,− ≥ 0.
i=1
Hence for y ∈ M we receive a bound
f (y)
≤
n
X
i=1
λi,+ f (x + ∆ei:n ) +
n
X
λi,− f (x − ∆ei:n ) ≤ Ξ < +∞,
i=1
where Ξ := max {f (x + ∆ei:n ) , f (x − ∆ei:n ) : i = 1, 2, . . . , n} .
Point y ∈ M can be also represented as y = x + δs, where
Then,
Pn
i=1
|si | = ∆ and 0 ≤ δ ≤ 1.
f (y) = f (x + δs) = f ((1 − δ)x + δ(x + s)) ≤ (1 − δ)f (x) + δf (x + s)
≤ (1 − δ)f (x) + δΞ,
δ
1
(x + δs) +
(x − s)
f (x) = f
1+δ
1+δ
1
δ
≤
f (x + δs) +
f (x − s)
1+δ
1+δ
δ
1
f (y) +
Ξ.
≤
1+δ
1+δ
Finally, we are receiving
(1 + δ)f (x) − δΞ ≤ f (y) ≤ (1 − δ)f (x) + δΞ .
Thus, f is continuous at each x ∈ int (Dom (f )).
Q.E.D.
Continuity of a convex function at boundary of its domain is not an easy task. A
necessary condition is valid for a general proper function.
Theorem 2.34 : Let a function f : Rn → R∗ be proper and continuous on Dom (f ).
Then,
epi (f ) = clo (epi (f )) ∩ (Dom (f ) × R) .
(2.26)
40
TO November 28, 2016:959
Proof: Let x ∈ Dom (f ) and
x
η
∈ clo (epi (f )).
Then, there is a sequence (xk , ηk ) ∈ epi (f ) converging to
x
η
.
Hence, we have f (xk ) ≤ ηk .
Function is continuouson Dom
(f ), after a limit passage we receive f (x) ≤ η.
x
Thus, we have shown
∈ epi (f ).
η
Q.E.D.
Theorem possesses a nice consequence.
Consiquence: Let f : Rn → R∗ be a proper function continuous on Dom (f ) and
Dom (f ) be a closed set. Then, epi (f ) is also a closed set. ♣
Proof: Statement is a direct consequence of Theorem 2.34, since Dom (f ) is a closed
set, and hence,
clo (epi (f )) ∩ (Dom (f ) × R) = clo (epi (f )) .
Q.E.D.
Theorem 2.35 : If functions fi : Rn → R∗ are convex for all i ∈ I, then sup fi : Rn →
i∈I
R∗ is also convex.
Proof: According to Lemma 2.4 we have epi sup fi
=
i∈I
T
epi (fi ).
i∈I
Intersection of convex sets is a convex set; see Lemma 1.4.
The statement is proved.
Q.E.D.
Theorem 2.36 : Let f : Rn → R∗ be convex function. Then, for all α ∈ R are sets
{x : f (x) ≤ α} and {x : f (x) < α} convex.
These sets are called level sets of f (cz. úrovňové množiny funkce f ).
Proof: It is sufficient to verify that the set {x : f (x) < α} is convex, since {x : f (x) ≤ α} =
T
β>α {x : f (x) < β}.
Take y, z ∈ {x : f (x) < α} and 0 < λ < 1. Then, y, z ∈ Dom (f ) and we have
f (λy + (1 − λ)z) ≤ λf (y) + (1 − λ)f (z) < α.
41
Petr Lachout November 28, 2016:959
Q.E.D.
As a consequence of Theorem 2.36, we are receiving that the set of all feasible solutions
(cz. množina přı́pustných řešenı́) of a convex program is convex, i.e.
{x ∈ Rn : g1 (x) ≤ α1 , g2 (x) ≤ α2 , . . . , gk (x) ≤ αk }
is a convex set, having functions g1 , g2 , . . . , gk convex.
Convexity of a sets {x : f (x) ≤ α} and {x : f (x) < α} is not implying that function
f is convex.
Example 2.37: Function
f (x) = log(x) if x > 0,
= +∞ otherwise
is not convex, but, its level sets {x : f (x) ≤ α} = (0, eα ], {x : f (x) < α} = (0, eα ) are
convex for all α ∈ R.
4
Definition 2.38 We say, function f : Rn → R∗ is
i) strictly convex (cz. ryze konvexnı́), if for all couple of points x, y ∈ Dom (f ),
x 6= y and 0 < λ < 1 we have inequality
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) .
ii) concave (cz. konkávnı́), if the function −f is convex.
iii) strictly concave, (cz. ryze konkávnı́), if the function −f is strictly convex.
Concave function can be equivalently defined as a function with convex hypograph.
Consider, strictly convex function is always proper. The only exception is if Dom (f )
is a one-point set.
Convex functions are very important at optimization theory, since its local minima
are immediately global minima.
Theorem 2.39 : Let f : Rn → R∗ be a proper convex function. Then, each local
minimum of f on Dom (f ) is a global minimum of f on Dom (f ).
Set of all global minimizers is convex.
42
TO November 28, 2016:959
Proof: Let x̂ ∈ Dom (f ) be a local minimum of f on Dom (f ), but, it is no global
minimum.
Then, there is y ∈ Dom (f ) such that f (y) < f (x̂).
Then, for all α ∈ (0, 1) we have f (αx̂ + (1 − α)y) ≤ αf (x̂) + (1 − α)f (y) < f (x̂).
That is contradiction, since x̂ is a local minimum of f on Dom (f ).
Hence, x̂ is a global minimum of f on Dom (f ).
Set of all global minimizers {x ∈ Rn : f (x) ≤ inf {f (y) : y ∈ Rn }} is convex, according
to Theorem 2.36.
Q.E.D.
Theorem 2.40 : Let f : Rn → R∗ be a proper strictly convex function which possesses
a local minimum on Dom (f ), then, it possesses unique global minimum on Dom (f ).
This global minimum is unique local minimum of f on Dom (f ).
Proof: According to Theorem 2.39, each local minimum of f on Dom (f ) is also its
global minimum. It is sufficient to show uniqueness of global minimum of f on Dom (f ).
Take x̂ a global minimum of f on Dom (f ) and y ∈ Dom (f ), y 6= x̂. Applying strict
convexity of f , we receive
1
1
1
1
x̂ + y < f (x̂) + f (y) .
f
2
2
2
2
Hence,
f (y) > 2f
1
1
x̂ + y − f (x̂) ≥ 2f (x̂) − f (x̂) = f (x̂) .
2
2
We have shown x̂ is unique global minimum of f on Dom (f ).
Q.E.D.
Let us recapitulate basic properties of convex functions.
2.3.2
Convex functions of one variable
This section sums up basic properties of convex functions of one variable. Presented
results are listed without proofs. Interested readers can consult basic textbooks on
mathematical analysis and probability theory.
We will consider a function f : J → R defined on a convex set J ⊂ R (Recall simple
structure of convex sets on real line. They are either empty set or point or interval.)
Consider smoothness of convex functions.
43
Petr Lachout November 28, 2016:959
Theorem 2.41 : Let J ⊂ R be an interval and f : J → R be a convex function.
i) Function f is continuous on int (J) and it can jump in extremal points of J.
Jumps must keep bounds: f (a) ≥ f (a+) if a is a left extremal point of J, f (a) ≥
f (a−) if a is a right extremal point of J.
ii) Derivative from left and from right exist at each point t ∈ int (J); i.e.
f (t + h) − f (t)
∈ R,
h→0+
h
f (t + h) − f (t)
f−0 (t) = lim
∈ R.
h→0−
h
f+0 (t) =
lim
We have f−0 (t) ≤ f+0 (t) ≤ f−0 (s) ≤ f+0 (s), whenever t, s ∈ int (J), t < s.
iii) f 0 exists on J except at most countably many point.
iv) f 00 exists on J except a set of Lebesgue measure zero.
P
P
k
k
v) f fulfills an inequality f
i=1 pi f (xi ) for all x1 , x2 , . . . , xk ∈ J,
i=1 pi xi ≤
Pk
p1 ≥ 0, p2 ≥ 0, . . . , pk ≥ 0, i=1 pi = 1.
vi) f fulfills Jensen inequality, i.e. f (E [X]) ≤ E [f (X)] for each real random variable X with a finite mean and with P (X ∈ J) = 1.
Recall, v) is a particular case of vi) . To see that, consider a random variable X
attaining values x1 , x2 , . . . , xk with probabilities p1 , p2 , . . . , pk .
Now, we recall some basic criteria indicating convex functions.
Theorem 2.42 : Let J ⊂ R be an open interval and f : J → R be a function, then we
have:
• Function f is convex
⇔
⇔
f+0 exists nondecreasing on J
f−0 exists nondecreasing on J.
• If f is differentiable on J, then
f is convex
⇔
f 0 is nondecreasing on J.
• If f is twice differentiable on J, then
f is convex
⇔
f 00 ≥ 0 on J.
⇔
44
TO November 28, 2016:959
2.3.3
Convex function of several variables
This section sums up basic properties of convex functions of several variables. Presented
results are listed without proofs. Interested readers can consult basic textbooks on
mathematical analysis, linear algebra and probability theory.
Consider a function f : D → R defined on a convex set D ⊂ Rn .
Lemma 2.43 Let D ⊂ Rn , D 6= ∅ be a P
convex set and f1 : D → R, f2 : D → R, . . . ,
fk : D → R be convex functions. Then, ki=1 ai fi : D → R is a convex function for all
a1 ≥ 0, a2 ≥ 0, . . . , ak ≥ 0.
Proof: A proof is straightforward.
Q.E.D.
Theorem 2.44 (Jensen inequality) : Let D ⊂ Rn be a nonempty convex set and f :
D → R be a convex function. If a real random vector X = (X1 , X2 , . . . , Xk )> possesses
finite mean and P (X ∈ D) = 1 then we have
E [X] ∈ D and f (E [X]) ≤ E [f (X)].
Proof: A proof can be found for example in [5], Theorem 5.9, p.26.
Q.E.D.
A consequence of Theorem 2.44 is a generalization of inequality (2.25) (“deterministic
Jensen inequality”).
Theorem 2.45 : Let D ⊂ Rn , D 6= ∅ be a convex set and f : D → R be a convex
function. Then, an inequality
!
k
k
X
X
f
p i xi ≤
pi f (xi )
(2.27)
i=1
i=1
holds for all x1 , x2 , . . . , xk ∈ D, p1 ≥ 0, p2 ≥ 0, . . . , pk ≥ 0,
Pk
i=1
pi = 1.
Proof: The statement is a particular case of Theorem 2.44. To see that, consider a
random variable X attaining values x1 , x2 , . . . , xk with probabilities p1 , p2 , . . . , pk .
Q.E.D.
Convexity of a function can be verified by means of functions of one variable.
45
Petr Lachout November 28, 2016:959
Theorem 2.46 : Let D ⊂ Rn , D 6= ∅ be a convex set and f : D → R. Then, function
f is convex if and only if functions ϕx,s : Dx,s → R are convex for all x ∈ D and all
s ∈ Rn , where ϕx,s (t) = f (x + ts) and Dx,s = {t : x + ts ∈ D, t ∈ R}. (Let us recall set
Dx,s is always an interval.)
Proof:
1. Take x ∈ D and s ∈ Rn .
For t1 , t2 ∈ Dx,s and 0 < λ < 1 we have
x + (λt1 + (1 − λ)t2 )s = λ(x + t1 s) + (1 − λ)(x + t2 s) ∈ D,
since x + t1 s, x + t2 s ∈ D and D is a convex set.
We have proved Dx,s is a convex subset of R, therefore, it is an interval.
2. Let f be a convex function and x ∈ D, s ∈ Rn .
For t1 , t2 ∈ Dx,s and 0 < λ < 1 we have
ϕx,s (λt1 + (1 − λ)t2 ) =
= f (x + (λt1 + (1 − λ)t2 )s) = f (λ(x + t1 s) + (1 − λ)(x + t2 s))
≤ λf (x + t1 s) + (1 − λ)f (x + t2 s) = λϕx,s (t1 ) + (1 − λ)ϕx,s (t2 ) .
We have verified ϕx,s is a convex function on an interval Dx,s .
3. Let function ϕx,s be convex on Dx,s for all x ∈ D and s ∈ Rn .
Take x, y ∈ D, 0 < λ < 1 and set s = x − y. Then, we have
f (λx + (1 − λ)y) = f (y + λs) = ϕy,s (λ)
≤ λϕy,s (1) + (1 − λ)ϕy,s (0) = λf (x) + (1 − λ)f (y) .
We have verified f is a convex function.
Q.E.D.
This property enables us generalize criteria for convex function identification.
We will need first and second derivative of projections.
Lemma 2.47 Let D ⊂ Rn , D 6= ∅ be a convex open set and f : D → R.
• If f is differentiable at D and x ∈ D, s ∈ Rn , t ∈ Dx,s , we have
ϕ0x,s
n
X
∂f
(x + ts) si = h ∇f (x + ts) , s i .
(t) =
∂xi
i=1
46
TO November 28, 2016:959
• If f is twice differentiable at D and x ∈ D, s ∈ Rn , t ∈ Dx,s , we have
ϕx,s (t + u) − ϕx,s (t) − u h ∇f (x + ts) , s i
=
u→0
u2
1
= h s, Hf (x + ts) s i .
2
lim
• If f is differentiable at D and ∇f is differentiable at D, then, ∇2 f exists on D and
for x ∈ D, s ∈ Rn , t ∈ Dx,s , we have
ϕ0x,s (t) = h ∇f (x + ts) , s i ,
ϕ00x,s (t) = s, ∇2 f (x + ts) s .
(2.28)
(2.29)
Proof: Statement is a consequence of Lemmas 2.14, 2.24.
Q.E.D.
Theorem 2.48 : Let D ⊂ Rn , D 6= ∅ be a convex open set and f : D → R be
differentiable at D. Then,
f is convex
t ∈ Dx,s 7→ h ∇f (x + ts) , s i is nondecreasing on Dx,s
for all x ∈ D, s ∈ Rn .
⇔
(2.30)
Proof: According to Theorem 2.46 we have to verify convexity of all one-dimensional
projections of f .
Take x ∈ D, s ∈ Rn and consider function ϕx,s .
Function f is differentiable at D, therefore, according to Lemma 2.47, we have
ϕ0x,s (t) = h ∇f (x + ts) , s i .
Hence, according to Theorem 2.42
ϕx,s is convex
⇔
t ∈ Dx,s 7→ h ∇f (x + ts) , s i
is a nondecreasing function.
The statement is proved.
Q.E.D.
47
Petr Lachout November 28, 2016:959
Theorem 2.49 : Let D ⊂ Rn , D 6= ∅ be a convex open set and f : D → R. If f
is differentiable at D and ∇f is differentiable at D, then, ∇2 f exists on D, f is twice
differentiable at D with
Hf (x) =
>
1
1 2
∇ f (x) +
∇2 f (x)
2
2
and
f is convex
⇔
Hf (x) is positively semidefinite for all x ∈ D.
(2.31)
Proof: According to Theorem 2.46 we have to verify convexity of all one-dimensional
projections of f .
Take x ∈ D, s ∈ Rn and consider function ϕx,s .
According to Lemma 2.24, we have
ϕ00x,s
(t) =
n X
n
X
i=1 j=1
∂ 2f
(x + ts)si sj = s> ∇2 f (x + ts) s.
∂xi ∂xj
Hence,
ϕx,s is convex ⇐⇒ ∀ t ∈ Dx,s
⇐⇒ ∀ t ∈ Dx,s
we have s> ∇2 f (x + ts) s ≥ 0
we have s> Hf (x + ts) s ≥ 0.
Finally, function f is convex if and only if Hf (x) is positively semidefinite for all x ∈ D.
Q.E.D.
Let us recall notion of positively semidefinite matrix and its equivalent definitions.
Lemma 2.50 For symmetric matrix A ∈ Rn×n the following is equivalent:
• A is positively semidefinite.
• For all x ∈ Rn we have x> Ax ≥ 0.
• All eigenvalues of matrix A are nonnegative.
• Determinants of all principle minors of matrix A are nonnegative, i.e.
∀ I ⊂ {1, 2, . . . , n}, I 6= ∅
we have
det (Ai,j , i, j ∈ I) ≥ 0.
• There are a regular matrix Q and a diagonal matrix Λ with nonnegative members
on its diagonal such that A = Q> ΛQ.
48
TO November 28, 2016:959
Lemma 2.51 For symmetric matrix A ∈ Rn×n the following is equivalent:
• A is positively definite.
• For all x ∈ Rn , x 6= 0 we have x> Ax > 0.
• All eigenvalues of matrix A are positive.
• Determinants of all principle corner minors of matrix A are positive, i.e.
∀ k ∈ {1, 2, . . . , n}
we have
det (Ai,j , i, j ∈ {1, 2, . . . , k}) > 0.
• There are a regular matrix Q and a diagonal matrix Λ with positive members on
its diagonal such that A = Q> ΛQ.
Let us recall expression of form A = Q> ΛQ means transformation of a quadratic form
to its polar base. For that there is an effective algorithm known as Gauss-Jordan elimination. In fact, it is Gauss elimination applied to rows and columns at ones, i.e. each
elementary transformation applied to rows must be applied to columns, too.
Let us recall smoothness of convex functions.
Theorem 2.52 : Let D ⊂ Rn , D 6= ∅ be a convex set and f : D → R be a convex
function. Then, f is continuous on rint (D).
Proof: Theorem is a reformulation of Theorem 2.33.
Q.E.D.
Theorem 2.53 : Let D ⊂ Rn , D 6= ∅ be an open convex set and f : D → R be a
function. If f is differentiable at D, then
f is convex ⇐⇒ ∀ x, y ∈ D we have f (x) − f (y) ≥ h ∇f (y) , x − y i .
Proof:
1. Let f be convex.
Take x, y ∈ D and denote h = x − y, ϕ(µ) = f (y + µh).
Then, ϕ is a differentiable convex function on Dy,h and its derivatives is
ϕ0 (µ) = h ∇f (y + µh) , h i .
According to “Theorem on mean value” there is θ ∈ (0, 1) such that
f (x) − f (y) = ϕ(1) − ϕ(0) = ϕ0 (θ)
≥ ϕ0 (0) = h ∇f (y) , h i = h ∇f (y) , x − y i ,
since derivatives of a convex differentiable function is nondecreasing.
(2.32)
49
Petr Lachout November 28, 2016:959
2. Let ∀ x, y ∈ D we have f (x) − f (y) ≥ h ∇f (y) , x − y i
Take y, z ∈ D,
λ ∈ (0, 1) and denote x = λy + (1 − λ)z.
According to assumption we have:
f (y) − f (x) ≥ h ∇f (x) , y − x i ,
f (z) − f (x) ≥ h ∇f (x) , z − x i .
Hence,
λf (y) + (1 − λ)f (z) ≥ f (x) + h ∇f (x) , λ(y − x) + (1 − λ)(z − x) i
= f (x) = f (λy + (1 − λ)z) .
According to Theorem 2.31, f is convex.
Q.E.D.
This property is generalized by subdiferencial and subgradient.
Definition 2.54 Let D ⊂ Rn , D 6= ∅ be a set and f : D → R be a function. We say, f
possesses at x ∈ D subgradient a ∈ Rn (cz. subgradient), if we have
f (y) − f (x) ≥ h a, y − x i for all y ∈ D.
(2.33)
Set of all subgradients at x will be called subdiferential of f at x (cz. subdiferenciál) and
will be denoted by ∂f (x).
Using subdiferential we can equivalently rewrite definition of global minimum.
Theorem 2.55 : Let D ⊂ Rn , x∗ ∈ D and f : D → R be a function. Then, x∗ is a
global minimum of f on D if and only if 0 ∈ ∂f (x∗ ).
Proof: Statement is a trivial consequence of subgradient definition, since
0 ∈ ∂f (x∗ )
⇐⇒
∀ x ∈ D f (x) ≥ f (x∗ ) .
Q.E.D.
The rewriting can be understand as a generalization of methodology to determine local
minima seeking for zero derivative. Unfortunately, it is a rewriting having no practical
importance. Nothing new is received by this idea.
Subgradient and subdiferencial are helpful tools for describing local minima of a
convex function; as we will see at Chapter 3.
50
TO November 28, 2016:959
Lemma 2.56 If G ⊂ Rn is nonempty open convex set, f : G → R is a convex function
and y ∈ G. If f possesses gradient at y, then ∂f (y) = {∇f (y)}.
Proof: Take η ∈ ∂f (y) and i ∈ {1, 2, . . . , n}.
For sufficiently small λ > 0, we have y + λei:n , y − λei:n ∈ G. Therefore using convexity
of f , we are receiving bounds
f (y + λei:n ) − f (y) ≥ h η, λei:n i = ληi ,
f (y − λei:n ) − f (y) ≥ h η, −λei:n i = −ληi .
Dividing by λ and letting λ → 0+, we find
∂f
(y) ≥ ηi ,
∂xi
∂f
(y) ≥ −ηi .
−
∂xi
Consequently, η = ∇f (y) for each η ∈ ∂f (y).
That is ∂f (y) = {∇f (y)}.
Q.E.D.
Lemma 2.57 Let G ⊂ Rn be a nonempty open convex set, f : G → R be a convex
function and y ∈ G. If ∂f (y) is a one-point set, then f is differentiable at y and
∂f (y) = {∇f (y)}.
Proof: Theorem from mathematical analysis.
Q.E.D.
Previous observations can be summed up in a lemma.
Lemma 2.58 Let G ⊂ Rn be a nonempty open convex set, f : G → R be a convex
function and y ∈ G. Hence, the following is equivalent:
1. f is differentiable at y and ∂f (y) = {∇f (y)}.
2. ∂f (y) is a one-point set.
3. f possesses a gradient at y.
Results on separation of convex bodies have consequences for convex function.
Petr Lachout November 28, 2016:959
51
Theorem 2.59 : Let D ⊂ Rn be a nonempty convex set and f : D → R be a convex
function. Then, ∂f (x) 6= ∅ for each x ∈ rint (D).
Proof: Without any loss of generality we can assume int (D) 6= ∅.
Take x ∈ int (D).
Then, (x, f (x)) ∈ ∂ (epi (f )) and according to Theorem 1.33 there are α ∈ Rn and β ∈ R
such that ( αβ ) 6= 0 and for all (y, η) ∈ epi (f ) we have
h α, y i + βη ≥ h α, x i + βf (x).
Number η can be arbitrary large, therefore, β ≥ 0.
1) Assume β = 0.
Since x ∈ int (D), there is δ > 0 such that Uδ (x) ⊂ D.
Therefore, for all y ∈ Uδ (x) we have h α, y i ≥ h α, x i .
According to Lemma 1.43, α = 0.
Hence, ( αβ ) = 0 which is a contradiction, because the vector must not be the origin.
2) Assume β > 0.
Consequently, for all (y, η) ∈ epi (f ) we have
1
1
α, y + η ≥
α, x + f (x).
β
β
Therefore, for all y ∈ Dom (f ) we have
1
1
f (y) − f (x) ≥
α, x − y = − α, y − x .
β
β
We have found β > 0 and − β1 α ∈ ∂f (x). Theorem is proved.
Q.E.D.
Equivalent description of a convex function using non-emptiness of subdiferentials is
in power if function definition region is an open set.
Theorem 2.60 : Let D ⊂ Rn be an open convex set and f : D → R. Then, f is a
convex function if and only if ∂f (x) 6= ∅ for each x ∈ D.
Proof:
1. According to Theorem 2.59, ∂f (x) 6= ∅ for each x ∈ D.
52
TO November 28, 2016:959
2. Assume ∂f (x) 6= ∅ for each x ∈ D.
Take x, y ∈ D and 0 < λ < 1.
Then z = λx + (1 − λ)y ∈ D, since D is a convex set.
Take α ∈ ∂f (z), which exists according to our assumption.
Definition of subgradient is giving to us
f (x) − f (z) ≥ h α, x − z i ,
f (y) − f (z) ≥ h α, y − z i .
Therefore,
λ(f (x) − f (z)) + (1 − λ)(f (y) − f (z)) ≥ λ h α, x − z i + (1 − λ) h α, y − z i .
Hence,
λf (x) + (1 − λ)f (y) − f (z) ≥ h α, λx + (1 − λ)y − z i = 0.
We have shown
λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y).
Thus, f is convex according to Theorem 2.31.
Q.E.D.
For a continuous function, the characterization is also in power.
Theorem 2.61 : Let D ⊂ Rn be a convex set and f : D → R be a continuous function.
Then, f is a convex function if and only if ∂f (x) 6= ∅ for each x ∈ rint (D).
Proof: Accordingly to Theorem 2.59, the condition is fulfilled for a convex function.
We have to show the opposite implication, only.
1. Accordingly to Theorem 2.60, f : rint (D) → R is convex.
2. Take x, y ∈ D and 0 < λ < 1.
Since D is convex, we have D ⊂ clo (rint (D)).
Then, there are sequences xk , yk ∈ rint (D) such that xk → x and yk → y.
For each k ∈ N, we have
λf (xk ) + (1 − λ)f (yk ) ≥ f (λxk + (1 − λ)yk ).
After limit passage k → +∞ and using continuity of f on D, we receive
λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y).
We have proved f is convex.
Q.E.D.
53
Petr Lachout November 28, 2016:959
2.3.4
Vector valued convex functions
In this section we consider functions defined on a finite dimensional Euclidean space with
values in a Cartesian product of finite number of extended real lines, i.e. f : Rn → (R∗ )m .
Definition 2.62 For a function f : Rn → (R∗ )m , we define its epigraph (cz. epigraf )
epi (f ) =
x
η
n
m
: f (x) ≤ η, x ∈ R , η ∈ R
,
(2.34)
domain (cz. doména) and weak domain (cz. slabá doména)
Dom (f ) = {x : f (x) < +∞, x ∈ Rn } ,
WDom (f ) = {x : fi (x) < +∞ for some i ∈ {1, 2, . . . , m}, x ∈ Rn } .
(2.35)
(2.36)
Definition 2.63 A function f : Rn → (R∗ )m is called monotone (cz. monotónnı́), if
f (x) ≤ f (y) whenever x ≤ y.
Definition 2.64 A function f : Rn → (R∗ )m is convex (cz. konvexnı́), if epi (f ) is a
convex set and WDom (f ) = Dom (f ).
Convexity of a function can be equivalently explained.
Lemma 2.65 A function f : Rn → (R∗ )m is convex if and only if Dom (f1 ) = Dom (f2 ) =
· · · = Dom (fm ) = Dom (f ), Dom (f ) is a convex set and fi is a convex function for each
i ∈ {1, 2, . . . , m}.
Theorem 2.66 : Function f : Rn → (R∗ )m is convex if and only if Dom (f1 ) =
Dom (f2 ) = · · · = Dom (fm ) = Dom (f ), Dom (f ) is a convex set and for all x, y ∈
Dom (f ) and 0 < λ < 1 we have
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) .
(2.37)
Proof: Statement is a consequence of Theorem 2.31 and Lemma 2.65.
Q.E.D.
Composition of functions preserves convexity of functions under some circumstances.
Lemma 2.67 If X ⊂ Rn is a convex set, h : Rn → Rm is an affine linear function
and g : h (X ) → R is a convex function, then, f : X → R : x 7→ g(h(x)) is a convex
function.
54
TO November 28, 2016:959
Proof: Assumptions are corectly formulated, since h (X ) is a convex set, because X is
a convex set and h : Rn → Rm is an affine linear function.
For x, y ∈ X and λ ∈ (0, 1) we have
f (λx + (1 − λ)y) =
=
≤
=
g (h (λx + (1 − λ)y))
g (λh (x) + (1 − λ)h (y))
λg (h (x)) + (1 − λ)g (h (y))
λf (x) + (1 − λ)f (y) .
Q.E.D.
Lemma 2.68 If X ⊂ Rn is a convex set, h : X → Rm is a convex function and
g : conv (h (X )) → R is a monotone convex function, then, f : X → R : x 7→ g(h(x)) is
a convex function.
Proof: For x, y ∈ X and λ ∈ (0, 1) we can estimate
f (λx + (1 − λ)y) =
≤
≤
=
g (h (λx + (1 − λ)y))
g (λh (x) + (1 − λ)h (y))
λg (h (x)) + (1 − λ)g (h (y))
λf (x) + (1 − λ)f (y) .
Q.E.D.
2.3.5
Generalization of convex functions
Now, we introduce generalizations of convex functions which are useful for optimization.
Definition 2.69 Let S ⊂ D ⊂ Rn , S be a nonempty open set, x ∈ S and f : D → R be
differentiable at x.
1. We say, f is pseudoconvex at x with respect to S (cz. pseudokonvexnı́ v bodě x
vzhledem k S), if
∀ y ∈ S, h ∇f (x) , y − x i ≥ 0 we have f (y) ≥ f (x) .
2. We say, f is strictly pseudoconvex at x with respect to S (cz. striktně pseudokonvexnı́ v bodě x vzhledem k S), if
∀ y ∈ S, y 6= x, h ∇f (x) , y − x i ≥ 0 we have f (y) > f (x) .
55
Petr Lachout November 28, 2016:959
3. We say, f is pseudoconcave at x with respect to S (cz. pseudokonkávnı́ v bodě x
vzhledem k S), if −f is pseudoconvex at x with respect to S.
4. We say, f is strictly pseudoconcave at x with respect to S (cz. striktně pseudokonkávnı́
v bodě x vzhledem k S), if −f is strictly pseudoconvex at x with respect to S.
Definition 2.70 Let S ⊂ D ⊂ Rn , S be a nonempty open set and f : D → R be
differentiable at S.
1. We say, f is pseudoconvex on S (cz. pseudokonvexnı́ na S), if
∀ x, y ∈ S, h ∇f (x) , y − x i ≥ 0 we have f (y) ≥ f (x) .
2. We say, f is strictly pseudoconvex on S (cz. striktně pseudokonvexnı́ na S), if
∀ x, y ∈ S, x 6= y, h ∇f (x) , y − x i ≥ 0 we have f (y) > f (x) .
3. We say, f is pseudoconcave on S (cz. pseudokonkávnı́ na S), if −f is pseudoconvex
on S.
4. We say, f is strictly pseudoconcave on S (cz. striktně pseudokonkávnı́ na S), if
−f is strictly pseudoconvex on S.
Definition 2.71 Let S ⊂ D ⊂ Rn , S be a nonempty convex set, f : D → R and x ∈ S.
1. We say, f is quasiconvex at x with respect to S (cz. quasikonvexnı́ v bodě x vzhledem k S) if
∀ y ∈ S, 0 < λ < 1 we have f (λx + (1 − λ)y) ≤ max{f (x) , f (y)}.
2. We say, f is quasiconvex on S (cz. quasikonvexnı́ na S), if
∀ y, z ∈ S, 0 < λ < 1 we have f (λy + (1 − λ)z) ≤ max{f (y) , f (z)}.
Lemma 2.72 Let D ⊂ Rn , D 6= ∅ be an open convex set and h : D → R be quasiconvex
on D. Then, (lower) level sets of h (cz. (dolnı́) úrovňové množiny) are convex , i.e.
∀ ∆ ∈ R sets lev≤∆ h = {x ∈ D : f (x) ≤ ∆} and lev<∆ h = {x ∈ D : f (x) < ∆} are
convex.
Proof: The statement is a direct consequence of Definition 2.71.
Q.E.D.
56
TO November 28, 2016:959
Lemma 2.73 Let S ⊂ D ⊂ Rn , S be a nonempty open convex set and h : D → R be
differentiable at S and pseudoconvex on S. Then, h is quasiconvex on S.
Proof: Take x, y ∈ S, 0 < λ < 1 and denote z = λx + (1 − λ)y.
Assume h (z) > h (x).
Function h is pseudoconvex, therefore, h ∇x h (z) , x − z i < 0.
Consider,
x − z = x − (λx + (1 − λ)y) = (1 − λ)(x − y),
y − z = y − (λx + (1 − λ)y) = λ(y − x) = −λ(x − y).
Hence,
λ
(x − z),
(1 − λ)
λ
h ∇x h (z) , y − z i = −
h ∇x h (z) , x − z i > 0 .
(1 − λ)
y−z = −
Function h is pseudoconvex, therefore, h (z) ≤ h (y).
Finally, h (z) ≤ max{h (x) , h (y)} and h is quasiconvex on S.
Q.E.D.
Lemma 2.74 Let D ⊂ Rn , D 6= ∅ be an open convex set and be differentiable at S and
pseudoconvex on S. Then, (lower) level sets of h (cz. (dolnı́) úrovňové množiny) are convex , i.e. ∀ ∆ ∈ R sets lev≤∆ h = {x ∈ D : f (x) ≤ ∆} and lev<∆ h = {x ∈ D : f (x) < ∆}
are convex.
Proof: The statement is a direct consequence of Lemma 2.73 and Lemma 2.72.
Q.E.D.
Function pseudoconvex at a point does not have to be quasiconvex at the point; see
an example.
Example 2.75: Consider D = (−2, 1) and f (x) = −x2 . Function f is pseudoconvex
at −1, but it is not quasiconvex at −1.
4
Chapter 3
Mathematical programming
In this chapter we introduce a proper course how to seek for local minimizers of a
mathematical program. The procedure ideas are taken from the book [1].
Mathematical program means a task
min f (x) ,
x∈M
(3.1)
where M ⊂ G ⊂ Rn , M 6= ∅, G is open and f : G → R.
Our aim is to introduce necessary and sufficient condition for a given point to be
a local minimizer. Step by step we will consider mathematical programs with different
description.
We divide conditions to conditions of the first order, i.e. conditions based on gradient
or on its generalization, and conditions of the second order, i.e. conditions based on
second partial derivatives or its generalization.
We will denote:
• FONC (First Order Necessary Condition) (cz. nutn condition optimality prvho
řdu),
• FOSC (First Order Sufficient Condition) (cz. sufficient condition optimality prvho
řdu),
• SONC (Second Order Necessary Condition) (cz. nutn condition optimality druhho
řdu),
• SOSC (Second Order Sufficient Condition) (cz. sufficient condition optimality
druhho řdu).
3.1
Convex objective function
We start with convex programming, i.e. objective function f is a convex function and
M is a convex set. For this case we possess a full description by a first order condition.
57
58
TO November 28, 2016:959
Theorem 3.1 (FONC+FOSC for KP) : Let M ⊂ G ⊂ Rn , M be convex, G be an open
convex set, x
e ∈ M and f : G → R be a convex function. Then, x
e is a global minimum
of f on M if and only if there is η ∈ ∂f (e
x) such that ∀ x ∈ M we have h η, x − x
e i ≥ 0.
Proof: We have to show both implications.
1. Let η ∈ ∂f (e
x) and ∀ x ∈ M we have h η, x − x
e i ≥ 0.
According to definition of subgradient, Definition 2.54, we have
∀x ∈ G
f (x) ≥ f (e
x) + h η, x − x
ei.
Using the condition for x ∈ M, we receive
f (x) ≥ f (e
x) + h η, x − x
e i ≥ f (e
x) .
We have verified x
e is a global minimum of f on M.
2. Let x
e be a global minimum of f on M.
Consider two sets
Ψ = {(x, y) : x + x
e ∈ G, y > f (x + x
e) − f (e
x)} ,
Γ = {(x, y) : x + x
e ∈ M, y ≤ 0} .
The sets can be also expressed as
Ψ = (epi (f ) \ graph (f )) − (e
x, f (e
x)) ,
Γ = (M − x
e) × (−∞, 0].
Sets are convex and Ψ ∩ Γ = ∅.
According to Theorem 1.42, Ψ, Γ can be properly separated.
Hence, there are α ∈ Rn , β ∈ R, (α, β) 6= 0 such that
inf {h α, x i + βy : (x, y) ∈ Ψ} ≥ sup {h α, x i + βy : (x, y) ∈ Γ} .
We know (0, 0) ∈ Γ and (0, 2−k ) ∈ Ψ for each k ∈ N.
Consequently,
inf {h α, x i + βy : (x, y) ∈ Ψ} = 0 = sup {h α, x i + βy : (x, y) ∈ Γ} .
Realize about a sign of β.
59
Petr Lachout November 28, 2016:959
(a) Assume β < 0.
Then, we receive a contradiction
inf {h α, x i + βy : (x, y) ∈ Ψ} = −∞,
since (0, y) ∈ Ψ for any y > 0.
(b) Assume β = 0.
Hence, h α, x i ≥ 0 for each x ∈ G − x
e,
since (x, f (x + x
e) − f (e
x) + 1) ∈ Ψ.
We know x
e ∈ M ⊂ G and G ⊂ Rn is open, therefore, G − x
e is open and
0∈G−x
e.
According to Lemma 1.43, α = 0.
That is a contradiction with property (α, β) 6= 0.
We found β > 0. Then,
h α, x i + βy ≥ 0 for each x ∈ G − x
e, y > f (x + x
e) − f (e
x) ,
h α, ξ i + βη ≤ 0 for each ξ ∈ M − x
e, η ≤ 0.
Hence, letting y → f (x + x
e) − f (e
x) and setting η = 0 we receive
h α, x i + β (f (x + x
e) − f (e
x)) ≥ 0 for each x ∈ G − x
e,
h α, ξ i ≤ 0 for each ξ ∈ M − x
e.
Set γ = − β1 α and shift points back; i.e. x + x
e
x,
− h γ, x − x
e i + f (x) − f (e
x) ≥ 0 for each x ∈ G,
− h γ, x − x
e i ≤ 0 for each x ∈ M.
Consequently,
f (x) ≥ f (e
x) + h γ, x − x
e i for each x ∈ G,
h γ, x − x
e i ≥ 0 for each x ∈ M.
We have found γ ∈ ∂f (e
x) with required property.
Q.E.D.
The condition becomes simpler for points from interior of M.
Lemma 3.2 If M ⊂ G ⊂ Rn , M is convex, G is open convex, x
e ∈ int (M) and f : G →
R is a convex function, then
x
e is a global minimum of f on M ⇐⇒ 0 ∈ ∂f (e
x) .
60
TO November 28, 2016:959
Proof: According to Theorem 3.1, x
e is a global minimum of f on M if and only if
there is η ∈ ∂f (e
x) such that for each x ∈ M we have h η, x − x
e i ≥ 0.
Since x
e ∈ int (M), there is ε > 0 such that U (e
x, ε) ⊂ M.
Then for each x ∈ U (e
x, ε), we have h η, x − x
e i ≥ 0.
Hence accordingly to Lemma 1.43, η = 0.
Q.E.D.
Lemma 3.3 Let M ⊂ G ⊂ Rn , M be convex, G be an open convex set, x
e ∈ M and
f : G → R be convex. If f possesses partial derivatives at x
e, then
x
e is a global minimum of f on M ⇐⇒ ∀ x ∈ M we have h ∇f (e
x) , x − x
e i ≥ 0.
Proof: According to lemma 2.58, ∂f (e
x) = {∇f (e
x)}, since f possesses partial derivatives at x
e.
Consequently, the statement follows directly Theorem 3.1.
Q.E.D.
The characterization coincides with classical one for interior points of M.
Lemma 3.4 Let M ⊂ G ⊂ Rn , M be convex, G be an open convex set, x
e ∈ int (M)
and f : G → R be a convex function. If f possesses partial derivatives at x
e, then
x
e is a global minimum of f on M ⇐⇒ ∇f (e
x) = 0.
Proof: The statement follows Lemma 3.3.
Q.E.D.
3.2
Concave objective function
In this chapter we consider local minima of a concave function on a convex set. Let us
formulate a sufficient condition of the first order.
Theorem 3.5 (FONC for a concave function) : Let M ⊂ G ⊂ Rn , M be a non-empty
convex set, G be an open convex set, f : G → R be a concave function and x
e ∈ M.
If x
e is a local minimum of f on M, then for each η ∈ Rn , −η ∈ ∂(−f ) (e
x) we have
h η, x − x
e i ≥ 0 ∀ x ∈ M.
61
Petr Lachout November 28, 2016:959
Proof: Let x
e be a local minimum of f on M.
Hence, there is δ > 0 such that x
e is a global minimum of f on M ∩ U (e
x, δ).
If x ∈ M, then there is λ > 0 such that x
e + λ(x − x
e) ∈ M ∩ U (e
x, δ).
Therefore, f (e
x + λ(x − x
e)) ≥ f (e
x).
n
Taking η ∈ R , −η ∈ ∂(−f ) (e
x), we estimate
f (e
x) ≤ f (e
x + λ(x − x
e)) ≤ f (e
x) + λ h η, x − x
ei.
Consequently, for each η ∈ Rn , −η ∈ ∂(−f ) (e
x) and x ∈ M we have h η, x − x
e i ≥ 0.
Q.E.D.
Unfortunately, the condition is not sufficient, see the following example.
Example 3.6: Consider function f : R → R : x 7→ −x2 and M = [−1, 2].
Condition from Theorem 3.5 is fulfilled in points −1, 0, 2. Points −1, 2 are local minima
of f on M. The point 0 is a global maximum of f , but, no local minimum.
4
If a concave function possesses a local minimum in an interior point of the set of
feasible solutions, then the function must be flat in its neighborhood.
Lemma 3.7 Let M ⊂ G ⊂ Rn , M be a non-empty convex set, G be an open convex set,
f : G → R be a concave function and x
e ∈ int (M). If x
e is a local minimum of f on M,
then ∂(−f ) (e
x) = {0} and there is δ > 0 such that U (e
x, δ) ⊂ M and for all x ∈ U (e
x, δ)
we have f (x) = f (e
x).
Proof: Let x
e ∈ int (M) be a local minimum of f on M.
1. Take η ∈ Rn , −η ∈ ∂(−f ) (e
x).
Hence, there is δ > 0 such that x
e − δη ∈ M.
According to Theorem 3.5 we have h η, −δη i = −δ kηk2 ≥ 0.
Finally, η = 0 and ∂(−f ) (e
x) = {0}.
2. There is δ > 0 such that U (e
x, δ) ⊂ M and for all x ∈ U (e
x, δ) we have f (x) ≥ f (e
x).
f is a concave, therefore, for all x ∈ U (e
x, δ) and η ∈ ∂(−f ) (e
x) = {0} we have
f (e
x) ≤ f (x) ≤ f (e
x) + h η, x − x
e i = f (e
x).
Finally, f (x) = f (e
x) for all x ∈ U (e
x, δ).
Q.E.D.
62
TO November 28, 2016:959
Lemma 3.8 Let M ⊂ G ⊂ Rn , M be a non-empty convex set, G be an open convex
set, f : G → R be a concave function, x
e ∈ M and f be differentiable at x
e. If x
e is a local
minimum of f on M, then
∀ x ∈ M we have h ∇f (e
x) , x − x
e i ≥ 0.
Proof: We know −f is convex function which is differentiable at x
e. According to lemma
2.58, subdiferencial of −f at x
e contains gradient, only. Consequently, the statement
follows directly from Theorem 3.5.
Q.E.D.
Lemma 3.9 Let M ⊂ G ⊂ Rn , M be a non-empty convex set, G be an open convex
set, f : G → R be a concave function, x
e ∈ int (M) and f be differentiable at x
e. If x
e is a
local minimum of f on M, then ∇f (e
x) = 0 and there is δ > 0 such that U (e
x, δ) ⊂ M
and for all x ∈ U (e
x, δ) we have f (x) = f (e
x).
Proof: We know, −f is a convex function differentiable at x
e. According to lemma
2.58, its subdiferencial equals to gradient. Consequently, the statement follows directly
from Theorem 3.5.
Q.E.D.
3.3
Unconstrained problem
Previous chapters consider convex and concave objective function which are very well
characterized by means of subgradients and subdiferencials.
In this chapter, we will consider general real-valued function as an objective. For such
a function, subdiferencials are typically empty. Therefore, we have to deal with functions
differentiable (or twice differentiable) at a given point; for definitions see Section 2.2.
Nevertheless, presented sufficient conditions require some kind of convexity for objective
function; employed properties are introduced in Section 2.3.5.
Now, we introduce optimality conditions of the first and of the second order for
unconstrained problem (UP) (cz. volný extrm).
Theorem 3.10 (FONC for UP) : Let f : Rn → R and x
e ∈ Rn . If x
e is a local minimum
n
of f on R and f is differentiable at x
e, then ∇f (e
x) = 0.
63
Petr Lachout November 28, 2016:959
Proof: Assume, ∇f (e
x) 6= 0.
∂f
Hence, there is an index i ∈ {1, 2, . . . , n} such that ∂x
(e
x) 6= 0.
i
∂f
x) > 0, then there is α > 0 such that f (e
x − αei:n ) < f (e
x).
If ∂xi (e
∂f
If ∂xi (e
x) < 0, then there is α > 0 such that f (e
x + αei:n ) < f (e
x).
n
We found out, x
e cannot be a local minimum of f on R .
Q.E.D.
Theorem 3.11 (SONC for UP) : Let f : Rn → R and x
e ∈ Rn . If x
e is a local minimum
n
of f on R and f is twice differentiable at x
e, then ∇f (e
x) = 0 and Hf (e
x) is a positively
semidefinite matrix.
Proof: Function f is twice differentiable at x
e, then accordingly to Definition 2.21, for
n
h ∈ R Taylor expansion of the second order is in power
f (e
x + h) = f (e
x) + h ∇f (e
x) , h i +
1
h h, Hf (e
x) h i + khk2 R2 (h; f, x
e) ,
2
where limh→0 R2 (h; f, x
e) = 0 and Hf (e
x) is symmetric.
According to Theorem 3.10, ∇f (e
x) = 0. Expansion becomes to be more simple
f (e
x + h) = f (e
x) +
1
h h, Hf (e
x) h i + khk2 R2 (h; f, x
e) .
2
Multiplying h by α > 0, we receive
f (e
x + αh) = f (e
x) +
α2
h h, Hf (e
x) h i + α2 khk2 R2 (αh; f, x
e) .
2
Point x
e is a local minimum of f on Rn , therefore, for α > 0 small enough
0≤
f (e
x + αh) − f (e
x)
1
= h h, Hf (e
x) h i + khk2 R2 (αh; f, x
e) .
2
α
2
We know, limα→0+ R2 (αh; f, x
e) = 0.
Letting α → 0+, we receive
∀ h ∈ Rn , h 6= 0 : 0 ≤ h h, Hf (e
x) h i .
According to Definition 2.21, Hf (e
x) is a symmetric matrix.
We have derived, Hf (e
x) is a positively semidefinite matrix,
Q.E.D.
64
TO November 28, 2016:959
Theorem 3.12 (SOSC for UP) : Let f : Rn → R and x
e ∈ Rn . If f is twice differentiable at x
e, ∇f (e
x) = 0 and Hf (e
x) is a positively definite matrix, then x
e is a strict local
minimum of f on Rn .
Proof: Function f is twice differentiable at x
e, then accordingly to Definition 2.21, for
n
h ∈ R Taylor expansion of the second order is in power
f (e
x + h) = f (e
x) + h ∇f (e
x) , h i +
1
h h, Hf (e
x) h i + khk2 R2 (h; f, x
e) ,
2
where limh→0 R2 (h; f, x
e) = 0 .
Matrix Hf (e
x) is positively definite. Hence, its eigen values are positive. Let us denote
its smallest eigen value by symbol λ.
Assumption ∇f (e
x) = 0, simplifies the expansion.
For h small enough, we have
1
x) h i + khk2 R2 (h; f, x
e)
f (e
x + h) − f (e
x) = h h, Hf (e
2
1
h
h
= khk2
, Hf (e
x)
+ 2R2 (h; f, x
e)
2
khk
khk
1
≥ khk2 (λ + 2R2 (h; f, x
e))
2
1
≥ khk2 λ > 0.
4
We detect, x
e is a strict local minimum of f on Rn .
Q.E.D.
3.4
General objective function
We consider local minimizers of a objective function under constraints expressed via a
set of feasible solutions.
We introduce notation of convenient tools.
Definition 3.13 For M ⊂ Rn , x ∈ clo (M), we introduce
Set of Feasible Directions of M at x (or, Cone of Feasible Directions of M at x) (cz.
množina přı́pustných směrů M v bodě x)
DM (x) = {s ∈ Rn : s 6= 0, ∃δ > 0 ∀ 0 < λ < δ : x + λs ∈ M} .
(3.2)
Its members are called Feasible Directions of M at x (cz. přı́pustn směry M v bodě x).
Petr Lachout November 28, 2016:959
65
Lemma 3.14 If M ⊂ Rn , x ∈ clo (M), s ∈ DM (x) and α > 0,
then αs ∈ DM (x). In other words DM (x) ∪ {0} is a cone.
Proof: Take s ∈ DM (x) and α > 0.
Then, there is a δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ M.
Consequently, ∀ 0 < λ < αδ we have x + λαs ∈ M.
Thus, αs ∈ DM (x).
Q.E.D.
There is no simple characterization for DM (x) ∪ {0} to be closed.
Lemma 3.15 For M ⊂ Rn convex, x ∈ clo (M), we have DM (x) is convex.
Proof: Take s, t ∈ DM (x) and 0 < α < 1.
Then, there are δ > 0, η > 0 such that ∀ 0 < λ < δ we have x + λs ∈ M and ∀ 0 < ϕ < η
we have x + ϕt ∈ M.
Because M is convex, ∀ 0 < ρ < δ ∧ η we have
x + ρ (αs + (1 − α)t) = α (x + ρs) + (1 − α) (x + ρt) ∈ M.
Thus, αs + (1 − α)t ∈ DM (x).
Q.E.D.
Lemma 3.16 For M ⊂ Rn convex, x ∈ M, we have
DM (x) = {α(y − x) : α > 0, y ∈ M, y 6= x} .
(3.3)
Proof:
1. Let y ∈ M, y 6= x, α > 0.
Then, for all 0 < λ < α1 we have x + λα(y − x) = (1 − αλ)x + αλy ∈ M, since M
is convex and 0 < αλ < 1.
We derived α(y − x) ∈ DM (x)
2. Take s ∈ DM (x).
Then, there is a δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ M.
Set y = x + λ2 s.
Then, y ∈ M and s = λ2 (y − x).
66
TO November 28, 2016:959
Q.E.D.
A tangent cone is a generalization of cone of feasible directions.
Definition 3.17 Let M ⊂ Rn , x
e ∈ clo (M). We define Tangent Cone to M at x
e (or,
Cone of Tangents) (cz. tečný kužel k množině M v bodě x
e) by
∃ xk ∈ M, λk > 0k ∈ N
n
TM (e
x) =
s∈R :
.
(3.4)
such that xk → x
e, λk (xk − x
e) → s.
Lemma 3.18 If M ⊂ Rn , x
e ∈ clo (M), then TM (e
x) is a closed cone.
Proof:
1. Since x
e ∈ clo (M), we have xk ∈ M, k ∈ N such that xk → x
e.
Set λk = 1 for each k ∈ N.
Hence, λk (xk − x
e) → 0.
Finally, 0 ∈ TM (e
x).
2. Take s ∈ TM (e
x) and α > 0.
There exist xk ∈ M, λk > 0, k ∈ N such that xk → x
e, λk (xk − x
e) → s.
Set ψk = αλk . Then, ψk (xk − x
e) → αs.
We received αs ∈ TM (e
x).
3. Take a sequence sν ∈ TM (e
x) for ν ∈ N and sν → s ∈ Rn .
Hence, there are xk,ν ∈ M, λk,ν > 0, k, ν ∈ N such that
xk,ν → x
e, λk,ν (xk,ν − x
e) → sν if k → +∞.
For each ν ∈ N, we select kν ∈ N such that
e) − sν k < ν1 .
kxkν ,ν − x
ek < ν1 and kλkν ,ν (xkν ,ν − x
Set ξν = xkν ,ν , ψν = λkν ,ν .
Then, ξν → x
e and ψν (ξν − x
e) → s.
We have verified s ∈ TM (e
x).
Consequently, TM (e
x) is a closed set.
Q.E.D.
Lemma 3.19 If M ⊂ Rn is a convex set and x
e ∈ clo (M), then TM (e
x) is a closed
convex cone.
67
Petr Lachout November 28, 2016:959
Proof: According to Lemma 3.18, TM (e
x) is a closed cone. It remains to show that
TM (e
x) is a convex set.
Take s, σ ∈ TM (e
x) a 0 < α < 1.
Hence, there are xk , yk ∈ M, λk > 0, ψk > 0, k ∈ N such that xk → x
e, yk → x
e,
λk (xk − x
e) → s. ψk (yk − x
e) → σ.
There is a sequence ξk ∈ M, k ∈ N such that ξk → x
e, (λk + ψk ) (ξk − x
e) → 0, since
x
e ∈ clo (M).
Define new sequences
λk
(xk − x
e) ,
λk + ψ k
ψk
= ξk +
(yk − x
e) ,
λk + ψ k
= λk + ψ k .
zk = ξk +
wk
ρk
We have zk , wk , αzk + (1 − α)wk ∈ M, since M is convex.
Moreover, ρk > 0, αzk + (1 − α)wk → x
e,
ρk (αzk + (1 − α)wk − x
e)
ψk
λk
(xk − x
e) + (1 − α)
(yk − x
e) + (ξk − x
e)
= ρk α
λk + ψ k
λk + ψk
= αλk (xk − x
e) + (1 − α)ψk (yk − x
e) + (λk + ψk ) (ξk − x
e)
−−−−−→ αs + (1 − α)σ.
k→+∞
We received αs + (1 − α)σ ∈ TM (e
x).
We have proved TM (e
x) is convex.
Q.E.D.
Lemma 3.20 If M ⊂ Rn is a set and x
e ∈ clo (M), then Tclo(M) (e
x) = TM (e
x).
Proof: Evidently, Tclo(M) (e
x) ⊃ TM (e
x). Opposite inclusion must be shown.
Take for that, s ∈ Tclo(M) (e
x), s 6= 0.
Then, there are xk ∈ clo (M), λk > 0 for k ∈ N such that xk → x
e, λk (xk − x
e) → s.
Since s 6= 0, limk→+∞ λk = +∞.
Moreover, there are yk ∈ M for k ∈ N such that kxk − yk k < kλ1 k .
Then, yk → x
e and
λk (yk − x
e) = λk (xk − x
e) + λk (yk − xk ) → s,
since λk kxk − yk k < k1 → 0.
We have shown s ∈ TM (e
x).
68
TO November 28, 2016:959
Q.E.D.
Lemma 3.21 Let M ⊂ Rn , x ∈ clo (M) and S ⊂ Rn , x ∈ int (S). Then, TM∩S (x) =
Tclo(M)∩clo(S) (x) = TM (x) = Tclo(M) (x).
Proof: We have to show TM∩S (x) = TM (x). The rest of the statement follows Lemma
3.20.
1. Since M ∩ S ⊂ M, the inclusion TM∩S (x) ⊂ TM (x) is evident.
2. Assume s ∈ TM (x).
Then, there are sequences xk ∈ M, λk > 0 for k ∈ N such that xk → x,
λk (xk − x) → s.
Since x ∈ int (S), xk ∈ S for all k large enough.
Therefore, s ∈ TM∩S (x).
Q.E.D.
Lemma 3.22 Let M ⊂ Rn , x
e ∈ clo (M). Hence, DM (e
x) ⊂ TM (e
x),
Proof: Take s ∈ DM (e
x).
It is sufficient to set λk = k and xk = x
e + k1 s for each k ∈ N.
If k ∈ N is sufficiently large, xk ∈ M.
Thus, s ∈ TM (e
x).
Q.E.D.
Now, we will approximate objective function.
Definition 3.23 Let G ⊂ Rn be open, x ∈ G, f : G → R. We introduce
Set of Improving Directions of f at x (or, Cone of Improving Directions of f at x) (cz.
množina zlepujı́cı́ch směrů f v bodě x )
Ff (x) = {s ∈ Rn : s 6= 0, ∃δ > 0 ∀ 0 < λ < δ : f (x + λs) < f (x)} .
(3.5)
Its members are called Improving Directions of f at x (cz. zlepujı́cı́ směry f v bodě x).
If f is differentiable at x, the set of improving directions is approximated by following
sets:
Ff,0 (x) = {s ∈ Rn : h ∇f (x) , s i < 0} ,
F0f,0 (x) = {s ∈ Rn : s 6= 0, h ∇f (x) , s i ≤ 0} .
(3.6)
(3.7)
Petr Lachout November 28, 2016:959
69
Lemma 3.24 If G ⊂ Rn is open, x ∈ G, f : G → R, s ∈ Ff (x), α > 0, then
αs ∈ Ff (x).
For f differentiable at x, we have
• If s ∈ Ff,0 (x), α > 0, then αs ∈ Ff,0 (x).
• If s ∈ F0f,0 (x), α > 0, then αs ∈ F0f,0 (x).
In other words Ff (x) ∪ {0}, Ff,0 (x) ∪ {0}, F0f,0 (x) ∪ {0} are cones.
Proof: Take s ∈ Ff (x) and α > 0.
Then, there is a δ > 0 such that ∀ 0 < λ < δ we have f (x + λs) < f (x).
Consequently, ∀ 0 < λ < αδ we have f (x + λαs) < f (x).
Thus, αs ∈ Ff (x).
Property for Ff,0 (x), F0f,0 (x) is evident.
Q.E.D.
There is no simple characterization for Ff (x) ∪ {0} to be closed.
Lemma 3.25 If G ⊂ Rn is convex open, x ∈ G, f : G → R is convex, then Ff (x) is
convex and
Ff (x) = {α(y − x) : α > 0, f (y) < f (x) , y ∈ G} .
(3.8)
Proof:
1. Let y ∈ M, y 6= x, α > 0.
Then, for all 0 < λ < α1 we have x + λα(y − x) = (1 − αλ)x + αλy ∈ M, since M
is convex and 0 < αλ < 1.
We derived α(y − x) ∈ DM (x)
2. Take s ∈ DM (x).
Then, there is a δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ M.
Set y = x + λ2 s.
Then, y ∈ M and s = λ2 (y − x).
Q.E.D.
Lemma 3.26 Let G ⊂ Rn , G =
6 ∅ is open, x ∈ G and f : G → R is differentiable at x.
Then,
70
TO November 28, 2016:959
i) Ff,0 (x) ⊂ Ff (x) ⊂ F0f,0 (x).
ii) If f is pseudoconvex at x with respect to same neighborhood of x,
then Ff,0 (x) = Ff (x).
iii) If f is strictly pseudoconcave in x with respect to same neighborhood of x, then
F0f,0 (x) = Ff (x).
Proof: Define for s ∈ G function ϕx,s : Dx,s → R : λ 7→ f (x + λs), where Dx,s =
{λ ∈ R : x + λs ∈ G}.
Function f is differentiable at x, then ϕx,s is differentiable at 0 and we have ϕ0x,s (0) =
h ∇f (x) , s i.
1. (a) Take s ∈ Ff,0 (x).
Hence, ϕ0x,s (0) = h ∇f (x) , s i < 0.
Consequently, ϕx,s is decreasing in a right neighborhood of origin.
Consequently, there is δ > 0 such that ∀ 0 < λ < δ we have f (x + λs) < f (x).
Consequently, s ∈ Ff (x).
(b) Take s ∈ Ff (x).
Hence, there is δ > 0 such that ∀ 0 < λ < δ we have f (x + λs) < f (x).
Consequently, ϕx,s (λ) < ϕx,s (0) on a right neighborhood of origin.
Consequently, ϕ0x,s (0) = h ∇f (x) , s i ≤ 0.
Finally, s ∈ F0f,0 (x).
Inclusions i) are shown.
2. Let η > 0 and f be pseudoconvex at x with respect to U (x, η).
Let s ∈ Ff (x).
Hence, there is δ > 0 such that ∀ 0 < λ < δ we have f (x + λs) < f (x).
Therefore, h ∇f (x) , λs i < 0 for each 0 < λ < δ ∧
x with respect to U (x, η).
η
,
ksk
since f is pseudoconvex at
It means h ∇f (x) , s i < 0.
Finally, s ∈ Ff,0 (x).
3. Let η > 0 and f be strictly pseudoconcave in x with respect to U (x, η).
Let s ∈ F0f,0 (x).
Hence, h ∇f (x) , s i ≤ 0, s 6= 0.
η
Hence, for each 0 < λ < ksk
, we have f (x + λs) < f (x), since f is strictly
pseudoconcave in x with respect to U (x, η).
Consequently, s ∈ Ff (x).
71
Petr Lachout November 28, 2016:959
Q.E.D.
Theorem 3.27 (FONC for MP) : Let M ⊂ G ⊂ Rn , G open, f : G → R and x
e ∈ M.
If x
e is a local minimum of f on M and f is differentiable at x
e, then Ff,0 (e
x)∩TM (e
x) = ∅.
Proof: Take s ∈ TM (e
x).
Hence, there is a sequence xk ∈ M, λk > 0 for k ∈ N such that
xk → x
e, λk (xk − x
e) → s.
We know f is differentiable at x
e, therefore,
f (xk ) − f (e
x) = h ∇f (e
x) , xk − x
e i + kxk − x
ek R1 (xk − x
e; f, x
e) ,
R1 (xk − x
e; f, x
e) → 0.
Since x
e is a local minimum of f on M, for sufficiently large k ∈ N
f (xk ) − f (e
x) ≥ 0.
Consequently,
0 ≤
=
lim λk (h ∇f (e
x) , xk − x
e i + kxk − x
ek R1 (xk − x
e; f, x
e))
k→+∞
lim (h ∇f (e
x) , λk (xk − x
e) i + kλk (xk − x
e)k R1 (xk − x
e; f, x
e))
k→+∞
= h ∇f (e
x) , s i .
Finally, we found s 6∈ Ff,0 (e
x).
Q.E.D.
Theorem 3.28 (FOSC for MP) : Let M ⊂ G ⊂ Rn , G open, x
e ∈ M and f : G → R be
differentiable at x
e. Let δ > 0 be such that f is pseudoconvex at x
e with respect to U (e
x, δ)
and for each x ∈ M ∩ U (e
x, δ), x 6= x
e we have x − x
e ∈ DM (e
x).
If Ff,0 (e
x) ∩ DM (e
x) = ∅, then x
e is a local minimum of f on M. Moreover, x
e is a
global minimum of f on M ∩ U (e
x, δ).
Proof: Take x ∈ M ∩ U (e
x, δ), x 6= x
e.
Accordingly to our assumption, x − x
e ∈ DM (e
x).
Hence, x − x
e 6∈ Ff,0 (e
x).
Consequently, h ∇f (e
x) , x − x
e i ≥ 0.
Function f is pseudoconvex at x
e with respect to U (e
x, δ), therefore, f (x) ≥ f (e
x).
Consequently, x
e is a global minimum of f on M ∩ U (e
x, δ).
Finally, we conclude x
e is a local minimum of f on M.
72
TO November 28, 2016:959
Q.E.D.
Theorem 3.29 (global FOSC for MP) : Let M ⊂ G ⊂ Rn , G open, f : G → R and
x
e ∈ M. Let function f be differentiable at x
e and pseudoconvex at x
e with respect to G
and for each x ∈ M, x 6= x
e we have x − x
e ∈ DM (e
x).
If Ff,0 (e
x) ∩ DM (e
x) = ∅, then x
e is a global minimum of f on M.
Proof: Take x ∈ M, x 6= x
e.
Accordingly to our assumption, x − x
e ∈ DM (e
x), and therefore, x − x
e 6∈ Ff,0 (e
x).
Consequently, h ∇f (e
x) , x − x
e i ≥ 0.
Function f is pseudoconvex at x
e with respect to G, therefore, f (x) ≥ f (e
x).
Consequently, x
e is a global minimum of f on M.
Q.E.D.
Let us note that convex set M always fulfills the condition for each x ∈ M, x 6= x
e
we have x − x
e ∈ DM (e
x).
Moreover for M = Rn , the condition Ff,0 (e
x)∩DM (e
x) = ∅ is simplified to Ff,0 (e
x) = ∅.
This is equivalent to ∇f (e
x) = 0. Therefore, Theorem 3.10 for unconstrained minimum
is a particular case of Theorem 3.27.
Chapter 4
Nonlinear programming inequalities
Structure of the set of feasible solutions is unspecified in previous chapter 3. Now, we
start realize on it as determined by a finite number of constraints.
At first, only inequalities are considered as constraints, i.e. in this chapter we consider
set of feasible solutions
Mg = {x ∈ G : ∀ i ∈ I we have gi (x) ≤ 0} ,
(4.1)
where G ⊂ Rn is a non-empty open set, I = {1, 2, . . . , m} and for each i ∈ I a function
gi : G → R is given.
Definition 4.1 For x ∈ Mg we introduce set of active indexes
Ig (x) = {i ∈ I : gi (x) = 0} .
(4.2)
We relax set of feasible solutions.
Definition 4.2 For x ∈ Mg , we introduce relaxed set of feasible solutions
b g (x) = {y ∈ G : ∀ i ∈ Ig (x) we have gi (y) ≤ 0} .
M
(4.3)
Set of feasible directions will be approximated by following sets.
Definition 4.3 Let x ∈ Mg and for each i ∈ I function gi be differentiable at x. Then,
we introduce
Gg,0 (x) = {s ∈ Rn : ∀ i ∈ Ig (x) we have h ∇gi (x) , s i < 0} ,
G0g,0 (x) = {s ∈ Rn : s 6= 0, ∀ i ∈ Ig (x) we have h ∇gi (x) , s i ≤ 0} ,
G0g,0,0 (x) = {s ∈ Rn : ∀ i ∈ Ig (x) we have h ∇gi (x) , s i ≤ 0} .
Note, Gg,0 (x) ∪ {0}, G0g,0,0 (x) = G0g,0 (x) ∪ {0} are cones.
73
(4.4)
(4.5)
(4.6)
74
TO November 28, 2016:959
4.1
Auxiliary results and observations
Lemma 4.4 Let G ⊂ Rn be a non-empty open set, gi : G → R be a given function for
each i ∈ I and x ∈ Mg . Let for each i ∈ Ig (x) function gi be differentiable at x and for
each i 6∈ Ig (x) function gi be continuous at x. Then,
i) Gg,0 (x) ⊂ DMg (x) ⊂ G0g,0 (x) ⊂ G0g,0,0 (x).
ii) DMg (x) ⊂ TMg (x) ⊂ G0g,0,0 (x).
iii) DMg (x) = DM
b g (x) (x), TMg (x) = TM
b g (x) (x).
iv) If moreover, function gi is strictly pseudoconvex at x for each i ∈ I with respect to
a neighborhood of x, then Gg,0 (x) = DMg (x).
v) If moreover, function gi is pseudoconcave at x for each i ∈ I with respect to a
neighborhood of x, then G0g,0 (x) = DMg (x).
Proof: Define for i ∈ I and s ∈ Rn a function ϕi,x,s : Dx,s → R : λ 7→ gi (x + λs),
where Dx,s = {λ ∈ R : x + λs ∈ G}.
If moreover, gi is differentiable at x, then ϕi,x,s is differentiable at 0 and we have
ϕ0i,x,s (0) = h ∇gi (x) , s i.
1. (a) Take s ∈ Gg,0 (e
x).
i. The set G is open, therefore, there is δG > 0 such that
∀ 0 < λ < δG we have x + λs ∈ G.
ii. For i ∈ Ig (x) we have gi (x) = 0, then ϕi,x,s is differentiable at 0 and we
have ϕ0i,x,s (0) = h ∇gi (x) , s i < 0.
Thus, the function ϕi,x,s is decreasing at a neighborhood of the origin.
Hence, there is δi > 0 such that ∀ 0 < λ < δi we have
gi (x + λs) < gi (x) = 0.
iii. For i 6∈ Ig (x) the function gi is continuous at x and gi (x) < 0.
Therefore, there is δi > 0 such that ∀ 0 < λ < δi we have gi (x + λs) < 0.
We have shown s ∈ DMg (x).
(b) Take s ∈ DMg (x).
Then, there is δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ G and
∀ i ∈ I we have gi (x + λs) ≤ 0.
Consequently, ∀ i ∈ Ig (x) and ∀ 0 < λ < δ we have gi (x + λs) ≤ gi (x) = 0.
Therefore, for each i ∈ Ig (x) we have ϕ0i,x,s (0) = h ∇gi (x) , s i ≤ 0.
Thus, s ∈ G0g,0 (x).
Petr Lachout November 28, 2016:959
75
(c) Last inclusion is evident, because G0g,0,0 (x) = G0g,0 (x) ∪ {0}.
Inclusions i) are proved.
2. (a) Inclusion DMg (x) ⊂ TMg (x) was shown in general setting in Lemma 3.22.
(b) Take s ∈ TMg (x).
Then, there are sequences xk ∈ Mg , λk > 0, k ∈ N such that xk → x,
λk (xk − x) → s.
For i ∈ Ig (x), we have gi (x) = 0 and gi is differentiable at x. Hence, we can
write a Taylor expansion
0 ≥ gi (xk ) = h ∇gi (x) , xk − x i + kxk − xk R1 (xk − x; gi , x) ,
R1 (xk − x; gi , x) → 0.
Therefore,
0 ≥ h ∇gi (x) , λk (xk − x) i + kλk (xk − x)k R1 (xk − x; gi , x) .
Letting k → +∞, we are receiving h ∇gi (x) , s i ≤ 0.
Thus, we have checked s ∈ G0g,0,0 (x).
The inclusions ii) are proved.
b g (x).
3. (a) We know Mg ⊂ M
Hence also, DMg (x) ⊂ DM
b g (x) (x).
b g (x) (x) and TMg (x) ⊂ TM
(b) Take s ∈ DM
b g (x) (x).
Then, there is δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ G and
∀ i ∈ Ig (x) we have gi (x + λs) ≤ 0.
For i 6∈ Ig (x) we have gi (x) < 0 and gi is continuous at x.
Therefore, there is η such that 0 < η ≤ δ and ∀ 0 < λ < η we have x + λs ∈ G
and ∀ i ∈ I we have gi (x + λs) ≤ 0.
We have found s ∈ DMg (x).
(c) Take s ∈ TM
b g (x) (x).
b g (x) , λk > 0, k ∈ N such that xk → x,
Then, there are sequences xk ∈ M
λk (xk − x) → s.
Hence, ∀ i ∈ Ig (x) we have gi (xk ) ≤ 0.
For i 6∈ Ig (x) we have gi (x) < 0 and gi is continuous at x.
Since set G is open, xk ∈ G for all k sufficiently large and ∀ i ∈ I we have
gi (xk ) ≤ 0.
Therefore, xk ∈ Mg for all k sufficiently large.
We have derived s ∈ DMg (x).
76
TO November 28, 2016:959
4. Let η > 0 be such that gi is strictly pseudoconvex at x with respect to U (x, η) for
each i ∈ I.
Take s ∈ DMg (x).
Then, there is δ > 0 such that ∀ 0 < λ < δ we have x + λs ∈ G and
∀ i ∈ I we have gi (x + λs) ≤ 0.
Consequently, ∀ 0 < λ < δ we have x+λs ∈ G and ∀ i ∈ Ig (x) we have gi (x + λs) ≤
gi (x) = 0.
Since function gi is strictly pseudoconvex at x with respect to U (x, η) for each
η
.
i ∈ Ig (x), h ∇gi (x) , λs i < 0 for each 0 < λ < δ ∧ ksk
Therefore, h ∇gi (x) , s i < 0.
It means s ∈ Gg,0 (e
x).
5. Let η > 0 be such that gi is pseudoconcave at x with respect to U (x, η) for each
i ∈ I.
Take s ∈ G0g,0 (x).
Then, s 6= 0 and we have:
(a) G is open set, therefore, there is δG > 0 such that ∀ 0 < λ < δG we have
x + λs ∈ G.
(b) For i ∈ Ig (x) we have h ∇gi (x) , s i ≤ 0.
Thus, ∀ λ > 0 we have h ∇gi (x) , λs i ≤ 0.
Function gi is pseudoconcave at x with respect to U (x, η), therefore, for each
η
0 < λ < δ ∧ ksk
. we have gi (x + λs) ≤ gi (x) = 0.
Therefore, there is δi > 0 such that ∀ 0 < λ < δi we have gi (x + λs) ≤ 0.
(c) For each i 6∈ Ig (x) we have gi is continuous at x and gi (x) < 0.
Therefore, there is δi > 0 such that ∀ 0 < λ < δi we have gi (x + λs) < 0.
We have shown s ∈ DMg (x).
Q.E.D.
Lemma 4.5 Let G ⊂ Rn be a non-empty open set, f : G → R be a function, for each
i ∈ I a function gi : G → R be given and x
e ∈ Mg .
If for each i 6∈ Ig (e
x) the function gi is continuous at x
e, then the following is equivalent:
i) The point x
e is a local minimum of f on Mg .
Petr Lachout November 28, 2016:959
77
b g (e
ii) The point x
e is a local minimum of f on M
x).
Proof:
b g (e
1. If x
e is a local minimum of f on M
x), then x
e is a local minimum of f on Mg ,
b
because Mg (e
x) ⊃ Mg .
2. If x
e is a local minimum of f on Mg , then there is δ > 0 such that x
e is a global
minimum of f on Mg ∩ U (e
x, δ), U (e
x, δ) ⊂ G and for each i 6∈ Ig (e
x), ξ ∈ U (e
x, δ)
we have gi (ξ) ≤ 0.
b g (e
But, Mg ∩ U (e
x, δ) = M
x) ∩ U (e
x, δ).
b g (e
Therefore, x
e is a global minimum of f on M
x) ∩ U (e
x, δ), and hence, x
e is a local
b g (e
minimum of f on M
x).
Q.E.D.
4.2
General results
At first, we consider a general setting.
Theorem 4.6 (FONC for NLP inequalities) : Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn
a non-empty open set, f : G → R and gi : G → R for each i ∈ I, x
e ∈ Mg and f is
differentiable at x
e, gi is differentiable at x
e for each i ∈ Ig (e
x) and gi is continuous at x
e
for each i 6∈ Ig (e
x).
If x
e is a local minimum of f on Mg , then Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
Proof: Accordingly to Theorem 3.27 we have Ff,0 (e
x) ∩ TMg (e
x) = ∅.
Accordingly to Lemma 4.4 we know TMg (e
x) ⊃ Gg,0 (e
x).
Therefore, Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
Q.E.D.
Theorem 4.7 (FOSC for NLP inequalities) : Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn
a non-empty open set, f : G → R and gi : G → R for each i ∈ I, x
e ∈ Mg and
f is differentiable at x
e, function gi is differentiable in a neighborhood of x
e for each
i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x), There is δkonv > 0 such that f is
pseudoconvex on U (e
x, δkonv ), gi is pseudoconvex on U (e
x, δkonv ) and strictly pseudoconvex
at x
e with respect to U (e
x, δkonv ) for each i ∈ Ig (e
x).
If Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅, then x
e is a local minimum of f on Mg , and, there is
b g (e
0 < δ ≤ δkonv such that x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
78
TO November 28, 2016:959
Proof: We check step by step assumptions of Theorem 3.28.
1. For each i ∈ Ig (e
x) function gi is differentiable at x
e and strictly pseudoconvex at x
e
with respect to G.
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x).
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) = ∅.
b g (e
x) (e
2. Take δ > 0 such that δ ≤ δkonv , U (e
x, δ) ⊂ G, for each i ∈ Ig (e
x) function gi is
differentiable and pseudoconvex on U (e
x, δ) and for each i 6∈ Ig (e
x), y ∈ U (e
x, δ) we
have gi (y) ≤ 0.
For each i ∈ Ig (e
x) function gi is differentiable at x
e and pseudoconvex on U (e
x, δ).
Hence according to Lemma 2.74, set {x ∈ U (e
x, δ) : gi (x) ≤ 0} is convex for each
i ∈ Ig (e
x).
T
b g (e
Consequently, M
x) ∩ U (e
x, δ) = i∈Ig (ex) {x ∈ U (e
x, δ) : gi (x) ≤ 0} is convex.
b g (e
x) for each x ∈ M
x) ∩ U (e
x, δ).
Hence, we have x − x
e ∈ DM
b g (e
x) (e
We assume, f is pseudoconvex at x
e with respect to U (e
x, δkonv ),
assumptions of Theorem 3.28 are fulfilled.
b g (e
Hence according to Theorem 3.28, x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Consequently according to Lemma 4.5, x
e is a local minimum of f on Mg .
Q.E.D.
Theorem 4.8 (global FOSC for NLP inequalities) : Let m ∈ N, I = {1, 2, . . . , m},
G ⊂ Rn a non-empty open set, f : G → R and gi : G → R for each i ∈ I, x
e ∈ Mg and f
is differentiable at x
e, function gi is differentiable in a neighborhood of x
e for each i ∈ Ig (e
x)
and gi is continuous at x
e for each i 6∈ Ig (e
x), Set G is open convex, f is pseudoconvex on
G, function gi is pseudoconvex on G and strictly pseudoconvex on G for each i ∈ Ig (e
x).
If Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅, then x
e is a global minimum of f on Mg .
Proof: We check step by step assumptions of Theorem 3.29.
1. For each i ∈ Ig (e
x) function gi is differentiable at x
e and strictly pseudoconvex at x
e
with respect to G.
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x),
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) = ∅,
b g (e
x) (e
Petr Lachout November 28, 2016:959
79
2. The set G is open convex and for each i ∈ Ig (e
x) function gi is pseudoconvex on
G, therefore, according to Lemma 2.74 set {x ∈ G : gi (x) ≤ 0} is convex for each
i ∈ Ig (e
x).
T
b g (e
Consequently, M
x) = i∈Ig (ex) {x ∈ G : gi (x) ≤ 0} is convex.
b g (e
Hence, for each x ∈ M
x) we have x − x
e ∈ DM
x).
b g (e
x) (e
We assume, f is pseudoconvex at x
e with respect to G, and therefore, assumptions of
Theorem 3.29 are fulfilled.
b g (e
According to Theorem 3.29, x
e is a global minimum of f on M
x), and, according to
Lemma 4.5 x
e is a global minimum of f on Mg .
Q.E.D.
Theorem 4.9 (FOSC for NLP inequalities 2) : Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn
a non-empty open set, f : G → R and gi : G → R for each i ∈ I, x
e ∈ Mg and f is
differentiable at x
e, function gi is differentiable in a neighborhood of x
e for each i ∈ Ig (e
x)
and gi is continuous at x
e for each i 6∈ Ig (e
x), There is δkonv > 0 such that f is pseudoconvex
on U (, δkonv ) and gi is quasiconvex on U (, δkonv ) for each i ∈ Ig ().
If Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅, then x
e is a local minimum of f on Mg , and, there is
b g (e
0 < δ ≤ δkonv such that and x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Proof: We check step by step assumptions of Theorem 3.28.
1. For each i ∈ Ig (e
x) function gi is differentiable at x
e and strictly pseudoconvex at x
e
with respect to G.
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x).
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) = ∅.
b g (e
x) (e
2. Take δ > 0 such that δ ≤ δkonv , U (e
x, δ) ⊂ G, for each i ∈ Ig (e
x) function gi is
differentiable and pseudoconvex on U (e
x, δ) and for each i 6∈ Ig (e
x), y ∈ U (e
x, δ) we
have gi (y) ≤ 0.
For each i ∈ Ig (e
x) function gi is differentiable at x
e and pseudoconvex on U (e
x, δ).
Hence according to Lemma 2.74, set {x ∈ U (e
x, δ) : gi (x) ≤ 0} is convex for each
i ∈ Ig (e
x).
T
b g (e
Consequently, M
x) ∩ U (e
x, δ) = i∈Ig (ex) {x ∈ U (e
x, δ) : gi (x) ≤ 0} is convex.
b g (e
Hence, for each x ∈ M
x) ∩ U (e
x, δ) we have x − x
e ∈ DM
x).
b g (e
x) (e
80
TO November 28, 2016:959
We assume f to be pseudoconvex at x
e with respect to U (e
x, δkonv ), and therefore, assumptions of Theorem 3.28 are fulfilled.
b g (e
Hence according to Theorem 3.28, x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Consequently according to Lemma 4.5, x
e is a local minimum of f on Mg .
Q.E.D.
Theorem 4.10 (global FOSC for NLP inequalities 2) : Let m ∈ N, I = {1, 2, . . . , m},
G ⊂ Rn a non-empty open set, f : G → R and gi : G → R for each i ∈ I, x
e ∈ Mg and f
is differentiable at x
e, function gi is differentiable in a neighborhood of x
e for each i ∈ Ig (e
x)
and gi is continuous at x
e for each i 6∈ Ig (e
x). Set G is open convex, f pseudoconvex on
G, gi is quasiconvex on G for each i ∈ Ig (e
x).
If Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅, then x
e is a global minimum of f on Mg .
Proof: We check step by step assumptions of Theorem 3.29.
1. For each i ∈ Ig (e
x) function gi is differentiable at x
e and strictly pseudoconvex at x
e
with respect to G.
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x),
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) = ∅,
b g (e
x) (e
2. The set G is open convex and for each i ∈ Ig (e
x) function gi is pseudoconvex on
G, therefore, according to Lemma 2.74, set {x ∈ G : gi (x) ≤ 0} is convex for each
i ∈ Ig (e
x).
T
b g (e
{x ∈ G : gi (x) ≤ 0} is convex.
Consequently, M
x) =
i∈Ig (e
x)
b g (e
x).
Hence, for each x ∈ M
x) we have x − x
e ∈ DM
b g (e
x) (e
We assume f is pseudoconvex at x
e with respect to G, and therefore, assumptions of
Theorem 3.29 are fulfilled.
b g (e
Consequently according to Theorem 3.29, x
e is a global minimum of f on M
x).
Consequently according to Lemma 4.5, x
e is a global minimum of f on Mg .
Q.E.D.
Derived optimality condition, i.e. Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅, gives no restriction for
constraints expresses as equalities.
Example 4.11: Consider program minimize a function f : Rn → R on a set Mg =
{x ∈ Rn : h (x) = 0}, where h : Rn → R and x
e ∈ Mg .
81
Petr Lachout November 28, 2016:959
Set of Mg can be rewritten to form Mg = {x ∈ Rn : h (x) ≤ 0, −h (x) ≤ 0}.
Thus, we have two inequalities with
Gg,0 (e
x) = {x ∈ Rn : h ∇h (x) , s i < 0, − h ∇h (x) , s i < 0} = ∅.
Therefore, the condition Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅ is fulfilled for all x
e ∈ Mg . Hence, it is
giving no restriction. In chapter 5, we will derive optimality condition convenient for
constraints expressed as inequalities and equalities.
4
4.3
Karush-Kuhn-Tucker optimality condition
Now, we express optimality condition by means of Lagrange coefficients.
Substantially influenced persons:
Let us introduce Karush-Kuhn-Tucker optimality conditions (cz. Karushovy-KuhnovyTuckerovy podmı́nky optimality), i.e. (KKTf g − r) and (KKTf g ).
Definition 4.12 Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G →
R and gi : G → R for each i ∈ I, x
e ∈ Mg . Then, we define reduced Lagrange function
(cz. redukovaná Lagrangeova funkce)
X
L (x; u|e
x) = f (x) +
ui gi (x) ∀ x ∈ G, ui ∈ R, i ∈ Ig (e
x)
i∈Ig (e
x)
and Lagrange function (cz. Lagrangeova funkce)
L (x; u) = f (x) +
X
ui gi (x)
∀ x ∈ G, ui ∈ R, i ∈ I.
i∈I
Definition 4.13
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg , f is differentiable at x
e, gi is differentiable at x
e for each i ∈ Ig (e
x)
and gi is continuous at x
e for each i 6∈ Ig (e
x) and a coefficient ui ∈ R is given for each
i ∈ Ig (e
x).
We say, x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r), whenever
∇f (e
x) +
X
ui ∇gi (e
x) = 0,
(4.7)
i∈Ig (e
x)
∀ i ∈ Ig (e
x) we have ui ≥ 0.
(4.8)
82
TO November 28, 2016:959
If also, function gi is differentiable at x
e for each i 6∈ Ig (e
x) and a coefficient ui ∈ R is
given for each i 6∈ Ig (e
x), then we say, x
e, ui , i ∈ I fulfill (KKTf g ), whenever
∇f (e
x) +
m
X
ui ∇gi (e
x) = 0,
(4.9)
ui gi (e
x) = 0 pro each i ∈ I,
∀ i ∈ I we have ui ≥ 0.
(4.10)
(4.11)
i=1
Conditions are called:
• x
e ∈ Mg - Primal Feasibility condition (P F ) (cz. přı́pustnost).
• (4.9) + (4.11) - Dual Feasibility condition (DFf g ) (cz. optimalita).
• (4.10) - Complementarity Slackness condition (CS) (cz. komplementarita).
Coefficients ui for i ∈ I (respectively for i ∈ Ig (e
x) only) are called Lagrange coefficients
(or, Lagrange multiplicaters) (cz. Lagrangeovy koeficienty).
Karush, Kuhn and Tucker derived (KKTf g ) as FONC. Karush [1939 PhD] employed
condition of gradient linear independence as a constraint qualification. Kuhn & Tucker
[1951] introduced their own constraint qualification (Kuhn − T uckergh [e
x]).
We will introduce more general version of KKT FONC.
We will need a convenient constraint qualification (cz. podmı́nka regularity). We
employ a constraint qualification due to Abadie.
(Abadieg [e
x])
TMg (e
x) ⊃ G0g,0,0 (e
x).
x), because opposite inclusion is always true;
Condition actually means TMg (e
x) = G0g,0,0 (e
see Lemma 4.4.
We start with an auxiliary Lemma.
Lemma 4.14
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg and f is differentiable at x
e, gi is differentiable at x
e for each
i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x). Then, the following is equvalent:
i) For each i ∈ Ig (e
x) there is a coefficient ui ∈ R such that x
e, ui , i ∈ Ig (e
x) fulfill
(KKTf g − r).
ii) Ff,0 (e
x) ∩ G0g,0,0 (e
x) = ∅.
83
Petr Lachout November 28, 2016:959
Proof: Property Ff,0 (e
x) ∩ G0g,0,0 (e
x) = ∅ means
n
there is no s ∈ R fulfilling
h ∇f (e
x) , s i < 0, ∀ i ∈ Ig (e
x) : h ∇gi (e
x) , s i ≤ 0.
Equivalently,
∀ s ∈ Rn fulfilling h ∇gi (e
x) , s i ≤ 0 ∀ i ∈ Ig (e
x) ,
we have h ∇f (e
x) , s i ≥ 0.
Equivalently,
∀ s ∈ Rn fulfilling h −∇gi (e
x) , s i ≥ 0 ∀ i ∈ Ig (e
x) ,
we have h ∇f (e
x) , s i ≥ 0.
Accordingly to Farkas Theorem 1.46 we are receiving:
For each i ∈ Ig (e
x) there is a coefficient ui ≥ 0 such that
X
−
ui ∇gi (e
x) = ∇f (e
x) .
i∈Ig (e
x)
It means, x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r).
Q.E.D.
Theorem 4.15 (KKT FONC - Abadie) :
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg , f is differentiable at x
e, gi is differentiable at x
e for each i ∈ Ig (e
x)
and gi is continuous at x
e for each i 6∈ Ig (e
x) and (Abadieg [e
x]).
If x
e is a local minimum of f on Mg , then for each i ∈ Ig (e
x) there is a coefficient
ui ∈ R such that x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r).
If also, for each i 6∈ Ig (e
x) function gi is differentiable at x
e, then for each i ∈ I there
is a coefficient ui ∈ R such that x
e, ui , i ∈ I fulfill (KKTf g ).
Proof: Accordingly to Theorem 3.27, we know Ff,0 (e
x) ∩ TMg (e
x) = ∅.
0
Condition (Abadieg [e
x]) implies TMg (e
x) = Gg,0,0 (e
x).
0
Therefore, Ff,0 (e
x) ∩ Gg,0,0 (e
x) = ∅.
Accordingly to Lemma 4.14, there are coefficients ui , i ∈ Ig (e
x) such that x
e, ui , i ∈ Ig (e
x)
fulfill (KKTf g − r).
If also, for each i 6∈ Ig (e
x) function gi is differentiable at x
e, then plugging ui = 0 for each
i 6∈ Ig (e
x), we are receiving x
e, ui , i ∈ I fulfill (KKTf g ).
84
TO November 28, 2016:959
Q.E.D.
Theorem 4.16 (KKT FOSC) :
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg and f is differentiable at x
e, function gi is differentiable in a
neighborhood of x
e for each i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x), There
is δkonv > 0 such that f is pseudoconvex on U (e
x, δkonv ), gi is pseudoconvex on U (e
x, δkonv )
and strictly pseudoconvex at x
e with respect to U (e
x, δkonv ) for each i ∈ Ig (e
x).
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x), such that x
e, ui , i ∈ Ig (e
x) fulfill
(KKTf g − r), then x
e is a local minimum of f on Mg .
Moreover, there is 0 < δ ≤ δkonv such that and x
e is a global minimum of f on
b
Mg (e
x) ∩ U (e
x, δ).
Proof: If x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r), then according to Lemma 4.14 we know
x) = ∅.
Ff,0 (e
x) ∩ G0g,0,0 (e
According to Lemma 4.4, we also have Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
The assumptions of Theorem 4.7 are fulfilled, therefore, x
e is a local minimum of f on Mg
b g (e
and there is 0 < δ ≤ δkonv such that and x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Q.E.D.
Theorem 4.17 (KKT a global FOSC) :
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg and f is differentiable at x
e, function gi is differentiable in a
neighborhood of x
e for each i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x), Set G
is open convex, f is pseudoconvex on G, function gi is pseudoconvex on G and strictly
pseudoconvex on G for each i ∈ Ig (e
x).
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x), such that x
e, ui , i ∈ Ig (e
x) fulfill
(KKTf g − r), then x
e is a global minimum of f on Mg .
b g (e
Moreover, x
e is a global minimum of f on M
x).
Proof: If x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r), then according to Lemma 4.14 we know
Ff,0 (e
x) ∩ G0g,0,0 (e
x) = ∅.
According to Lemma 4.4 we obtain Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
The assumptions of Theorem 4.8 are fulfilled, therefore, x
e is a global minimum of f on
b
Mg and, also, x
e is a global minimum of f on Mg (e
x).
Q.E.D.
Petr Lachout November 28, 2016:959
85
Theorem 4.18 (KKT FOSC 2) :
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg and f is differentiable at x
e, function gi is differentiable in a
neighborhood of x
e for each i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x), There
is δkonv > 0 such that f is pseudoconvex on U (, δkonv ) and gi is quasiconvex on U (, δkonv )
for each i ∈ Ig ().
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x), such that x
e, ui , i ∈ Ig (e
x) fulfill
(KKTf g − r), then x
e is a local minimum of f on Mg .
b g (e
Moreover, there is 0 < δ ≤ δkonv such that x
e is a global minimum of f on M
x) ∩
U (e
x, δ).
Proof: If x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r), then according to Lemma 4.14 we know
x) = ∅. According to Lemma 4.4 we obtain Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
Ff,0 (e
x) ∩ G0g,0,0 (e
The assumptions of Theorem 4.9 are fulfilled, therefore, x
e is a local minimum of f on
b g (e
Mg and there is 0 < δ ≤ δkonv such that, x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Q.E.D.
Theorem 4.19 (KKT a global FOSC 2) :
Let m ∈ N, I = {1, 2, . . . , m}, G ⊂ Rn a non-empty open set, f : G → R and gi : G → R
for each i ∈ I, x
e ∈ Mg and f is differentiable at x
e, function gi is differentiable in a
neighborhood of x
e for each i ∈ Ig (e
x) and gi is continuous at x
e for each i 6∈ Ig (e
x), Set G
is open convex, f pseudoconvex on G, gi is quasiconvex on G for each i ∈ Ig (e
x).
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x), such that x
e, ui , i ∈ Ig (e
x) fulfill
(KKTf g − r), then x
e is a global minimum of f on Mg .
b g (e
Moreover, x
e is a global minimum of f on M
x).
Proof: If x
e, ui , i ∈ Ig (e
x) fulfill (KKTf g − r), then according to Lemma 4.14 we know
0
Ff,0 (e
x) ∩ Gg,0,0 (e
x) = ∅.
According to Lemma 4.4 we also obtain Ff,0 (e
x) ∩ Gg,0 (e
x) = ∅.
The assumptions of Theorem 4.10 are fulfilled, therefore, x
e is a global minimum of f on
b g (e
Mg and, also, x
e is a global minimum of f on M
x).
Q.E.D.
86
TO November 28, 2016:959
4.4
Constraint qualifications
Constraint qualification (Abadieg [e
x]) is rather of a technical character than to be easy
verifiable. Practical usage requires conditions, which are easier verifiable. Let us introduce some of them which are often employed.
We begin with necessary definitions.
Definition 4.20 Let S ⊂ Rn , x
e ∈ clo (S). We define a cone of attainable directions of S at x
e
(cz. kužel dosažitelných směrů) as


there is ϕ : R → Rn and δ > 0 such that 

AS (e
x) =
s ∈ Rn : ∀0 < t < δ we have ϕ(t) ∈ S,
(4.12)


ϕ(t)−ϕ(0)
ϕ(0) = x
e, limt→0+
= s.
t
Definition 4.21 Let S ⊂ Rn , x
e ∈ clo (S). We say S is geometrically attainable at x
e
(cz. geometricky dosaitelná), whenever TS (e
x) = AS (e
x).
Introduced cones fulfill the following.
Lemma 4.22 Let G ⊂ Rn be open, a function gi : G → R be given for each i ∈ I and
x
e ∈ Mg . Let function gi be differentiable at x
e for each i ∈ Ig (e
x). Then,
clo DMg (e
x) ⊂ clo AMg (e
x) ⊂ TMg (e
x) ⊂ G0g,0,0 (e
x) .
(4.13)
Proof: See 5.2.1 Lemma in [1], p.242.
Q.E.D.
Lemma 4.23 Let G ⊂ Rn be open, a function gi : G → R be given for each i ∈ I and
x
e ∈ Mg . Let function gi be differentiable at x
e for each i ∈ Ig (e
x) and for each i ∈
6 Ig (e
x)
function gi be continuous at x
e. Then,
clo (Gg,0 (e
x)) ⊂ clo DMg (e
x) ⊂ clo AMg (e
x) ⊂ TMg (e
x) ⊂ G0g,0,0 (e
x) .
(4.14)
Proof: See 5.2.1 Lemma in [1], p.242.
Q.E.D.
Now, we give a list of Constraint Qualifications.
(LPg )
Function gi is an affine linear function for each i ∈ I.
87
Petr Lachout November 28, 2016:959
(Slaterg [e
x])
Function gi is pseudoconvex at x
e with respect to G for each i ∈ Ig (e
x), function gi is
continuous at x
e for each i 6∈ Ig (e
x), there is ξ ∈ G such that gi (ξ) < 0 for each i ∈ Ig (e
x).
(LIg [e
x])
Gradients ∇gi (e
x), i ∈ Ig (e
x) are linearly independent and function gi is continuous at x
e
for each i 6∈ Ig (e
x).
(Cottleg [e
x])
The function gi is continuous at x
e for each i 6∈ Ig (e
x) and clo (Gg,0 (e
x)) = G0g,0,0 (e
x).
(Zangwillg [e
x])
x).
x) = G0g,0,0 (e
clo DMg (e
(Kuhn − Tuckerg [e
x])
0
x).
x) = Gg,0,0 (e
clo AMg (e
Introduced constraint qualifications are related.
Theorem 4.24 : Let G ⊂ Rn be open, a function gi : G → R be given for each i ∈ I
and x
e ∈ Mg . Let gi be differentiable at x
e for each i ∈ Ig (e
x) and gi be continuous at x
e
for each i 6∈ Ig (e
x). Then,
(LIg [e
x])
&
&
(LPg )
⇒
(Slaterg [e
x])
.
.
(Cottleg [e
x])
⇓
(Zangwillg [e
x])
⇓
(Kuhn − T uckerg [e
x])
⇓
(Abadieg [e
x])
Proof: See the end of 5.2. Chapter in [1], pp.244-245.
Q.E.D.
88
TO November 28, 2016:959
Chapter 5
Nonlinear programing - equalities
and inequalities
In this chapter, we consider a set of feasible solutions determined by a finite number of
constraints. Considered constraints are formulated as inequalities and equalities, i.e. a
set of feasible solutions is written in a form:
Mg,h = {x ∈ G : ∀ i ∈ I we have gi (x) ≤ 0, ∀ j ∈ J we have hj (x) = 0} , (5.1)
where G ⊂ Rn is a non-empty open set, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, gi : G → R
is a function given for each i ∈ I and hj : G → R is a function given for each j ∈ J.
Equalities determine a manifold.
Definition 5.1 Let G ⊂ Rn be a non-empty open set and hj : G → R be given for each
j ∈ J. Hence, we introduce a manifold
Hh = {x ∈ G : ∀ j ∈ J we have hj (x) = 0} .
(5.2)
We relax a set of feasible solutions at a given point.
Definition 5.2 For a point x ∈ Mg,h , we introduce a relaxed set of feasible solutions at x
(cz. uvolněná množina přı́pustných řešenı́)
b g,h (x) = {y ∈ G : ∀ i ∈ Ig (x) we have gi (y) ≤ 0, ∀ j ∈ J we have hj (y) = 0}
M
b g (x) ∩ Hh .
= M
(5.3)
Approximation of inequalities was developed in previous Chapter 4. To approximate
equalities, we will employ tangent hyperplanes.
Definition 5.3 Let x ∈ Mg,h and hj be differentiable at x for each j ∈ J. Hence, we
introduce an approximating set
Hh,0 (x) = {s ∈ Rn : ∀ j ∈ J we have h ∇hj (x) , s i = 0} .
89
(5.4)
90
TO November 28, 2016:959
Lemma 5.4 Let G ⊂ Rn be a non-empty open set, f : G → R be a given function,
gi : G → R be a given function for each i ∈ I, hj : G → R be a given function for each
j ∈ J and x
e ∈ Mg,h .
If gi is continuous at x
e for each i 6∈ Ig (e
x) then the following is equivalent:
i) The point x
e is a local minimum of f on Mg,h .
b g,h (e
ii) The point x
e is a local minimum of f on M
x).
Proof:
b g,h (e
1. If x
e is a local minimum of f on M
x), then x
e is a local minimum of f on Mg,h ,
b
since Mg,h (e
x) ⊃ Mg,h ,
2. If x
e is a local minimum of f on Mg,h , then there is δ > 0 such that x
e is a global
minimum of f on Mg,h ∩ U (e
x, δ), U (e
x, δ) ⊂ G and gi (ξ) ≤ 0 for each i 6∈ Ig (e
x),
ξ ∈ U (e
x, δ).
b g,h (e
Hence, Mg,h ∩ U (e
x, δ) = M
x) ∩ U (e
x, δ).
b g,h (e
Therefore, x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
b g,h (e
It means, x
e is a local minimum of f on M
x).
Q.E.D.
Realizations in this chapter are based on Theorem about an implicit function, (cz.
věta o implicitnı́ funkci).
Theorem 5.5 (Existence of an implicit function) : Let G ⊂ Rn be an open set, x ∈ G,
hj : G → R be continuously differentiable at a neighborhood of x for each j ∈ J and
gradients ∇hj (x), j ∈ J be linearly independent.
If x ∈ Hh and s ∈ Hh,0 (x), then there is δ > 0 and a curve ϕ : (−δ, δ) → Rn differentiable at (−δ, δ), ϕ(0) = x, ϕ0 (0) = s, ϕ(t) ∈ G for each t ∈ (−δ, δ) and hj (ϕ(t)) = 0
for each t ∈ (−δ, δ), j ∈ J.
Lemma 5.6 Let G ⊂ Rn be a non-empty open set, hj : G → R be a given function for
each j ∈ J and x ∈ Hh . Let hj : G → R be continuously differentiable at a neighborhood
of x for each j ∈ J and gradients ∇hj (x), j ∈ J be linearly independent.
Then, Hh,0 (x) = AHh (x) = THh (x).
Proof:
Petr Lachout November 28, 2016:959
91
1. Let s ∈ Hh,0 (x).
Accordingly to Theorem 5.5, there is δ > 0 and a curve ϕ : (−δ, δ) → Rn differentiable at (−δ, δ), ϕ(0) = x, ϕ0 (0) = s, ϕ(t) ∈ G for each t ∈ (−δ, δ) and
hj (ϕ(t)) = 0 for each t ∈ (−δ, δ), j ∈ J.
Hence s ∈ AHh (x).
2. Always, AHh (x) ⊂ THh (x).
3. Let s ∈ THh (x).
Then, there are xk ∈ Hh , λk > 0, k ∈ N such that xk → x, λk (xk − x) → s.
For j ∈ J, we have hj (x) = 0 and hj is differentiable at x. Hence, we can write a
Taylor expansion
0 = hj (xk ) = h ∇hj (x) , xk − x i + kxk − xk R1 (xk − x; hj , x) ,
R1 (xk − x; hj , x) → 0.
Therefore,
0 = h ∇hj (x) , λk (xk − x) i + kλk (xk − x)k R1 (xk − x; hj , x) ,
R1 (xk − x; hj , x) → 0.
Letting k → +∞, we are receiving h ∇hj (x) , s i = 0.
Thus, we have checked s ∈ Hh,0 (x).
Q.E.D.
5.1
General constraints
Theorem 5.7 (FONC for NLP) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h and f be differentiable at x
e, gi be differentiable at x
e for each
i ∈ Ig (e
x), gi be continuous at x
e for each i 6∈ Ig (e
x), hj be continuously differentiable at x
e
for each j ∈ J. Let gradients ∇hj (e
x), j ∈ J be linearly independent.
If x
e is a local minimum of f on Mg,h , then Ff,0 (e
x) ∩ Gg,0 (e
x) ∩ Hh,0 (e
x) = ∅.
Proof: A proof will be done showing a contradiction.
Assume s ∈ Ff,0 (e
x) ∩ Gg,0 (e
x) ∩ Hh,0 (e
x), i.e. h ∇f (e
x) , s i < 0,
∀ i ∈ Ig (e
x) we have h ∇gi (e
x) , s i < 0, ∀ j ∈ J we have h ∇hj (e
x) , s i = 0.
92
TO November 28, 2016:959
Gradients ∇hj (e
x), j ∈ J are linearly independent, therefore according to Theorem 5.5,
there is δ > 0 and a vector function α : (−δ, δ) → Rn , which is differentiable at a
neighborhood of the origin, such that α(0) = x
e, α0 (0) = s and for each t ∈ (−δ, δ), j ∈ J
we have hj (α(t)) = 0.
Hence, there is 0 < η ≤ δ such that for each t ∈ [0, η) we have f (α(t)) < f (e
x),
gi (α(t)) < 0 for each i ∈ I and hj (α(t)) = 0 for each j ∈ J.
But, it is a contradiction with our assumption that x
e is a local minimum of f on Mg,h .
Q.E.D.
Theorem 5.8 (FOSC for NLP) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h and f be differentiable at x
e, gi be differentiable in a neighborhood
of x
e for each i ∈ Ig (e
x), gi be continuous at x
e for each i 6∈ Ig (e
x) and hj be continuously
differentiable at x
e for each j ∈ J, There is δkonv > 0 such that f is pseudoconvex at x
e
with respect to U (e
x, δkonv ), gi is strictly pseudoconvex on U (e
x, δkonv ) for each i ∈ Ig (e
x)
and hj is an affine linear function on U (e
x, δkonv ) for each j ∈ J.
If Ff,0 (e
x) ∩ Gg,0 (e
x) ∩ Hh,0 (e
x) = ∅, then x
e is a local minimum of f on Mg,h .
Proof: We check step by step assumptions of Theorem 3.28.
1. Function gi is differentiable at x
e and strictly pseudoconvex at x
e with respect to
U (e
x, δkonv ) for each i ∈ Ig (e
x).
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x).
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) ∩ Hh,0 (e
x) = ∅.
b g (e
x) (e
2. Take δ > 0 such that δ ≤ δkonv , U (e
x, δ) ⊂ G, for each i ∈ Ig (e
x) function gi is
differentiable and pseudoconvex on U (e
x, δ) and for each i 6∈ Ig (e
x), y ∈ U (e
x, δ) we
have gi (y) ≤ 0.
For each i ∈ Ig (e
x) function gi is differentiable at x
e and pseudoconvex on U (e
x, δ).
Hence according to Lemma 2.74, set {x ∈ U (e
x, δ) : gi (x) ≤ 0} is convex for each
i ∈ Ig (e
x).
Also, set {x ∈ U (e
x, δ) : hj (x) = 0} is convex for each j ∈ J, since hj is affine linear
on U (e
x, δ).
Consequently,
b g,h (e
M
x) ∩ U (e
x, δ) =
\
{x ∈ U (e
x, δ) : gi (x) ≤ 0}
i∈Ig (e
x)
∩
\
j∈J
{x ∈ U (e
x, δ) : hj (x) = 0}
93
Petr Lachout November 28, 2016:959
is convex.
b g,h (e
Hence, we have x − x
e ∈ DM
x) for each x ∈ M
x) ∩ U (e
x, δ).
b g,h (e
x) (e
We assume, f is pseudoconvex at x
e with respect to U (e
x, δkonv ),
assumptions of Theorem 3.28 are fulfilled.
b g,h (e
Hence according to Theorem 3.28, x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Consequently according to Lemma 5.4, x
e is a local minimum of f on Mg,h .
Q.E.D.
Theorem 5.9 (FOSC for NLP global) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h and f be differentiable at x
e, gi be differentiable in a neighborhood
of x
e for each i ∈ Ig (e
x), gi be continuous at x
e for each i 6∈ Ig (e
x) and hj be continuously
differentiable at x
e for each j ∈ J, f is pseudoconvex at x
e with respect to G, gi is strictly
pseudoconvex on G for each i ∈ Ig (e
x), and hj is an affine linear function on G for each
j ∈ J.
If Ff,0 (e
x) ∩ Gg,0 (e
x) ∩ Hh,0 (e
x) = ∅, then x
e is a global minimum of f on Mg,h .
Proof: We check step by step assumptions of Theorem 3.28.
1. Function gi is differentiable at x
e and strictly pseudoconvex at x
e with respect to G
for each i ∈ Ig (e
x).
Accordingly to Lemma 4.4 we know DM
x) = Gg,0 (e
x).
b g (e
x) (e
Then, Ff,0 (e
x) ∩ DM
x) ∩ Hh,0 (e
x) = ∅.
b g (e
x) (e
2. For each i ∈ Ig (e
x) function gi is differentiable at x
e and pseudoconvex on G. Hence
according to Lemma 2.74, set {x ∈ G : gi (x) ≤ 0} is convex for each i ∈ Ig (e
x).
Also, set {x ∈ G : hj (x) = 0} is convex for each j ∈ J, since hj is affine linear on
G.
Consequently,
b g,h (e
M
x) =
\
i∈Ig (e
x)
{x ∈ G : gi (x) ≤ 0} ∩
\
{x ∈ G : hj (x) = 0}
j∈J
is convex.
b g,h (e
Hence, we have x − x
e ∈ DM
x) for each x ∈ M
x).
b g,h (e
x) (e
We assume, f is pseudoconvex at x
e with respect to G.
b g,h (e
Hence according to Theorem 3.28, x
e is a global minimum of f on M
x).
Consequently according to Lemma 5.4, x
e is a global minimum of f on Mg,h .
Q.E.D.
94
5.2
TO November 28, 2016:959
Karush-Kuhn-Tucker optimality condition
Now, we express optimality condition by means of Lagrange’s multiplicators.
We introduce Karush-Kuhn-Tucker optimality conditions (cz. Karushova-KuhnovaTuckerova podmı́nka optimality) (KKTf gh − r) and (KKTf gh ) for a nonlinear program
with inequalities and equalities.
Definition 5.10 Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a nonempty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J,
x
e ∈ Mg,h . A coefficient ui ≥ 0 is given for each i ∈ Ig (e
x) and a coefficient vj ∈ R is
given for each j ∈ J.
We define a reduced Lagrange function (cz. redukovaná Lagrangeova funkce)
X
X
L (x; u, v|e
x) = f (x) +
ui gi (x) +
vj hj (x)
j∈J
i∈Ig (e
x)
∀ x ∈ G, ui ∈ R, i ∈ Ig (e
x) , vj ∈ R, j ∈ J
and, if a coefficient ui ≥ 0 is also given for each i 6∈ Ig (e
x), a Lagrange function (cz.
Lagrangeova funkce)
X
X
L (x; u, v) = f (x) +
ui gi (x) +
vj hj (x)
i∈I
j∈J
∀ x ∈ G, ui ∈ R, i ∈ I, vj ∈ R, j ∈ J.
Definition 5.11
Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a non-empty open set,
f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J, x
e ∈ Mg,h , f be
differentiable at x
e, gi be differentiable at x
e for each i ∈ Ig (e
x), gi be continuous at x
e for
each i 6∈ Ig (e
x), hj be continuously differentiable at x
e for each j ∈ J, a coefficient ui ≥ 0
is given for each i ∈ Ig (e
x) and a coefficient vj ∈ R is given for each j ∈ J.
We say, x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r), whenever
X
∇f (e
x) +
ui ∇gi (e
x) +
k
X
vj ∇hj (e
x) = 0,
(5.5)
j=1
i∈Ig (e
x)
∀ i ∈ Ig (e
x) we have ui ≥ 0.
(5.6)
If also gi is differentiable at x
e for each i 6∈ Ig (e
x) and a coefficient ui ∈ R is given for
each i 6∈ Ig (e
x) then we say, x
e, ui , i ∈ I, vj , j ∈ J fulfill (KKTf gh ), whenever
∇f (e
x) +
m
X
i=1
ui ∇gi (e
x) +
k
X
vj ∇hj (e
x) = 0,
(5.7)
j=1
ui gi (e
x) = 0 for each i ∈ I,
∀ i ∈ I we have ui ≥ 0.
(5.8)
(5.9)
95
Petr Lachout November 28, 2016:959
The conditions possess standard calling:
• x
e ∈ Mg,h - Primal Feasibility condition (P F ) (cz. přı́pustnost);
• (5.7) + (5.9) - Dual Feasibility condition (DFf gh ) (cz. optimalita);
• (5.8) - Complementarity Slackness condition (CS) (cz. komplementarita).
Coefficients ui for i ∈ I (respectively for i ∈ Ig (e
x) only) and vj for j ∈ J are called
Lagrange coefficients (or, Lagrange multiplicaters) (cz. Lagrangeovy coefficienty).
Let us start with an auxiliary lemma.
Lemma 5.12
Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a non-empty open set,
f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J, x
e ∈ Mg,h and f be
differentiable at x
e, gi be differentiable at x
e for each i ∈ Ig (e
x), gi be continuous at x
e for
each i 6∈ Ig (e
x), hj be continuously differentiable at x
e for each j ∈ J. Then, the followings
are equivalent:
i) For each i ∈ Ig (e
x) there is a coefficient ui ∈ R and for each j ∈ J there is a
coefficient vj ∈ R such that x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r).
x) ∩ Hh,0 (e
x) = ∅.
ii) Ff,0 (e
x) ∩ G0g,0,0 (e
Proof: Property Ff,0 (e
x) ∩ G0g,0,0 (e
x) ∩ Hh,0 (e
x) = ∅ means that there is no s ∈ Rn such
that
h ∇f (e
x) , s i < 0,
∀ i ∈ Ig (e
x) : h ∇gi (e
x) , s i ≤ 0,
∀ j ∈ J : h ∇hj (e
x) , s i = 0.
That is equivalent with a property:
∀ s ∈ Rn fulfilling h ∇gi (e
x) , s i ≥ 0 for all i ∈ Ig (e
x) ,
h ∇hj (e
x) , s i = 0 for all j ∈ J,
we have h −∇f (e
x) , s i ≥ 0.
According to Farkas Theorem 1.46 we are receiving an equivalence with:
There are ui ≥ 0 for each i ∈ Ig (e
x) and vj ∈ R for each j ∈ J such that
X
X
ui ∇gi (e
x) +
vj ∇hj (e
x) = −∇f (e
x) .
i∈Ig (e
x)
j∈J
It means, x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r).
96
TO November 28, 2016:959
Q.E.D.
Connection between Lemma 5.12 and local minima needs a convenient constraint
qualification (cz. podmı́nku regularity). We will employ a constraint qualification due
to Abadie:
(Abadiegh [e
x])
x) ∩ Hh,0 (e
x).
TMg,h (e
x) ⊃ G0g,0,0 (e
x) ∩ Hh,0 (e
x), since
Abadie constraint qualification actually means TMg,h (e
x) = G0g,0,0 (e
opposite inclusion is always true; see Lemmas 4.4 and 5.6.
Theorem 5.13 (KKT FONC - Abadie) :
Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a non-empty open set,
f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J, x
e ∈ Mg,h , f be
differentiable at x
e, gi be differentiable at x
e for each i ∈ Ig (e
x), gi be continuous at x
e for
each i 6∈ Ig (e
x), hj be continuously differentiable at x
e for each j ∈ J and (Abadiegh [e
x]).
If x
e is a local minimum of f on Mg,h , then there is a coefficient ui ∈ R for each
i ∈ Ig (e
x) and there is a coefficient vj ∈ R for each j ∈ J such that x
e, ui , i ∈ Ig (e
x), vj ,
j ∈ J fulfill (KKTf gh − r).
If also gi is differentiable at x
e for each i 6∈ Ig (e
x), then there is a coefficient ui ∈ R
for each i ∈ I and there is a coefficient vj ∈ R for each j ∈ J such that x
e, ui , i ∈ I, vj ,
j ∈ J fulfill (KKTf gh ).
x) = ∅.
Proof: According to Theorem 3.27 we know Ff,0 (e
x) ∩ TMg,h (e
0
x) ∩ Hh,0 (e
x).
Condition (Abadiegh [e
x]) says TMg (e
x) = Gg,0,0 (e
x) ∩ Hh,0 (e
x) = ∅.
Therefore, Ff,0 (e
x) ∩ G0g,0,0 (e
According to Lemma 5.12, it means that x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r).
If also gi is differentiable at x
e for each i 6∈ Ig (e
x), then plugging ui = 0 for each i 6∈ Ig (e
x)
we receive x
e, ui , i ∈ I, vj , j ∈ J fulfill (KKTf gh ).
Q.E.D.
Theorem 5.14 (KKT FOSC) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h and f be differentiable at x
e, gi be differentiable in a neighborhood
of x
e for each i ∈ Ig (e
x), gi be continuous at x
e for each i 6∈ Ig (e
x) and hj be continuously
differentiable at x
e for each j ∈ J, There is δkonv > 0 such that function f is pseudoconvex
at x
e with respect to U (e
x, δkonv ), gi is quasiconvex at x
e with respect to U (e
x, δkonv ) for each
Petr Lachout November 28, 2016:959
97
i ∈ Ig (e
x), hj is quasiconvex at x
e with respect to U (e
x, δkonv ) for each j ∈ J with vj > 0
and hj is quasiconcave at x
e with respect to U (e
x, δkonv ) for each j ∈ J with vj < 0.
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x) and there is a coefficient vj ∈ R for
each j ∈ J such that x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r), then x
e is a local
minimum of f on Mg,h .
Moreover, there is δ > 0 such that δ ≤ δkonv and x
e is a global minimum of f on
b g,h (e
M
x) ∩ U (e
x, δ).
Proof: If x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r), hence,
Ff,0 (e
x) ∩ G0g,0,0 (e
x) ∩ Hh,0 (e
x) = ∅, according to Lemma 5.12.
Assumptions of Theorem 5.8 are fulfilled, therefore, x
e is a local minimum of f on Mg,h
b g,h (e
and there is δ > 0 such that δ ≤ δkonv , x
e is a global minimum of f on M
x) ∩ U (e
x, δ).
Q.E.D.
Theorem 5.15 (KKT global FOSC) :
Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a non-empty open set,
f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J, x
e ∈ Mg,h and
f be differentiable at x
e, gi be differentiable in a neighborhood of x
e for each i ∈ Ig (e
x), gi
be continuous at x
e for each i 6∈ Ig (e
x) and hj be continuously differentiable at x
e for each
j ∈ J, f is pseudoconvex at x
e with respect to G, gi is quasiconvex at x
e with respect to
U (e
x, δkonv ) for each i ∈ Ig (e
x), hj is quasiconvex at x
e with respect to G for each j ∈ J
with vj > 0 and hj is quasiconcave at x
e with respect to G for each j ∈ J with vj < 0. Let
gradients ∇hj (e
x), j ∈ J are linearly independent.
If there is a coefficient ui ∈ R for each i ∈ Ig (e
x) and there is a coefficient vj ∈ R for
each j ∈ J such that x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r), then x
e is a global
minimum of f on Mg,h .
b g,h (e
Moreover, x
e is a global minimum of f on M
x).
Proof: If x
e, ui , i ∈ Ig (e
x), vj , j ∈ J fulfill (KKTf gh − r), hence,
Ff,0 (e
x) ∩ G0g,0,0 (e
x) ∩ Hh,0 (e
x) = ∅, according to Lemma 5.12.
Assumptions of Theorem 5.9 are fulfilled, therefore, x
e is a global minimum of f on Mg,h
b g,h (e
and, moreover, x
e is a global minimum of f on M
x).
Q.E.D.
98
TO November 28, 2016:959
5.3
Constraint qualifications
Constraint qualification (Abadiegh [e
x]) is rather of technical character. Practical applications need a simple easy applicable constraint. Let us introduce some of often-used
constraint qualifications.
(LPgh )
Function gi is affine linear for each i ∈ I and hj is affine linear for each j ∈ J.
(Slatergh [e
x])
Function gi is pseudoconvex at x
e with respect to G for each i ∈ Ig (e
x), gi is continuous at
x
e for each i 6∈ Ig (e
x), hj is pseudoconvex, pseudoconcave at x
e with respect to G for each
j ∈ J, hj is continuously differentiable at x
e for each j ∈ J and gradients ∇hj (e
x), j ∈ J
are linearly independent.
There is ξ ∈ G such that gi (ξ) < 0 for each i ∈ Ig (e
x) and hj (ξ) = 0 for each j ∈ J.
(LIgh [e
x])
Gradients ∇gi (e
x), i ∈ Ig (e
x), ∇hj (e
x), j ∈ J are linearly independent and function gi is
continuous at x
e for each i 6∈ Ig (e
x).
(Cottlegh [e
x])
Function gi is continuous at x
e for each i 6∈ Ig (e
x), function hj is continuously differentiable at x
e for each j ∈ J, gradients ∇hj (e
x), j ∈ J are linearly independent and
x) ∩ Hh,0 (e
x).
clo (Gg,0 (e
x) ∩ Hh,0 (e
x)) = G0g,0,0 (e
(M angasarian − F romovitzgh [e
x])
Function gi is continuous at x
e for each i 6∈ Ig (e
x), function hj is continuously differentiable at x
e for each j ∈ J, gradients ∇hj (e
x), j ∈ J are linearly independent and
Gg,0 (e
x) ∩ Hh,0 (e
x) 6= ∅.
(Kuhn − Tuckergh [e
x])
clo AMg (e
x) = G0g,0,0 (e
x) ∩ Hh,0 (e
x).
Introduced conditions are related as next theorem expresses.
99
Petr Lachout November 28, 2016:959
Theorem 5.16 : Let G ⊂ Rn be open, gi : Rn → R is given for each i ∈ I, hj : Rn → R
is given for each j ∈ J and x
e ∈ Mg,h . Let gi be differentiable at x
e for each i ∈ Ig (e
x), gi
is continuous at x
e for each i 6∈ Ig (e
x) and hj be differentiable at x
e for each j ∈ J.
Then, introduced constraint qualifications are related accordingly to the graph:
(LIgh [e
x])
⇓
(Cottlegh [e
x])
(LPgh )
⇐==⇒
⇓
⇒ (Kuhn − T uckergh [e
x])
⇓
(Abadiegh [e
x])
(Slatergh [e
x])
⇓
(M angasarian − F romovitzgh [e
x])
Proof: See the end of 5.3 Chapter in [1], pp.248-249.
Q.E.D.
Comparing with the case of inequalities, constraint qualifications like (Zangwillg [e
x])
is missing. It is because if equalities are among constraints then set of feasible directions
DM (e
x) is typically empty. Hence, such a condition would be fulfilled always. Such
condition would give no restriction, and therefore, would be unusable.
5.4
Second order optimality conditions
In this section, we introduce second order conditions generalizing conditions for unconstrained optimization program; see theorems 3.11 and 3.12. Fortunately, conditions
(KKTf gh ) are starting point.
At first, we introduce necessary notations.
Assumption (SecDif ff gh [e
x])
f is twice differentiable at x
e, gi is twice differentiable at x
e for each i ∈ I and hj is twice
differentiable at x
e for each j ∈ J
Lemma 5.17 Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k}, G ⊂ Rn be a nonempty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R for each j ∈ J,
x ∈ Mg,h , ui ≥ 0 is given for each i ∈ I and vj ∈ R is given for each j ∈ J. We consider
a Lagrange function
X
X
L (x; u, v) = f (x) +
ui gi (x) +
vj hj (x) .
i∈I
j∈J
100
TO November 28, 2016:959
If all functions f , gi , hj are differentiable at x, then L (•; u, v) is differentiable at x and
X
X
∇x L (x; u, v) = ∇f (x) +
ui ∇gi (x) +
vj ∇hj (x) .
(5.10)
i∈I
j∈J
If all functions f , gi , hj are twice differentiable at x, then L (•; u, v) is twice differentiable
at x and
X
X
HL (x; u, v) = Hf (x) +
ui Hgi (x) +
vj Hhj (x) .
(5.11)
i∈I
j∈J
Definition 5.18 For x
e ∈ Mg,h and u ∈ Rm , u ≥ 0, the set of active indexes is divided
in two sets: the set of stongly active indexes (cz. silně aktivnı́)
I+
x, u) = {i ∈ Ig (e
x) : ui > 0}
g (e
(5.12)
and the set of weakly active indexes (cz. slabě aktivnı́)
I0g (e
x, u) = {i ∈ Ig (e
x) : ui = 0} .
(5.13)
Definition 5.19 For x
e ∈ Mg,h and u ∈ Rm , u ≥ 0, we define
cone of strongly influential directions (cz. kužel silně vlivných směrů)
G+
x) = s ∈ Rn : ∀ i ∈ I+
x, u) we have h ∇gi (e
x) , s i = 0
g,0 (e
g (e
(5.14)
and cone of weakly influential directions (cz. kužel slabě vlivných směrů)
G0g,0 (e
x) = s ∈ Rn : ∀ i ∈ I0g (e
x, u) we have h ∇gi (e
x) , s i ≤ 0 .
(5.15)
Theorem 5.20 (KKT SONC) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h , f is twice differentiable at x
e, gi is twice differentiable at x
e for
each i ∈ I and hj is twice differentiable at x
e for each j ∈ J, (LIgh [e
x]).
If x
e is a local minimum of f on Mg,h , then there are Lagrange multipliers u ∈ Rm ,
v ∈ Rk such that the triplet (e
x, u, v) fulfills (KKTf gh ) and for each
+
0
s ∈ Gg,0 (e
x) ∩ Hh,0 (e
x), s 6= 0, we have h s, HL (e
x; u, v) s i ≥ 0.
x) ∩ Gg,0 (e
Proof: (LIgh [e
x]) implies (Abadiegh [e
x]); see Theorem 5.16. Assumptions of Theorem
5.13 are fulfilled, therefore, there are Lagrange multipliers u ∈ Rm , v ∈ Rk such that the
triplet (e
x, u, v) fulfills (KKTf gh ).
All functions are twice differentiable at x
e, hence, Lagrange function is also twice differentiable; see Lemma 5.17, and for each y ∈ Mg,h we have
L (y; u, v) − L (e
x; u, v)
1
hy − x
e, HL (e
x; u, v) (y − x
e) i
2
+ ky − x
ek2 R2 (y − x
e; L, x
e) ,
= h ∇x L (e
x; u, v) , y − x
ei +
101
Petr Lachout November 28, 2016:959
where limy → xe
y ∈ Mg,h
R2 (y − x
e; L, x
e) = 0.
The triplet (e
x, u, v) fulfills (KKTf gh ), therefore, ∇x L (e
x; u, v) = 0. Then the expression
becomes shorter
L (y; u, v) − L (e
x; u, v)
1
hy − x
e, HL (e
x; u, v) (y − x
e) i + ky − x
ek2 R2 (y − x
e; L, x
e) .
=
2
Take s ∈ G+
x) ∩ G0g,0 (e
x) ∩ Hh,0 (e
x), s 6= 0.
g,0 (e
According to Theorem 5.5, there is δ > 0 and a function ζ : [0, δ) → Rn with properties
• ζ is differentiable at (0, δ).
• ζ(0) = x
e, ζ 0 (0+) = s.
• For all 0 < t < δ we have ζ(t) ∈ Mg,h .
Since x
e is a local minimum of f on Mg,h , we have
0 ≤ L (ζ(t); u, v) − L (e
x; u, v)
1
h ζ(t) − x
e, HL (e
x; u, v) (ζ(t) − x
e) i + kζ(t) − x
ek2 R2 (ζ(t) − x
e; L, x
e) ,
=
2
for all t > 0 sufficiently small. Then,
1 ζ(t) − ζ(0)
ζ(t) − ζ(0)
0 ≤
, HL (e
x; u, v)
2
t
t
2
ζ(t) − ζ(0) R2 (ζ(t) − x
+
e; L, x
e) .
t
Letting t → 0+, we are receiving
0 ≤
1
h s, HL (e
x; u, v) s i .
2
Q.E.D.
Theorem 5.21 (KKT SOSC) : Let m, k ∈ N, I = {1, 2, . . . , m}, J = {1, 2, . . . , k},
G ⊂ Rn be a non-empty open set, f : G → R, gi : G → R for each i ∈ I and hj : G → R
for each j ∈ J, x
e ∈ Mg,h , f is twice differentiable at x
e, gi is twice differentiable at x
e for
each i ∈ I and hj is twice differentiable at x
e for each j ∈ J.
If a triplet (e
x, u, v) fulfills (KKTf gh ) and for each s ∈ G+
x) ∩ G0g,0 (e
x) ∩ Hh,0 (e
x),
g,0 (e
s 6= 0 we have h s, HL (x; u, v) s i > 0 then x
e is a strict local minimum of f on Mg,h .
102
TO November 28, 2016:959
Proof: Assume x
e is not a strict local minimum of f on Mg,h . Then there is a sequence
xk ∈ Mg,h , k ∈ N, xk → x
e, xk 6= x
e, f (xk ) ≤ f (e
x).
Define dk = kxk1−exk (xk − x
e), λk = kxk − x
ek. Then xk = x
e + λk dk .
Since kdk k = 1, there is a convergent subsequence dkν , j ∈ N and dkν → η.
Hence,
0 ≥ f (e
x + λkν dkν ) − f (e
x) = h ∇f (e
x) , λkν dkν i + kλkν dkν k R1 (λkν dkν ; f, x
e) ,
∀ i ∈ Ig (e
x)
0 ≥ gi (e
x + λkν dkν ) = h ∇gi (e
x) , λkν dkν i + kλkν dkν k R1 (λkν dkν ; gi , x
e) ,
∀j ∈ J
0 = hj (e
x + λkν dkν ) = h ∇hj (e
x) , λkν dkν i + kλkν dkν k R1 (λkν dkν ; hj , x
e) .
Dividing by λkν , we are receiving
0 ≥ h ∇f (e
x) , dkν i + kdkν k R1 (λkν dkν ; f, x
e) ,
∀ i ∈ Ig (e
x)
0 ≥ h ∇gi (e
x) , dkν i + kdkν k R1 (λkν dkν ; gi , x
e) ,
∀j ∈ J
0 = h ∇hj (e
x) , dkν i + kdkν k R1 (λkν dkν ; hj , x
e) .
Letting ν → +∞, we derive
0 ≥ h ∇f (e
x) , η i ,
∀ i ∈ Ig (e
x)
0 ≥ h ∇gi (e
x) , η i ,
∀j ∈ J
0 = h ∇hj (e
x) , η i .
The triplet (e
x, u, v) fulfills (KKTf gh ), therefore,
∇f (e
x) +
m
X
ui ∇gi (e
x) +
i=1
k
X
vj ∇hj (e
x) = 0,
j=1
ui gi (e
x) = 0 for each i ∈ I,
∀ i ∈ I we have ui ≥ 0.
Since ui = 0 for all i ∈ I \ I+
x, u),
g (e
∇f (e
x) +
X
i∈I+
x,u)
g (e
ui ∇gi (e
x) +
k
X
vj ∇hj (e
x) = 0.
j=1
Then, scalar product with η is giving
h ∇f (e
x) , η i +
X
i∈I+
x,u)
g (e
ui h ∇gi (e
x) , η i +
k
X
j=1
vj h ∇hj (e
x) , η i = 0.
103
Petr Lachout November 28, 2016:959
Consequently, h ∇f (e
x) , η i = 0 and h ∇gi (e
x) , η i = 0 for all i ∈ I+
x, u).
g (e
+
0
We have found η ∈ Gg,0 (e
x) ∩ Gg,0 (e
x) ∩ Hh,0 (e
x), η 6= 0.
Moreover, Lagrangian function fulfills for each ν ∈ N
L (xkν ; u, v) − L (e
x; u, v)
X
X
= f (xkν ) +
ui gi (xkν ) +
vj hj (xkν ) −
i∈I
!
f (e
x) +
j∈J
= f (xkν ) +
X
vj hj (e
x)
j∈J

X
ui gi (xkν ) − f (e
x) +
ui gi (e
x) 
x,u)
i∈I+
g (e
x,u)
i∈I+
g (e
= f (xkν ) − f (e
x) +
ui gi (e
x) +
i∈I

X
X
X
ui gi (xkν ) ≤ 0.
i∈I+
x,u)
g (e
All functions are twice differentiable at x
e, hence, Lagrange function is also twice differentiable at x
e; see Lemma 5.17, and for each ν ∈ N
x; u, v)
0 ≥ L (xkν ; u, v) − L (e
1
x; u, v) λkν dkν i
h λkν dkν , HL (e
2
+ kλkν dkν k2 R2 (λkν dkν ; L, x
e) ,
= h ∇x L (e
x; u, v) , λkν dkν i +
where limν→+∞ R2 (λkν dkν ; L, x
e) = 0.
The triplet (e
x, u, v) fulfills (KKTf gh ), therefore, ∇x L (e
x; u, v) = 0, and, the expression
becomes shorter
0 ≥ L (xkν ; u, v) − L (e
x; u, v)
1
=
h λkν dkν , HL (e
x; u, v) λkν dkν i + kλkν dkν k2 R2 (λkν dkν ; L, x
e) .
2
Dividing by 21 λ2kν , we are receiving
0 ≥ h dkν , HL (e
x; u, v) dkν i + kdkν k2 R2 (λkν dkν ; L, x
e) .
Letting ν → +∞, we derive a contradiction
0 ≥ h η, HL (e
x; u, v) η i ,
x) ∩ G0g,0 (e
x) ∩ Hh,0 (e
x), η 6= 0.
since η ∈ G+
g,0 (e
Q.E.D.
104
TO November 28, 2016:959
Chapter 6
Polar of a cone and normal cone
This last chapter of our text presents recent view of modern optimization. Description
is based on tangent cone, regular tangent cone, normal cone, regular normal cone and
polar of a cone. It is a generalization of views presented in previous chapters.
6.1
Polar of a cone
Definition 6.1 Let K ⊂ Rn be a cone. We define polar of K (cz. polára K)
K o = {v ∈ Rn : ∀ x ∈ K we have h v, x i ≤ 0} .
(6.1)
and bipolar of K (cz. bipolára K)
K oo = K oo = {w ∈ Rn : ∀ v ∈ K o we have h w, v i ≤ 0} .
(6.2)
Basic properties of polar.
Lemma 6.2 If K ⊂ Rn is cone, then K o is a closed convex cone and
K oo = clo (conv (K)).
Proof:
1. For v, s ∈ K o a α ≥ 0, β ≥ 0 we have
h w, αv + βs i = α h w, v i + β h w, s i ≤ 0.
We have shown K o is a convex cone
2. K o is a closed set, since scalar product is a continuous function at each argument.
3. We know, that K oo is a closed convex cone containing K. Hence, K oo ⊃ clo (conv (K)).
105
106
TO November 28, 2016:959
4. Take s 6∈ clo (conv (K)).
According to Theorem 1.32, a point and a closed convex set can be strictly separated. Thus, there is a γ ∈ Rn , γ 6= 0 and ∆ ∈ R such that for each x ∈ K we
have
h γ, x i < ∆ < h γ, s i .
We know, that K is a cone. Hence, for each x ∈ K must be fulfilled
h γ, x i ≤ 0 < ∆ < h γ, s i .
Therefore, γ ∈ K o , but h γ, s i > 0.
Thus, s 6∈ K oo .
Q.E.D.
6.2
Clarke’s regularity
We are familiar with a tangent cone but we need to introduce more, we will employ
normal cone, regular normal cone and for full picture a regular tangent cone will be
described.
Definition 6.3 Let S ⊂ Rn , x
e ∈ clo (S). We say, that s ∈ Rn is
e in the Regular Sense), (cz. regulárnı́
a Regular Normal to S at x
e (or, Normal to S at x
normála k S v x
e) if
∀ x ∈ S we have h s, x − x
e i ≤ kx − x
ek R (x − x
e; s, x
e) ,
(6.3)
where R (x − x
e; s, x
e) → 0 provided x → x
e and x ∈ S.
e) (cz. Regulárnı́
Regular Normal cone to S at x
e (or, Cone of Regular Normals to S at x
b
normálový kužel) NS (e
x) is the set of all regular normal to S at x
e.
Definition 6.4 Let S ⊂ Rn , x
e ∈ clo (S). We say, that s ∈ Rn is a Normal to S at x
e
(or, Normal to S at x
e in the General Sense; Normal Vector to S at x
e), (cz. normála k
bS (xk ) for k ∈ N such that xk → x
Svx
e) if there are sequences xk ∈ S, sk ∈ N
e, sk → s.
Normal cone to S at x
e (or, Cone of Normals to S at x
e), (cz. Normálový kužel k S v
bodě x
e) NS (e
x) is the set of all normals to S at x
e.
Defined objects are really cones and normal cone always contains regular normal
cone.
107
Petr Lachout November 28, 2016:959
bS (e
bS (e
Lemma 6.5 If S ⊂ Rn and x
e ∈ clo (S), then N
x), NS (e
x) are cones and N
x) ⊂
NS (e
x).
Proof:
1. From Definition 6.3, one can derive that any nonnegative multiplication of a regular
bS (e
normal is again a regular normal. Thus, N
x) is a cone.
2. Let s ∈ NS (e
x) and α ≥ 0.
bS (xk ) for k ∈ N such that xk → x
Then, there is a sequence xk ∈ S, sk ∈ N
e, sk → s.
bS (xk ) for k ∈ N and αsk → αs.
Hence, αsk ∈ N
Thus, αs ∈ NS (e
x).
We have shown NS (e
x) is a cone.
3. Inclusion follows immediately Definition 6.4, since constant choice xk = x
e, sk =
bS (e
s∈N
x) for all k ∈ N is allowed.
Q.E.D.
bclo(M) (e
bM (e
Lemma 6.6 If M ⊂ Rn is a set and x
e ∈ clo (M), then N
x) = N
x) and
Nclo(M) (e
x) = NM (e
x).
bclo(M) (e
bM (e
Proof: It is sufficient to show N
x) = N
x).
bclo(M) (e
bM (e
Evidently, N
x) ⊂ N
x).
Thus, opposite inclusion must be shown.
bM (e
Take for that, s ∈ N
x) and define a function
h s, x − x
ei
Q (x − x
e) =
kx − x
ek
= 0 for x = x
e.
+
for all x ∈ clo (M) , x 6= x
e,
We need to show that Q is vanishing for x → x
e, x ∈ clo (M).
Take a sequence xk ∈ clo (M), k ∈ N such that xk → x
e and xk 6= x
e.
Then, there is a sequence yk ∈ M with
1
1
0 < kyk − x
ek < kxk − x
ek + k+1
, kyk − xk k < k+1
kyk − x
ek.
108
TO November 28, 2016:959
Hence, yk → x
e and
Q (xk − x
e) =
h s, xk − x
ei
kxk − x
ek
+
+
h s, yk − x
e i + h s, xk − yk i
kyk − x
ek − kyk − xk k
+
1
h s, yk − x
e i + h s, xk − yk i
1
kyk − x
ek
1 − k+1
"
+ + #
h s, yk − x
ei
1
h s, xk − yk i
+
1
kyk − x
ek
kyk − x
ek
1 − k+1
ksk kxk − yk k
1
+
(R (yk − x
e; s, x
e)) +
1
kyk − x
ek
1 − k+1
1
ksk
(R (yk − x
e; s, x
e))+ +
1
k+1
1 − k+1
−−−−−→ 0 .
≤
≤
≤
≤
≤
k→+∞
bclo(M) (e
Thus, (6.3) is verified and s ∈ N
x).
Q.E.D.
bM∩S (x) =
Lemma 6.7 Let M ⊂ Rn , x ∈ clo (M) and S ⊂ Rn , x ∈ int (S). Then, N
bclo(M)∩clo(S) (x) = N
bM (x) = N
bclo(M) (x) and NM∩S (x) = Nclo(M)∩clo(S) (x) = NM (x) =
N
Nclo(M) (x).
bM∩S (x) = N
bM (x). The rest of the statement follows it,
Proof: We have to show N
since Definition 6.4 and Lemma 6.6.
bM∩S (x) ⊃ N
bM (x) is evident.
1. Since M ∩ S ⊂ M, the inclusion N
bM∩S (x).
2. Assume s ∈ N
Define a function
h s, x − x
ei
Q (x − x
e) =
kx − x
ek
= 0 for x = x
e.
+
for all x ∈ M, x 6= x
e,
We need to show that Q is vanishing for x → x
e, x ∈ M.
Take a sequence xk ∈ M, k ∈ N such that xk → x
e and xk 6= x
e.
109
Petr Lachout November 28, 2016:959
Then, xk ∈ M ∩ S for all k large enough.
Therefore,
Q (xk − x
e) =
h s, xk − x
ei
kxk − x
ek
+
≤ (R (xk − x
e; s, x
e))+ −−−−−→ 0
k→+∞
for k large enough.
bM (e
Thus, (6.3) is verified and s ∈ N
x).
Q.E.D.
bS (e
bS (e
Theorem 6.8 : If S ⊂ Rn and x
e ∈ clo (S), then TS (e
x) o = N
x), N
x) o ⊃ TS (e
x).
Proof:
bS (e
1. Let s ∈ TS (e
x), s 6= 0 and v ∈ N
x).
Then, there is a sequence xk ∈ S, λk > 0, k ∈ N such that
xk → x
e, λk (xk − x
e) → s.
bS (e
Since v ∈ N
x),
∀ k ∈ N we have h v, xk − x
e i ≤ kxk − x
ek R (xk − x
e; v, x
e) .
Hence for each k ∈ N we have
h v, λk (xk − x
e) i ≤ kλk (xk − x
e)k R (xk − x
e; v, x
e) .
Letting k → +∞ we receive h v, s i ≤ 0.
bS (e
bS (e
We have found TS (e
x) o ⊃ N
x), N
x) o ⊃ TS (e
x).
2. Let v ∈ TS (e
x) o .
For k ∈ N we consider
Mk
N0
1
1
=
x∈S :
< kx − x
ek ≤
k+1
k
= {k ∈ N : Mk 6= ∅} .
If k ∈ N0 we denote
∆k = sup {h v, x − x
e i : x ∈ Mk } .
,
110
TO November 28, 2016:959
(a) If N0 is a finite set, there is some k0 ∈ N such that there is no x ∈ S fulfilling
0 < kx − x
ek ≤ k10 .
Hence, x
e is an isolated point of S and therefore
bS (e
TS (e
x) = {0}, (TS (e
x))o = N
x) = R n .
(b) Let N0 be an infinite set.
From definition of supremum for each k ∈ N0 , we find xk ∈ Mk such that
h v, xk − x
ei +
1
> ∆k .
k2
Let s ∈ Rn be a cluster point of the sequence
infinite subset N1 ⊂ N0 with property
xk −e
x
,
kxk −e
xk
k ∈ N0 , i.e. there is an
xk − x
e
= s.
k → +∞ kxk − x
ek
lim
k ∈ N1
Then, s ∈ TS (e
x), s 6= 0.
Scalar product is continuous and v ∈ TS (e
x) o , therefore,
xk − x
e
lim
v,
= h v, s i ≤ 0.
k → +∞
kxk − x
ek
k ∈ N1
Result is in power for each cluster point, hence, for original sequence we have
xk − x
e
lim sup v,
≤ 0.
kxk − x
ek
k → +∞
k ∈ N0
Take finally x ∈ S, 0 < kx − x
ek ≤ 1.
Find a k ∈ N0 such that x ∈ Mk and estimate
1
h v, x − x
e i ≤ ∆k < h v, xk − x
ei + 2
k xk − x
e
1
≤ kxk − x
ek v,
+ 2
kxk − x
ek
k
1
xk − x
e
1
≤
v,
+
k
kxk − x
ek
k
k+1
xk − x
e
≤ kx − x
ek
v,
+
k
kxk − x
ek
k+1
xk − x
e
≤ kx − x
ek
v,
+
k
kxk − x
ek
1
k
1
k
+
.
111
Petr Lachout November 28, 2016:959
If x → x
e, then k → +∞ and
+
xk − x
e
1
k+1
v,
+
lim
= 0.
k→+∞
k
kxk − x
ek
k
bS (e
We verified (6.3), therefore v ∈ N
x).
Q.E.D.
Polar of a normal cone has also certain importance.
Definition 6.9 For S ⊂ Rn and x
e ∈ clo (S) we define Regular Tangent Cone to S at x
e
(or, Cone of Regular Tangent Vectors of S at x
e) (cz. regulárnı́ tený kužel k S v x
e) by


for each xk ∈ S, k ∈ N, xk → x
e,




for
each
λ
>
0,
k
∈
N,
λ
%
+∞,
k
k
n
s∈R :
.
(6.4)
TbS (e
x) =
there is ξk ∈ S, k ∈ N,




such that ξk → x
e, λk (ξk − xk ) → s.
At first, consider basic properties of a regular tangent cone.
Theorem 6.10 : If S ⊂ Rn and x
e ∈ clo (S), then TbS (e
x) is a closed convex cone.
Proof:
1. According to definition, 0 ∈ TbS (e
x), since one can take ξk = xk .
2. Take s ∈ TbS (e
x) and α > 0.
For sequences xk ∈ S, k ∈ N, xk → x
e, λk > 0. λk % +∞ we have a sequence
λk
ξk ∈ S, k ∈ N, such that ξk → x
e, α (ξk − xk ) → s.
Hence, λk (ξk − xk ) → αs and αs ∈ TbS (e
x).
We have shown TbS (e
x) is a cone.
3. Let sj ∈ TbS (e
x) for each j ∈ N and sj → s.
Then, for sequences xk ∈ S, k ∈ N, xk → x
e, λk > 0. λk % +∞ we have sequences
ξj,k ∈ S, j, k ∈ N, such that for fixed j we have ξj,k → x
e, λk (ξj,k − xk ) → sj letting
k → +∞.
Select indexes kj ∈ N, j ∈ N such that
1 < k1 < k2 < k3 < . . . .,
1
1
kξj,k − x
ek < , kλk (ξj,k − xk ) − sj k < for all k ≥ kj .
j
j
112
TO November 28, 2016:959
Set
ηk = ξ1,k for all 1 ≤ k < k1 ,
= ξj,k for all kj ≤ k < kj+1 , j ∈ N.
Then,
ηk → x
e, λk (ηk − xk ) → s.
Hence, s ∈ TbS (e
x) and, therefore, TbS (e
x) is a closed set.
4. If s, v ∈ TbS (e
x), then for sequences xk ∈ S, k ∈ N, xk → x
e, λk > 0. λk % +∞
we have sequences ξk , ηk ∈ S, k ∈ N such that ξk → x
e, ηk → x
e, λk (ξk − xk ) → s,
λk (ηk − ξk ) → v.
Then, λk (ηk − xk ) → s + v.
We have verified TbS (e
x) is a closed convex cone.
Q.E.D.
Lemma 6.11 If M ⊂ Rn is a set and x
e ∈ clo (M), then Tbclo(M) (e
x) = TbM (e
x).
Proof:
1. Let s ∈ Tbclo(M) (e
x), s 6= 0.
Take sequences xk ∈ M, λk > 0 for k ∈ N such that xk → x
e, λk % +∞.
There is a sequence ξk ∈ clo (M), k ∈ N such that ξk → x
e, λk (ξk − xk ) → s.
Hence, there is a sequence ηk ∈ M, k ∈ N such that kηk − ξk k <
1
.
λ2k
Then, ηk → x
e and
λk (ηk − xk ) = λk (ξk − xk ) + λk (ηk − ξk ) → s,
since λk kηk − ξk k <
1
λk
→ 0.
We have shown s ∈ TbM (e
x).
2. Let s ∈ TbM (e
x), s 6= 0.
Take sequences xk ∈ clo (M), λk > 0 for k ∈ N such that xk → x
e, λk % +∞.
Hence, there is a sequence yk ∈ M, k ∈ N such that kyk − xk k <
1
.
λ2k
Then, there is a sequence ξk ∈ M, k ∈ N such that ξk → x
e, λk (ξk − yk ) → s.
113
Petr Lachout November 28, 2016:959
Then,
λk (ξk − xk ) = λk (ξk − yk ) + λk (yk − xk ) → s,
since λk kyk − xk k <
1
λk
→ 0.
We have shown s ∈ Tbclo(M) (e
x).
Q.E.D.
Lemma 6.12 Let M ⊂ Rn , x ∈ clo (M) and S ⊂ Rn , x ∈ int (S). Then, TbM∩S (x) =
Tbclo(M)∩clo(S) (x) = TbM (x) = Tbclo(M) (x).
Proof: We have to show TbM∩S (x) = TbM (x). The rest of the statement follows Lemma
6.11.
1. Let s ∈ TbM∩S (e
x).
Take sequences xk ∈ M, λk > 0 for k ∈ N such that xk → x
e, λk % +∞.
There is a k0 ∈ N such that xk ∈ M ∩ S for all k ≥ k0 , k ∈ N.
Form a new sequence
yk = xk for k ≥ k0 ,
= xk0 for k < k0 .
Then, yk ∈ M ∩ S for all k ∈ N and yk → x
e.
Hence, there is a sequence ξk ∈ M ∩ S, k ∈ N such that
ξk → x
e, λk (ξk − yk ) → s.
Finally, λk (ξk − xk ) → s. and, therefore, s ∈ TbM (e
x).
2. Let s ∈ TbM (e
x).
Take sequences xk ∈ M ∩ S, λk > 0 for k ∈ N such that xk → x
e, λk % +∞.
Then, there is a sequence ξk ∈ M, k ∈ N such that ξk → x
e, λk (ξk − xk ) → s.
There is a k0 ∈ N such that ξk ∈ M ∩ S for all k ≥ k0 , k ∈ N.
Form a new sequence
ηk = ξk for k ≥ k0 ,
= ξk0 for k < k0 .
Then, ηk ∈ M ∩ S for all k ∈ N and ηk → x
e.
Moreover, λk (ηk − xk ) → s.
We have shown s ∈ TbM∩S (e
x).
114
TO November 28, 2016:959
Q.E.D.
Theorem 6.13 : If S ⊂ Rn and x
e ∈ clo (S), then TbS (e
x) ⊂ TS (e
x).
Proof: Take w ∈ TbS (e
x).
Since, x
e ∈ clo (S) one can find a sequence xk ∈ S, k ∈ N such that kxk − x
ek < k12 .
Set λk = k for k ∈ N.
Since w ∈ TbS (e
x), there is a sequence ξk ∈ S, k ∈ N, such that ξk → x
e, λk (ξk − xk ) → w.
Moreover, λk (ξk − x
e) = λk (ξk − xk ) + λk (xk − x
e) → w.
Hence, w ∈ TS (e
x).
Q.E.D.
Definition 6.14 Let S ⊂ Rn , x
e ∈ clo (S). We say, that set S is locally closed at x
e (cz.
lokálně uzavená v x
e), if there is δ > 0 such that V (e
x, δ) ∩ S is a closed set.
Theorem 6.15 : Let S ⊂ Rn and x
e ∈ clo (S). If S is locally closed at x
e, then
For each xk ∈ S, k ∈ N, xk → x
e,
n
s ∈ R : there are sk ∈ TS (xk ), k ∈ N
such that sk → s.
(
TbS (e
x) =
)
(6.5)
Proof: See 6.26 Theorem in [9], p.217.
Q.E.D.
Theorem 6.16 : Let S ⊂ Rn a x
e ∈ clo (S). If S is locally closed at x
e, then TbS (e
x) =
o
NS (e
x) .
Proof: See 6.28 Theorem in [9], p.220.
Q.E.D.
Definition 6.17 Let S ⊂ Rn , x
e ∈ S. We say S is regular at x
e in the Sense of Clarke
bS (e
(cz. regulárnı́ ve smyslu Clarka), whenever S is locally closed at x
e and NS (e
x) = N
x).
Petr Lachout November 28, 2016:959
115
Lemma 6.18 Let S ⊂ Rn be convex, x
e ∈ S. Then, S is geometrically attainable at x
e
and we have
TS (e
x) = clo ({s ∈ Rn : ∃ λ > 0 such that x
e + λs ∈ S}) ,
n
int (TS (e
x)) = {s ∈ R : ∃ λ > 0 such that x
e + λs ∈ int (S)} ,
n
b
NS (e
x) = NS (e
x) = {s ∈ R : ∀ x ∈ S we have h s, x − x
e i ≤ 0} .
(6.6)
(6.7)
(6.8)
Therefore, convex set S is regular at x
e in sense of Clarke if and only if S is locally closed
at x
e.
Proof: See 6.9 Theorem in [9], p.203.
Q.E.D.
6.3
Mathematical program
Theorem 6.19 (MP-normal FONC) : Let M ⊂ G ⊂ Rn , G be an open set, x
e ∈ M and
f : G → R be differentiable at x
e.
bM (e
If x
e is a local minimum of f on M, then −∇x f (e
x) ∈ N
x).
Proof: If x
e is a local minimum of f on M, then there is a δ > 0 such that x
e is a global
minimum of f on M ∩ V (e
x, δ). Therefore, for each x ∈ M ∩ V (e
x, δ) we have
0 ≤ f (x) − f (e
x) = h ∇x f (e
x) , x − x
e i + kx − x
ek R1 (x − x
e; f, x
e) .
Consequently, for each x ∈ M ∩ V (e
x, δ) we have
− h ∇x f (e
x) , x − x
e i ≤ kx − x
ek R1 (x − x
e; f, x
e) .
Validity of the formula can be extended for all x ∈ M.
bM (e
Finally, −∇x f (e
x) ∈ N
x).
Q.E.D.
Consider, the condition can be equivalently reformulated or replaced by a condition
which is not so severe.
Lemma 6.20 Let M ⊂ G ⊂ Rn , G be an open set, x
e ∈ M and f : G → R be
differentiable at x
e. Then, we have
bM (e
−∇x f (e
x) ∈ N
x) ⇐⇒ ∀ s ∈ TM (e
x) we have h ∇x f (e
x) , s i ≥ 0,
bM (e
−∇x f (e
x) ∈ N
x) =⇒ −∇x f (e
x) ∈ NM (e
x) .
(6.9)
(6.10)
If M is convex, then
bM (e
−∇x f (e
x) ∈ N
x) ⇐⇒ ∀ x ∈ M we have h ∇x f (e
x) , x − x
e i ≥ 0.
(6.11)
116
TO November 28, 2016:959
Proof: The equivalence (6.9) follows Theorem 6.8.
The implication (6.10) follows Lemma 6.5.
The equivalence (6.11) follows Lemma 6.18.
Q.E.D.
Theorem 6.21 (MP-normal FOSC) : Let M ⊂ G ⊂ Rn , M ⊂ Rn be a convex set, G
be an open convex set, x
e ∈ M, f : G → R be convex and differentiable at x
e.
bM (e
If −∇x f (e
x) ∈ N
x), then x
e is a global minimum of f on M.
Proof: Since function f : G → R is convex and differentiable at x
e and according to
(6.11), we have
∀ x ∈ M f (x) − f (e
x) ≥ h ∇x f (e
x) , x − x
e i ≥ 0.
It means, x
e is a global minimum of f on M.
Q.E.D.
Condition used in Theorems 6.19, 6.21 coincides with the condition used in Theorems
3.27, 3.28.
Lemma 6.22 Let M ⊂ G ⊂ Rn , G be an open set, x
e ∈ M and f : G → R be
bM (e
differentiable at x
e. Then, Ff,0 (e
x) ∩ TM (e
x) = ∅ iff −∇x f (e
x) ∈ N
x).
Proof:
1. Assume Ff,0 (e
x) ∩ TM (e
x) = ∅ and take s ∈ TM (e
x).
Then, s 6∈ Ff,0 (e
x).
Therefore, h ∇x f (e
x) , s i ≥ 0 or, equivalently, h −∇x f (e
x) , s i ≤ 0.
bM (e
We have verified −∇x f (e
x) ∈ TS (e
x) o = N
x).
bM (e
2. Assume −∇x f (e
x) ∈ N
x) and take s ∈ TM (e
x).
Then, h −∇x f (e
x) , s i ≤ 0 or, equivalently, h ∇x f (e
x) , s i ≥ 0.
Hence, s 6∈ Ff,0 (e
x).
We have verified Ff,0 (e
x) ∩ TM (e
x) = ∅.
Q.E.D.
If a set of feasible solutions is described by constraints, we possess characterization
of normal cone.
Petr Lachout November 28, 2016:959
6.4
117
Lagrange coefficients
In this chapter, we assume the fillowing setting.
Assumption (BaseGD )
Let m ∈ N, I = {1, 2, . . . , m}, W ⊂ G ⊂ Rn , W be a non-empty closed set, G be an open
set, D ⊂ Rm be a non-empty closed set, f : G → R and gi : G → R be given for each
i ∈ I.
Assumption (Dif fGD [e
x])
Function gi is differentiable at x
e for each i ∈ Ig (e
x).
Assumption (Dif ff [e
x])
Function f is differentiable at x
e.
Definition 6.23 Assume (BaseGD ). and consider a vector function G = (g1 , g2 , . . . , gm ) :
G → Rm . We define a set of feasible solutions
MG,D = {x ∈ W : G (x) ∈ D} .
(6.12)
We will need a condition on conditional linear independence.
(LIGD [e
x])
Let (BaseGD ), x
e ∈ MG,D and (Dif fGD [e
x]), We say, MG,D fulfills at x
e a condition on
conditional linear independence (cz. podmı́nka podmı́nné lineárnı́ nezávislosti), whenever the only solution of
−
m
X
yi ∇x gi (e
x) ∈ NW (e
x) , y ∈ ND (G (e
x))
(6.13)
i=1
is trivial, i.e. y = 0.
Theorem 6.24 (Lagrange FONC) : Let (BaseGD ), x
e ∈ MG,D , (Dif fGD [e
x]) and (LIGD [e
x]).
Then,
( m
)
X
NMG,D (e
x) ⊂
yi ∇x gi (e
x) + z : y ∈ ND (G (e
x)) , z ∈ NW (e
x) .
(6.14)
i=1
Proof: See 6.14 Theorem in [9], p.208.
Q.E.D.
118
TO November 28, 2016:959
Theorem 6.25 (Lagrange FONC-regulárnı́) : Let (BaseGD ), x
e ∈ MG,D , (Dif fGD [e
x])
and (LIGD [e
x]). If W is regular at x
e in sense of Clarke, D is regular at G (e
x) in sense of
Clarke, then MG,D is regular at x
e in sense of Clarke and
( m
)
X
NMG,D (e
x) =
yi ∇x gi (e
x) + z : y ∈ ND (G (e
x)) , z ∈ NW (e
x) .
(6.15)
i=1
Proof: See 6.14 Theorem in [9], p.208.
Q.E.D.
Theorem 6.26 (Lagrange FOSC) : Let (BaseGD ), x
e ∈ MG,D and (Dif fGD [e
x]).
Then,
)
( m
X
bM (e
bD (G (e
bW (e
N
x) ⊃
yi ∇x gi (e
x) + z : y ∈ N
x)) , z ∈ N
x) .
(6.16)
G,D
i=1
Proof: See 6.14 Theorem in [9], p.208.
Q.E.D.
Theorem 6.27 (MP-GD FONC) : Let (BaseGD ), x
e ∈ MG,D , (Dif fGD [e
x]), (LIGD [e
x])
and (Dif ff [e
x]). Let W be regular at x
e in sense of Clarke, D be regular at G (e
x) in sense
of Clarke.
If x
e is a local minimum of f on MG,D , then
( m
)
X
bD (G (e
bW (e
−∇x f (e
x) ∈
yi ∇x gi (e
x) + z : y ∈ N
x)) , z ∈ N
x) .
(6.17)
i=1
Proof: Consequence of Theorems 6.19 and 6.25.
Q.E.D.
Theorem 6.28 (MP-GD FOSC) : Let (BaseGD ), x
e ∈ MG,D , (Dif fGD [e
x]) and (Dif ff [e
x]).
If MG,D is convex and f is convex and
( m
)
X
bD (G (e
bW (e
−∇x f (e
x) ∈
yi ∇x gi (e
x) + z : y ∈ N
x)) , z ∈ N
x) ,
(6.18)
i=1
then x
e is a global minimum of f on MG,D .
Proof: Consequence of Theorems 6.21 and 6.26.
Q.E.D.
Chapter 7
Appendix
7.1
Basic on Cone Calculus
Introduced cones describe local properties of sets, functions.
Lemma 7.1 Let M ⊂ Rn , x ∈ clo (M). Then,


r > 0,


∃
x
∈
M,
x
=
6
x
e
for
k
∈
N
k
k
TM (e
x) = {0} ∪ rv :
.
x

n 
→
v
∈
R
such that xk → x
e, kxxkk −e
−e
xk
(7.1)
Proof:
1. Evidently,

r > 0,

∃
x
∈
M,
x
=
6
x
e
for
k
∈
N
k
k
TM (e
x) ⊃ {0} ∪ rv :
.
−e
x

n 
→
v
∈
R
such that xk → x
e, kxxkk −e
xk


2. Take s ∈ TM (x), s 6= 0.
Then, there are sequences xk ∈ M, λk > 0 for k ∈ N such that xk → x,
λk (xk − x) → s.
Consequently,
λk kxk − xk = kλk (xk − x)k → ksk .
Since s 6= 0, xk 6= x eventually.
Hence,
xk − x
λk (xk − x)
s
=
→
.
kxk − xk
λk kxk − xk
ksk
119
120
TO November 28, 2016:959
Therefore,



r > 0,

e for k ∈ N
s ∈
rv : ∃ xk ∈ M, xk 6= x
.
−e
x

n 
→
v
∈
R
such that xk → x
e, kxxkk −e
xk
Q.E.D.
Lemma 7.2 Let T ⊂ R, e
t ∈ int (T ), f : T → Rn be (cz. prost). Consider M =
{f (t) : t ∈ T } and ϕ : M → T inverse function to f ; i.e. ϕ ◦ f (t)
∈ T.
= t for all t 0
If f is differentiable at e
t, f e
t =
6 0 and ϕ is continuous at f e
t then TM f e
t =
0 rf e
t : r∈R .
Proof: Take
, λk > 0 for k ∈ N such that
sequences tk ∈ T
e
e
f (tk ) → f t , λk f (tk ) − f t → ξ ∈ Rn , ξ 6= 0.
Then, tk 6= e
t and tk → e
t, since ϕ is continuous at f e
t .
Using differentiability of f at e
t, we have
f (tk ) − f e
t = tk − e
t f0 e
t + |tk − e
t|R1 tk − e
t; f, e
t ,
where limh→0 R1 h; f, e
t = 0.
Then,
lim λk tk − e
t = c ∈ R, c 6= 0
k→+∞
ξ =
=
lim λk f (tk ) − f e
t
k→+∞
lim λk
= cf 0 e
t .
k→+∞
f (tk ) − f e
t
tk − e
t lim
k→+∞
tk − e
t
Q.E.D.
Bibliography
[1] Bazara, M.S.; Sherali, H.D.; Shetty, C.M.: Nonlinear Programming. Theory and
Algorithms. 2nd edition, Wiley, New York, 1993.
[2] Dupaov, J.: Linern programovn. SPNP, Praha, 1982.
[3] Plesnk, J.; Dupaov, J.; Vlach, M.: Linerne programovanie, Alfa, Bratislava, 1990.
[4] Habala, P.; Hjek, P.; Zizler, V.: Introduction to Banach Spaces I-II. MatfyzPress,
Praha, 1996.
[5] Lachout, P.: Teorie pravdpodobnosti. Skripta MFF UK, Praha, 1998.
[6] Lachout, P.: Konvexita v konen dimenzi. Text na m webov strnce, verze 16.10.2011.
[7] Lachout, P.: Matematick programovn. Text na m webov strnce, verze 16.10.2011.
[8] Rockafellar, T.: Convex Analysis., Springer-Verlag, Berlin, 1975.
[9] Rockafellar, T.; Wets, R. J.-B.: Variational Analysis. Springer-Verlag, Berlin, 1998.
121
Index
bipolar, 109
gradient, 28
Hessian, 32
hypograph, 25, 41
level set, 40, 56, 57
linear, 7
monotone, 53
partial derivative, 28
proper, 25
pseudoconcave, 55
pseudoconvex, 55
quasiconvex, 56
second partial derivatives, 32
strictly concave, 41
strictly convex, 41
strictly pseudoconcave, 55
strictly pseudoconvex, 55
subdiferential, 49
subgradient, 49
twice differentiable, 32
weak domain, 53
condition
complementarity slackness, 86, 98
dual feasibility, 86, 98
optimality
Karush-Kuhn-Tucker, 85
primal feasibility, 86, 98
conditional linear independence, 122
cone, 9
attainable directions, 90
convex, 10
convex polyhedral, 10
influential directions, 104
regular tangent, 115
tangent, 68
with vertex at a point, 10
cone influential directions, 104
constraint qualification, 86, 91, 99
Abadie, 86, 99
direction
feasible, 67
improving, 71
function
affine linear, 7
chain rule, 31
concave, 41
continuously differentiable, 28, 36
convex, 37, 54
differentiable, 27, 27, 28, 28, 30
differentiable in direction, 27
domain, 25, 53
epigraph, 25, 53
halfspace
supporting, 15
hyperplane
supporting, 15
inequality
Jensen, 44
Jensen, 43
Lagrange function, 86, 97
Lagrange coefficients, 86, 98
reduced, 85, 97
normal, 110
122
123
Petr Lachout November 28, 2016:959
regular, 110
normal cone, 110
regular, 110
optimality condition
FONC, 59
FOSC, 59
Karush-Kuhn-Tucker, 97
SONC, 59
SOSC, 59
polar, 109
projection point, 12
regular set
Clarke, 119
scalar product, 7
separation
proper, 17, 19
strict, 17, 18
set
active indexes, 77
convex, 8
convex hull, 8
convex polyhedral, 9
convex polyhedron, 9
direction, 10
extremal directions, 11
extremal point, 11
feasible directions, 67, 77, 102
feasible solutions, 67, 77, 77, 93, 121
relaxed, 77, 93
geometrically attainable, 90
improving directions, 71
locally closed, 118
nonnegative hull, 10
set feasible solutions, 93
set of active indexes, 103
strongly, 103
set of feasible solutions, 41, 63, 121
theorem
about implicit function, 94

Download Report

Optimization Theory Petr Lachout

Paperzz.com

Your Paperzz