c Marina A. Epelman
IOE 519: NLP, Winter 2012
4
13
Optimality conditions for unconstrained problems
The definitions of global and local solutions of optimization problems are intuitive, but usually
impossible to check directly. Hence, we will derive easily verifiable conditions that are either
necessary for a point to be a local minimizer (thus helping us to identify candidates for minimizers),
or sufficient (thus allowing us to confirm that the point being considered is a local minimizer), or,
sometimes, both.
(P) min f (x)
s.t. x 2 X,
where x = (x1 , . . . , xn )T 2 Rn , f : Rn ! R, and X — an open set.
4.1
Optimality conditions: the necessary and the sufficient
Necessary condition for local optimality: “if x̄ is a local minimizer of (P), then x̄ must satisfy...”
Such conditions help us identify all candidates for local optimizers.
Theorem 4.1 Suppose that f is di↵erentiable at x̄ 2 X. If there is a vector d such that rf (x̄)T d <
0, then for all > 0 sufficiently small, f (x̄+ d) < f (x̄) (d is called a descent direction if it satisfies
the latter condition).
Proof: Note that since X is an open set, x̄ + d 2 X for sufficiently small
> 0. We have:
f (x̄ + d) = f (x̄) + rf (x̄)T d + kdk↵x̄ ( d),
where ↵x̄ ( d) ! 0 as
! 0. Rearranging,
f (x̄ + d)
f (x̄)
Since rf (x̄)T d < 0 and ↵x̄ ( d) ! 0 as
= rf (x̄)T d + kdk↵x̄ ( d).
! 0, f (x̄ + d)
f (x̄) < 0 for all
> 0 sufficiently small.
Corollary 4.2 Suppose f is di↵erentiable at x̄. If x̄ is a local minimizer, then rf (x̄) = 0 (such a
point is called a stationary point).
Proof: If rf (x̄) 6= 0, then d =
minimizer.
rf (x̄) is a descent direction, whereby x̄ cannot be a local
The above corollary is a first order necessary optimality condition for an unconstrained minimization
problem. However, a stationary point can be a local minimizer, a local maximizer, or neither. The
following theorem will provide a second order necessary optimality condition. First, a definition:
Definition 4.3 An n ⇥ n matrix M is called symmetric if Mij = Mji 8i, j. A symmetric n ⇥ n
matrix M is called
• positive definite if dT M d > 0 8d 2 Rn , d 6= 0
• positive semidefinite if dT M d
0 8d 2 Rn
• negative definite if dT M d < 0 8d 2 Rn , d 6= 0
• negative semidefinite if dT M d 0 8d 2 Rn
c Marina A. Epelman
IOE 519: NLP, Winter 2012
14
• indefinite if 9d, p 2 Rn : dT M d > 0, pT M p < 0.
Example 1
M=
✓
2 0
0 3
◆
is positive definite.
Example 2
M=
✓
8
1
1
1
◆
is positive definite. To see this, note that for d 6= 0,
dT M d = 8d21
2d1 d2 + d22 = 7d21 + (d1
d2 ) 2 > 0 .
Since M is a symmetric matrix, all its eigenvalues are real numbers. It can be shown that M is
positive semidefinite if and only if all of its eigenvalues are nonnegative, positive definite if all of
its eigenvalues are positive, etc.
Theorem 4.4 Suppose that f is twice continuously di↵erentiable at x̄ 2 X. If x̄ is a local minimizer, then rf (x̄) = 0 and H(x̄) (the Hessian at x̄) is positive semidefinite.
Proof: From the first order necessary condition, rf (x̄) = 0. Suppose H(x̄) is not positive
semidefinite. Then 9d such that dT H(x̄)d < 0. We have:
f (x̄ + d) = f (x̄) + rf (x̄)T d +
1
2
1 2 T
d H(x̄)d +
2
! 0. Rearranging,
= f (x̄) +
where ↵x̄ ( d) ! 0 as
f (x̄ + d)
f (x̄)
2
Since dT H(x̄)d < 0 and ↵x̄ ( d) ! 0 as
— contradiction.
Example 3 Let
2 T
2
d H(x̄)d +
2
kdk2 ↵x̄ ( d)
kdk2 ↵x̄ ( d),
1
= dT H(x̄)d + kdk2 ↵x̄ ( d).
2
! 0, f (x̄ + d)
1
f (x) = x21 + x1 x2 + 2x22
2
f (x̄) < 0 for all
4x1
4x2
x32 .
Then
rf (x) = x1 + x2
and
H(x) =
4, x1 + 4x2
✓
1
1
1 4 6x2
4
◆
> 0 sufficiently small
3x22
T
,
.
rf (x) = 0 has exactly two solutions: x̄ = (4, 0) and x̃ = (3, 1). But
✓
◆
1 1
H(x̃) =
1
2
is indefinite, therefore, the only possible candidate for a local minimizer is x̄ = (4, 0).
c Marina A. Epelman
IOE 519: NLP, Winter 2012
15
Necessary conditions only allow us to come up with a list of candidate points for minimizers.
Sufficient condition for local optimality: “if x̄ satisfies ..., then x̄ is a local minimizer of (P).”
Theorem 4.5 Suppose that f is twice di↵erentiable at x̄. If rf (x̄) = 0 and H(x̄) is positive
definite, then x̄ is a (strict) local minimizer.
Proof:
1
f (x) = f (x̄) + (x x̄)T H(x̄)(x x̄) + kx x̄k2 ↵x̄ (x x̄).
2
Suppose that x̄ is not a strict local minimizer. Then there exists a sequence xk ! x̄ such that
xk 6= x̄ and f (xk ) f (x̄) for all k. Define dk = kxxkk x̄x̄k . Then
f (xk ) = f (x̄) + kxk
x̄k
2
✓
1 T
d H(x̄)dk + ↵x̄ (xk
2 k
◆
1 T
d H(x̄)dk + ↵x̄ (xk
2 k
x̄) =
f (xk )
kxk
x̄) , so
f (x̄)
0.
x̄k2
Since kdk k = 1 for any k, there exists a subsequence of {dk } converging to some point d such that
kdk = 1 (by Theorem 3.9). Assume wolog that dk ! d. Then
0
1 T
dk H(x̄)dk + ↵x̄ (xk
k!1 2
lim
1
x̄) = dT H(x̄)d,
2
which is a contradiction with positive definiteness of H(x̄).
Note:
• If rf (x̄) = 0 and H(x̄) is negative definite, then x̄ is a local maximizer.
• If rf (x̄) = 0 and H(x̄) is positive semi definite, we cannot be sure if x̄ is a local minimizer.
Example 4 Consider the function
1
1
1
f (x) = x31 + x21 + 2x1 x2 + x22
3
2
2
x2 + 9.
Stationary points are candidates for optimality; to find them we solve
✓ 2
◆
x1 + x1 + 2x2
rf (x) =
= 0.
2x1 + x2 1
Solving the above system of equations results in two stationary points: xa = (1, 1)T and xb =
(2, 3). The Hessian is
✓
◆
2x1 + 1 2
H(x) =
.
2
1
In particular,
H(xa ) =
✓
3 2
2 1
◆
, and H(xb ) =
✓
5 2
2 1
◆
.
Here, H(xa ) is indefinite, hence xa is neither a local minimizer or maximizer. H(xb ) is positive
definite, hence xb is a local minimizer. Therefore, the function has only one local minimizer —
does this mean that it is also a global minimizer?
c Marina A. Epelman
IOE 519: NLP, Winter 2012
4.2
16
Convexity and minimization
Definitions:
• Let x, y 2 Rn . Points of the form x + (1
)y are called convex combinations of x and
y forP 2 [0, 1]. More generally, point
P y is a convex combination of points x1 , . . . , xk if
y = ki=1 ↵i xi where ↵i 0 8i, and ki=1 ↵i = 1.
• A set S ⇢ Rn is called convex if 8x, y 2 S and 8 2 [0, 1], x + (1
)y 2 S.
• A function f : S ! R, where S is a nonempty convex set is a convex function if
f ( x + (1
)y) f (x) + (1
)f (y) 8x, y 2 S, 8 2 [0, 1].
• A function f as above is called a strictly convex function if the inequality above is strict for
all x 6= y and 2 (0, 1).
• A function f : S ! R is called concave (strictly concave) if ( f ) is convex (strictly convex).
Consider the problem:
(CP) minx
f (x)
s.t. x 2 S .
Theorem 4.6 Suppose S is a nonempty convex set, f : S ! R is a convex function, and x̄ is a
local minimizer of (CP). Then x̄ is a global minimizer of f over S.
Proof: Suppose x̄ is not a global minimizer, i.e., 9y 2 S : f (y) < f (x̄). Let y( ) = x̄ + (1
)y,
which is a convex combination of x̄ and y for 2 [0, 1] (and therefore, y( ) 2 S for 2 [0, 1]).
Note that y( ) ! x̄ as ! 1.
From the convexity of f ,
f (y( )) = f ( x̄ + (1
)y) f (x̄) + (1
)f (y) < f (x̄) + (1
for all 2 (0, 1). Therefore, f (y( )) < f (x̄) for all
resulting in a contradiction.
)f (x̄) = f (x̄)
2 (0, 1), and so x̄ is not a local minimizer,
Note:
• A problem of minimizing a convex function over a convex feasible region (such as we considered
in the theorem) is a convex programming problem.
• If f is strictly convex, a local minimizer is the unique global minimizer.
• If f is (strictly) concave, a local maximizer is a (unique) global maximizer.
The following results help us to determine when a function is convex.
Theorem 4.7 Suppose X ✓ Rn is a non-empty open convex set, and f : X ! R is di↵erentiable.
Then f is convex i↵ (“if and only if ”) it satisfies the gradient inequality:
f (y)
f (x) + rf (x)T (y
Proof: Suppose f is convex. Then, for any
f ( y + (1
)x) f (y) + (1
2 (0, 1],
)f (x) )
x) 8x, y 2 X.
f (x + (y
x))
f (x)
f (y)
f (x).
c Marina A. Epelman
IOE 519: NLP, Winter 2012
Letting
! 0, we obtain: rf (x)T (y
x) f (y)
17
f (x), establishing the “only if” part.
Now, suppose that the gradient inequality holds 8x, y 2 X. Let w and z be any two points in X.
Let 2 [0, 1], and set x = w + (1
)z. Then
f (x) + rf (x)T (w
f (w)
x) and f (z)
f (x) + rf (x)T (z
x).
Taking a convex combination of the above inequalities,
f (w) + (1
)f (z)
f (x) + rf (x)T ( (w
x) + (1
= f (x) + rf (x)T 0 = f ( w + (1
)(z
x))
)z),
so that f (x) is convex.
In one dimension, the gradient inequality has the form f (y)
f (x) + f 0 (x)(y
x) 8x, y 2 X.
The following theorem provides another necessary and sufficient condition, for the case when f is
twice continuously di↵erentiable.
Theorem 4.8 Suppose X is a non-empty open convex set, and f : X ! R is twice continuously
di↵erentiable. Then f is convex i↵ the Hessian of f , H(x), is positive semidefinite 8x 2 X.
Proof: Suppose f is convex. Let x̄ 2 X and d be any direction. Then for
x̄ + d 2 X. We have:
> 0 sufficiently small,
1
f (x̄ + d) = f (x̄) + rf (x̄)T ( d) + ( d)T H(x̄)( d) + k dk2 ↵x̄ ( d),
2
where ↵x̄ (y) ! 0 as y ! 0. Using the gradient inequality, we obtain
✓
◆
2 1 T
2
d H(x̄)d + kdk ↵x̄ ( d)
0.
2
Dividing by
2
> 0 and letting
! 0, we obtain dT H(x̄)d
0, proving the “only if” part.
Conversely, suppose that H(z) is positive semidefinite for all z 2 X. Let x, y 2 S be arbitrary.
Invoking the second-order version of the Taylor’s theorem, we have:
f (y) = f (x) + rf (x)T (y
1
x) + (y
2
x)T H(z)(y
x)
for some z which is a convex combination of x and y (and hence z 2 S). Since H(z) is positive
semidefinite, the gradient inequality holds, and hence f is convex.
In one dimension, the Hessian is the second derivative of the function, the positive semidefiniteness
condition can be stated as f 00 0 8x 2 X.
One can also show the following sufficient (but not necessary!) condition:
Theorem 4.9 Suppose X is a non-empty open convex set, and f : X ! R is twice continuously
di↵erentiable. Then f is strictly convex if the Hessian of f , H(x), is positive definite 8x 2 X.
For convex (unconstrainted) optimization problems, the optimality conditions of the previous subsection can be simplified significantly, providing a single necessary and sufficient condition for global
optimality:
c Marina A. Epelman
IOE 519: NLP, Winter 2012
18
Theorem 4.10 Suppose f : X ! R is convex and di↵erentiable on X. Then x̄ 2 X is a global
minimizer i↵ rf (x̄) = 0.
Proof: The necessity of the condition rf (x̄) = 0 was established regardless of convexity of the
function.
f (x̄) + rf (x̄)T (y
Suppose rf (x̄) = 0. Then, by gradient inequality, f (y)
y 2 X, and so x̄ is a global minimizer.
x̄) = f (x̄) for all
Example 5 Let
f (x) =
ln(1
x1
rf (x) =
✓
Then
and
0⇣
H(x) = @
1
1 x1 x2
⇣
⌘2
x2 )
1
1 x1 x2
1
1 x1 x2
⇣ ⌘2
+ x11
⌘2
⇣
1
1 x1 x2
ln x1
ln x2 .
1 ◆
x1 ,
1
x2
⇣
1
1 x1 x2
⌘2
1
1 x1 x2
⌘2
+
⇣
1
x2
1
⌘2 A .
It is actually easy to prove that f (x) is a strictly convex function, and hence that H(x) is positive
definite on its domain X = {(x1 , x2 ) : x1 > 0, x2 > 0, x1 + x2 < 1}. At x̄ = 13 , 13 we have
rf (x̄) = 0, and so x̄ is the unique global minimizer of f (x).
© Copyright 2026 Paperzz