Chapter 2. Differentiation in Euclidean Spaces §1. Euclidean

Chapter 2. Differentiation in Euclidean Spaces
§1. Euclidean Spaces
Euclidean k-space IRk is the Cartesian product of k copies of IR. It consists of all
k-tuples x = (x1 , . . . , xk ), where x1 , . . . , xk ∈ IR. An element of IRk may be viewed as a
vector.
We may define two algebraic operations on IRk : addition and scalar multiplication.
The addition of two vectors x = (x1 , . . . , xk ) and y = (y1 , . . . , yk ) in IRk is defined by
x + y := (x1 + y1 , . . . , xk + yk ).
The addition has the following properties:
1.
2.
3.
4.
x + y = y + x for all x, y ∈ IRk .
(x + y) + z = x + (y + z) for all x, y, z ∈ IRk .
The zero vector 0 = (0, . . . , 0) satisfies x + 0 = x for all x ∈ IRk .
For each x ∈ IRk , its negative −x := (−x1 , . . . , −xk ) satisfies x + (−x) = 0.
The scalar multiplication of a real number λ and a vector x = (x1 , . . . , xk ) ∈ IRk
is defined by
λx := (λx1 , . . . , λxk ).
The scalar multiplication has the following properties:
5.
6.
7.
8.
λ(x + y) = λx + λy for all x, y ∈ IRk and all λ ∈ IR.
(λ + µ)x = λx + µx for all x ∈ IRk and all λ, µ ∈ IR.
λ(µx) = (λµ)x for all x ∈ IRk and all λ, µ ∈ IR.
For each x ∈ IRk , 1x = x.
Thus, IRk together with addition and scalar multiplication becomes a vector space
(linear space). For i = 1, . . . , k, let ei be the vector (0, . . . , 0, 1, 0, . . . , 0), where 1 is in the
ith place and 0 elsewhere. Then each x = (x1 , . . . , xk ) ∈ IRk can be uniquely represented
as x = x1 e1 + · · · + xk ek . Thus, {e1 , . . . , ek } is a basis for IRk . We call ei the ith coordinate
unit vector.
The inner product of two vectors x = (x1 , . . . , xk ) and y = (y1 , . . . , yk ) is defined by
hx, yi := x1 y1 + · · · + xk yk .
It has the following properties:
(a) hx, xi ≥ 0 for each x ∈ IRk , and hx, xi = 0 if and only of x = 0.
1
(b) hx, yi = hy, xi for all x, y ∈ IRk .
(c) hλx + µy, zi = λhx, zi + µhy, zi for all x, y, z ∈ IRk and λ, µ ∈ IR.
The norm of a vector x in IRk is defined to be
kxk :=
p
hx, xi.
Evidently, kxk ≥ 0 for each x ∈ IRk , and kxk = 0 if and only if x = 0. Moreover,
kλxk = |λ|kxk for all x ∈ IRk and λ ∈ IR.
Two vectors x and y are said to be orthogonal if hx, yi = 0. If x and y are orthogonal,
then we have Pythagoras’s theorem:
kx + yk2 = kxk2 + kyk2 .
Suppose that x and y are two vectors in IRk and x 6= 0. Then there exists a real
number t such that
hy − tx, xi = 0.
Indeed, the above relation is valid if and only if t = hx, yi/hx, xi. The vector tx is called
the projection of y onto x. Since y = tx + (y − tx), and since the vectors y − tx and tx
are orthogonal, we have kyk2 = ktxk2 + ky − txk2 . It follows that ktxk ≤ kyk. Thus we
have proved the Schwarz inequality:
|hx, yi| ≤ kxk kyk.
Suppose that θ is the angle between two nonzero vectors x and y. Then the preceding
discussion tells us that
hx, yi
tkxk
=
.
cos θ =
kyk
kxkkyk
It follows from the Schwarz inequality that
kx + yk2 = kxk2 + 2hx, yi + kyk2 ≤ kxk2 + 2kxk kyk + kyk2 = (kxk + kyk)2 .
This establishes the triangle inequality:
kx + yk ≤ kxk + kyk.
For x, y ∈ IRk , define ρ(x, y) := kx − yk. It is easily seen that ρ is a metric on IRk .
Thus, all the results in Chapter 1 on metric spaces are valid for Euclidean spaces.
2
§2. Differentiable Functions
Suppose that f is a function from a nonempty open set U in IRk to IR. For a point
a ∈ U and a nonzero vector u ∈ IRk , the directional derivative of f at a with respect
to u is
f (a + hu) − f (a)
Du f (a) := lim
,
h→0
h
provided this limit exists. In particular, if u = ei is the ith coordinate unit vector, then
we write Di f for Dei f and call it the ith partial derivative of f . The gradient of f at
a, denoted ∇f (a), is the vector
∇f (a) = D1 f (a), . . . , Dk f (a) .
Example 1. Let f be the function on IR2 given by f (0, 0) := 0 and
f (x1 , x2 ) :=
x21 x32
,
x41 + x62
(x1 , x2 ) ∈ IR2 \ {(0, 0)}.
For (x1 , x2 ) ∈ IR2 \ {(0, 0)} we have
D1 f (x1 , x2 ) =
2x1 x32 (x41 + x62 ) − x21 x32 (4x31 )
2x1 x32 (x62 − x41 )
=
(x41 + x62 )2
(x41 + x62 )2
D2 f (x1 , x2 ) =
3x21 x22 (x41 + x62 ) − x21 x32 (6x52 )
3x21 x22 (x41 − x62 )
=
.
(x41 + x62 )2
(x41 + x62 )2
and
Let us consider partial derivatives of f at (0, 0). Suppose that u = (u1 , u2 ) is a vector
in IR2 with u1 6= 0. Then
h5 u21 u32
u21 u32
u32
f (hu1 , hu2 ) − f (0, 0)
= lim
= lim 4
= 2.
Du f (0, 0) = lim
6 6
h→0 h(h4 u4
h→0 u1 + h2 u6
h→0
h
u1
1 + h u1 )
2
In particular, D1 f (0, 0) = 0. If u = (0, u2 ), then Du f (0, 0) = 0 and so D2 f (0, 0) = 0.
However, the function f is not continuous at (0, 0). Indeed, we have
f (x1 , 0) = 0
2/3
and f (x1 , x1 ) =
x21 x21
1
= ,
4
4
x1 + x1
2
2/3
x1 6= 0.
Consequently, limx1 →0 f (x1 , 0) = 0 and limx1 →0 f (x1 , x1 ) = 1/2. This shows that f is
discontinuous at (0, 0).
Suppose that f is a real-valued function on an open set U in IRk and a is a point in
U . The function f is said to be differentiable at a if there is a vector b ∈ IRk such that
f (x) − f (a) − hb, x − ai
= 0.
x→a
kx − ak
lim
3
Theorem 2.1. Let f be a real-valued function defined on an open set U in IRk and let a
be a point in U . If f is differentiable at a, then all the partial derivatives of f at a exist
and ∇f (a) = b. Moreover, for any vector u ∈ IRk , Du f (a) = h∇f (a), ui. Furthermore, f
is continuous at a.
Proof. Let b = (b1 , . . . , bk ) be a nonzero vector in IRk . Since f is differentiable at a, for
a nonzero vector u ∈ IRk we have
f (a + hu) − f (a) − hb, hui
= 0.
h→0
hkuk
lim
It follows that
f (a + hu) − f (a)
= hb, ui.
h→0
h
lim
Hence, Du f (a) = hb, ui. In particular, if u = ei is the ith coordinate unit vector, we obtain
Di f (a) = hb, ei i = bi . Thus, all the partial derivatives of f at a exist and
b = (b1 , . . . , bk ) = D1 f (a), . . . , Dk f (a) = ∇f (a).
Consequently, Du f (a) = h∇f (a), ui. Furthermore, since f is differentiable at a, for given
ε > 0 we can find δ > 0 such that Bδ (a) ⊂ U and
|f (x) − f (a) − hb, x − ai| ≤ εkx − ak whenever kx − ak < δ.
Therefore, kx − ak < δ implies
|f (x)−f (a)| ≤ |f (x)−f (a)−hb, x−ai|+|hb, x−ai| ≤ εkx−ak+kbkkx−ak = (ε+kbk)kx−ak.
This shows that limx→a |f (x) − f (a)| = 0. In other words, f is continuous at a.
Theorem 2.2. Let f be a real-valued function defined on an open set U in IRk and let
a be a point in U . Suppose that all the partial derivatives D1 f, . . . , Dk f exist at every
point of U . Moreover, suppose that each Dj f is continuous at a for j = 1, . . . , k. Then f
is differentiable at a.
Proof. Since U is open, there exists some r > 0 such that Br (a) ⊂ U . For a = (a1 , . . . , ak )
and x = (x1 , . . . , xk ) in Br (a) we have
x−a=
k
X
(xi − ai )ei ,
i=1
4
where ei is the ith coordinate unit vector. Let u0 be the zero vector in IRk and, for
Pj
j = 1, . . . , k, let uj := i=1 (xi − ai )ei . Then we have
k
X
f (x) − f (a) = f (a + uk ) − f (a + u0 ) =
f (a + uj ) − f (a + uj−1 ) .
j=1
Applying the mean value theorem to the term f (a + uj ) − f (a + uj−1 ), we obtain
f (a+uj )−f (a+uj−1 ) = f a+uj−1 +(xj −aj )ej −f (a+uj−1 ) = Dj f (a+uj−1 +ξj ej )(xj −aj ),
where ξj is a real number between 0 and xj − aj . Thus we deduce that
f (x) − f (a) − h∇f (a), x − ai =
k
X
Dj f (a + uj−1 + ξj ej ) − Dj f (a) (xj − aj ).
j=1
By our assumption, for each j = 1, . . . , k, Dj f is continuous at a. Hence, for given ε > 0,
there exists δ ∈ (0, r) such that
ky − ak < δ
implies
y ∈ U and |Dj f (y) − Dj f (a)| < ε.
Suppose that kx − ak < δ. Then kuj−1 + ξj ej k < δ for j = 1, . . . , k. Hence,
Dj f (a + uj−1 + ξj ej ) − Dj f (a) < ε.
Consequently,
k
X
Dj f (a + uj−1 + ξj ej ) − Dj f (a)|xj − aj | ≤ kεkx − ak.
|f (x) − f (a) − h∇f (a), x − ai| ≤
j=1
This shows that f is differentiable at a.
Example 2. Let f be the function on IR2 given by
2
2
f (x1 , x2 ) := e−x1 −x2 ,
(x1 , x2 ) ∈ IR2 .
We have
2
2
D1 f (x1 , x2 ) = −2x1 e−x1 −x2
2
2
and D2 f (x1 , x2 ) = −2x2 e−x1 −x2 ,
(x1 , x2 ) ∈ IR2 .
Evidently, D1 f and D2 f are continuous on IR2 . By Theorem 2.2 we conclude that f is
differentiable at every point of IR2 .
Let f be a real-valued function defined on an open set U in IRk . If all the partial
derivatives Dj f (j = 1, . . . , k) are continuous on U , then f is said to be continuously
differentiable on U . The collection of all continuously differentiable, real-valued functions
on U will be denoted C 1 (U ).
In the following theorem, we discuss the sum, product, and quotient of differentiable
functions and give formulas for their gradients.
5
Theorem 2.3. Let f and g be real-valued functions on an open set U in IRk and let a
be a point in U . Suppose that f and g are differentiable at a. Then f + g and f g are
differentiable at a and
∇(f + g)(a) = ∇f (a) + ∇g(a) and ∇(f g)(a) = (∇f )(a)g(a) + f (a)(∇g)(a).
If, in addition, g(a) 6= 0, then
f (∇f )(a)g(a) − f (a)∇g(a)
∇
(a) =
.
g
[g(a)]2
Proof. Since f and g are differentiable at a, for given ε > 0, there exists δ > 0 such that
kx − ak < δ implies x ∈ U ,
|f (x) − f (a) − h∇f (a), x − ai| ≤ εkx − ak and |g(x) − g(a) − h∇g(a), x − ai| ≤ εkx − ak.
It follows that
|(f + g)(x) − (f + g)(a) − h∇f (a) + ∇g(a), x − ai| ≤ 2εkx − ak.
This shows that f + g is differentiable at a and ∇(f + g)(a) = ∇f (a) + ∇g(a).
Let us consider the product f g. We have
(f g)(x) − (f g)(a) = [f (x) − f (a)]g(a) + f (a)[g(x) − g(a)] + [f (x) − f (a)][g(x) − g(a)].
There exists a constant M such that |f (x)−f (a)| ≤ M kx−ak and |g(x)−g(a)| ≤ M kx−ak
whenever kx − ak < δ. We may choose δ such that 0 < δ < ε/M 2 . Hence, kx − ak < δ
implies
[f (x) − f (a)][g(x) − g(a)] ≤ M 2 kx − ak2 ≤ εkx − ak.
Consequently,
(f g)(x) − (f g)(a) − h(∇f )(a)g(a) + f (a)(∇g)(a), x − ai ≤ [|f (a)| + |g(a)| + 1]εkx − ak.
This shows that f g is differentiable at a and ∇(f g)(a) = (∇f )(a)g(a) + f (a)(∇g)(a).
Finally, let us deal with the quotient f /g. By our assumption, g(a) 6= 0. Since g is
continuous at a, we may choose δ > 0 sufficiently small such that kx − ak < δ implies
|g(x)| ≥ |g(a)|/2. We have
f f [f (x) − f (a)]g(a) − f (a)[g(x) − g(a)]
(x) −
(a) =
.
g
g
g(x)g(a)
An argument similar to the above proof for the product f g shows that f /g is differentiable
at a and the desired formula for ∇(f /g) is valid.
6
§3. Higher-Order Partial Derivatives
Let f be a real-valued function on an open set U in IR2 . Suppose that the first-order
partial derivatives D1 f and D2 f exist on U . As functions, D1 f and D2 f may have partial
derivatives in their own right. These second-order partial derivatives are denoted by
D11 f := D1 (D1 f ),
D12 f := D1 (D2 f ),
D21 f := D2 (D1 f ),
D22 f := D2 (D2 f ).
Example 1. Let f be the function on IR2 given by
f (x1 , x2 ) := ex1 sin(x1 − x2 ),
(x1 , x2 ) ∈ IR2 .
We have
D1 f (x1 , x2 ) = ex1 sin(x1 − x2 ) + ex1 cos(x1 − x2 )
and D2 f (x1 , x2 ) = −ex1 cos(x1 − x2 ).
It follows that
D11 f = D1 (D1 f ) = 2ex1 cos(x1 − x2 )
D12 f = D1 (D2 f ) = −ex1 cos(x1 − x2 ) + ex1 sin(x1 − x2 )
D21 f = D2 (D1 f ) = −ex1 cos(x1 − x2 ) + ex1 sin(x1 − x2 )
D22 f = D2 (D2 f ) = −ex1 sin(x1 − x2 ).
In this example we have D12 f = D21 f .
Example 2. Let f be the function on IR2 given by f (0, 0) := 0 and
f (x1 , x2 ) := x1 x2
x21 − x22
,
x21 + x22
(x1 , x2 ) ∈ IR2 \ {(0, 0)}.
It is easily seen that D1 f (0, 0) = D2 f (0, 0) = 0. For (x1 , x2 ) ∈ IR2 \ {(0, 0)}, we have
2
4x21 x22
x1 − x22
,
D1 f (x1 , x2 ) = x2 2
+ 2
x1 + x22
(x1 + x22 )2
2
x1 − x22
4x21 x22
D2 f (x1 , x2 ) = x1 2
− 2
.
x1 + x22
(x1 + x22 )2
It follows that
D2 f (x1 , 0) − D2 f (0, 0)
= 1,
x1 →0
x1
D1 f (0, x2 ) − D1 f (0, 0)
D21 f (0, 0) = lim
= −1.
x2 →0
x2
D12 f (0, 0) = lim
Consequently, D12 f (0, 0) 6= D21 f (0, 0).
7
Theorem 3.1. Let f be a real-valued function on an open set U in IR2 . Suppose that D1 f ,
D2 f , D12 f and D21 f exist in U . If D12 f and D21 f are continuous at a point a = (a1 , a2 )
in U , then D12 f (a) = D21 f (a).
Proof. Since a = (a1 , a2 ) ∈ U , there exists some δ > 0 such that Bδ (a) ⊂ U . Suppose
that 0 < |h1 | < δ/2 and 0 < |h2 | < δ/2. Consider the expression
f (a1 + h1 , a2 + h2 ) − f (a1 + h1 , a2 ) − f (a1 , a2 + h2 ) + f (a1 , a2 )
.
h1 h2
It can be rewritten as
φ(a1 + h1 ) − φ(a1 )
w=
,
h1
where φ(x1 ) := [f (x1 , a2 + h2 ) − f (x1 , a2 )]/h2 for |x1 − a1 | < δ/2. Clearly,
w :=
D1 f (x1 , a2 + h2 ) − D1 f (x1 , a2 )
, a1 − δ/2 < x1 < a1 + δ/2.
h2
By the mean value theorem we have
φ(a1 + h1 ) − φ(a1 )
= φ0 (a1 + ξ1 h1 ),
w=
h1
where 0 < ξ1 < 1. It follows that
D1 f (a1 + ξ1 h1 , a2 + h2 ) − D1 f (a1 + ξ1 h1 , a2 )
w = φ0 (a1 + ξ1 h1 ) =
.
h2
Applying the mean value theorem again to the above quotient, we get
φ0 (x1 ) =
w = D2 (D1 f )(a1 + ξ1 h1 , a2 + ξ2 h2 ),
where 0 < ξ2 < 1. For the same reason, there exist η1 , η2 ∈ (0, 1) such that
w = D1 (D2 f )(a1 + η1 h1 , a2 + η2 h2 ).
Consequently,
D21 f (a1 + ξ1 h1 , a2 + ξ2 h2 ) = D12 f (a1 + η1 h1 , a2 + η2 h2 ).
Letting h1 → 0 and h2 → 0 in the above equality, we obtain D21 f (a) = D12 f (a).
More generally, suppose that f is a real-valued function defined on an open set U in
IR . For j1 , j2 , . . . , jm ∈ {1, . . . , k}, we define
k
Dj1 j2 ···jm f := Dj1 (Dj2 ···jm f ).
Each of the partial derivatives Dj1 j2 ···jm f is called an mth-order partial derivative of f . If
all the mth order partial derivatives of f exist and are continuous on U , then f is said to be
m-times continuously differentiable on U . The collection of all m-times continuously
differentiable functions on U is denoted by C m (U ).
The following extension of Theorem 3.1 can be proved in an analogous way.
8
Theorem 3.2. Let f be an m-times continuously differentiable functions on an open set
U in IRk . If j1 , j2 , . . . , jm ∈ {1, . . . , k}, and if σ is a permutation of {1, 2, . . . , m}, then
Dj1 j2 ···jm f (a) = Djσ(1) jσ(2) ···jσ(m) f (a) for all a ∈ U.
Note that a permutation of a set X is a one-to-one mapping from X onto X.
§4. Taylor’s Theorem
In this section we extend the mean value theorem and Taylor theorem for functions
of a single variable to functions of several variables.
Let x and y be two distinct points in IRk . The closed line segment joining x and y is
described as the set
[x, y] := {(1 − t)x + ty : 0 ≤ t ≤ 1}.
The open line segment joining x and y is described as the set
(x, y) := {(1 − t)x + ty : 0 < t < 1}.
Theorem 4.1. Let f be a real-valued, differentiable function on an open set U in IRk .
Let x and y be two distinct points in U such that the closed line segment [x, y] is contained
in U . Then there exists a point z in the open line segment (x, y) such that
f (y) − f (x) = h∇f (z), y − xi.
Proof. Let g(t) := f ((1 − t)x + ty), 0 ≤ t ≤ 1. Then g is a continuous function on [0, 1].
We claim that g is differentiable on (0, 1). Suppose that 0 < t0 < 1. For 0 < t < 1 we have
g(t) − g(t0 ) = f ((1 − t)x + ty) − f ((1 − t0 )x + t0 y).
Note that [(1 − t)x + ty] − [(1 − t0 )x + t0 y] = (t − t0 )(y − x). Since f is differentiable on
U , we have
lim
t→t0
f ((1 − t)x + ty) − f ((1 − t0 )x + t0 y) − h∇f (z0 ), (t − t0 )(y − x)i
= 0,
t − t0
where z0 := (1 − t0 )x + t0 y. Consequently,
g 0 (t0 ) = lim
t→t0
g(t) − g(t0 )
= h∇f (z0 ), y − xi.
t − t0
9
By the mean value theorem, there exists some c ∈ (0, 1) such that
g(1) − g(0) = g 0 (c)(1 − 0).
It follows that
f (y) − f (x) = h∇f (z), y − xi,
where z := (1 − c)x + cy lies in the open line segment (x, y).
Theorem 4.2. Let f be a real-valued, (n + 1)-times continuously differentiable function
on an open set U in IR2 . Let a = (a1 , a2 ) and x = (x1 , x2 ) be two distinct points in U
such that the closed line segment [a, x] is contained in U . Then there exists a point z in
the open line segment (a, x) such that f (x) = Tn (x) + Rn (x), where
j
n X
(x1 − a1 )D1 + (x2 − a2 )D2 f (a)
Tn (x) =
j!
j=0
and
n+1
(x1 − a1 )D1 + (x2 − a2 )D2
f (z)
.
Rn (x) =
(n + 1)!
Proof. Since the closed line segment [a, x] is contained in U , there exists some δ > 0 such
that a + t(x − a) ∈ U for all t ∈ (−δ, 1 + δ). Let
g(t) := f (a + t(x − a)),
−δ < t < 1 + δ.
In light of the proof of Theorem 4.1, we see that g is differentiable on (−δ, 1 + δ) and
g 0 (t) = h∇f (a + t(x − a)), x − ai,
−δ < t < 1 + δ.
It follows that
g 0 (t) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a + t(x − a)).
Since f is (n + 1)-times continuously differentiable on U , by using an induction argument
we derive that
j
g (j) (t) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a + t(x − a)),
Applying Taylor’s theorem to the function g, we obtain
g(1) =
n
X
g (j) (0)
j=0
j!
10
+
g (n+1) (c)
,
(n + 1)!
j = 0, 1, . . . , n + 1.
j
where 0 < c < 1. But g(1) = f (x), g (j) (0) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a), and
n+1
g (n+1) (c) = (x1 − a1 )D1 + (x2 − a2 )D2
f (z),
where z := a + c(x − a) ∈ (a, x). This completes the proof.
In the above theorem, Tn is a polynomial of (total) degree at most n. It is called the
Taylor polynomial of f of order n at a. For n = 2, the Taylor polynomial T2 has the
following form:
T2 (x1 , x2 ) = f (a1 , a2 ) + D1 f (a1 , a2 )(x1 − a1 ) + D2 f (a1 , a2 )(x2 − a2 )
D11 f (a1 , a2 )
D22 f (a1 , a2 )
+
(x1 − a1 )2 + D12 f (a1 , a2 )(x1 − a1 )(x2 − a2 ) +
(x2 − a2 )2 .
2
2
Example. Let f be the function on IR2 given by
f (x1 , x2 ) := ex1 sin(x1 − x2 ),
(x1 , x2 ) ∈ IR2 .
Let us find the Taylor polynomial of f of order 2 at (0, 0). We have f (0, 0) = 0. Moreover,
using D1 f , D2 f , D11 f , D12 f , and D22 f calculated in Example 1 of Section 3, we obtain
D1 f (0, 0) = 1,
D2 f (0, 0) = −1,
D11 f (0, 0) = 2,
D12 f (0, 0) = −1,
D22 f (0, 0) = 0.
Hence,
T2 (x1 , x2 ) = x1 − x2 + x21 − x1 x2 .
The argument in Theorem 4.2 can be extended to give a proof of the following Taylor
theorem for functions on IRk .
Theorem 4.3. Let f be a real-valued, (n + 1)-times continuously differentiable function
on an open set U in IRk . Let a = (a1 , a2 , . . . , ak ) and x = (x1 , x2 , . . . , xk ) be two distinct
points in U such that the closed line segment [a, x] is contained in U . Then there exists a
point z in the open line segment (a, x) such that f (x) = Tn (x) + Rn (x), where
n
X
j
1
(x1 − a1 )D1 + (x2 − a2 )D2 + · · · + (xk − ak )Dk f (a)
Tn (x) =
j!
j=0
and
Rn (x) =
n+1
1
(x1 − a1 )D1 + (x2 − a2 )D2 + · · · + (xk − ak )Dk
f (z).
(n + 1)!
11
§5. Maxima and Minima of Differentiable Functions
In this section we employ the theory of differentiation developed so far to study maxima and minima of differentiable functions.
Let f be a real-valued function defined on a subset U of IRk and let a be a point in U .
We say that f has a global minimum (maximum) at a if f (x) ≥ f (a) (f (x) ≤ f (a)) for
all x ∈ U . Suppose that a is an interior point of E. We say that f has a local minimum
(maximum) at a if there exists some r > 0 such that Br (a) ⊂ U and f (x) ≥ f (a)
(f (x) ≤ f (a)) for all x ∈ Br (a). We say that f has a strict local minimum (maximum)
at a if there exists some r > 0 such that Br (a) ⊂ U and f (x) > f (a) (f (x) < f (a)) for all
x ∈ Br (a).
We first give a necessary condition for local maxima or minima.
Theorem 5.1. Let f be a real-valued function defined on an open set U in IRk and let
a be a point in U . If f has a local minimum or maximum at a, and if all the first-order
partial derivatives D1 f, . . . , Dk f exist at a, then Dj f (a) = 0 for j = 1, . . . , k.
Proof. For j = 1, . . . , k, recall that ej is the jth coordinate unit vector. There exists some
δ > 0 such that a + tej ∈ U for all t ∈ (−δ, δ). Let gj (t) := f (a + tej ), t ∈ (−δ, δ). Since
Dj f exists at a, we have
gj (t) − gj (0)
f (a + tej ) − f (a)
= lim
= Dj f (a).
t→0
t→0
t
t
gj0 (0) = lim
By our assumption, f has a local minimum or maximum at a. Consequently, gj has a
local minimum or maximum at 0. Hence gj0 (0) = 0. This shows that Dj f (a) = 0 for
j = 1, . . . , k.
Let f be a real-valued function defined on an open set U in IRk . A point a ∈ U is
called a critical point of f if D1 f, . . . , Dk f exist at a and ∇f (a) = 0.
Example 5.1. Let f be the quadratic polynomial given by
f (x1 , x2 ) = px21 + 2qx1 x2 + rx22 ,
(x1 , x2 ) ∈ IR2 ,
where p, q, r are real numbers such that
p q
= pr − q 2 6= 0.
∆ := q r
We have
D1 f (x1 , x2 ) = 2px1 + 2qx2
and D2 f (x1 , x2 ) = 2qx1 + 2rx2 .
12
Since pr − q 2 6= 0, D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0 imply x1 = 0 and x2 = 0. Thus,
(0, 0) is the only critical point of f . Suppose that ∆ = pr − q 2 > 0. Then we have p 6= 0
and
1
f (x1 , x2 ) = px21 + 2qx1 x2 + rx22 = (px1 + qx2 )2 + (pr − q 2 )x22 .
p
Consequently, if ∆ > 0 and p > 0, then f (x1 , x2 ) > 0 for all (x1 , x2 ) ∈ IR2 \ {(0, 0)}; hence
f has a strict global minimum at (0, 0). If ∆ > 0 and p < 0, then f (x1 , x2 ) < 0 for all
(x1 , x2 ) ∈ IR2 \ {(0, 0)}; hence f has a strict global maximum at (0, 0).
Now suppose that ∆ = pr − q 2 < 0. For v ∈ IR, we have
f (vx2 , x2 ) = (pv 2 + 2qv + r)x22 ,
x2 ∈ IR.
Since q 2 − pr > 0, the quadratic polynomial pv 2 + 2qv + r has two distinct real roots.
Hence, there exist s, t ∈ IR such that ps2 + 2qs + r > 0 and pt2 + 2qt + r < 0. It follows
that f (sx2 , x2 ) > 0 and f (tx2 , x2 ) < 0 for any x2 ∈ IR \ {0}. Thus, f has neither local
maximum nor local minimum at (0, 0).
Let f be a real-valued function defined on an open set U in IRk . A point a in U is
called a saddle point of f if it is a critical point of f and if for any r > 0, there exist
y, z ∈ Br (a) ∩ U such that f (y) < f (a) < f (z). Thus, in the above example, (0, 0) is a
saddle point of f , provided ∆ = pr − q 2 < 0.
The following theorem gives some sufficient conditions for a function defined on an
open subset of IR2 to have a local maximum or local minimum.
Theorem 5.2. Let f be a real-valued function defined on an open set U in IR2 with
continuous first and second partial derivatives. Suppose that a point a = (a1 , a2 ) in U is
a critical point of f . For y ∈ U , let
D11 f (y) D12 f (y) .
∆f (y) := D21 f (y) D22 f (y) Then the following statements are true.
(a) If ∆f (a) > 0 and D11 f (a) > 0, then f has a strict local minimum at a.
(b) If ∆f (a) > 0 and D11 f (a) < 0, then f has a strict local maximum at a.
(c) If ∆f (a) < 0, then a is a saddle point of f .
Proof. Since a is a critical point of f , we have D1 f (a) = 0 and D2 f (a) = 0. Let
x = (x1 , x2 ) ∈ U . By Taylor’s theorem, there exists a point z ∈ (a, x) such that
f (x) − f (a) =
1
D11 f (z)(x1 − a1 )2 + 2D12 f (z)(x1 − a1 )(x2 − a2 ) + D22 f (z)(x2 − a2 )2 .
2
13
Since D11 f , D12 f , and D22 f are continuous, ∆f (y) is a continuous function of y.
Suppose that ∆f (a) > 0 and D11 f (a) > 0. Then there exists some r > 0 such that
Br (a) ⊂ U and that ∆f (y) > 0 and D11 f (y) > 0 for all y ∈ Br (a). For x ∈ Br (a), we
have z ∈ (a, x) ⊂ U . Hence, ∆f (z) > 0 and D11 f (z) > 0. Consequently,
f (x) − f (a) =
1
D11 f (z)(x1 − a1 )2 + 2D12 f (z)(x1 − a1 )(x2 − a2 ) + D22 f (z)(x2 − a2 )2 > 0
2
for all x = (x1 , x2 ) ∈ Br (a) \ {a}. This shows that f has a strict local minimum at a.
Suppose that ∆f (a) > 0 and D11 f (a) < 0. A similar argument shows that there exists
some r > 0 such that Br (a) ⊂ U and f (x) − f (a) < 0 for all x = (x1 , x2 ) ∈ Br (a) \ {a}.
In other words, f has a strict local maximum at a.
Suppose that ∆f (a) < 0. Then there exists some r > 0 such that Br (a) ⊂ U and that
∆f (y) < 0 for all y ∈ Br (a). Consequently, for any δ ∈ (0, r), there exist y, z ∈ Bδ (a) \ {a}
such that f (y) − f (a) < 0 < f (z) − f (a). Therefore, a is a saddle point of f .
Example 5.2. Let us look for the global and local minimizers and maximizers (if any) of
the function
f (x1 , x2 ) := x31 − 12x1 x2 + 8x32 , (x1 , x2 ) ∈ IR2 .
In this case, we have f (x1 , 0) = x31 . Consequently,
lim f (x1 , 0) = +∞ and
x1 →+∞
lim
x1 →−∞
f (x1 , 0) = −∞.
Hence, f has neither global minimizers nor global maximizers.
In order to find local minimizers and maximizers, we calculate D1 f and D2 f :
D1 f (x1 , x2 ) = 3x21 − 12x2
and D2 f (x1 , x2 ) = −12x1 + 24x22 ,
(x1 , x2 ) ∈ IR2 .
Solving the system of equations D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0, we find the critical
points (2, 1) and (0, 0) of f . Moreover,
D11 f (x1 , x2 ) = 6x1 ,
It follows that
D12 f (x1 , x2 ) = −12,
6x
∆f (x1 , x2 ) = 1
−12
D22 f (x1 , x2 ) = 48x2 .
−12 = 288x1 x2 − 144.
48x2 At the point (2, 1) we have ∆f (2, 1) > 0 and D11 f (2, 1) > 0. Hence, f has a local minimum
at (2, 1) and f (2, 1) = −8. At the point (0, 0) we have ∆f (0, 0) < 0. Therefore, (0, 0) is a
saddle point of f .
14
In the rest of this section we consider coercive functions and global minimizers. A
continuous function f defined on IRk is said to be coercive if
lim f (x) = +∞.
kxk→∞
This means that for any real number M > 0 there exists some R such that
kxk ≥ R
implies
f (x) ≥ M.
Theorem 5.3. Let f be a continuous function defined on IRk . If f is coercive, then f
has at least one global minimizer. If, in addition, all the first-order partial derivatives of
f exist on IRk , then these global minimizers can be found among the critical points of f .
Proof. Since f is coercive, there exists some r > 0 such that kxk > r implies f (x) > f (0).
Let E be the set {x ∈ IRk : kxk ≤ r}. Then E is a bounded closed set. Since f is
continuous on E, there exists some point a ∈ E such that f (a) ≤ f (x) for all x ∈ E. In
particular, f (a) ≤ f (0). But f (x) > f (0) for x ∈ IRk \ E. Hence, f (a) ≤ f (x) for all
x ∈ IRk . This completes the proof of the first statement.
The second statement holds because global minimizers are critical points of f .
Example 5.3. Consider the function
f (x1 , x2 ) := x41 − 4x1 x2 + x42 ,
(x1 , x2 ) ∈ IR2 .
We first show that f is coercive. Indeed, for (x1 , x2 ) ∈ IR2 \ {(0, 0)} we have
4x1 x2
4
4
.
f (x1 , x2 ) = (x1 + x2 ) 1 − 4
x1 + x42
We observe that
2(x21 + x22 )
2(x21 + x22 )2
4(x41 + x42 )
4
|4x1 x2 |
≤
=
≤
= 2
.
4
4
4
4
4
4
2
2
4
4
2
2
x1 + x2
x1 + x2
(x1 + x2 )(x1 + x2 )
(x1 + x2 )(x1 + x2 )
(x1 + x22 )
This shows that limkxk→+∞ f (x) = +∞.
In order to find critical points of f , we calculate D1 f and D2 f :
D1 f (x1 , x2 ) = 4x31 − 4x2
and D2 f (x1 , x2 ) = −4x1 + 4x32 .
Solving the system of equations D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0, we find the critical
points (1, 1), (−1, −1) and (0, 0) of f . But f (1, 1) = −2, f (−1, −1) = −2 and f (0, 0) = 0.
Therefore, f has a global minimum at (1, 1) and (−1, −1).
15