Chapter 2. Differentiation in Euclidean Spaces §1. Euclidean Spaces Euclidean k-space IRk is the Cartesian product of k copies of IR. It consists of all k-tuples x = (x1 , . . . , xk ), where x1 , . . . , xk ∈ IR. An element of IRk may be viewed as a vector. We may define two algebraic operations on IRk : addition and scalar multiplication. The addition of two vectors x = (x1 , . . . , xk ) and y = (y1 , . . . , yk ) in IRk is defined by x + y := (x1 + y1 , . . . , xk + yk ). The addition has the following properties: 1. 2. 3. 4. x + y = y + x for all x, y ∈ IRk . (x + y) + z = x + (y + z) for all x, y, z ∈ IRk . The zero vector 0 = (0, . . . , 0) satisfies x + 0 = x for all x ∈ IRk . For each x ∈ IRk , its negative −x := (−x1 , . . . , −xk ) satisfies x + (−x) = 0. The scalar multiplication of a real number λ and a vector x = (x1 , . . . , xk ) ∈ IRk is defined by λx := (λx1 , . . . , λxk ). The scalar multiplication has the following properties: 5. 6. 7. 8. λ(x + y) = λx + λy for all x, y ∈ IRk and all λ ∈ IR. (λ + µ)x = λx + µx for all x ∈ IRk and all λ, µ ∈ IR. λ(µx) = (λµ)x for all x ∈ IRk and all λ, µ ∈ IR. For each x ∈ IRk , 1x = x. Thus, IRk together with addition and scalar multiplication becomes a vector space (linear space). For i = 1, . . . , k, let ei be the vector (0, . . . , 0, 1, 0, . . . , 0), where 1 is in the ith place and 0 elsewhere. Then each x = (x1 , . . . , xk ) ∈ IRk can be uniquely represented as x = x1 e1 + · · · + xk ek . Thus, {e1 , . . . , ek } is a basis for IRk . We call ei the ith coordinate unit vector. The inner product of two vectors x = (x1 , . . . , xk ) and y = (y1 , . . . , yk ) is defined by hx, yi := x1 y1 + · · · + xk yk . It has the following properties: (a) hx, xi ≥ 0 for each x ∈ IRk , and hx, xi = 0 if and only of x = 0. 1 (b) hx, yi = hy, xi for all x, y ∈ IRk . (c) hλx + µy, zi = λhx, zi + µhy, zi for all x, y, z ∈ IRk and λ, µ ∈ IR. The norm of a vector x in IRk is defined to be kxk := p hx, xi. Evidently, kxk ≥ 0 for each x ∈ IRk , and kxk = 0 if and only if x = 0. Moreover, kλxk = |λ|kxk for all x ∈ IRk and λ ∈ IR. Two vectors x and y are said to be orthogonal if hx, yi = 0. If x and y are orthogonal, then we have Pythagoras’s theorem: kx + yk2 = kxk2 + kyk2 . Suppose that x and y are two vectors in IRk and x 6= 0. Then there exists a real number t such that hy − tx, xi = 0. Indeed, the above relation is valid if and only if t = hx, yi/hx, xi. The vector tx is called the projection of y onto x. Since y = tx + (y − tx), and since the vectors y − tx and tx are orthogonal, we have kyk2 = ktxk2 + ky − txk2 . It follows that ktxk ≤ kyk. Thus we have proved the Schwarz inequality: |hx, yi| ≤ kxk kyk. Suppose that θ is the angle between two nonzero vectors x and y. Then the preceding discussion tells us that hx, yi tkxk = . cos θ = kyk kxkkyk It follows from the Schwarz inequality that kx + yk2 = kxk2 + 2hx, yi + kyk2 ≤ kxk2 + 2kxk kyk + kyk2 = (kxk + kyk)2 . This establishes the triangle inequality: kx + yk ≤ kxk + kyk. For x, y ∈ IRk , define ρ(x, y) := kx − yk. It is easily seen that ρ is a metric on IRk . Thus, all the results in Chapter 1 on metric spaces are valid for Euclidean spaces. 2 §2. Differentiable Functions Suppose that f is a function from a nonempty open set U in IRk to IR. For a point a ∈ U and a nonzero vector u ∈ IRk , the directional derivative of f at a with respect to u is f (a + hu) − f (a) Du f (a) := lim , h→0 h provided this limit exists. In particular, if u = ei is the ith coordinate unit vector, then we write Di f for Dei f and call it the ith partial derivative of f . The gradient of f at a, denoted ∇f (a), is the vector ∇f (a) = D1 f (a), . . . , Dk f (a) . Example 1. Let f be the function on IR2 given by f (0, 0) := 0 and f (x1 , x2 ) := x21 x32 , x41 + x62 (x1 , x2 ) ∈ IR2 \ {(0, 0)}. For (x1 , x2 ) ∈ IR2 \ {(0, 0)} we have D1 f (x1 , x2 ) = 2x1 x32 (x41 + x62 ) − x21 x32 (4x31 ) 2x1 x32 (x62 − x41 ) = (x41 + x62 )2 (x41 + x62 )2 D2 f (x1 , x2 ) = 3x21 x22 (x41 + x62 ) − x21 x32 (6x52 ) 3x21 x22 (x41 − x62 ) = . (x41 + x62 )2 (x41 + x62 )2 and Let us consider partial derivatives of f at (0, 0). Suppose that u = (u1 , u2 ) is a vector in IR2 with u1 6= 0. Then h5 u21 u32 u21 u32 u32 f (hu1 , hu2 ) − f (0, 0) = lim = lim 4 = 2. Du f (0, 0) = lim 6 6 h→0 h(h4 u4 h→0 u1 + h2 u6 h→0 h u1 1 + h u1 ) 2 In particular, D1 f (0, 0) = 0. If u = (0, u2 ), then Du f (0, 0) = 0 and so D2 f (0, 0) = 0. However, the function f is not continuous at (0, 0). Indeed, we have f (x1 , 0) = 0 2/3 and f (x1 , x1 ) = x21 x21 1 = , 4 4 x1 + x1 2 2/3 x1 6= 0. Consequently, limx1 →0 f (x1 , 0) = 0 and limx1 →0 f (x1 , x1 ) = 1/2. This shows that f is discontinuous at (0, 0). Suppose that f is a real-valued function on an open set U in IRk and a is a point in U . The function f is said to be differentiable at a if there is a vector b ∈ IRk such that f (x) − f (a) − hb, x − ai = 0. x→a kx − ak lim 3 Theorem 2.1. Let f be a real-valued function defined on an open set U in IRk and let a be a point in U . If f is differentiable at a, then all the partial derivatives of f at a exist and ∇f (a) = b. Moreover, for any vector u ∈ IRk , Du f (a) = h∇f (a), ui. Furthermore, f is continuous at a. Proof. Let b = (b1 , . . . , bk ) be a nonzero vector in IRk . Since f is differentiable at a, for a nonzero vector u ∈ IRk we have f (a + hu) − f (a) − hb, hui = 0. h→0 hkuk lim It follows that f (a + hu) − f (a) = hb, ui. h→0 h lim Hence, Du f (a) = hb, ui. In particular, if u = ei is the ith coordinate unit vector, we obtain Di f (a) = hb, ei i = bi . Thus, all the partial derivatives of f at a exist and b = (b1 , . . . , bk ) = D1 f (a), . . . , Dk f (a) = ∇f (a). Consequently, Du f (a) = h∇f (a), ui. Furthermore, since f is differentiable at a, for given ε > 0 we can find δ > 0 such that Bδ (a) ⊂ U and |f (x) − f (a) − hb, x − ai| ≤ εkx − ak whenever kx − ak < δ. Therefore, kx − ak < δ implies |f (x)−f (a)| ≤ |f (x)−f (a)−hb, x−ai|+|hb, x−ai| ≤ εkx−ak+kbkkx−ak = (ε+kbk)kx−ak. This shows that limx→a |f (x) − f (a)| = 0. In other words, f is continuous at a. Theorem 2.2. Let f be a real-valued function defined on an open set U in IRk and let a be a point in U . Suppose that all the partial derivatives D1 f, . . . , Dk f exist at every point of U . Moreover, suppose that each Dj f is continuous at a for j = 1, . . . , k. Then f is differentiable at a. Proof. Since U is open, there exists some r > 0 such that Br (a) ⊂ U . For a = (a1 , . . . , ak ) and x = (x1 , . . . , xk ) in Br (a) we have x−a= k X (xi − ai )ei , i=1 4 where ei is the ith coordinate unit vector. Let u0 be the zero vector in IRk and, for Pj j = 1, . . . , k, let uj := i=1 (xi − ai )ei . Then we have k X f (x) − f (a) = f (a + uk ) − f (a + u0 ) = f (a + uj ) − f (a + uj−1 ) . j=1 Applying the mean value theorem to the term f (a + uj ) − f (a + uj−1 ), we obtain f (a+uj )−f (a+uj−1 ) = f a+uj−1 +(xj −aj )ej −f (a+uj−1 ) = Dj f (a+uj−1 +ξj ej )(xj −aj ), where ξj is a real number between 0 and xj − aj . Thus we deduce that f (x) − f (a) − h∇f (a), x − ai = k X Dj f (a + uj−1 + ξj ej ) − Dj f (a) (xj − aj ). j=1 By our assumption, for each j = 1, . . . , k, Dj f is continuous at a. Hence, for given ε > 0, there exists δ ∈ (0, r) such that ky − ak < δ implies y ∈ U and |Dj f (y) − Dj f (a)| < ε. Suppose that kx − ak < δ. Then kuj−1 + ξj ej k < δ for j = 1, . . . , k. Hence, Dj f (a + uj−1 + ξj ej ) − Dj f (a) < ε. Consequently, k X Dj f (a + uj−1 + ξj ej ) − Dj f (a)|xj − aj | ≤ kεkx − ak. |f (x) − f (a) − h∇f (a), x − ai| ≤ j=1 This shows that f is differentiable at a. Example 2. Let f be the function on IR2 given by 2 2 f (x1 , x2 ) := e−x1 −x2 , (x1 , x2 ) ∈ IR2 . We have 2 2 D1 f (x1 , x2 ) = −2x1 e−x1 −x2 2 2 and D2 f (x1 , x2 ) = −2x2 e−x1 −x2 , (x1 , x2 ) ∈ IR2 . Evidently, D1 f and D2 f are continuous on IR2 . By Theorem 2.2 we conclude that f is differentiable at every point of IR2 . Let f be a real-valued function defined on an open set U in IRk . If all the partial derivatives Dj f (j = 1, . . . , k) are continuous on U , then f is said to be continuously differentiable on U . The collection of all continuously differentiable, real-valued functions on U will be denoted C 1 (U ). In the following theorem, we discuss the sum, product, and quotient of differentiable functions and give formulas for their gradients. 5 Theorem 2.3. Let f and g be real-valued functions on an open set U in IRk and let a be a point in U . Suppose that f and g are differentiable at a. Then f + g and f g are differentiable at a and ∇(f + g)(a) = ∇f (a) + ∇g(a) and ∇(f g)(a) = (∇f )(a)g(a) + f (a)(∇g)(a). If, in addition, g(a) 6= 0, then f (∇f )(a)g(a) − f (a)∇g(a) ∇ (a) = . g [g(a)]2 Proof. Since f and g are differentiable at a, for given ε > 0, there exists δ > 0 such that kx − ak < δ implies x ∈ U , |f (x) − f (a) − h∇f (a), x − ai| ≤ εkx − ak and |g(x) − g(a) − h∇g(a), x − ai| ≤ εkx − ak. It follows that |(f + g)(x) − (f + g)(a) − h∇f (a) + ∇g(a), x − ai| ≤ 2εkx − ak. This shows that f + g is differentiable at a and ∇(f + g)(a) = ∇f (a) + ∇g(a). Let us consider the product f g. We have (f g)(x) − (f g)(a) = [f (x) − f (a)]g(a) + f (a)[g(x) − g(a)] + [f (x) − f (a)][g(x) − g(a)]. There exists a constant M such that |f (x)−f (a)| ≤ M kx−ak and |g(x)−g(a)| ≤ M kx−ak whenever kx − ak < δ. We may choose δ such that 0 < δ < ε/M 2 . Hence, kx − ak < δ implies [f (x) − f (a)][g(x) − g(a)] ≤ M 2 kx − ak2 ≤ εkx − ak. Consequently, (f g)(x) − (f g)(a) − h(∇f )(a)g(a) + f (a)(∇g)(a), x − ai ≤ [|f (a)| + |g(a)| + 1]εkx − ak. This shows that f g is differentiable at a and ∇(f g)(a) = (∇f )(a)g(a) + f (a)(∇g)(a). Finally, let us deal with the quotient f /g. By our assumption, g(a) 6= 0. Since g is continuous at a, we may choose δ > 0 sufficiently small such that kx − ak < δ implies |g(x)| ≥ |g(a)|/2. We have f f [f (x) − f (a)]g(a) − f (a)[g(x) − g(a)] (x) − (a) = . g g g(x)g(a) An argument similar to the above proof for the product f g shows that f /g is differentiable at a and the desired formula for ∇(f /g) is valid. 6 §3. Higher-Order Partial Derivatives Let f be a real-valued function on an open set U in IR2 . Suppose that the first-order partial derivatives D1 f and D2 f exist on U . As functions, D1 f and D2 f may have partial derivatives in their own right. These second-order partial derivatives are denoted by D11 f := D1 (D1 f ), D12 f := D1 (D2 f ), D21 f := D2 (D1 f ), D22 f := D2 (D2 f ). Example 1. Let f be the function on IR2 given by f (x1 , x2 ) := ex1 sin(x1 − x2 ), (x1 , x2 ) ∈ IR2 . We have D1 f (x1 , x2 ) = ex1 sin(x1 − x2 ) + ex1 cos(x1 − x2 ) and D2 f (x1 , x2 ) = −ex1 cos(x1 − x2 ). It follows that D11 f = D1 (D1 f ) = 2ex1 cos(x1 − x2 ) D12 f = D1 (D2 f ) = −ex1 cos(x1 − x2 ) + ex1 sin(x1 − x2 ) D21 f = D2 (D1 f ) = −ex1 cos(x1 − x2 ) + ex1 sin(x1 − x2 ) D22 f = D2 (D2 f ) = −ex1 sin(x1 − x2 ). In this example we have D12 f = D21 f . Example 2. Let f be the function on IR2 given by f (0, 0) := 0 and f (x1 , x2 ) := x1 x2 x21 − x22 , x21 + x22 (x1 , x2 ) ∈ IR2 \ {(0, 0)}. It is easily seen that D1 f (0, 0) = D2 f (0, 0) = 0. For (x1 , x2 ) ∈ IR2 \ {(0, 0)}, we have 2 4x21 x22 x1 − x22 , D1 f (x1 , x2 ) = x2 2 + 2 x1 + x22 (x1 + x22 )2 2 x1 − x22 4x21 x22 D2 f (x1 , x2 ) = x1 2 − 2 . x1 + x22 (x1 + x22 )2 It follows that D2 f (x1 , 0) − D2 f (0, 0) = 1, x1 →0 x1 D1 f (0, x2 ) − D1 f (0, 0) D21 f (0, 0) = lim = −1. x2 →0 x2 D12 f (0, 0) = lim Consequently, D12 f (0, 0) 6= D21 f (0, 0). 7 Theorem 3.1. Let f be a real-valued function on an open set U in IR2 . Suppose that D1 f , D2 f , D12 f and D21 f exist in U . If D12 f and D21 f are continuous at a point a = (a1 , a2 ) in U , then D12 f (a) = D21 f (a). Proof. Since a = (a1 , a2 ) ∈ U , there exists some δ > 0 such that Bδ (a) ⊂ U . Suppose that 0 < |h1 | < δ/2 and 0 < |h2 | < δ/2. Consider the expression f (a1 + h1 , a2 + h2 ) − f (a1 + h1 , a2 ) − f (a1 , a2 + h2 ) + f (a1 , a2 ) . h1 h2 It can be rewritten as φ(a1 + h1 ) − φ(a1 ) w= , h1 where φ(x1 ) := [f (x1 , a2 + h2 ) − f (x1 , a2 )]/h2 for |x1 − a1 | < δ/2. Clearly, w := D1 f (x1 , a2 + h2 ) − D1 f (x1 , a2 ) , a1 − δ/2 < x1 < a1 + δ/2. h2 By the mean value theorem we have φ(a1 + h1 ) − φ(a1 ) = φ0 (a1 + ξ1 h1 ), w= h1 where 0 < ξ1 < 1. It follows that D1 f (a1 + ξ1 h1 , a2 + h2 ) − D1 f (a1 + ξ1 h1 , a2 ) w = φ0 (a1 + ξ1 h1 ) = . h2 Applying the mean value theorem again to the above quotient, we get φ0 (x1 ) = w = D2 (D1 f )(a1 + ξ1 h1 , a2 + ξ2 h2 ), where 0 < ξ2 < 1. For the same reason, there exist η1 , η2 ∈ (0, 1) such that w = D1 (D2 f )(a1 + η1 h1 , a2 + η2 h2 ). Consequently, D21 f (a1 + ξ1 h1 , a2 + ξ2 h2 ) = D12 f (a1 + η1 h1 , a2 + η2 h2 ). Letting h1 → 0 and h2 → 0 in the above equality, we obtain D21 f (a) = D12 f (a). More generally, suppose that f is a real-valued function defined on an open set U in IR . For j1 , j2 , . . . , jm ∈ {1, . . . , k}, we define k Dj1 j2 ···jm f := Dj1 (Dj2 ···jm f ). Each of the partial derivatives Dj1 j2 ···jm f is called an mth-order partial derivative of f . If all the mth order partial derivatives of f exist and are continuous on U , then f is said to be m-times continuously differentiable on U . The collection of all m-times continuously differentiable functions on U is denoted by C m (U ). The following extension of Theorem 3.1 can be proved in an analogous way. 8 Theorem 3.2. Let f be an m-times continuously differentiable functions on an open set U in IRk . If j1 , j2 , . . . , jm ∈ {1, . . . , k}, and if σ is a permutation of {1, 2, . . . , m}, then Dj1 j2 ···jm f (a) = Djσ(1) jσ(2) ···jσ(m) f (a) for all a ∈ U. Note that a permutation of a set X is a one-to-one mapping from X onto X. §4. Taylor’s Theorem In this section we extend the mean value theorem and Taylor theorem for functions of a single variable to functions of several variables. Let x and y be two distinct points in IRk . The closed line segment joining x and y is described as the set [x, y] := {(1 − t)x + ty : 0 ≤ t ≤ 1}. The open line segment joining x and y is described as the set (x, y) := {(1 − t)x + ty : 0 < t < 1}. Theorem 4.1. Let f be a real-valued, differentiable function on an open set U in IRk . Let x and y be two distinct points in U such that the closed line segment [x, y] is contained in U . Then there exists a point z in the open line segment (x, y) such that f (y) − f (x) = h∇f (z), y − xi. Proof. Let g(t) := f ((1 − t)x + ty), 0 ≤ t ≤ 1. Then g is a continuous function on [0, 1]. We claim that g is differentiable on (0, 1). Suppose that 0 < t0 < 1. For 0 < t < 1 we have g(t) − g(t0 ) = f ((1 − t)x + ty) − f ((1 − t0 )x + t0 y). Note that [(1 − t)x + ty] − [(1 − t0 )x + t0 y] = (t − t0 )(y − x). Since f is differentiable on U , we have lim t→t0 f ((1 − t)x + ty) − f ((1 − t0 )x + t0 y) − h∇f (z0 ), (t − t0 )(y − x)i = 0, t − t0 where z0 := (1 − t0 )x + t0 y. Consequently, g 0 (t0 ) = lim t→t0 g(t) − g(t0 ) = h∇f (z0 ), y − xi. t − t0 9 By the mean value theorem, there exists some c ∈ (0, 1) such that g(1) − g(0) = g 0 (c)(1 − 0). It follows that f (y) − f (x) = h∇f (z), y − xi, where z := (1 − c)x + cy lies in the open line segment (x, y). Theorem 4.2. Let f be a real-valued, (n + 1)-times continuously differentiable function on an open set U in IR2 . Let a = (a1 , a2 ) and x = (x1 , x2 ) be two distinct points in U such that the closed line segment [a, x] is contained in U . Then there exists a point z in the open line segment (a, x) such that f (x) = Tn (x) + Rn (x), where j n X (x1 − a1 )D1 + (x2 − a2 )D2 f (a) Tn (x) = j! j=0 and n+1 (x1 − a1 )D1 + (x2 − a2 )D2 f (z) . Rn (x) = (n + 1)! Proof. Since the closed line segment [a, x] is contained in U , there exists some δ > 0 such that a + t(x − a) ∈ U for all t ∈ (−δ, 1 + δ). Let g(t) := f (a + t(x − a)), −δ < t < 1 + δ. In light of the proof of Theorem 4.1, we see that g is differentiable on (−δ, 1 + δ) and g 0 (t) = h∇f (a + t(x − a)), x − ai, −δ < t < 1 + δ. It follows that g 0 (t) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a + t(x − a)). Since f is (n + 1)-times continuously differentiable on U , by using an induction argument we derive that j g (j) (t) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a + t(x − a)), Applying Taylor’s theorem to the function g, we obtain g(1) = n X g (j) (0) j=0 j! 10 + g (n+1) (c) , (n + 1)! j = 0, 1, . . . , n + 1. j where 0 < c < 1. But g(1) = f (x), g (j) (0) = (x1 − a1 )D1 + (x2 − a2 )D2 f (a), and n+1 g (n+1) (c) = (x1 − a1 )D1 + (x2 − a2 )D2 f (z), where z := a + c(x − a) ∈ (a, x). This completes the proof. In the above theorem, Tn is a polynomial of (total) degree at most n. It is called the Taylor polynomial of f of order n at a. For n = 2, the Taylor polynomial T2 has the following form: T2 (x1 , x2 ) = f (a1 , a2 ) + D1 f (a1 , a2 )(x1 − a1 ) + D2 f (a1 , a2 )(x2 − a2 ) D11 f (a1 , a2 ) D22 f (a1 , a2 ) + (x1 − a1 )2 + D12 f (a1 , a2 )(x1 − a1 )(x2 − a2 ) + (x2 − a2 )2 . 2 2 Example. Let f be the function on IR2 given by f (x1 , x2 ) := ex1 sin(x1 − x2 ), (x1 , x2 ) ∈ IR2 . Let us find the Taylor polynomial of f of order 2 at (0, 0). We have f (0, 0) = 0. Moreover, using D1 f , D2 f , D11 f , D12 f , and D22 f calculated in Example 1 of Section 3, we obtain D1 f (0, 0) = 1, D2 f (0, 0) = −1, D11 f (0, 0) = 2, D12 f (0, 0) = −1, D22 f (0, 0) = 0. Hence, T2 (x1 , x2 ) = x1 − x2 + x21 − x1 x2 . The argument in Theorem 4.2 can be extended to give a proof of the following Taylor theorem for functions on IRk . Theorem 4.3. Let f be a real-valued, (n + 1)-times continuously differentiable function on an open set U in IRk . Let a = (a1 , a2 , . . . , ak ) and x = (x1 , x2 , . . . , xk ) be two distinct points in U such that the closed line segment [a, x] is contained in U . Then there exists a point z in the open line segment (a, x) such that f (x) = Tn (x) + Rn (x), where n X j 1 (x1 − a1 )D1 + (x2 − a2 )D2 + · · · + (xk − ak )Dk f (a) Tn (x) = j! j=0 and Rn (x) = n+1 1 (x1 − a1 )D1 + (x2 − a2 )D2 + · · · + (xk − ak )Dk f (z). (n + 1)! 11 §5. Maxima and Minima of Differentiable Functions In this section we employ the theory of differentiation developed so far to study maxima and minima of differentiable functions. Let f be a real-valued function defined on a subset U of IRk and let a be a point in U . We say that f has a global minimum (maximum) at a if f (x) ≥ f (a) (f (x) ≤ f (a)) for all x ∈ U . Suppose that a is an interior point of E. We say that f has a local minimum (maximum) at a if there exists some r > 0 such that Br (a) ⊂ U and f (x) ≥ f (a) (f (x) ≤ f (a)) for all x ∈ Br (a). We say that f has a strict local minimum (maximum) at a if there exists some r > 0 such that Br (a) ⊂ U and f (x) > f (a) (f (x) < f (a)) for all x ∈ Br (a). We first give a necessary condition for local maxima or minima. Theorem 5.1. Let f be a real-valued function defined on an open set U in IRk and let a be a point in U . If f has a local minimum or maximum at a, and if all the first-order partial derivatives D1 f, . . . , Dk f exist at a, then Dj f (a) = 0 for j = 1, . . . , k. Proof. For j = 1, . . . , k, recall that ej is the jth coordinate unit vector. There exists some δ > 0 such that a + tej ∈ U for all t ∈ (−δ, δ). Let gj (t) := f (a + tej ), t ∈ (−δ, δ). Since Dj f exists at a, we have gj (t) − gj (0) f (a + tej ) − f (a) = lim = Dj f (a). t→0 t→0 t t gj0 (0) = lim By our assumption, f has a local minimum or maximum at a. Consequently, gj has a local minimum or maximum at 0. Hence gj0 (0) = 0. This shows that Dj f (a) = 0 for j = 1, . . . , k. Let f be a real-valued function defined on an open set U in IRk . A point a ∈ U is called a critical point of f if D1 f, . . . , Dk f exist at a and ∇f (a) = 0. Example 5.1. Let f be the quadratic polynomial given by f (x1 , x2 ) = px21 + 2qx1 x2 + rx22 , (x1 , x2 ) ∈ IR2 , where p, q, r are real numbers such that p q = pr − q 2 6= 0. ∆ := q r We have D1 f (x1 , x2 ) = 2px1 + 2qx2 and D2 f (x1 , x2 ) = 2qx1 + 2rx2 . 12 Since pr − q 2 6= 0, D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0 imply x1 = 0 and x2 = 0. Thus, (0, 0) is the only critical point of f . Suppose that ∆ = pr − q 2 > 0. Then we have p 6= 0 and 1 f (x1 , x2 ) = px21 + 2qx1 x2 + rx22 = (px1 + qx2 )2 + (pr − q 2 )x22 . p Consequently, if ∆ > 0 and p > 0, then f (x1 , x2 ) > 0 for all (x1 , x2 ) ∈ IR2 \ {(0, 0)}; hence f has a strict global minimum at (0, 0). If ∆ > 0 and p < 0, then f (x1 , x2 ) < 0 for all (x1 , x2 ) ∈ IR2 \ {(0, 0)}; hence f has a strict global maximum at (0, 0). Now suppose that ∆ = pr − q 2 < 0. For v ∈ IR, we have f (vx2 , x2 ) = (pv 2 + 2qv + r)x22 , x2 ∈ IR. Since q 2 − pr > 0, the quadratic polynomial pv 2 + 2qv + r has two distinct real roots. Hence, there exist s, t ∈ IR such that ps2 + 2qs + r > 0 and pt2 + 2qt + r < 0. It follows that f (sx2 , x2 ) > 0 and f (tx2 , x2 ) < 0 for any x2 ∈ IR \ {0}. Thus, f has neither local maximum nor local minimum at (0, 0). Let f be a real-valued function defined on an open set U in IRk . A point a in U is called a saddle point of f if it is a critical point of f and if for any r > 0, there exist y, z ∈ Br (a) ∩ U such that f (y) < f (a) < f (z). Thus, in the above example, (0, 0) is a saddle point of f , provided ∆ = pr − q 2 < 0. The following theorem gives some sufficient conditions for a function defined on an open subset of IR2 to have a local maximum or local minimum. Theorem 5.2. Let f be a real-valued function defined on an open set U in IR2 with continuous first and second partial derivatives. Suppose that a point a = (a1 , a2 ) in U is a critical point of f . For y ∈ U , let D11 f (y) D12 f (y) . ∆f (y) := D21 f (y) D22 f (y) Then the following statements are true. (a) If ∆f (a) > 0 and D11 f (a) > 0, then f has a strict local minimum at a. (b) If ∆f (a) > 0 and D11 f (a) < 0, then f has a strict local maximum at a. (c) If ∆f (a) < 0, then a is a saddle point of f . Proof. Since a is a critical point of f , we have D1 f (a) = 0 and D2 f (a) = 0. Let x = (x1 , x2 ) ∈ U . By Taylor’s theorem, there exists a point z ∈ (a, x) such that f (x) − f (a) = 1 D11 f (z)(x1 − a1 )2 + 2D12 f (z)(x1 − a1 )(x2 − a2 ) + D22 f (z)(x2 − a2 )2 . 2 13 Since D11 f , D12 f , and D22 f are continuous, ∆f (y) is a continuous function of y. Suppose that ∆f (a) > 0 and D11 f (a) > 0. Then there exists some r > 0 such that Br (a) ⊂ U and that ∆f (y) > 0 and D11 f (y) > 0 for all y ∈ Br (a). For x ∈ Br (a), we have z ∈ (a, x) ⊂ U . Hence, ∆f (z) > 0 and D11 f (z) > 0. Consequently, f (x) − f (a) = 1 D11 f (z)(x1 − a1 )2 + 2D12 f (z)(x1 − a1 )(x2 − a2 ) + D22 f (z)(x2 − a2 )2 > 0 2 for all x = (x1 , x2 ) ∈ Br (a) \ {a}. This shows that f has a strict local minimum at a. Suppose that ∆f (a) > 0 and D11 f (a) < 0. A similar argument shows that there exists some r > 0 such that Br (a) ⊂ U and f (x) − f (a) < 0 for all x = (x1 , x2 ) ∈ Br (a) \ {a}. In other words, f has a strict local maximum at a. Suppose that ∆f (a) < 0. Then there exists some r > 0 such that Br (a) ⊂ U and that ∆f (y) < 0 for all y ∈ Br (a). Consequently, for any δ ∈ (0, r), there exist y, z ∈ Bδ (a) \ {a} such that f (y) − f (a) < 0 < f (z) − f (a). Therefore, a is a saddle point of f . Example 5.2. Let us look for the global and local minimizers and maximizers (if any) of the function f (x1 , x2 ) := x31 − 12x1 x2 + 8x32 , (x1 , x2 ) ∈ IR2 . In this case, we have f (x1 , 0) = x31 . Consequently, lim f (x1 , 0) = +∞ and x1 →+∞ lim x1 →−∞ f (x1 , 0) = −∞. Hence, f has neither global minimizers nor global maximizers. In order to find local minimizers and maximizers, we calculate D1 f and D2 f : D1 f (x1 , x2 ) = 3x21 − 12x2 and D2 f (x1 , x2 ) = −12x1 + 24x22 , (x1 , x2 ) ∈ IR2 . Solving the system of equations D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0, we find the critical points (2, 1) and (0, 0) of f . Moreover, D11 f (x1 , x2 ) = 6x1 , It follows that D12 f (x1 , x2 ) = −12, 6x ∆f (x1 , x2 ) = 1 −12 D22 f (x1 , x2 ) = 48x2 . −12 = 288x1 x2 − 144. 48x2 At the point (2, 1) we have ∆f (2, 1) > 0 and D11 f (2, 1) > 0. Hence, f has a local minimum at (2, 1) and f (2, 1) = −8. At the point (0, 0) we have ∆f (0, 0) < 0. Therefore, (0, 0) is a saddle point of f . 14 In the rest of this section we consider coercive functions and global minimizers. A continuous function f defined on IRk is said to be coercive if lim f (x) = +∞. kxk→∞ This means that for any real number M > 0 there exists some R such that kxk ≥ R implies f (x) ≥ M. Theorem 5.3. Let f be a continuous function defined on IRk . If f is coercive, then f has at least one global minimizer. If, in addition, all the first-order partial derivatives of f exist on IRk , then these global minimizers can be found among the critical points of f . Proof. Since f is coercive, there exists some r > 0 such that kxk > r implies f (x) > f (0). Let E be the set {x ∈ IRk : kxk ≤ r}. Then E is a bounded closed set. Since f is continuous on E, there exists some point a ∈ E such that f (a) ≤ f (x) for all x ∈ E. In particular, f (a) ≤ f (0). But f (x) > f (0) for x ∈ IRk \ E. Hence, f (a) ≤ f (x) for all x ∈ IRk . This completes the proof of the first statement. The second statement holds because global minimizers are critical points of f . Example 5.3. Consider the function f (x1 , x2 ) := x41 − 4x1 x2 + x42 , (x1 , x2 ) ∈ IR2 . We first show that f is coercive. Indeed, for (x1 , x2 ) ∈ IR2 \ {(0, 0)} we have 4x1 x2 4 4 . f (x1 , x2 ) = (x1 + x2 ) 1 − 4 x1 + x42 We observe that 2(x21 + x22 ) 2(x21 + x22 )2 4(x41 + x42 ) 4 |4x1 x2 | ≤ = ≤ = 2 . 4 4 4 4 4 4 2 2 4 4 2 2 x1 + x2 x1 + x2 (x1 + x2 )(x1 + x2 ) (x1 + x2 )(x1 + x2 ) (x1 + x22 ) This shows that limkxk→+∞ f (x) = +∞. In order to find critical points of f , we calculate D1 f and D2 f : D1 f (x1 , x2 ) = 4x31 − 4x2 and D2 f (x1 , x2 ) = −4x1 + 4x32 . Solving the system of equations D1 f (x1 , x2 ) = 0 and D2 f (x1 , x2 ) = 0, we find the critical points (1, 1), (−1, −1) and (0, 0) of f . But f (1, 1) = −2, f (−1, −1) = −2 and f (0, 0) = 0. Therefore, f has a global minimum at (1, 1) and (−1, −1). 15
© Copyright 2025 Paperzz