2. Inverse function and implicit function theorem The Inverse

2. Inverse function and implicit function theorem
The Inverse Function Theorem gives a condition under which a function can be locally
inverted. This theorem and its corollary the Implicit Function Theorem are fundamental
results in multivariable calculus. First we state the Inverse Function Theorem. Here, we
assume k ≥ 1.
Theorem 2.1. Let F be a C k map from an open neighborhood Ω of p0 ∈ Rn to Rn , with
q0 = F (p0 ). Suppose the derivative DF (p0 ) is invertible. Then there is a neighborhood
U of p0 and a neighborhood V of q0 such that F : U → V is one-to-one and onto, and
F −1 : V → U is a C k map. (One says F : U → V is a diffeomorphism.)
First we show that F is one-to-one on a neighborhood of p0 , under these hypotheses.
In fact, we establish the following result, of interest in its own right.
Proposition 2.2. Assume Ω ⊂ Rn is open and convex, and let f : Ω → Rn be C 1 .
Assume that the symmetric part of Df (u) is positive-definite, for each u ∈ Ω. Then f is
one-to-one on Ω.
Proof. Take distinct points u1 , u2 ∈ Ω, and set u2 − u1 = w. Consider ϕ : [0, 1] → R,
given by
ϕ(t) = w · f (u1 + tw).
Then ϕ0 (t) = w · Df (u1 + tw)w > 0 for t ∈ [0, 1], so ϕ(0) 6= ϕ(1). But ϕ(0) = w · f (u1 )
and ϕ(1) = w · f (u2 ), so f (u1 ) 6= f (u2 ).
To continue the proof of Theorem 2.1, let us set
(2.1)
¡
¢
f (u) = A F (p0 + u) − q0 ,
A = DF (p0 )−1 .
Then f (0) = 0 and Df (0) = I, the identity matrix. We show that f maps a neighborhood
of 0 one-to-one and onto some neighborhood of 0. Proposition 2.2 applies, so we know f
is one-to-one on some neighborhood O of 0. We next show that the image of O under f
contains a neighborhood of 0.
We can write
(2.2)
f (u) = u + R(u),
R(0) = 0, DR(0) = 0.
For v small, we want to solve
(2.3)
f (u) = v.
This is equivalent to u + R(u) = v, so let
(2.4)
Tv (u) = v − R(u).
1
2
Thus solving (2.3) is equivalent to solving
(2.5)
Tv (u) = u.
We look for a fixed point u = K(v) = f −1 (v). Also, we want to prove that DK(0) = I,
i.e., that K(v) = v + r(v) with r(v) = o(kvk). (The “little oh” notation is defined in
Exercise 8 of §1.) If we succeed in doing this, it follows easily that, for general x close to
q0 , G(x) = F −1 (x) is defined, and DG(q0 ) = DF (p0 )−1 . A parallel argument, with p0
replaced by nearby u and x = F (u), gives
(2.6)
³
¡
¢´−1
DG(x) = DF G(x)
.
Then a simple inductive argument shows that G is C k if F is C k . See Exercise 6 at the
end of this section, for an approach to this last argument.
A tool we will use to solve (2.5) is the following general result, known as the Contraction
Mapping Principle.
Theorem 2.3. Let X be a complete metric space, and let T : X → X satisfy
(2.7)
dist(T x, T y) ≤ r dist(x, y),
for some r < 1. (We say T is a contraction.) Then T has a unique fixed point x. For any
y0 ∈ X, T k y0 → x as k → ∞.
Proof. Pick y0 ∈ X and let yk = T k y0 . Then dist(yk , yk+1 ) ≤ rk dist(y0 , y1 ), so
(2.8)
dist(yk , yk+m ) ≤ dist(yk , yk+1 ) + · · · + dist(yk+m−1 , yk+m )
¡
¢
≤ rk + · · · + rk+m−1 dist(y0 , y1 )
¡
¢−1
≤ rk 1 − r
dist(y0 , y1 ).
It follows that (yk ) is a Cauchy sequence, so it converges; yk → x. Since T yk = yk+1 and
T is continuous, it follows that T x = x, i.e., x is a fixed point. Uniqueness of the fixed
point is clear from the estimate dist(T x, T x0 ) ≤ r dist(x, x0 ), which implies dist(x, x0 ) = 0
if x and x0 are fixed points. This proves Theorem 2.3.
Returning to the problem of solving (2.5), we pick b > 0 such that
(2.9)
B2b (0) ⊂ O and kwk ≤ 2b ⇒ kDR(w)k ≤
1
.
2
We claim that
(2.10)
kvk ≤ b =⇒ Tv : Xv → Xv ,
where
(2.11)
Xv = {u ∈ O : ku − vk ≤ Av },
Av =
sup
kwk≤2kvk
kR(w)k.
3
See Fig. 2.1. To prove that (2.10) holds, note that
(2.12)
Tv (u) − v = −R(u),
so we need to show that
(2.13)
kvk ≤ b, u ∈ Xv =⇒ kR(u)k ≤ Av .
Indeed,
(2.14)
u ∈ Xv =⇒ kuk ≤ kvk + Av ,
and, by (2.9) and (2.11),
(2.15)
kvk ≤ b =⇒ Av ≤ kvk,
and we have
(2.16)
u ∈ Xv =⇒ kuk ≤ 2kvk
=⇒ kR(u)k ≤
(hence, parenthetically, Xv ⊂ B2b (0))
sup
kR(w)k = Av .
kwk≤2kvk
This establishes (2.10).
Now, given uj ∈ Xv , kvk ≤ b,
(2.17)
kTv (u1 ) − Tv (u2 )k = kR(u2 ) − R(u1 )k
1
≤ ku1 − u2 k,
2
the last inequality by (2.9), so the map (2.10) is a contraction, if kvk ≤ b. Hence there
exists a unique fixed point u = K(v) ∈ Xv . Also, since u ∈ Xv ,
(2.18)
kK(v) − vk ≤ Av = o(kvk),
so DK(0) = I, and the Inverse Function Theorem is proved
Thus if DF is invertible on the domain of F, F is a local diffeomorphism. Stronger
hypotheses are needed to guarantee that F is a global diffeomorphism onto its range.
Proposition 2.2 provides one tool for doing this. Here is a slight strengthening.
Corollary 2.4. Assume Ω ⊂ Rn is open and convex, and that F : Ω → Rn is C 1 . Assume
there exist n × n matrices A and B such that the symmetric part of A DF (u) B is positive
definite for each u ∈ Ω. Then F maps Ω diffeomorphically onto its image, an open set in
Rn .
Proof. Exercise.
4
We make a comment about solving the equation F (x) = y, under the hypotheses of
Theorem 2.1, when y is close to q0 . The fact that finding the fixed point for Tv in (2.9) is
accomplished by taking the limit of Tvk (v) implies that, when y is sufficiently close to q0 ,
the sequence (xk ), defined by
¡
¢
(2.19)
x0 = p0 , xk+1 = xk + DF (p0 )−1 y − F (xk ) ,
converges to the solution x. An analysis of the rate at which xk → x, and F (xk ) → y, can
be made by applying F to (2.19), yielding
F (xk+1 ) = F (xk + DF (p0 )−1 (y − F (xk ))
= F (xk ) + DF (xk )DF (p0 )−1 (y − F (xk )) + R(xk , DF (p0 )−1 (y − F (xk ))),
and hence
(2.20)
¡
¢
e k , y − F (xk )),
y − F (xk+1 ) = I − DF (xk )DF (p0 )−1 (y − F (xk )) + R(x
e k , y − F (xk ))k = o(ky − F (xk )k).
with kR(x
It turns out that replacing p0 by xk in (2.19) yields a faster approximation. This method,
known as Newton’s method, is described in the exercises.
We consider some examples of maps to which Theorem 2.1 applies. First, we look at
µ
¶ µ
¶
r cos θ
x(r, θ)
2
(2.21)
F : (0, ∞) × R −→ R , F (r, θ) =
=
.
r sin θ
y(r, θ)
Then
µ
(2.22)
DF (r, θ) =
∂r x
∂r y
∂θ x
∂θ y
¶
µ
=
cos θ
sin θ
−r sin θ
r cos θ
¶
,
so
(2.23)
det DF (r, θ) = r cos2 θ + r sin2 θ = r.
Hence DF (r, θ) is invertible for all (r, θ) ∈ (0, ∞) × R. Theorem 2.1 implies that each
(r0 , θ0 ) ∈ (0, ∞) × R has a neighborhood U and (x0 , y0 ) = (r0 cos θ0 , r0 sin θ0 ) has a neighborhood V such that F is a smooth diffeomorphism of U onto V . In this simple situation,
it can be verified directly that
(2.24)
F : (0, ∞) × (−π, π) −→ R2 \ {(x, 0) : x ≤ 0}
is a smooth diffeomorphism.
Note that DF (1, 0) = I in (2.22). Let us check the domain of applicability of Proposition
2.2. The symmetric part of DF (r, θ) in (2.22) is
¶
µ
1
(1
−
r)
sin
θ
cos θ
2
.
(2.25)
S(r, θ) = 1
(1
−
r)
sin
θ
r cos θ
2
5
By Proposition 1.7, this is positive definite if and only if
(2.26)
cos θ > 0,
and
(2.27)
1
det S(r, θ) = r cos2 θ − (1 − r)2 sin2 θ > 0.
4
Now (2.26) holds for θ ∈ (−π/2, π/2), but not on all of (−π, π). Furthermore, (2.27) holds
for (r, θ) in a neighborhood of (r0 , θ0 ) = (1, 0), but it does not hold on all of (0, ∞) ×
(−π/2, π/2). We see that Proposition 2.2 does not capture the full force of (2.24).
We move on to another example. As in §1, we can extend Theorem 2.1, replacing Rn by a
2
finite dimensional real vector space, isometric to a Euclidean space, such as M (n, R) ≈ Rn .
As an example, consider
(2.28)
Exp : M (n, R) −→ M (n, R),
Exp(X) = e
X
∞
X
1 k
X .
=
k!
k=0
Since
(2.29)
1
Exp(Y ) = I + Y + Y 2 + · · · ,
2
we have
(2.30)
D Exp(0)Y = Y,
∀ Y ∈ M (n, R),
so D Exp(0) is invertible. Then Theorem 2.1 implies that there exist a neighborhod U of
0 ∈ M (n, R) and a neighborhood V of I ∈ M (n, R) such that Exp : U → V is a smooth
diffeomorphism.
To motivate the next result, we consider the following example. Take a > 0 and consider
the equation
(2.31)
x2 + y 2 = a2 ,
F (x, y) = x2 + y 2 .
Note that
(2.32)
DF (x, y) = (2x 2y),
Dx F (x, y) = 2x,
Dy F (x, y) = 2y.
The equation (2.31) defines y “implicitly” as a smooth function of x if |x| < a. Explicitly,
(2.33)
|x| < a =⇒ y =
p
a2 − x2 ,
Similarly, (2.31) defines x implicitly as a smooth function of y if |y| < a; explicitly
(2.34)
|y| < a =⇒ x =
p
a2 − y 2 .
6
Now, given x0 ∈ R, a > 0, there exists y0 ∈ R such that F (x0 , y0 ) = a2 if and only if
|x0 | ≤ a. Furthermore,
(2.35)
given F (x0 , y0 ) = a2 ,
Dy F (x0 , y0 ) 6= 0 ⇔ |x0 | < a.
Similarly, given y0 ∈ R, there exists x0 such that F (x0 , y0 ) = a2 if and only if |y0 | ≤ a,
and
(2.36)
given F (x0 , y0 ) = a2 ,
Dx F (x0 , y0 ) 6= 0 ⇔ |x0 | < a.
Note also that, whenever (x, y) ∈ R2 and F (x, y) = a2 > 0,
(2.37)
DF (x, y) 6= 0,
so either Dx F (x, y) 6= 0 or Dy F (x, y) 6= 0, and, as seen above whenever (x0 , y0 ) ∈ R2 and
F (x0 , y0 ) = a2 > 0, we can solve F (x, y) = a2 for either y as a smooth function of x for x
near x0 or for x as a smooth funciton of y for y near y0 .
We move from these observaitons to the next result, the Implicit Function Theorem.
Theorem 2.5. Suppose U is a neighborhood of x0 ∈ Rm , V a neighborhood of y0 ∈ R` ,
and we have a C k map
F : U × V −→ R` ,
(2.38)
F (x0 , y0 ) = u0 .
Assume Dy F (x0 , y0 ) is invertible. Then the equation F (x, y) = u0 defines y = g(x, u0 ) for
x near x0 (satisfying g(x0 , u0 ) = y0 ) with g a C k map.
To prove this, consider H : U × V → Rm × R` defined by
¡
¢
H(x, y) = x, F (x, y) .
(2.39)
(Actually, regard (x, y) and (x, F (x, y)) as column vectors.) We have
µ
(2.40)
DH =
I
Dx F
0
Dy F
¶
.
Thus DH(x0 , y0 ) is invertible, so J = H −1 exists, on a neighborhood of (x0 , u0 ), and is
C k , by the Inverse Function Theorem. It is clear that J(x, u0 ) has the form
¡
¢
J(x, u0 ) = x, g(x, u0 ) ,
(2.41)
and g is the desired map.
Here is an example where Theorem 2.5 applies. Set
(2.42)
4
2
F : R −→ R ,
µ
¶
x(u2 + v 2 )
F (u, v, x, y) =
.
xu + yv
7
We have
µ ¶
4
F (2, 0, 1, 1) =
.
2
(2.43)
Note that
µ
(2.44)
Du,v F (u, v, x, y) =
2xu 2xv
x
y
¶
,
hence
µ
(2.45)
Du,v F (2, 0, 1, 1) =
4 0
1 1
¶
is invertible, so Theorem 2.5 (with (u, v) in place of y and (x, y) in place of x) implies that
the equation
µ ¶
4
(2.46)
F (u, v, x, y) =
2
defines smooth functions
(2.47)
u = u(x, y),
v = v(x, y),
for (x, y) near (x0 , y0 ) = (1, 1), satisfying (2.46), with (u(1, 1), v(1, 1)) = (2, 0).
Let us next focus on the case ` = 1 of Theorem 2.5, so
(2.48)
z = (x, y) ∈ Rn , x ∈ Rn−1 , y ∈ R,
F (z) ∈ R.
Then Dy F = ∂y F . If F (x0 , y0 ) = u0 , Theorem 2.5 says that if
(2.49)
∂y F (x0 , y0 ) 6= 0,
then one can solve
(2.50)
F (x, y) = u0 for y = g(x, u0 ),
for x near x0 (satisfying g(x0 , u0 ) = y0 ), with g a C k function. This phenomenon was
illustrated in (2.31)–(2.35). To generalize the observations involving (2.36)–(2.37), we
note the following. Set (x, y) = z = (z1 , . . . , zn ), z0 = (x0 , y0 ). The condition (2.49) is
that ∂zn F (z0 ) 6= 0. Now a simple permutation of variables allows us to assume
(2.51)
∂zj F (z0 ) 6= 0,
F (z0 ) = u0 ,
and deduce that one can solve
(2.52)
F (z) = u0 ,
for zj = g(z1 , . . . , zj−1 , zj+1 , . . . , zn ).
Let us record this result, changing notation and replacing z by x.
8
Proposition 2.6. Let Ω be a neighborhood of x0 ∈ Rn . Asume we have a C k function
(2.53)
F : Ω −→ R,
F (x0 ) = u0 ,
and assume
(2.54)
DF (x0 ) 6= 0,
i.e., (∂1 F (x0 ), . . . , ∂n F (x0 )) 6= 0.
Then there exists j ∈ {1, . . . , n} such that one can solve F (x) = u0 for
(2.55)
xj = g(x1 , . . . , xj−1 , xj+1 , . . . , xn ),
with (x10 , . . . , xj0 , . . . , xn0 ) = x0 , for a C k function g.
Remark. For F : Ω → R, it is common to denote DF (x) by ∇F (x),
(2.56)
∇F (x) = (∂1 F (x), . . . , ∂n F (x)).
Here is an example to which Proposition 2.6 applies. Using the notation (x, y) =
(x1 , x2 ), set
(2.57)
F : R2 −→ R,
F (x, y) = x2 + y 2 − x.
Then
(2.58)
∇F (x, y) = (2x − 1, 2y),
which vanishes if and only if x = 1/2, y = 0. Hence Proposition 2.6 applies if and only if
(x0 , y0 ) 6= (1/2, 0).
Let us give an example involving a real valued function on M (n, R), namely
(2.59)
det : M (n, R) −→ R.
As indicated in Exercise 11 of §1 (the first exercise set), if det X 6= 0,
(2.60)
D det(X)Y = (det X) Tr(X −1 Y ),
so
(2.61)
det X 6= 0 =⇒ D det(X) 6= 0.
We deduce that, if
(2.62)
X0 ∈ M (n, R),
det X0 = a 6= 0,
9
then, writing
(2.63)
X = (xjk )1≤j,k≤n ,
there exist µ, ν ∈ {1, . . . , n} such that the equation
(2.64)
det X = a
has a smooth solution of the form
(2.65)
³
´
xµν = g xαβ : (α, β) 6= (µ, ν) ,
such that, if the argument of g consists of the matrix entries of X0 other than the µ, ν
entry, then the left side of (2.65) is the µ, ν entry of X0 .
Let us return to the setting of Theorem 2.5, with ` not necessarily equal to 1. In notation
parallel to that of (2.51), we assume F is a C k map,
(2.66)
F : Ω −→ R` ,
F (z0 ) = u0 ,
whre Ω is a neighborhood of z0 in Rn . We assume
(2.67)
DF (z0 ) : Rn −→ R` is surjective.
Then, upon reordering the variables z = (z1 , . . . , zn ), we can write z = (x, y), x =
(x1 , . . . , xn−` ), y = (y1 , . . . , y` ), such that Dy F (z0 ) is invertible, and Theorem 2.5 applies. Thus (for this reordering of variables), we have a C k solution to
(2.68)
F (x, y) = u0 ,
y = g(x, u0 ),
satisfying y0 = g(x0 , u0 ), z0 = (x0 , y0 ).
To give one example to which this result applies, we take another look at F : R4 → R2
in (2.42). We have
µ
¶
2xu 2xv u2 + v 2 0
(2.69)
DF (u, v, x, y) =
.
x
y
u
v
The reader is invited to determine for which (u, v, x, y) ∈ R4 the matrix on the right side
of (2.69) has rank 2.
Here is another example, involving a map defined on M (n, R). Set
µ
¶
det X
2
(2.70)
F : M (n, R) −→ R , F (X) =
.
Tr X
Parallel to (2.60), if det X 6= 0, Y ∈ M (n, R),
µ
¶
(det X) Tr(X −1 Y )
(2.71)
DF (X)Y =
.
Tr Y
10
Hence, given det X 6= 0, DF (X) : M (n, R) → R2 is surjective if and only if
µ
¶
Tr(X −1 Y )
2
(2.72)
L : M (n, R) → R , LY =
Tr Y
is surjective. This is seen to be the case if and only if X is not a scalar multiple of the
identity I ∈ M (n, R).
Exercises
1. Supose F : U → Rn is a C 2 map, p ∈ U, open in Rn , and DF (p) is invertible. With
q = F (p), define a map N on a neighborhood of p by
¡
¢
(2.73)
N (x) = x + DF (x)−1 q − F (x) .
Show that there exists ε > 0 and C < ∞ such that, for 0 ≤ r < ε,
kx − pk ≤ r =⇒ kN (x) − pk ≤ C r2 .
Conclude that, if kx1 − pk ≤ r with r < min(ε, 1/2C), then xj+1 = N (xj ) defines a
sequence converging very rapidly to p. This is the basis of Newton’s method, for solving
F (p) = q for p.
Hint. Apply F to both sides of (2.73).
2. Applying Newton’s method to f (x) = 1/x, show that you get a fast approximation to
division using only addition and multiplication.
Hint. Carry out the calculation of N (x) in this case and notice a “miracle.”
3. Identify R2n with Cn via z = x + iy, as in Exercise 4 of §1. Let U ⊂ R2n be open,
F : U → R2n be C 1 . Assume p ∈ U, DF (p) invertible. If F −1 : V → U is given as in
Theorem 2.1, show that F −1 is holomorphic provided F is.
4. Let O ⊂ Rn be open. We say a function f ∈ C ∞ (O) is real analytic provided that, for
each x0 ∈ O, we have a convergent power series expansion
(2.74)
f (x) =
X 1
f (α) (x0 )(x − x0 )α ,
α!
α≥0
valid in a neighborhood of x0 . Show that we can let x be complex in (2.16), and obtain
an extension of f to a neighborhood of O in Cn . Show that the extended function is
holomorphic, i.e., satisfies the Cauchy-Riemann equations.
Remark. It can be shown that, conversely, any holomorphic function has a power series
expansion. See §10. For the next exercise, assume this as known.
5. Let O ⊂ Rn be open, p ∈ O, f : O → Rn be real analytic, with Df (p) invertible. Take
11
f −1 : V → U as in Theorem 2.1. Show f −1 is real analytic.
Hint. Consider a holomorphic extension F : Ω → Cn of f and apply Exercise 3.
6. Use (2.6) to show that if a C 1 diffeomorphism has a C 1 inverse G, and if actually F is
C k , then also G is C k .
Hint. Use induction on k. Write (2.6) as
G(x) = Φ ◦ F ◦ G(x),
with Φ(X) = X −1 , as on Exercises 3 and 13 of §1, G(x) = DG(x), F(x) = DF (x). Apply
Exercise 12 of §1 to show that, in general
G, F, Φ ∈ C ` =⇒ G ∈ C ` .
Deduce that if one is given F ∈ C k and one knows that G ∈ C k−1 , then this result applies
to give G = DG ∈ C k−1 , hence G ∈ C k .
7. Show that there is a neighborhood O of (1, 0) ∈ R2 and there are functions u, v, w ∈
C 1 (O) (u = u(x, y), etc.) satisfying the equations
u3 + v 3 − xw3 = 0,
u2 + yw2 + v = 1,
xu + yvw = 1,
for (x, y) ∈ O, and satisfying
u(1, 0) = 1,
v(1, 0) = 0,
w(1, 0) = 1.
Hint. Define F : R5 → R3 by

u3 + v 3 − xw3
F (u, v, w, x, y) =  u2 + yw2 + v  ,
xu + yvw

Then F (1, 0, 1, 1, 0) = (0, 1, 1)t . Evaluate the 3 × 3 matrix Du,v,w F (1, 0, 1, 1, 0). Compare
(2.42)–(2.47).
8. Consider F : M (n, R) → M (n, R), given by F (X) = X 2 . Show that F is a diffeomorphism of a neighborhood of the identity matrix I onto a neighborhood of I. Show that F
is not a diffeomorphism of a neighborhood of
µ
¶
1 0
0 −1
onto a neighborhood of I (in case n = 2).
9. Prove Corollary 2.4.