Zeroing in on the Implicit Function Theorem

Zeroing in on the
Implicit Function
Theorem
The Implicit Function Theorem
for
several equations in several unknowns.
So where do we stand?
Solving a system of m equations in n
unknowns is equivalent to finding the “zeros”
of a vector-valued function from
n → m.
 When n > m, such a system will “typically”
have infinitely many solutions. In “nice” cases,
the solution will be a function from
n-m → m.

So where do we stand?

Solving linear systems is easy; we are
interested in the non-linear case.

We will ordinarily not be able to solve a
system “globally.” But under reasonable
conditions, there will be a solution function in
a (possibly small!) region around a single
“known” solution.
For a Single Equation in Two Unknowns
In a small box around (a,b), we
hope to find g(x).
And we can, provided that the ypartials at (a,b) are continuous
and non-zero.
y
(a,b)
(a,b)
x
y = g(x)
Make the box around (a,b)
small enough so that all of the
y-partials in this box are
“close” to D.
y
(a,b)
x
Start with a point (a,b) on the
contour line, where the partial
with respect to y is not 0:
 f 
D 
 0
 y  ( a,b)
(a,b)
y = g(x)
Fix x.
For this x, construct function
y
f  x, y 
x  y  y 
D
Iterate x(y). What happens?
(a,b)
x
Start with a point (a,b) on the
contour line, where the partial
with respect to y is not 0:
(a,b)
 f 
D 
 0
 y  ( a,b)
y = g(x)
x
A Whole Family of Quasi-Newton Functions
Remember that (a,b) is
a “known” solution.
and
f (a, b)
D
y
There are a whole bunch of these functions:
f  x, y 
x  y  y 
D
There is one for each x value.
A Whole Family of Quasi-Newton Functions
Remember that (a,b) is a
“known” solution to f(x,y)=0,
If x = a, then we have
a  y   y 
f  a, y 
D
The “best of all possible
worlds”
Leibniz method!
and
f (a, b)
D
y
If x  a, then we have
x  y   y 
f  x, y 
D
The “pretty good”
Quasi-Newton
method!
What are the issues?
We
have to make sure the iterated maps
converge---how do we do this?
“Pretty good” quasi-Newton’s method
f ( x)
Q( x)  x 
D
If we choose D near enough to f’(p)
so that |Q’(p)| < ½, iterated maps will
converge in a neighborhood of p.
How does that work in our case?
x  y   y 
f  x, y 
D
If we make sure that the partials of f
with respect to y are all near enough to
D to guarantee that | x’(y) | < ½ for
all (x,y) in the square, then the iterated
maps will converge.
The Role of the Derivative
If we have a function f: [a,b] →  which is differentiable on
(a,b) and continuous on [a,b], the Mean Value Theorem says
that there is some c in [a,b] such that
f '(c) 
f (b)  f (a)

ba
f '(c) 
f (b)  f (a)
.
ba
If |f (x)| < k < 1 for all x in [a,b]
k  f '(c) 
f (b)  f (a)
ba

f (b)  f (a)  k b  a
Likewise, if |f (x)| > k > 1 for all x in [a,b]
k  f '(c) 
f (b)  f (a)
ba

f (b)  f (a)  k b  a
Distances
contract by a
factor of k!
Distances
expand by a
factor of k!
The Role of the Derivative
If p is a fixed point of f in [a,b], and |f (x)| < k < 1 for all x
in [a,b], then
f ( x)  f ( p)  k x  p .
f moves other points
But f (p) = p, so f ( x)  p  k x  p .
closer to p by a
factor of k!
Each time we re-apply the next iterate is even closer to p!
(Attracting fixed point!)
f 2 ( x)  p  f ( f ( x))  f ( p)  k f ( x)  p  k 2 | x  p | .
Likewise, if |f (x)| > k > 1 for all x in [a,b], f moves other points
farther and farther away from p. (Repelling fixed point!)
What are the issues?
Not
so obvious . . . we have to work a bit to
make sure we get the right fixed point. (We
don’t leave the box!)
(a,b)
x
y
x
Systems of Equations and Differentiability
A vector valued function f of several variables is
differentiable at a vector p if in some small
neighborhood of p, the graph of f “looks a lot like” an
affine function.
That is, there is a linear transformation Df(p) so that for
all z “close” to p,
Where Df(p) is the
Jacobian matrix made
f (z)  Df (p)(z  p)  f (p)
up of all the partial
derivatives of f.
Suppose that f (p) = 0. When can we solve f(z) = 0 in some
neighborhood of p?
Systems of Equations and Differentiability
For z “close” to p,
f (z)  Df (p)(z  p)  f (p)
Since f (p) = 0,
f (z)  Df (p)(z  p)
When can we solve f(z) = 0 in some neighborhood of p?
Answer: Whenever we can solve
Df (p)(z  p)  0
Because the
existence of a
solution depends on
the geometry of the
function.
When can we do it?
Df (p)(z  p)  0
is a linear system.
We understand linear
systems extremely well.
 a11 a12
a
a
 21 22
 a31 a32
a13
a14
a23
a24
a33
a34
a15 
a25 

a35 
a11 y1  a12 y2  a13 y3  a14 x1  a15 x2  0
a21 y1  a22 y2  a23 y3  a24 x1  a25 x2  0
a31 y1  a32 y2  a33 y3  a34 x1  a35 x2  0
We can solve for the variables y1,
y2, and y3 in terms of x1 and x2 if
and only if the sub-matrix . . .
is invertible.
A bit of notation
To simplify things, we will write our vector
valued function F : n+m →m.
 We will write our “input” variables as
concatenations of n-vectors and m-vectors.
e.g. (y,x)=(y1, y2,. . ., yn, x1, x2,. . ., xm)
 So when we solve F(y,x)=0 we will be solving
for the y-variables in terms of the x-variables.

The System Df (p)(z  p)  0
 F1
 y (p)
 1
 F2
(p)


y
 1
 F3
(p)


y
1



 Fm
 y (p)
 1
F1
(p)
y2
F1
(p)
yn
F1
(p)
x1
F2
(p)
y2
F3
(p)
yn
F2
(p)
x1
F3
(p)
y2
F3
(p)
yn
F3
(p)
x1
Fm
(p)
y2
Fm
(p)
yn
Fm
(p)
x1
F1

(p) 
xm
y  p1 
  1
  y2  p2 
F2
(p) 

xn
 
  yn  pn   0
F3
(p) 
xn
  x1  pn 1 



Fm
  xm  pn  m 
(p) 
xm

We can solve for the variables y1, y2, . . . yn in terms of x1,
x2 ,. . ., xm if and only if the sub-matrix . . .
is invertible.
So the Condition We Need is. . .
. . . the invertibility of the matrix
 F1
 y (p)
 1
 F2
(p)


y
 1
D   F3
(p)


y
 1


 Fm
 y (p)
 1
F1
(p)
y2
F2
(p)
y2
F3
(p)
y2
Fm
(p)
y2
F1

(p) 
yn


F3
(p) 
yn

.
F3
(p) 
yn



Fm

(p) 
yn

We will refer to the inverse of the matrix D as D-1.
The Implicit Function Theorem
F : n+m →m has continuous partials.
 Suppose b n and a m with F(b,a)=0.
 The n x n matrix that corresponds to the ypartials of F (denoted by D) is invertible.
Then “near” a there exists a unique function g(x)
such that F(g(x),x)=0; moreover g(x) is
continuous.

What Function Do We Iterate?
The 2-variable case. Fix x.
For this x, construct function
f  x, y 
x  y  y 
D
Iterate x(y).
Where
f (a, b)
D
.
y
What Function Do We Iterate?
The 2-variable case. Fix x.
The n-variable case. Fix x.
For this x, construct function
For this x, construct function
f  x, y 
x  y  y 
D
x  y   y  D-1  F  y, x  
Iterate x(y).
Iterate x(y).
Multi-variable Parallel?
n+m
Partials of F
in the ball are
all close to
the partials at
(b,a)
r
t
(b,a)
n
m
g(x)
a
r
t
x
We want g
continuous
& unique
F(g(x),x) = 0
b
r
Notation: Let dF(y,x) denote the n x n submatrix made up of
all the y partial derivatives of F at (y,x).
Step I: Since D is invertible,  D-1  0.
We choose r as follows:
n+m
r
(b,a)
Notation: Let dF(y,x) denote the n x n submatrix made up of
all the y partial derivatives of F at (y,x).
Step I: Since D is invertible,  D-1  0.
We choose r as follows: use continuity of the partials of F
to choose r small enough so that for all (y,x) Br(b,a)
D dF(y, x) 
1
2 D 1
1
1
Then x (y)  In  D  dF(y, x)   D  D  dF(y, x)  .
So
x (y )


Multivariable Mean
Value Theorem
D1 D dF(y, x)
D
1

 1

 2 D1

!
 1

 2

x (y1 )  x (y 2 )

1
y1  y 2
2
There are two (highly non-trivial) steps in the proof:



In Step 1 we choose the radius r of the ball in n+m so that the
partials in the ball are all “close” to the (known) partials at
(b,a). This makes x contract distances on the ball, forcing
convergence to a fixed point. (Uses continuity of the partial
derivatives.)
In Step 2 we choose the radius t around a “small” so as to
guarantee that our iterates stay in the ball. In other words, our
“initial guess” for the quasi-Newton’s method is good enough
to guarantee convergence to the “correct” root.
The two together guarantee that we can find value a g(x)
which solves the equation. We then map x to g(x).
Since this is the central idea of the proof, I will reiterate it:


In Step 1 we make sure that iterated maps on our “pretty
good” quasi-Newton function actually converge.
In Step 2, by making sure that they didn’t “leave the box,” we
made sure the iterated maps converged to the fixed point we
were aiming for. That is, that they didn’t march off to some
other fixed point outside the region.
(a,b)
x
y
x
Final thoughts
The Implicit Function Theorem is a DEEP theorem of
Mathematics.
Fixed point methods show up in many other contexts:
For instance:
 They
underlie many numerical approximation techniques.
 They are used in other theoretical contexts, such as in
Picard’s Iteration proof of the fundamental theorem of
differential equations.
The iteration functions frequently take the form of a
“quasi-Newton’s method.”