Lecture 14 - Linearization

LECTURE 17 - LINEARIZATION
CHRIS JOHNSON
Abstract. In this lecture we’ll apply tangent planes, the topic of
the previous lecture, to show how to obtain a linear approximation
of a multivariable function.
1. Motivation
Suppose that we’re given a function of a single variable, f (x). If
this function is differentiable, then we can use the tangent line of the
graph of the function to get an approximation of the function. That is,
we know the equation of the tangent line of the surface at some point
(x0 , f (x0 )) is
y − f (x0 ) = f 0 (x0 )(x − x0 ).
Moving the f (x0 ) to the right-hand side, we have a line
y = f 0 (x0 )(x − x0 ) + f (x0 )
which is the graph of the function
L(x) = f 0 (x0 )(x − x0 ) + f (x0 ).
What’s nice about this function L(x) is that it’s something we can
actually evaluate. If you think about the mathematical, numerical
operations you can actually perform – the things you can in principle sit
down and work out with a pencil and paper – you realize that there are
basically only four operations: addition, subtraction, multiplication,
and division. (These are the four arithmetic operations.) There are of
course some more complicated things we know how to do (for example,
cubing a number), but these are really built out of combinations of
addition, subtraction, multiplication, and division (e.g., x3 = x · x · x).
Using a computer, by the way, doesn’t really let you do any more
operations than what you can do with pencil and paper. Ultimately,
computers also can only do arithmetic: they aren’t magically able to
perform things that people in principle can not. In fact, in some ways
computers are worse at these arithmetic operations than people. A
computer has to represent numbers using a finite number of bits: values
that can only be 1 or 0. Since any computer only has a finite number
Date: February 26, 2014.
1
2
CHRIS JOHNSON
of these bits – even if it’s a very large number! – “most” numbers
can’t be represented exactly on a computer. It turns out that “most”
numbers would require an infinite number of bits to represent exactly,
so the computer has to use approximations. You might be surprised
to learn that a number as simple as 1/10 can’t be represented exactly
with a finite number of bits (at least not the way computers usually
represent numbers)! A really simple way to demonstrate this is to ask
the computer to add 0.1 + 0.2. If you did this in Python, for example,
you’ll get
>>> 0.1 + 0.2
0.30000000000000004.
Usually the computer truncates – cuts off – the values before printing
them, but internally represents the number to more decimal places.
When you do lots of calculations with these values, these little errors
start to add up!
Anyway, the four arithmetic operations are basically the only tools
we have to do numerical computations. However, certain types of functions are defined in terms of these arithmetic operations. The trig functions, cos θ and sin θ, for example, are defined geometrically: they’re
the x- and y-coordinates of points on the circle. Yet somehow your computer is able to spit out a value if you enter cos(0.3245). If the computer
can only do arithmetic, how is it able to determine this value? The answer is that the computer uses calculus (or, rather, someone who knew
calculus programmed the computer) to determine approximations of
cos θ that can actually be evaluated by using only arithmetic operations. This is of course the Taylor polynomials you learned about in
your second-semester calculus class.
The techniques of Taylor polynomials you learned before are very
powerful, and are the basis for all of the fancy technology we have
today: breakthroughs in science and medicine are possible today because people are able to use computers to do calculations and analyze
very large amounts of data, and they’re able to do this because we
can use Taylor polynomials (and related ideas) to convert complicated
calculations into arithmetic.
However, the material you’ve learned before is only applicable to
functions of a single variable. Our goal in this lecture is to start studying the comparable ideas for functions of several variables. To do this,
we’ll use tangent planes to get a linear approximation of a multivariable
function.
LECTURE 17 - LINEARIZATION
3
2. Linearization
Recall from the last lecture that we said tangent plane of the surface
z = f (x, y) at the point (x0 , y0 ) is given by
fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) − (z − f (x0 , y0 )) = 0.
Solving this for z we have
z = fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) + f (x0 , y0 ).
Notice that this is a function of x and y, and this is a plane which is a
graph of the function
L(x, y) = fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ) + f (x0 , y0 ),
which we call the linearization of the function f (x, y) at the point
(x0 , y0 ).
The reason we care about linearizations is that they’re functions we
(or a computer) can actually compute: we can get numerical answers
using a linearization. We can thus use linearizations to approximate
multivariable functions.
Example 2.1. Calculate the linearization of f (x, y) = xey at the point
(2, 0), and use the linearization to approximate f (2.01, −0.1).
To calculate the linearization, we first need to calculate the partial
derivatives.
fx (x, y) = ey
fy (x, y) = xey
Now we must evaluate these partial derivatives, and our original
function, at the point (x0 , y0 ) = (2, 0).
f (2, 0) = 2
fx (2, 0) = 1
fy (2, 0) = 2
We use these values to now determine our linearization,
L(x, y) = (x − 2) + 2y + 2 = x + 2y
We can now use this approximation to estimate the value f (2.01, −0.1):
f (2.01, −0.1) ≈ L(2.01, −0.1)
= 2.01 + 2 · (−0.1)
= 2.01 − 0.2
= 1.81
4
CHRIS JOHNSON
Let’s take a moment to think about what we’ve just done in the
example above. We used linearization, which is essentially just the
equation of a tangent plane, to estimate the value 2.01 · e−0.1 and got
an actual numerical value. This is an extremely important idea: we
can complicated functions and estimate them with things we actually
know how to calculate – this we can sit down and really do with a
pencil and paper. That is, we were able to say
2.01 · (2.718281824...)−0.1 ≈ 1.81.
By the way, if you plug 2.01e−0.1 in a calculator or computer, it will
probably spit back the answer
2.01e−0.1 ≈ 1.81872321025.
This means two things: 1) our approximation above, which was super
easy to actually calculate by hand, is a decent approximation; and
2) the computer uses a different type of approximation than what we
used. The computer is using a Taylor polynomial, probably to some
high degree, to get its approximation. The idea of Taylor polynomials
in one variable, you may recall, is really just an extension of the idea
of linearization (using tangent lines as the approximation). We can
also do Taylor polynomials in several variables, but won’t work on that
right now. For this lecture we’ll focus on linearization, and may come
back to multivariable Taylor polynomials at the end of the semester if
we have extra time.
Example 2.2. Calculate the linearization of f (x, y) = x3 y +y 2 x at the
point (−1, 3), and use the linearization to approximate f (−0.93, 2.976).
First we calculate the partial derivatives,
fx (x, y) = 3x2 y + y 2
fy (x, y) = x3 + 2xy
Evaluating these partials, and the original function, at (−1, 3) we
have
f (−1, 3) = (−1)3 · 3 + 32 · (−1) = −3 − 9 = −12
fx (−1, 3) = 3 · (−1)2 · 3 + 32 = 9 + 9 = 18
fy (−1, 3) = (−1)3 + 2 · (−1) · 3 = −1 − 6 = −7
Hence the linearization is
L(x, y) = 18(x + 1) − 7(y − 3) − 12.
LECTURE 17 - LINEARIZATION
5
We now use this to get an approximation,
f (−0.93, 2.976) ≈ L(−0.93, 2.976)
= 18 · (1 − 0.93) − 7(2.976 − 3) − 12
= 18 · (0.07) − 7 · (−0.024) − 12
= 1.26 + 0.168 − 12
= 1.428 − 12
= −10.576
3. Differentials
Notice in both of the examples above we had to pick values (x0 , y0 ),
the “center” of our approximation, where we could actually calculate
the true value of the function. This is always the case for these linearizations: we have to find a place to “anchor” our approximation; we
need somewhere where we know definitively what the function equals.
Let’s say we know the true value z0 = f (x0 , y0 ). Calling L(x, y) = z,
our linearization has the form:
z = fx (x0 , y0 ) · (x − x0 ) + fy (x0 , y0 ) · (y − y0 ) + z0 .
Moving the z0 to the other side we have
z − z0 = fx (x0 , y0 ) · (x − x0 ) + fy (x0 , y0 ) · (y − y0 ).
Notice that z − z0 , x − x0 , and y − y0 are just the change in the
values of z, x, and y when change our inputs from x0 to x; from y0 to
y; and then the output changes from z0 to z. That is, each of x − x0 ,
y − y0 , and z − z0 represents the change in x, y, and z. Let’s write
these changes as dx, dy, and dz. (We use the letter ’d’ for “difference.”)
Our equation above then becomes
dz = fx (x0 , y0 )dx + fy (x0 , y0 )dy.
Thinking of dx and dy as variables (just saying how much we vary the
original inputs x and y), we have that dz is a function of two variables.
This function is called the differential of z = f (x, y).
The idea here is that differentials measure the change in our approximation. For example, in our example above, we have
dz = 18dx − 7dy.
This means we can determine the change in approximation, dz, by just
plugging in the changes dx and dy in our variables x and y. In the
example above we changed x from −1 to −0.93. This is a change of
dx = −0.93 − (−1) = 0.07.
6
CHRIS JOHNSON
We changed y from 3 to 2.976. This is a change of
dy = 2.976 − 3 = −0.024
So the change in z from f (−1, 3) = 12 to our approximation L(−0.93, 2.976)
is
dz = 18 · (0.07) − 7 · (−0.024) = 1.428.
This means our function f (x, y) changes by approximately 1.428 when
we move the inputs of the function from (x0 , y0 ) = (−1, 3) to (−0.93, 2.976);
so the approximation is −12 + dz = −12 + 1.428 = −10.576, as we saw
above.
Differentials and linearizations are two sides of the same coin: they’re
basically the same thing, just represented different ways. More precisely, a differential is just a change in linearization. This means that
f (x, y) ≈ L(x, y) = f (x0 , y0 ) + dz.
(Since we’ll usually write z = f (x, y), we may sometimes write df for
dz, and call this value the differential of f instead of the differential of
z. These are the same thing, just different words.)
Example 3.1. Calculate the differential dz of z = sin(x + 3y 2 ):
∂f
∂f
dz =
dx +
dy
∂x
∂y
∂
∂
sin(x + 3y 2 )dx +
sin(x + 3y 2 )dy
=⇒ dz =
∂x
∂y
=⇒ dz = cos(x + 3y 2 )dx + 6y cos(x + 3y 2 )dy
Example 3.2. Calculate the differential dz of z = f (x, y) = (x3 −
2) · tan−1 (y), then use the differential to approximate f (−2.1, 0.22) and
f (−1.99, 0.18).
x3 − 2
dy
1 + y2
Use differentials (or linearizations), we need to find a point (x0 , y0 )
to use as the “center” of our approximation; some value near the values
we’re trying to approximate, where we can exactly calculate the true
value of the function. Let’s use (x0 , y0 ) = (−2, 0). Then our differential
becomes
dz = 12dx − 10dy
and the true value of the function is f (−2, 0) = −10.
For (−2.1, 0.22), we have dx = −0.1 and dy = 0.22, so
dz = 3x2 tan−1 (y)dx +
dz = 12 · (−0.1) − 10 · (0.22) = −1.2 − 2.2 = −3.4
LECTURE 17 - LINEARIZATION
7
so our approximation for the function is
f (−2.1, 0.22) ≈ f (−2, 0) + dz = −10 − 3.4 = −13.4.
For (−1.99, 0.18), dx = 0.01 and dy = 0.18, so
dz = 12 · 0.01 − 10 · 0.18 = 0.12 − 1.8 = −1.68
so
f (−1.99, 0.18) ≈ −10 − 1.68 = −11.68.
We can describe differentials in any number of variables, by the way.
If z = f (x1 , x2 , ..., xn ) is a function of n variables, the differential of z
is defined to be
∂f
∂f
∂f
dz =
dx1 +
dx2 + · · · +
dxn .
∂x1
∂x2
∂xn
4. Differentiability
In the case of functions of a single variable, saying a function is differentiable at x0 basically means that the tangent plane of the approximation is a “good” approximation of the function near x0 . Differentiability
in two dimensions is basically the same thing.
Intuitively, a function f (x, y) is differentiable at (x0 , y0 ) if the tangent plane of f (x, y) at the point (x0 , y0 ) is a good approximation of
the function near (x0 , y0 ). To be precise, a function z = f (x, y) is
differentiable at (x0 , y0 ) if we can write the function as
f (x, y) = f (x0 , y0 ) + fx (x0 , y0 )∆x + fy (x0 , y0 )∆y + 1 (∆x) + 2 (∆y)
where 1 and 2 are functions of the single variable with
lim 1 (∆x) = lim 2 (∆y) = 0.
∆x→0
∆y→0
Note that we can rewrite the above as
f (x, y) = L(x, y) + 1 (∆x) + 2 (∆y).
Note we used = and not ≈! We’re saying that f (x, y) is the linearization
“plus a little bit more” where that “little bit more” gets very very small
as (x, y) gets close to (x0 , y0 ). Again, differentiability means the linear
approximation is good.
Because of the following theorem, most of the functions we deal with
in this class will be differentiable.
Theorem 4.1. If the partial derivatives fx (x, y) and fy (x, y) exist and
are continuous near (x0 , y0 ), then f (x, y) is differentiable at (x0 , y0 ).