notes - UTSA

MAT1193 – 5e The chain rule One of the most important rules for taking derivatives is the chain rule which tells us how to take derivatives of compositions of functions. Here is the rule: h(x) = f  g(x) = f ( g(x)) → hʹ′(x) = f ʹ′( g(x)) * gʹ′(x)
h(x) = f  g(x) = f ( g(x)) →
dh dh dg
=
dx dg dx
To figure out what that even means, it’s useful to go back to the function machine picture of the composition between functions: €
The input value x is first fed into the function g resulting in an output g(x). That output of g is now treated as the input to the function f, resulting in a value f(g(x)). That is what we define as h(x). Now we change the input by a little bit by adding Δx to x. The first thing we do is to apply the function g, and ask how much has the change in input affected the output of the function g? If Δx is small, then the relationship between Δx (change in the input) and Δg (change in the output) is determined by the derivative g’(x) = dg/dx: Δg = gʹ′(x)Δx or Δg =
dg
Δx dx
That means that the output of the function g is approximately equal to g(x)+Δg=g(x)+g’(x) Δx . €
But that change in the output of the function g is viewed as a change of input to the function f. So how much does the output of f change? Remembering that the base input to the function f was equal to g(x), the logic of the derivative tells us df
Δf = f ʹ′(g(x))Δg or Δf =
Δg dg
€
Notice that in the differential notation we have written the derivative of f evaluated at the output of g as df/dg. But now we go back and plug in the fact that Δg = g’(x) *Δx (or in differential notation Δg = dg/df* Δx). So we find that the change in the output of the result of the two functions is just Δf = f ʹ′(g(x)) * gʹ′(x)Δx or
€
Δf =
df dg
Δx dg dx
This says that the relationship between a small change in the input (Δx) and the change in the output of the composition of functions (Δf) is determined by df dg
multiplying by f ʹ′(g(x)) * gʹ′(x) or by
. But this is just the chain rule: dg dx
hʹ′(x) = f ʹ′( g(x)) * gʹ′(x) or
dh dh dg
=
. dx dg dx
€
Let’s work through an example. €
Suppose we are the problem of finding the derivative of the function h(x) = ln(x2). In particular, we want to know h’(e): the derivative of the function h evaluated at the base point x=e. Before applying the rule, let just evaluate the function at the input value e. h(x) = ln(e2) = 2, since ln(x) and ex are inverse functions of each other. But let’s slow down a bit and recognize that h(x) is the composition of two functions. First we take an input value x and square it, that is we apply the function g(x) = x2. With x=e we have g(e) = e2. Then we take that output and use that as input to the function f(y) = ln(y). Notice that I used y as the input variable to the function f. I could have used anything, but it would be kind of confusing to use the variable x, since for this problem we’re thinking of x as the input to the function g, not the function f. To find h(e) we take the output of g which is equal to e2 and use that as the input to f, so that h(e) = f(e2) = ln(e2) =2. To look at the chain rule, suppose we change input away from the value of e, decreasing it by 0.1. That is to say, we want to examine the input x+Δx = e-­‐0.1=2.71828-­‐0.1 = 2.61828. Our base point x=e, and the change is Δx = -­‐0.1. How much does that change affect the output of the function g? Well g(x+Δx)≈g(x)+g’(x)Δx. From the power rule, g’(x) = 2x. If we evaluate this at the base point x=e, we find that g’(e) = 2e. So that means by reducing the input value x by 0.1 we have changed the output of the function g by g’(x)Δx = 2e*(-­‐0.1) = -­‐0.2*e. Looking at the function machine picture, we see that a change in the output of the function g will act like a change in the input to the function f. At the base point x=e the input to the function f was equal to e2. Now we are going to reduce this input by an amount -­‐0.2e. How much will the output of f change? The derivative tells us that f(y+Δy)≈f(y)+f’(y)Δy. For f(y) = ln(y) the log rule for derivatives says that f’(y) = 1/y. In our problem, the base point input to the function f is equal to e2. That means that the derivative of f at that base point is given by f’(e2) = 1/e2. So the change in the output to f is equal to 1
1
1
−0.2e −0.2
=
2 * Δy = 2 * (2e * Δx ) = 2 * ( −0.2e) =
e
e
e
e2
e
€
To confirm that’s just a long-­‐winded way of applying the chain rule, lets start with the rule and plug in. hʹ′(x) = f ʹ′( g(x)) * gʹ′(x) =
2
h'(e) =
e
1
2x 2
2x = 2 =
g(x)
x
x
€
Let’s try another example, but this time let’s do an applied example so that we can keep track of the units. (The problem is 2.9.35 from the book.) “The number of mosquitoes (M) that end up in a room is a function of how far the window is open (W, in square centimeters) according to M(W) = 5W+2. The number of bites (B) depends on the number of mosquitoes according to B(M) = 0.5M. Find the derivative of the number of bites as a function of how far the window is open.” First of all the function composition here can be written as B(W) = B(M(W)). Applying the chain rule B’(W) = B’(M)*M’(W). B’(M) = 0.5 and M’(W) = 5, so B’(W) = 0.5*5 = 2.5. But what does that mean? Let’s take things a bit more slowly. Writing B’(M) as dB/dM it becomes a bit more obvious that dB/dM is the change in the number of bites per change in the number of mosquitoes. So dB/dM has the units of bites/mosquito and B’(M) = 0.5 bites/mosquito means that for every extra mosquito you let in you get ½ more mosquito bites. So if there were 3 more mosquitoes, there would be 1.5 extra bites: ΔM = 3 mosquitoes -­‐> ΔB = (dB/dM)*ΔM= (0.5 bites/mosquito )*3 mosquitoes = 1.5 bites. Similarly dM/dW has units of mosquitoes/cm2 and and M’(W) = 5 mosquitoes/cm2 means that for every square centimeter you let in 5 more mosquitoes. So if you closed the window by 2 square centimeters, there would be 10 fewer mosquitoes: ΔW = -­‐2 cm2-­‐> ΔM = (dM/dW)*ΔW= (5 mosquitoes/cm2 )*(-­‐2 cm2) = -­‐10 mosquitoes. The chain rule allows you to answer the question of how many extra bites you get by opening/closing the window a little bit. So if you close the window by 2 square centimeters there would be 10 fewer mosquitoes ( ΔM = -­‐10 mosquitoes). That would lead to ΔB=5 fewer bites: ΔB = (dB/dM)*ΔM = 0.5 bites/mosquito)*(-­‐10 mosquitoes) The chain rule just puts these two steps together dB/dW = (dB/dM)*(dM/dW) = (0.5 bites/mosquito)*(5 mosquitoes/cm2) = 2.5 bites/cm2 Notice how the units tell the story. Notice that one thing that makes this example so simple is that both component functions are linear functions, making the derivative of each function into a constant, and that means that the derivative of the composition is also a constant. In this case you don’t need to worry about evaluating the derivatives at the proper base point.