Chain rules via multiplication - BR`s home page - BYU

Chain rules via multiplication
Bro. David E. Brown, BYU–Idaho Dept. of Mathematics. All rights reserved.
Version 0.44, of June 16, 2014
Answer to Exercise 2.1 corrected, minor edits made, and numbering of exercises mostly corrected on 2014-06-13.
Contents
1 Introduction
1
2 A fancier Calc I example
2
3 Chain rules for multivariate functions
3.1 Additional chain rules? . . . . . . . . . . . . . . . .
3.2 Isn’t there a better way to write down chain rules?
3.2.1 A more compact way of writing chain rules
3.2.2 A more flexible way of writing chain rules .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
5
5
6
4 “The” Chain Rule
7
5 Additional exercises
9
6 Answers, etc.
9
1
Introduction
You learned a “chain rule”
probably looked
0 for differentiating compositions of functions in dCalculus I. It
dg du
something like g(u(x)) = g 0 (u(x))u0 (x), which is sometimes shortened to dx
g(u(x)) = du
dx . The truth is
slightly more complicated: The chain rule is really
d
dg(u) du
g(u(x)) =
,
(1)
dx
du u=u(x) dx
but we’re usually too lazy to write all this, so we use one of the abbreviated versions given above.
You know how this goes: For example, if f (x) = sin3 x, you can think of g as being the cubing function
(that is, g(u) = u3 ), and u as being the sine function (i.e., u(x) = sin x). Then
g(u(x)) = (sin x)3 = sin3 x = f (x).
The chain rule (Equation (1)) says
df
dg du
d 3 du 2 =
=
u
=
3u
(cos x) = 3 sin2 x cos x.
dx
du u=sin x dx
du u=sin x dx
u=sin x
Back when my father learned calculus, this technique of differentiating “by substitution” was used quite
commonly, much as we integrate by substitution. Differentiation by substitution has gone out of style, but
1
it’s a valid technique. Feel free to use it, if it helps you with the chain rule. We will make good use of it in
this document.
The question at hand is what the chain rule looks like when there are more variables than x floating
around. We’ll sneak up on this question, by looking at a slightly more complicated Calc I example, to get
a sense for how additional variables might be handled. Then we’ll look at chain rules for partial derivatives
for functions of more than one variable. I say “rules” instead of “rule” because there are many chain rules.
Fortunately, we will combine them all into one mother-of-all-chain-rules. Along the way, we’ll examine spiffy
uses of symbols, to help keep the tedium down. Spiffy uses of symbols will include introducing matrices and
matrix multiplication at some point. And, as you have no doubt discerned, this little exposition is rather
informal. We can tackle formalities some other time.
2
A fancier Calc I example
Careful examination of a suitable example can provide a stepping stone to chain rules for multivariate
functions. Let’s try differentiating
f (x) = (ln x)2 + sin3 x.
Rather than treating this mindlessly as a Calc I problem, let’s examine it in some detail, deliberately
introducing additional variables to increase its value as a bridge to the multivariate case. So: f has two
terms: (ln x)2 and sin3 x. I want you to think of these two terms as u2 and v 3 , respectively, so that
f (x) = f (u, v) = u2 + v 3 .
(2)
This requires us to set
u = ln x
and
v = sin x.
We’re using a substitution—one that happens to have two parts to it. I call u and v intermediate variables
because f depends on u and v, which in turn depend on x and y—this means they are between f on the one
hand and x and y on the other.
To differentiate f with respect to x, we can start with the (ln x)2 term, think of it as u2 , and use the
Calc I chain rule, in a knee-jerk reaction sort of way:
1
2 ln x
d
(ln x)2 = (2 ln x)
=
.
dx
x
x
1
1
∂f du
Notice that the “2 ln x” bit is really ∂u in disguise, and
is actually dx . So (2 ln x)
is really
x
x
u=ln x
∂f du
∂u dx . Put this together and find that
u=ln x
d
(ln x)2 =
dx
! ∂f du
∂u u=ln x
dx
(3)
The differentiation of the other term is like unto it:
! d
∂f dv
3
2
sin x =
=
3v
(cos x) = (3 sin2 x)(cos x) = 3 sin2 x cos x.
dx
∂v v=sin x
dx
v=sin x
It’s the left-most equality that interests me here:
d
sin3 x =
dx
! ∂f dv
.
∂v v=sin x
dx
(Compare with Equation (3).)
Page 2
(4)
Here’s the punchline: We can get the entire derivative of f = (ln x)2 + sin3 x by adding the results of (3)
and (4) together:
! ! du
dv
df
∂f ∂f 2 ln x
=
+
=
+ 3 sin2 x cos x.
dx
∂u u=ln x
dx
∂v v=sin x
dx
x
I want you to focus on the leftmost equality in the above, which is
! ! du
dv
df
∂f ∂f =
+
.
dx
∂u u=ln x
dx
∂v v=sin x
dx
With your kind permission, I’ll abbreviate it as
df
∂f du ∂f dv
=
+
,
dx
∂u dx
∂v dx
(5)
always remembering to substitute in the u = ln x and v = sin x, as needed. (This abbreviation is customary.)
Hmm. . . Equation (5) looks suspiciously like two instances of the chain rule added together. That’s
because that’s exactly what it is: Differentiating the given f actually requires you to use the Calc I chain
rule twice, once for the u2 term and once for the v 3 term. (Go back and look at Equation (2) to remember
why I’m talking about u2 and v 3 .)
Exercise 2.0.1. Reproduce the logic above, so as to differentiate f = sinh2 x − ln(x2 ) with respect to x. In
the process, re-invent Equation (5). Hint: Think about “u2 ” and “ln v.”
3
Chain rules for multivariate functions
So Equation (5) is a chain rule; it uses two intermediate variables (u and v) and one independent variable
(x). What if there are two independent variables?
Let’s find out. Let f (x, y) = ln(2x + y) + cos(3x − y), and think about calculating ∂f
∂x . Here’s how it goes
if we indulge our Calc I knee-jerk reaction: Differentiate the ln(2x + y) term using the Calc I chain rule, but
remember to hold y constant during the differentiation:
∂
1
∂
2
ln(2x + y) =
(2x + y) =
.
∂x
2x + y ∂x
2x + y
Then do the same for the cosine term:
∂
cos(3x − y) = − sin(3x − y)
∂x
∂
(3x − y) = −3 sin(3x − y),
∂x
and combine the results to get
2
∂f
=
− 3 sin(3x − y).
∂x
2x + y
(6)
Problem is, the knee-jerk reaction skips the intermediate steps, which (a) keeps you from seeing what’s really
happening and therefore also (b) puts you at risk for making mistakes later.
So what are the missing steps? To see them, try setting u = 2x + y and v = 3x − y. Then
f = f (u, v) = ln u + cos v,
which makes the “
2
” bit equal to
2x + y
∂f ∂u ∂
(2x + y) = 2
∂x
, and the “− sin(3x−y)”’ bit equal to
u=2x+y
and
Page 3
∂
(3x − y) = 3.
∂x
∂v ∂x . Likewise,
v=3x−y
Then we can write Equation (6) as
∂u
∂v
2
1
∂f ∂f ∂f
=
− 3 sin(3x − y) =
(2) + sin(3x − y) (−3) =
+
.
∂x
2x + y
2x + y
∂u u=2x+y ∂x
∂v u=2x+y ∂x
Let’s abbreviate this as
∂f
∂f ∂u ∂f ∂v
=
+
.
∂x
∂u ∂x
∂v ∂x
(7)
df
This is just Equation (5), except we have to say “ ∂f
∂x ” instead of “ dx ” because x isn’t the only independent
variable anymore.
If you’re interested, here’s how it would look to differentiate f (x, y) = ln(2x + y) + cos(3x − y) using
Equation (7) without all lead-up I’ve written above: You’d start by saying, “Hmm. . . Let’s say u = 2x + y
and v = 3x − y and calculate
1
2
∂f ∂u ∂f ∂v
+
=
(2) + − sin(3x − y) (3) =
− 3 sin(3x − y).”
∂u ∂x
∂v ∂x
x−y
x−y
Exercise 3.0.2. Use the ideas of this section to calculate ∂f
∂y for f (x, y) = ln(2x + y) + cos(3x − y). In the
process, you should come up with a chain rule similar to (but not the same as) Equation (7). Hint: Use the
same u and v that I did.
3.1
Additional chain rules?
You might be wondering whether there is a different chain rule for every function. Fortunately, the answer
is, “nope.”
q
√
x
Exercise 3.1.1. Use Equation (7) to calculate ∂f
for
f
(x,
y)
=
xy
+
∂x
y . Hint: Choose u and v to be
functions that are “inside” other functions.
Exercise 3.1.2. Use the chain rule you created for Exercise 3.0.2 to calculate
∂f
∂y
for f (x, y) =
√
xy +
q
x
y.
So: A given chain rule may serve to differentiate more than one function—in fact, lots of functions.
Nevertheless, there are lots of chain rules. For example, suppose you have to differentiate
f = sinh(xz) + cosh(yz) + tanh(xyz)
with respect to z. If we let u = xz, v = yz , and w = xyz, the chain rule for
∂f
∂z
looks like this:
∂f ∂u ∂f ∂v
∂f ∂w
∂f
=
+
+
.
∂z
∂u ∂z
∂v ∂z
∂w ∂z
(8)
Exercise 3.1.3. Explain why the previous sentence is true.
Exercise 3.1.4. Write down chain rules for
Exercise 3.1.5. Go ahead and calculate
∂f
∂x
and
∂f ∂f
∂x , ∂y ,
∂f
∂y ,
and
∂f
∂z
for the current example.
for the current example, using the chain rules.
Exercise 3.1.6. Write down the chain rule for differentiating f (x, y) = sin(xy) cos(xy) with respect to x
and use it to calculate ∂f
∂x . The do the same for differentiation with respect to y. (Hint: There’s only one
intermediate variable in this example.)
Page 4
3.2
Isn’t there a better way to write down chain rules?
Fortunately, there is a more flexible and compact way of writing down chain rules than what we have so far.
To find out what it is, let’s put a chain rule under the microscope. How about the chain rule you should
have discovered while doing Exercise 3.0.2? It was
∂f ∂u ∂f ∂v
∂f
=
+
.
∂y
∂u ∂y
∂v ∂y
Hmm. . . This chain rule is a sum of products. . . and the first product uses u in both factors; the second uses
v. . . hmm. . . u is “first” and v is “second,” in some sense?. . . uh, two derivatives; two “components”. . . Is
this chain rule a dot product of some vectors or other?
∂f
and ∂f
Yes, actually. If we think of ∂u
∂v as being the first and second components of a vector, and if we
∂u
∂v
think of ∂y and ∂y as the first and second components of some other vector, we can write the chain rule as
the following dot product:
" # " #
∂u
∂f
∂f ∂u ∂f ∂v
∂y
+
= ∂u
(9)
·
∂f
∂v .
∂u ∂y
∂v ∂y
∂y
∂v
Clever, eh? (Wish I could take credit for it!)
Exercise 3.2.1. Write Equation (7) as a dot product of suitable vectors.
There is another way to write dot products. Some people (myself included) write Equation (9) like this:
" #
i ∂u
∂f ∂u ∂f ∂v h ∂f
∂f
∂y
+
= ∂u
(10)
∂v .
∂v
∂u ∂y
∂v ∂y
∂y
(Note the absence of the dot.) Means EXACTLY the same thing as Equation (9). Right now, it’s just
another way to write the dot product. Shortly, however, we will see that it’s a more powerful and more
flexible way of writing certain types of multiplication.
Exercise 3.2.2. Write Equation (7) in the same way as Equation (10).
Equation (10) has some advantages over Equation (9). One is that it has a nice, compact representation.
Another is that it can be extended to more complicated situations than writing a chain rule for a partial
derivative with respect to a single variable. Let’s take these in turn.
3.2.1
A more compact way of writing chain rules
h
#»
∂f
It is customary to use the symbol ∇f to stand for the vector1 ∂u
∂f
∂v
i
#»
. (The symbol ∇f is pronounced
“del f ,” “the gradient of f ,” “grad f ,” or even “nabla
" f#.”)
∂u
u
∂
for the column ∂y
Also, some people write ∂y
∂v . (Convince yourself that this makes sense. If you
v
∂y
u
∂
, so I’m just as likely to write
can’t, then ask somebody.) I’m too lazy to write ∂y
v
" #
∂u
∂(u, v)
∂y
for
∂v ,
∂y
∂y
u
instead. (The symbol ∂(u,v)
stands
for
the
partial
derivative
of
with respect to y. This is sloppiness
∂y
v
u
again, as it takes the column
, turns it into the row u v , puts a comma between the u and the v,
v
1 The gradient, being a “row” instead of a “column,” is not a vector, but a “covector.” It is very common to call the gradient
a vector, and at this point in your education, it’s even safe. So I will strive to resist the temptation to be picky about this.
Page 5
and changes the square brackets into round parentheses, to get the (u, v) in the “numerator” of the symbol
∂(u,v)
∂y . Shameful, but customary.)
With sloppy abbreviations like the above in hand, we can write the multiplication
" #
i ∂u
h
∂f
∂v
∂f
∂u
∂y
∂v
∂y
as
#» ∂(u, v)
,
∇f
∂y
so that the chain rule of Equation (10) is now
∂f
#» ∂(u, v)
= ∇f
.
∂y
∂y
(11)
Exercise 3.2.3. Write the chain rule of Equation (7) in the same format as Equation (11). Then write out
what it means in terms of a matrix multiplication and a dot product.
Exercise 3.2.4. Repeat the previous exercise for the chain rule of Equation (8).
3.2.2
A more flexible way of writing chain rules
The more compact method of writing chain rules is also more flexible. For example, the chain rule of
Equation (8) can be written as
∂f
#» ∂(u, v, w)
= ∇f
.
∂z
∂z
Likewise, if we need to differentiate f = x2 + y 2 + z 2 with respect to y, the chain rule is
∂f
#» ∂(u, v, w)
= ∇f
,
∂y
∂y
(I’m thinking of f as being u + v + w, with u = x2 , v = y 2 , and w = z 2 .)
#» ∂(u,v,w)
Exercise 3.2.5. Write out what “ ∂f
” means and calculate it, assuming f = x2 + y 2 + z 2 , as
∂y = ∇f
∂y
above.
Exercise 3.2.6. Write down the chain rule for
chain rule means, and calculate it.
∂f
∂x ,
again assuming f = x2 + y 2 + z 2 ; write out what this
Example 3.2.7. The density ρ of the water at a point under the surface of the ocean depends on the
temperature T , the depth d, and the salinity s at that point. The temperature and the salinity both depend
on the depth. Write down the chain rule for finding out how the density changes with depth, and interpret
it.
Fine: ρ depends on T , d, and s. We can express this fact as ρ = ρ(d, s, T ) (keeping everything in
alphabetical order, for the sake of good bookkeeping). Likewise, T = T (d) and s = s(d). On the other hand,
d is just d. (You can say d = d(d) if you like, but it seems pretty silly.)
The chain rule for how density changes with depth is (by analogy with Equation (7))
∂ρ
#» ∂(d, s, T )
= ∇ρ
.
∂d
∂d
Page 6
#»
We can calculate this by realizing that ∇ρ = ρd

ρs
ρT and
∂(d,s,T )
∂d
  
dd
1
=  sd  =  sd . Putting all this
Td
Td
together, we get that
∂ρ
#» ∂(d, s, T ) = ∇ρ
= ρd
∂d
∂d

ρs
ρT

1
 sd  = ρd + ρs sd + ρT Td .
Td
Notice that the rightmost expression clearly shows that density depends on depth (directly, without regard to
salinity or temperature—that’s the “ρd ” part), but that density also depends on salinity and temperature,
which in turn depend on depth. Of course, we could have said all that with words (just did!), but the
statement
∂ρ
= ρd + ρs sd + ρT Td
∂d
is cleaner, easier to read, and just plain more elegant. Moreover, this statement shows how the dependence
of salinity and temperature are incorporated into the dependence of density on temperature. Heh! Try doing
that in words!
∂ρ
and ρd
The foregoing example points out a shortcoming in our notation. We have said elsewhere that ∂d
mean the same thing. Yet, in the example above, they don’t! What’s going on here?
Well, in the “ρs sd + ρT Td ” part of our answer, we treated s and T like intermediate variables. But in the
∂ρ
“ρd ” part, we did not! In the “ρd ” part, we held s and T fixed. It’s as though we were saying “ ‘ ∂d
’ means
‘The partial derivative with respect to d, treating s and T as intermediate variables when it suits us,’ ” while
saying “ ‘ρd ’ means ‘The partial derivative of ρ with respect to d, holding s and T constant.’ ” The science
and engineering community believe they have a cure for this problem, but in my excessively autistic way, I
don’t believe they’re “cure” does the job.2
4
“The” Chain Rule
Our sloppy symbols are actually flexible enough to allow us to put all the partial derivatives of a function in
#» ∂(u,v)
one place. For example, Equation (11) and its cousin ∂f
from Exercise 3.2.3 give us chain rules
∂y = ∇f
∂y
for the first partial derivatives of some function f with respect to x and y, respectively. We can combine
these two chain rules into one happy equation, like so:
ux uy
∂f
= fu fv
,
(12)
vx vy
∂(x, y)
f
fv as being the gradient
that is, if we can agree on what the symbols mean. You’ll recognize
the row u
u
uy
of f , though it’s laying down on the job. The boxy thing x
is called the “Jacobian matrix of f .”
vx vy
∂(u, v)
. Using this symbol allows us to
The symbol for the Jacobian matrix is the hopefully unsurprising
∂(x, y)
write our chain rules together as
∂f
#» ∂(u, v)
= ∇f
.
(13)
∂(x, y)
∂(x, y)
Fine, but how does this symbol stand for the two chain rules combined?
Think of the right-hand side of Equation (12) as a multiplication.3 To produce one of the chain rules
∂(u,v)
properly, this multiplication has to include multiplying the gradient by ∂(u,v)
is the left column
∂x . Since
∂x
of the Jacobian matrix, the multiplication required includes taking the dot product of the gradient with
2 If
you want to know what the “cure” is, take a look at pages 844–846 of the text.
no symbol between the gradient and the Jacobian matrix, and writing things next to each other with no symbol
between has meant multiplication since 5th or 6th grade, yes? So, it’s a multiplication.
3 There’s
Page 7
∂(u,v)
∂x .
Likewise, we need the dot product of the gradient with ∂(u,v)
∂y , to get the other chain rule. So the
multiplication in Equation (12) is a pair of dot products. Specifically, it’s
ux uy
fu fv
= fu ux + fv vx fu uy + fv vy .
vx v y
If you like, you can write this as
fu
fv
ux
vx
h
#»
uy
= ∇f
vy
I prefer to write it as
fu
Cleaner, yes? So now we can write
fv
ux
vx
#»
∇f
∂(u,v)
∂y
i
.
#» ∂(u, v)
uy
= ∇f
.
vy
∂(x, y)
∂f
#» ∂(u, v)
= ∇f
,
∂(x, y)
∂(x, y)
as in Equation (13). As a bonus, the symbol
Equation (11) and with
∂(u,v)
∂x
∂(u,v)
∂(x,y)
is consistent with symbols like the
∂(u,v)
∂x
we used in
∂f
∂(x,y) .
So: Equation (13) is really Equation (12), in disguise. We will call
∂f
∂(x,y)
the total derivative of f.
Example 4.0.8. Let’s calculate the total derivative of f = sinh(x2 − y 2 ) cos(x2 + y 2 ). To do so, I suggest
letting u = x2 − y 2 and v = x2 + y 2 . Then f is f = sinh u + cos v, which shows how f depends on u and
v; bear in mind that these two intermediate variables depend on x and y. Hmm. . . Sounds like a job for
Equation (13):
∂f
#» ∂(u, v)
= ∇f
∂(x, y)
∂(x, y)
ux uy
= fu fv
vx vy
2x
= cosh(x2 − y 2 ) cos(x2 + y 2 ) − sinh(x2 − y 2 ) sin(x2 + y 2 )
2x
h
= 2x cosh(x2 − y 2 ) cos(x2 + y 2 ) − 2x sinh(x2 − y 2 ) sin(x2 + y 2 )
−2y
2y
i
−2y cosh(x2 − y 2 ) cos(x2 + y 2 ) − 2y sinh(x2 − y 2 ) sin(x2 + y 2 )
(The last expression is supposed to be a row, but it didn’t fit, so I put the first entry on one line and the
second entry on the following line.)
Exercise 4.0.9. Calculate the total derivative of f = sin(x + y) cos(x − y).
Exercise 4.0.10. What would Equation (13) look like if f = sin(x2 + y 2 + z 2 )?
We now finally arrive at “the” chain rule. I want a nice way to write it. To create a nice way, note first
#»
that all our chain rules include the symbol ∇f . But the symbol for the Jacobian is different from one context
to another, depending on how many intermediate variables there are, and how many independent variables.
I will get around this problem by using the symbol J to stand for the Jacobian. Likewise, the symbol for
the total derivative depends on how many independent variables there are. I will get around this problem
by using the symbol Df for the total derivative.
Here is the long-awaited chain rule:
#»
Df = ∇f J.
(14)
Page 8
Not very dramatic, perhaps, but this one equation now includes all the chain rules there are in the universe,
from Calc I on up.
You may be interested to know that suitable use of matrix multiplication can extend the chain rule to
situations in which there are variables between the intermediate variables and the independent variables.
I also note in passing "that if you
# " want
# your total derivative to be a genuine vector (as opposed to a row
∂v
∂u
∂f
#»
∂x
∂x
∂u
= J T ∇f ; this is what you get when you transpose the matrices in
matrix), you can use ∂u
∂v
∂f
∂y
∂y
∂v
Equation (14) and reverse the order of multiplication.
5
Additional exercises
Exercise 5.0.11. Use Equation (7) and the chain rule you invented in Exercise 3.0.2 to calculate the first
partial derivatives of f (x, y) = xy + x/y.
Until I can get some more exercises written, look in your Calculus text, in the section on “chain rules.”
They’ll talk about “branch diagrams” or “tree diagrams” for helping you with the bookkeeping. That’s fine,
but try working your textbook’s examples using the methods of this document, and see if you get the same
answers as the book does. You’d better!
6
Answers, etc.
Exercise 2.0.1. Let u = sinh x and v = x2 . Then f = u2 + ln v, so that:
du
∂f = 2u|u=sinh x = 2 sinh x,
= cosh x
∂u u=sinh x
dx
1 1
dv
∂f =
= 2,
and
= 2x.
∂v v=x2
v v=x2
x
dx
The derivative of the u2 term is therefore
∂f ∂u u=sinh x
du
= (2 sinh x)(2x) = 4x sinh x,
dx
and the derivative of the ln v term is
1
∂f dv
= 2 (2x) = 2/x.
∂v v=x2 dx
x
Put the pieces together to get
df
∂f ∂f 2
du
dv
=
+
= 4x sinh x + .
dx
∂u u=sinh x dx
∂v v=x2 dx
x
∂f dv
∂f du
2
Note: In practice, people usually think in terms of ∂u
dx + ∂v dx and substitute in the sinh x and the x ,
as needed. Their work typically looks like this, on paper:
∂f
1
2
= (2 sinh x) (2x) +
(2x) = 4x sinh x + .
2
∂x
x
x
Exercise 3.0.2. Knee-jerk reaction: Differentiate the ln(2x + y) term using the Calc I chain rule, but
remember to hold x constant during the differentiation:
∂
1
∂
1
ln(2x + y) =
(2x + y) =
.
∂y
2x + y ∂y
2x + y
Page 9
Then do the same for the cosine term:
∂
cos(3x − y) = − sin(3x − y)
∂y
∂
(3x − y) = − sin(3x − y),
∂y
and combine the results to get
∂f
1
=
− sin(3x − y).
∂y
2x + y
What we’ve done here is
you’d invent.
∂f
∂y
=
∂f ∂u
∂u ∂y
+
∂f ∂v
∂v ∂y .
This is the chain rule similar to Equation (7) that I hoped
Exercise 3.1.1. Let u = xy and v = xy . Then f =
1
∂f
1
= √ = √ ,
∂u
2 xy
2 u
√
u+
1
∂f
1
= √ = q =
∂v
2 v
2 xy
r
√
v, so
y
,
4x
∂u
= y,
∂x
∂v
1
= .
∂x
y
and
Equation (7) now says
∂f ∂u ∂f ∂v
y
1
∂f
=
+
= √ +
∂x
∂u ∂x
∂v ∂x
2 xy y
r
y
=
4x
r
1
y
+√
.
4x
4xy
Exercise 3.1.2. The chain rule you created for Exercise 3.0.2 should have been
We can use
r
1
∂f
∂f
y
= √
and
=
∂u
2 xy
∂v
4x
from the previous exercise. But instead of
∂u
=x
∂y
∂u
∂x
and
∂v
∂x ,
∂f
∂y
=
∂f ∂u
∂u ∂y
+
∂f ∂v
∂v ∂y .
we need
∂v
x
= − 2.
∂y
y
and
Then
∂f ∂u ∂f ∂v
1
∂f
=
+
= √ x+
∂y
∂u ∂y
∂v ∂y
2 xy
r
y
4x
x
− 2
y
r
=
x
−
4y
r
x
.
4y 3
Exercise 3.1.3. Well, f depends on u, v, and w, all of which depend on z, but in different ways. So
∂f ∂f
the contributions to ∂f
∂z that u, v, and w all make have to be accounted for separately. The term ∂u ∂z term
∂f ∂v
∂f ∂w
describes the dependence of f on z, via u, and likewise for the terms ∂v ∂z and ∂w ∂z . Adding the three
terms together gives the total dependence of f on z.
∂f ∂u
∂f ∂w
∂f ∂v
Exercise 3.1.4. ∂f
∂x = ∂u ∂x + ∂w ∂x . The term ∂v ∂x is missing, because v = yz does not depend on x.
∂f
∂f ∂v
∂f ∂w
∂f ∂u
Also, ∂y = ∂v ∂y + ∂w ∂y . The term ∂u ∂y is missing, because u = xz does not depend on y. (If you prefer,
it’s missing because ∂u
∂y = 0.)
∂f ∂u
∂f ∂w
Exercise 3.1.5. ∂f
∂x = ∂u ∂x + ∂w ∂x = z cosh(xz) + yz sech(xyz)
∂f
∂f ∂v
∂f ∂w
∂y = ∂v ∂y + ∂w ∂y = z sinh(yz) + xz sech(xyz)
∂f
∂z
=
∂f ∂u
∂u ∂z
+
∂f ∂u
∂u ∂z
+
∂f ∂w
∂w ∂z
= x cosh(xz) + y sinh(yz) + xy sech(xyz)
Exercise 3.1.6. Let u = xy. Then f = sin u cos u, and
∂f
df ∂u
=
= −y sin2 u + y cos2 u = y(cos2 xy − sin2 xy) = y(cos 2xy − 1).
∂x
du ∂x
Page 10
Likewise,
∂f
df ∂u
=
= −x sin2 u + x cos2 u = x(cos2 xy − sin2 xy) = x(cos 2xy − 1).
∂y
du ∂y
"
∂f
∂u
∂f
∂v
# "
#
∂u
∂x
∂v
∂x
Exercise 3.2.1.
∂f
∂x
Exercise 3.2.2.
∂f
∂x
=
Exercise 3.2.3.
∂f
∂x
h
#»
∂f
= ∇f ∂(u,v)
=
∂x
∂u
=
h
∂f
∂u
·
∂f
∂v
i
"
∂u
∂x
∂v
∂x
#
∂f
∂v
i
"
∂u
∂x
∂v
∂x
"
#
=
∂f
∂u
∂f
∂v
# "
·
∂u
∂x
∂v
∂x
#
. Oh. This looks a lot like what we
wrote for the previous two exercises! (Sorry about the repetition. I wanted to drive home the point that
we’re just writing the same thing in three different ways.)
Exercise 3.2.4.
Exercise 3.2.5.
∂f
∂y
#»
= ∇f
∂(u,v,w)
∂y
means
∂f
#» ∂(u, v, w)
= ∇f
∂y
∂y
= fu
= 1
 
uy
fv fw  vy  ,
wy
 
0
1 1 2y 
0
which we calculate as
= 1(0) + 1(2y) + 1(0)
= 2y
Exercise 4.0.9. Let u = x + y and v = x − y, so that f = sin u cos v. Then
#
"
∂u
∂f
#» ∂(u, v) h ∂f ∂f i ∂u
∂x
∂y
= ∇f
= ∂u ∂v
∂v
∂v
∂(x, y)
∂(x, y)
∂x
∂y
1 1
= cos u cos v − sin u sin v
= cos u cos v − sin u sin v cos u cos v + sin u sin v .
1 −1
Exercise 4.0.10. Let u = x2 , v = y 2 , and w = z 2 , so that f = sin(u + v + w). Then the total derivative
of f is


∂u
∂u
∂u
h
i
∂x
∂y
∂z
∂f
#» ∂(u, v, w)
∂v
∂v 
∂f
∂f
∂f  ∂v
= ∇f ·
= ∂u
∂y
∂z 
∂v
∂w  ∂x
∂(x, y, z)
∂(x, y, z)
∂w
∂w
∂w
∂x
= cos(u + v + w)
cos(u + v + w)
∂y

2x
cos(u + v + w)  0
0
0
2y
0
2z cos(u + v + w)
= 2x cos(u + v + w) 2y cos(u + v + w)
= 2x cos(x2 + y 2 + z 2 ) 2y cos(x2 + y 2 + z 2 )
Page 11
∂z

0
0
2z
2z cos(x2 + y 2 + z 2 )
∂f ∂u
∂f ∂v
x
Exercise 5.0.11. Equation (7) says ∂f
∂x = ∂u ∂x + ∂v ∂x . If we let u = xy and v = y , then f = u + v,
and
∂f
∂f ∂u ∂f ∂v
1
1
=
+
= (1)(y) + (1)
=y+ .
∂x
∂u ∂x
∂v ∂x
y
y
Similarly, in Exercise 3.0.2, you should have found that
∂f
∂y
=
∂f ∂u
∂u ∂y
+
∂f ∂v
∂v ∂y .
This implies that
x
∂f
∂f ∂u ∂f ∂v
x
=
+
= (1)(x) + (1) − 2 = x − 2 .
∂x
∂u ∂x
∂v ∂x
y
y
Page 12