The chain rule for higher derivatives

Carnegie Mellon University
Research Showcase @ CMU
Department of Mathematical Sciences
Mellon College of Science
1995
The chain rule for higher derivatives
Walter Noll
Carnegie Mellon University
Follow this and additional works at: http://repository.cmu.edu/math
This Technical Report is brought to you for free and open access by the Mellon College of Science at Research Showcase @ CMU. It has been accepted
for inclusion in Department of Mathematical Sciences by an authorized administrator of Research Showcase @ CMU. For more information, please
contact [email protected].
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
The copyright law of the United States (title 17, U.S. Code) governs the making
of photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.
The Chain Rule for
Higher Derivatives
by
Walter Noll
Department of Mathematics
Carnegie Mellon University
Pittsburgh, PA 15213
Research Report No. 95-184
November, 1995
510.6
C28R
95-184
The Chain Rule for Higher Derivatives
by Walter Noll, November 1995
1. Introduction.
Perhaps the most important theorem of elementary differential calculus is
the Chain Rule, It states, roughly, that the composite of two differentiable
functions is again differentiable, and it gives a formula for the derivative of this
composite. A Chain Rule of Order n should state, roughly, that the composite
of two functions that are n times differentiable is again n times differentiable,
and it should give a formula for the n'th derivative of this composite. Many
texbooks contain a Chain Rule of order 2 and perhaps 3, but I do not know of a
single one that contains a Chain Rule of arbitrary order n with an explicit and
useful formula. The main purpose of this paper is to derive such a formula.
In the last section, I will show how the Chain Rule of higher order can be
used to prove that the composite of two real-analytic functions is again realanalytic. The usual proof uses complex extensions of the real-analytic functions
and basic theorems of Complex Analysis. However, this usual proof cannot
easily be extended to the case when the real-analytic functions are replaced by
mappings between higher-dimensional spaces. The proof presented here can. In
fact, it will be used in the second volume of my book entitled "Finite-Dimensional
Spaces; Algebra, Geometry, and Analysis", now being written. (The first volume
appeared in 1987.)
In this paper, we need some precise and efficient notation and terminology,
which, unfortunately, has not yet become standard. (It is described in some
detail in Chapter 0 of Vol.1 of my book mentioned above.) We denote the set
of all natural numbers, with zero included, by IN. the set of all real numbers by
R , and the set of all positive real numbers, with zero included, by P . We use
IN x : = ]N\{0} for the set of all non-zero natural numbers and F x := P \{0} for
the set of all strictly positive real numbers. Given n £ IN we use the notation
n' := {k £ IN | 1 < k < n} for the set of all natural numbers from 1 to n. We
have 0' = 0. A subset J of R is called an interval if for all s,t £ J we have
[s,t] C J. The empty set and singletons are intervals; all other intervals are said
to be genuine. The cardinality of a finite set 5 is denoted by # 5 .
A family a := (a?- | / £ J) is specified by a procedure which assigns to each
i in a given set / , called the index set of a, an object a t , which is then called
a term of a. The family a should not be confused with its range {a, | i £ / } ,
which is the set of all terms of a. The set of all families indexed on I with terms
in a given set 5 is denoted by S1. If 5 contains a. zero-element 0 of some sort,
for example if 5 is IN, R , or P , we define the support of a given a £ S1 by
Supta := {i £ / | ax ^ 0 } .
(11)
The set of all families in S1 that have finite support will be denoted by
5 ( / ) := {a £ S1 | Supta is finite} .
(12)
Families indexed on IN or IN* are called sequences.
Assume now that an additive monoid P is given. This means that an
addition and a zero are defined for P such that the usual rules hold. Examples
are P := IN, R , and P . If / is a finite set and a G P1 is given, one can
unambiguously define the sum
]T?
Y
(13)
by using the associative and commatative laws of addition. If / is the empty set
we have E a = 0. The third notation of (13) is preferred in displayed formulas,
but the first and second are very useful in involved formulas. If the set I is not
finite but a G P^ 7) one can still define E a and use the notations (13) by putting
E a := ] T ( a , | i G Supta) .
(14)
Assume now that the given monoid P is a commutative multiplicative
monoid instead of an additive monoid. Examples are P := IN, R , P , IN X ,
and P x . Then the considerations of the preceding two paragraphs apply when
^2 i s replaced by f| in (13) and (14) and if 0 is replaced by 1 in the definition
(11) of support.
2. Patterns of partitions.
Recall that a partition of a given set 5 is a pair-wise disjoint collection of
non-empty subsets of 5 whose union is all of 5. The member-sets of a partitions
are called pieces.
Definition 1. The pattern n(V) G IV
finite set S is defined by
of a given partition V of a given
(n(V))i := #{/v G V | #/v = i} for all i G JVX
(21)
Roughly, the pattern of V tells us how many pieces of V have a given
cardinality.
Let a finite set 5 be given. It is clear that
(22)
for every partition V of 5. Given n G K , we use the notation
n)i:={p€lN(1N;X)|
Y, ' > ' = »}
ielN
9
TX
(23)
We now put n := # S . Then Un is the set of all possible patterns of partitions
of 5. Denote the set of all partitions of S by Part 5. Given p G II n , we use the
notation
\p : = #{V G Part S \p = ir(V)} .
(24)
for the number of all partitions of 5 that have the pattern p. This number
depends only on p G IN* \ which determines ?? = # S via (22); it does not
depend on the particular nature of 5 itself.
\ not zero, be given and choose k 6 W x such that
Lemma 1. Let p G J!V
pk ^ 0. Define q G JV(iNfX) by
Putting n : = £)(?>,- I * € ^r><)< u ' e ^ a v e £ ( * 9* I* £ ^ r>< ) =
A
77
~
(26)
«-
Proof : We choose a set 5 with n = # 5 . We then construct a partition of S as
follows. First we choose a subset K of 5 with k = #K and then a partition Q.
of S\K with pattern q as defined by (25). Then QU {A'} is a partition of 5 with
pattern p. Every partition of S with patterns p can be obtained in this manner.
Since there are (£) subsets A of 5 with #A~ = k and, given such a A', there
are Xq partitions of S\K with pattern q. we obtain partitions of 5 with pattern
p in (JJ)Ag different ways. However, since every partition of S with pattern p
contains pu pieces A' with #A' = k, the procedure described above produces
one and the same partitition of S in pu different ways. We conclude that (26) is
valid. |
Theorem 1. For all p G -EV
* we have
>Pi •
Proof: We use induction over /?. If ?? := 0 we must have p = ( 0 , . . . ) , the
sequence all of whose terms are zero. Then (27) is valid with A(o } = 1 because
the empty set has only one partition, namely the empty one, and because 0' = 0,
making the product in the denominator on the right side of (27) equal to 1.
Now let p G IN(
\ not zero, be given and put n := ]T(z px \ i G IN X ).
Assume that (27) becomes valid after p is replaced by a q G IN(
* G 1NX) < n, so that
g
= n(<iMW
\iem\)
•
]
with m : =
(2S)
We now choose k G IN x such that pk ^ 0 and we define q G JN(ES ) by (25).
Then in := n - A: < ?? and hence, by (26) of the Lemma and by (28), we have
_
/??\
1
m!
P
~ W
1
(n-fc)!
Ui!)1" | t € m l \ { f c } ) '
which easily yields (27), completing the induction. |
3. The Chain Rule of order n.
In this section, we assume that genuine intervals J and K and functions
/ : J —• K and g : K —> R are given, so that the composite g o / : J —• R
makes sense.
The most common form of the ordinary Chain Rule is this: if / and g are
differentiate, so is g o f and we have (g o / ) • = [g* o / ) /*. (We follow Newton
and denote derivatives by superscript dots rather than primes.) The following
is a preliminary form of the Chain Rule of order n.
L e m m a 2. Let n G IS be given. If both f and g are n times differentiable, so
is g o f and
{(Jo
/)<">=
Y, {9'*V) ° f) U f{*K) •
(3D
Proof: We proceed by induction. If ?? := 0 we have ??' = 0 and hence 0 is the
only member of Part n). Therefore, the product on the right side of (31) reduces
to 1 and (31) becomes(<y o / ) ( 0 ) = (g o f) = (#<0) o / ) .
Assume now that ?? G IN is given, that / and g are n + 1 times differentiate,
and that (31) is valid for the given /?. Then / . g{^p) and / ( * / v] are different iable
for all V G Part ??J and all 7v G V. Hence we can apply the ordinary Chain Rule
and the Product Rule to differentiate (31) and obtain
°nr
PGPart /?!
LG^
n
h'eT\{L}
Now, the partitions of (?? + 1)' can be classified as follows.
(1) The first class consists of partitions that contain {n + 1} as a piece. They
are of the form
V9 := V U {{?? + 1}} with V e Part ??] .
(2) The second class consists of partitions that do not contain {n + 1} as a piece.
They are of the form
V':=(V\{L})
U { L U { n +l } } with
4
VePartn)
a n dL e V .
The summands of the first sum on the right side of (32) are of the form
(</ (#n °/) n fi#K>)
when V belongs to the first class. The second term on the right side of (32)
can be written as a sum with summands that are also of the form (33), except
that V now belongs to the second class. Therefore, (32) reduces to (31) with n
replaced by n + 1, and the induction is complete. |
Theorem 2 (Chain Rule of order n). Let n G IV be given. If both f and g
are n times different iable, so is g o / and
(gof){n)=
where II n is defined by (23) and Xp is given explicitly by (27).
Proof: We observe that each summand on the right side of (31) depends on
the partition V only through its pattern TT(V). NOW, in view of (24). Xp is the
number of partitions of w having a given pattern p G II n . Therefore, (31) of
Lemma 2 reduces to (34) when equal terms in the summation are put together. |
4. Summations.
We assume that an index set / is given and we denote the collection of all
finite subsets of / by F i n / . Let a G P 7 be given. As was pointed out in Sect.l,
the sum E a is meaningful if / is finite or if a has finite support. Even if / is
infinite and a fails to have finite support, one can still assign a meaning to S a ,
possibly oc, by putting
So := sup{ Y^ «* IA ' € Fin/} G F := P U {oo} .
(41)
We say that a is summable if E a < oc. Most of the rules for summations
remain valid for sums in the extended sense of (41). For example, if V is a
partition of / , we have
Kev
If the index set is finite, the following proposition can be proved fairly easily
by induction over # / using the Binomial Theorem. The proposition is also valid
if / is infinite and if sums are defined by (41).
Proposition 1. Let n G IV and a G P1 be given and put
r , , : = { p e I V ( / ) | Zp = n} .
(43)
Then
The following results are variations of standard theorems involving "power
series".
TAfX
Proposition 2. Let a sequence a G P
be given. Theii the following conditions are equivalent:
(a) The set { %/oZ \ n G W x } is bounded.
(b) For some h e Px, the set {cinhn \ n G W x } is bounded.
(c) For some ft G P x , we have lim^^oo anhn = 0.
(d) For some ft G P x , the sequence (anhn \ n G W x ) is suinmabie.
Proposition 3. Assume that a givej^ sequence a G P ' satisfies one, and
hence all, of the conditions of Prop.2. Then we can choose ft G P x such that
the sequence {antn \ n G Nx) is summable for all t G [0,ft]and
lim Y
ant« = 0 .
(45)
4. Composition of analytic functions.
We assume now that non-empty open intervals J and A' and a C°°-function
/ : J
> K and x G J are given. For every /? G IN * and d G P x with
[x — d,x + d] C J, we use the notation
a n (/;aNd):=sup{|/ ( l l ) (.r + 6)| | ^ G [ - ^ r i ] } e P .
(51)
The supremum on the right side of (51) is actually a maximum and hence belongs
to P because all the derivatives of / are continuous.
Definition 2. We say that f is analytic near x G J if there is a d G P x with
[x — d,x + d] C J such that the sequence a := {-^an(f:x.d)
\ n G 1NX) satisfies
one. and hence all. of the conditions (a) - (d) of Prop.2 of Sect.4.
Assume that / is analytic near x and choose d satisfying the conditions of
Def.2. By (c) of Prop.2 we may choose /? G P x such that
lim an(f;x,d)—
?i-^oo
hn
77,!
=0.
(52)
In view of the notation (51), (52) ensures that the remainder term in the Taylorexpansion of / near x goes to zero and hence that
f(x + t) = f(x)+ lim Y f{k)(x)*—
(53)
n — oc
for all t G P% satisfying |<| < min{/?.c/}. One can prove, conversely, that / is
analytic in the sense of Def.2 if (53) holds for all t with |/| less than some strictly
positive number. Therefore, our Def.2 of analyticity is equivalent to the one
usually given in the textbooks.
We assume now that non-empty open intervals J and K and C^-functions
/ : J —> A" and g :—• R are given. By Thm.2, g o f :—> R is then also a
C°°-function.
6
Theorem 3 (Composition-Theorem for Analyticity). Let x G J be given
and put y := f(x). Assume that f is analytic near x and that g is analytic near
y. Then g o / is analytic near x.
Proof: Let d! G P x with [y — d1, y + d1] C A" be given. Since / is continuous at
x and since y = / ( # ) , we can choose d G P x with [x — d, x + d\ C J such that
|/(.r + s) — y | < d' for all s G [—d, d]. This means that for every s G [—d, </] there
is ^' G [—d',d'] such that f(x + s) = y + s*. Therefore, using the notation (51)
with / , x, d, and n replaced by g, y, d1, and m, respectively, we have
\(g(m) o / ) ( . : + 5)| = \g{m)(f(x
+ s))\ < *m(g\yJ)
(54)
for all 6 G [—d, c?] and rn G 1NX. We note that this statement remains valid if d
is replaced by any smaller number in P x .
Now let ?? G IN* be given. Using Thin.2, the inequality (54), and the
notation (51) we see that
\(gof){n)(.r
+ s)\<
is valid for all 6 G [—d,d]. Using the notation (51) again, this time with /
replaced by g o / , we conclude that
] T Ap a(Ep)(<j: y, d') J\ (a,-(/; ,r, c/))Pt .
an(gof;x.d)<
(55)
Now let t G P be given. Using the formula (27) of Thm.l aaid the fact that
t = n((*f')Pl> I * € nJ) for all p G II n , we conclude from (55) that
n
<
Using the notation (43) with / := IN x , it is evident that both {Tn \ n G IN x }
and {II n | 7? G N x } are partitions of 1N(INX ) \{(0,.. .)}• Summing (56) over
n G ]NX and using the formula (42) with
first with V replaced by {II n | n 6
{ r n | n e IN*}, we find that
<
IN TX }
and then again with V replaced by
Using Prop.l of Sect.4 with / := INx . it follows that
tn
a
^^y,df)^(
°n{gof;x,d)-< £
n€IN
X
^
tk\m
1/
]T <Tk(f;x,d)-) . (57)
™"
This inequality is valid for all d! € P x with [y - d\ y + d'] C A', all d G P x with
[x — d, x + d] C J that are small enough, and all / £ P . Since # is analytic near
x, we can, by Def.2, fix df such that the condition (d) of Prop.2 of Sect.4 is valid
for the sequence
| 777. €1NX) .
a:=(^(Jm(g^d')
Hence we may choose h' G P
x
such that
_ , „>'
oo .
(58)
Since / is analytic near ,r. we can. by Def.2. fix d £ P x such that not only (57)
holds but also Prop.3 of Sect.4 can be applied to the sequence
| k e INx ) .
a:=(-ak.(f:.v.d)
Hence we may choose /?. G P
x
such that
hk
Ck(f',x.cl)—
<hf .
(59)
Using (57) with the choice t := h it follows from (59) and (58) that
<rn(gof;x,d)—<
77!
V <Tm{g\y,d9)—r < oc .
m
'
x
Hence condition (cl) of Prop.2 of Sect.4 is satisfied for the sequence
a :=((jn(gof:x.d)^
\ n G INx ) .
By Def.2 it follows that g o / is analytic near x. |
APR 1 5 2004