the final version

MA10192: Mathematics I
2011-2012 semester 1
Contents
1 Functions and equations
1.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Exponentials and logarithms . . . . . . . . . . . . . . . .
1.3 Trigonometric functions . . . . . . . . . . . . . . . . . . .
1.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Solving simple trigonometric equations . . . . . . .
1.3.3 Inverse trigonometric functions . . . . . . . . . . .
1.3.4 Polar coordinates . . . . . . . . . . . . . . . . . . .
1.3.5 Harmonic form . . . . . . . . . . . . . . . . . . . .
1.3.6 Solving more complicated trigonometric equations
1.4 The bisection method . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
9
9
10
11
12
14
15
16
2 Differentiation
2.1 Definition . . . . . . . . . . . . .
2.2 Notation . . . . . . . . . . . . . .
2.3 Derivatives of standard functions
2.4 Rules for differentiation . . . . .
2.4.1 Product rule . . . . . . .
2.4.2 Quotient rule . . . . . . .
2.4.3 Chain rule . . . . . . . . .
2.5 Derivatives of inverse functions .
2.6 Parametric differentiation . . . .
2.7 Implicit differentiation . . . . . .
2.8 Maxima and minima . . . . . . .
2.9 Asymptotes . . . . . . . . . . . .
2.9.1 L’Hopital’s rule . . . . . .
2.10 Newton’s method . . . . . . . . .
2.11 Taylor polynomials . . . . . . . .
2.12 Numerical differentiation . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
20
20
23
24
24
25
27
28
29
30
35
36
37
38
42
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Integration
44
3.1 Antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Estimating area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Definition (optional) . . . . . . . . . . . . . . . . . . . . . . . . . 47
1
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
The fundamental theorem of calculus . . . . . . . . . . .
Rules for integration . . . . . . . . . . . . . . . . . . . .
3.5.1 The substitution rule . . . . . . . . . . . . . . . .
3.5.2 Integration by parts . . . . . . . . . . . . . . . .
Anti-derivatives of inverse functions . . . . . . . . . . .
Anti-derivatives of rational functions by partial fractions
Anti-derivatives of trigonometric rational functions . . .
Trigonometric substitution . . . . . . . . . . . . . . . . .
Hyperbolic substitution . . . . . . . . . . . . . . . . . .
Area between curves . . . . . . . . . . . . . . . . . . . .
Numerical integration . . . . . . . . . . . . . . . . . . .
Improper integrals . . . . . . . . . . . . . . . . . . . . .
4 Ordinary differential equations
4.1 Separable equations . . . . . . . . . . . .
4.2 Linear equations . . . . . . . . . . . . . .
4.2.1 Integrating factors . . . . . . . . .
4.2.2 Variation of parameters . . . . . .
4.2.3 Method of undermined coefficients
4.3 Numerical methods . . . . . . . . . . . . .
5 Functions of several variables
5.1 Partial derivatives and extrema . .
5.2 Lagrange multipliers and boundary
5.3 The chain rule . . . . . . . . . . .
5.4 Exact differential equations . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
50
51
52
54
55
56
65
66
67
69
70
73
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
76
76
78
79
81
82
84
. . . . .
extrema
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
89
89
92
94
95
Chapter 1
Functions and equations
1.1
Polynomials
The simplest kind of functions are linear ones. The formula is1
y = ax + b,
and geometrically this is a straight line (with slope a and y-intercept b). Finding
the point of intersection of the straight lines y = ax + b and y = cx + d leads to
the linear equation
ax + b = cx + d.
We rewrite this as
(a − c)x = d − b
and note that if a 6= c this can be solved as
x=
d−b
,
a−c
whereas if a = c and d 6= b there is no solution (geometrically: the lines are
parallel and not equal) and if a = c and d = b every x is a solution (geometrically:
the lines are equal). We find the y-coordinate of the point of intersection by
substituting the found x in either formula:
y=a
d−b
+ b.
a−c
Given a line L and a point P we can define a parabola as the set of points
(x, y) such that the distance of (x, y) to P is the same as the distance of (x, y)
to L. If we for example take P = (0, 1), then the distance of (x, y) to (0, 1)
1 Actually, this formula gives what is more properly called an affine function, the function is
linear only if b = 0. In the one variable case this misuse of terminology is very widespread. But
do remember the distinction when studying linear functions of several variables in Mathematics
II.
3
p
is (by Pythagoras Theorem) x2 + (y − 1)2 . And if we take L to be the line
y = −1, then the distance of (x, y) to this line is |y + 1| (i.e. the absolute value
of y + 1). So for the parabola defined by P = (0, 1) and the line L given by
y = −1 we obtain the formula
p
x2 + (y − 1)2 = |y + 1|.
Squaring both sides and simplifying gives the formula y = 41 x2 . We will not go
further into the geometry of parabolas in this course. Suffice it to say that the
formula y = ax2 + bx + c always defines a parabola and that any parabola may
be written in this way by choosing the coordinate axis in the right way (the
crucial thing for the latter is to take the x-axis parallel to the given line).
Finding the point of intersection of parabolas or a parabola and a straight
line gives rise to a quadratic equation
ax2 + bx + c = 0.
We will tacitly assume that a 6= 0 since otherwise this really is a linear equation,
which we already saw how to solve. We first look at the special case where b = 0,
a = 1. We rename −c = z. We then get the equation
x2 = z.
Geometrically it is clear that if z > 0, then there are two solutions, one positive
and one negative: it is also clear the√the negative solution is minus the positive
solution. The positive one is called z. Note that we haven’t really solved this
equation, we simply noted that there is a solution
√ and have given that solution
a name. For certain values of z we know
what
z is; for example if z = 4, then
√
its square root is 2. But for example 2 cannot be simplified; it is not a whole
number or a quotient of whole numbers.
We now reduce the general quadratic equation to the special quadratic equation x2 = z. We have
!
2
b
−b2 + 4ac
2
ax + bx + c = a x +
+
2a
4a2
so that the solutions of ax2 + bx + c = 0 are
√
−b ± b2 − 4ac
x=
.
2a
This is the well-known quadratic formula. Note that this expresses the solution
of the general quadratic equation ax2 + bx + c = 0 in terms of those of a special
quadratic equation, namely x2 = z (which has square roots as its solutions).
From the graph of the function y = x3 it is obvious that the equation
3
x = z has exactly one solution for any real number z. Similarly as before
with quadratic equations we may ask the question whether the general cubic
equation ax3 + bx2 + cx + d = 0 can be solved in terms of cube roots (solutions
4
of x3 = z) and maybe square roots as well. This can indeed be done, but the
formulas are not very nice. Quartic equations (ax4 + bx3 + cx2 + dx + e = 0) can
be solved as well using only roots. Surprisingly, it stops there: quintic equations
cannot be solved in terms of roots (a specific example is x5 − x + 1 = 0). This
is the content of the Abel-Ruffini theorem.
We return to the equation x2 = z. As said, for z > 0 this equation has two
solutions. For z = 0 obviously only zero is a solution. For z < 0 there are again
two solutions, but this are not real numbers but complex numbers. You might
know them from A-level, they will appear in MA10193: Mathematics II.
1.2
Exponentials and logarithms
You are undoubtedly familiar with the exponential function ax with a > 0.
There are various ways of formally defining these functions. The most elementary and most useful at this point is the following. An exponential function is
a continuous function f such that f (0) = 1 and such that for all x and y the
following holds
f (x + y) = f (x)f (y),
(1.1)
i.e. exponential functions are those that translate addition into multiplication.
The continuity assumption is made to rule out pathological examples and we
will not worry too much about it (intuitively a continuous function is a function
whose graph you can draw without taking your pen from the paper2 ).
It can be shown that an exponential function is completely characterized by
its value at some point different from zero. This point is usually taken to be
one and the exponential function is then written as ax with a the value at one.
The formula (1.2) then becomes
ax+y = ax ay ,
(1.2)
which should be familiar. We can’t go into the details of this (existence and
uniqueness of exponential functions), but note that ax certainly makes sense
for x a nonnegative integer
and since az/2 az/2 = az by (1.2) with x = y = z/2
√
1/2
we must have a
= a, and we already encountered square roots. Similarly,
for all rational numbers x (i.e. fractions, i.e. quotients of integers ax can be
defined in terms of roots. It can then be shown that ax is continuous and it can
√
subsequently be extended by continuity to all irrational numbers (such as 2
and π) as well. We will really only be interested in solving equations involving
exponential functions (and later in differentiating them and integrating them).
So don’t worry if this was a bit above your head, the formula (1.2) is something
that you should definitely keep in mind though. Another one to remember is
y
(ax ) = axy ,
(1.3)
2 The precise defintion is as follows: f is continuous at the point p if for all ε > 0 there
exists a δ > 0 such that if p − δ < x < p + δ, then f (p) − ε < f (x) < f (p) + ε. A function is
continuous if it is continuous at all points where it is defined.
5
i.e. powers are turned into products. Note that if y is a positive integer, then
this easily follows from (1.2) (it follows in the general case as well, but not
easily).
Assuming that a > 0 and a 6= 1, the equation
ay = x
can be solved for y if x > 0 and such a solution y is unique (remember the
graph of the exponential function). This solution y is called the a-logarithm of
x: loga x. So by definition of the logarithm we have for all x > 0
aloga x = x.
(1.4)
loga ax = x.
(1.5)
We also have (now for all x):
The equations (1.4) and (1.5) mean that the exponential and logarithm are
inverses of each other. We can use the identities (1.2) and (1.3) to obtain the
following identities for logarithms:
loga (xy) = loga x + loga y,
loga (xy ) = y loga x,
(1.6)
so the logarithm turns products into sums and powers into products.
If you’ve seen one exponential function then you’ve seen them all because of
the equation
bx = cx logc b ,
which we obtained by using b = clogc b from (1.4), then taking both sides of
that equation to the power x and using (1.3). So if we would only know how
to compute c-logs and c-exponentials, then we can still compute b-exponentials
for any b. You’ve probably used the corresponding formula for logarithms
logb x =
logc x
logc b
when using your calculator (which probably only computes 10-logarithms and elogarithms). So we really only need to consider the exponential and logarithmic
function for one fixed base. For several reasons the base e (≈ 2.71828) is the
‘best’ base. The reason you will probably find most convincing is that the
exponential function ax has a derivative equal to a constant times itself and
that only for a = e this constant equals one (so that ex is the only exponential
function that is its own derivative). We will come back to this when we discuss
differentiation. The natural logarithm loge at A-level is often denoted by ln
and I will use this notation as well. But be warned that many people write log
when they mean the natural logarithm (though your calculator probably uses
the symbol log for log10 ).
Now, let us solve equations.
6
Problem 1. The following expression is an integer, which integer?
95/2 · 813/4
.
272/3
Solution. We use the exponential rules (1.2) and (1.3) in the following computation:
38
95/2 · 813/4
(32 )5/2 · (34 )3/4
35 · 33
=
= 36 = 729.
=
=
32
32
272/3
(33 )2/3
Problem 2. Solve for x:
53x = 2.
Solution. There are at least three ways of solving this problem. They give
solutions that look different, but are actually the same.
The first method is the following.Take the log5 on both sides to obtain, using
(1.5),
3x = log5 2
and divide by 3 to isolate x:
log5 2
.
3
x
The second method is as follows. Note that 53x = 53 = 125x by (1.3) so that
the equation is equivalent to 125x = 2, which by the definition of the logarithm
has solution x = log125 2.
The third method, which I prefer since it gives the answer in terms of natural
logarithms goes as follows. Take the natural logarithm of both sides of the
equation to give
ln 53x = ln 2.
x=
Now use the logarithmic rule (1.6) to rewrite this as
3x ln 5 = ln 2.
From this we solve x as
ln 2
.
3 ln 5
Using any of these three methods of solving the equation is fine.
x=
Problem 3. Solve for x:
ln 5x = 2.
Solution. Take e. on both sides to obtain, using the definition of logarithm (1.4),
5x = e2
and divide by 5 to isolate x:
x=
7
e2
.
5
Problem 4. Solve for x:
ln e3x = 7.
Solution. The left-hand side is 3x (using (1.5)), so x = 7/3.
Problem 5. Solve for x:
ln (x + 2) + ln (x + 1) = 5.
Solution. Take e. on both sides to obtain
eln (x+2)+ln (x+1) = e5 .
Now use the exponential rule (1.2) to obtain
eln (x+2) · eln (x+1) = e5 .
Use the relation between exponential and logarithm (1.4) to obtain
(x + 2)(x + 1) = e5 .
Write this second order equation in standard form
x2 + 3x + 2 − e5 = 0.
Use the quadratic formula
x=
−3 ±
p
9 − 4(2 − e5 )
,
2
and simplify a bit
√
1 + 4e5
.
2
If we choose the minus sign, then x is not in the domain of the function
ln (x + 2) + ln (x + 1) that we started with (this domain consists of all numbers larger than -1). So only the plus sign gives a valid solution:
√
−3 + 1 + 4e5
x=
.
2
x=
−3 ±
Problem 6. Solve for x:
4x + 2x − 2 = 0.
Solution. Rewrite as
(2x )2 + 2x − 2 = 0.
Substitute u = 2x :
u2 + u − 2 = 0.
Rewrite
(u + 2)(u − 1) = 0.
So u = −2 or u = 1. So we have to solve 2x = −2 and 2x = 1. The first has no
solution, the second the unique solution x = 0.
8
Problem 7. Suppose the function f (x) = cax satisfies f (1) = 3, f (4) = 5. What
are c and a?
Solution. We get the equations
ca4 = 5.
ca = 3,
We eliminate c by taking the quotient:
ca4
5
= a3 = .
ca
3
So a =
5 1/3
.
3
Substitute back to obtain
c
so
1/3
5
= 3,
3
−1/3
5
c=3
.
3
Problem 8. Prove, using the exponential laws that for all x, y > 0
ln
x
= ln x − ln y.
y
Solution. Take e. on both sides gives the equivalent
x
= eln x−ln y = eln x · e− ln y .
y
First, we have
1 = e0 = eln y · e− ln y = y · e− ln y ,
so that e− ln y = y1 . So we indeed have
1
eln x · e− ln y = x .
y
1.3
1.3.1
Trigonometric functions
Definitions
We now define the trigonometric functions sine and cosine in a maybe at first
sight slightly odd way. Consider the unit circle (i.e. all pairs (x, y) such that
x2 + y 2 = 1). Let t be some number greater than or equal to zero. Starting
from the point (1, 0) walk along the unit circle counter clock-wise for length t.
9
Define cos t as the x-coordinate of the point that you arrive at and sin t as the
y-coordinate of the point that you arrive at. If t < 0 do a similar thing but now
walk clockwise for length −t. The so defined functions obviously have certain
properties:
Periodicity sin(t + 2π) = sin t, cos(t + 2π) = cos t,
Sine is odd sin(−t) = − sin t,
Cosine is even cos(−t) = cos t,
Pythagoras cos2 t + sin2 t = 1.
There are also the following, not so obvious but important, addition formulas:
cos(s + t) = cos s cos t − sin s sin t.
sin(s + t) = sin s cos t + cos s sin t,
Note that the addition formulas imply the following properties (which are geometrically reasonably obvious):
• sin(π − t) = sin t,
• sin( π2 + t) = cos t,
• cos( 3π
2 + t) = sin t.
We define the third trigonometric function, the tangent, as
tan t =
sin t
.
cos t
It is possible to define the tangent geometrically, but the above definition will
do for our purposes. The other trigonometric functions secant, cosecant and
cotangent can also be defined geometrically, but I will ignore these functions in
this course.
1.3.2
Solving simple trigonometric equations
It seems that at A-level trigonometric equations only have to be solved on a given
finite interval. I will virtually always ask you to find all solutions. For example,
if I ask you to solve sin x = 0 for x, then I expect the answer x = kπ with k
an integer. I might occasionally ask you to solve sin x = 0 for −3π ≤ x ≤ 2π
and then the correct answer would be x = −3π, −2π, −π, 0, π, 2π. This can be
deduced from the general case by noting that x = kπ with k an integer is in the
desired interval if and only if k = −3, −2, −1, 0, 1, 2.
Problem 9. Solve cos x =
1
2
for x.
Solution. One solution is x = π3 . Since the cosine is even, another solution
is x = − π3 . Geometrically it is obvious that these are the only two solutions
that satisfy −π ≤ x ≤ π (one full period). Since the cosine is 2π-periodic all
solutions are given by x = π3 + 2kπ, x = − π3 + 2nπ, where k and n are arbitrary
integers.
10
Problem 10. Solve cos 2x =
1
2
for x.
Solution. We substitute u = 2x to reduce this problem to the previous one:
cos u = 21 . From problem 9 we know that u = π3 + 2kπ, u = − π3 + 2nπ. Now
substitute back to obtain 2x = π3 + 2kπ, 2x = − π3 + 2nπ. Solving for x gives
x = π6 + kπ, x = − π6 + nπ with k and n are arbitrary integers.
Note that we reduced to problem involving cos 2x to a ‘standard’ problem
involving cos u with u = 2x and after finding u solved for x. This is usually
the best way to proceed: it is very easy to make mistakes with trigonometric
functions of different periods than the standard 2π.
Problem 11. Solve sin x =
1
2
for x.
Solution. One solution is x = π6 . Using the identity sin(π − x) = sin x we see
that π − π6 = 5π
6 is another solution. Geometrically it is obvious that these are
the only two solutions that satisfy 0 ≤ x ≤ 2π (one full period). Since the sine
is 2π-periodic all solutions are given by x = π6 + 2kπ, x = 5π
6 + 2nπ, where k
and n are arbitrary integers.
A often made mistake in problems such as problem 11 (involving the sine)
is that a wrong second solution is found. Please look at the geometry of the
problem to avoid this.
1.3.3
Inverse trigonometric functions
The unique solution of cos x = 21 in the interval 0 ≤ x ≤ π can be expressed
in terms of known numbers (namely π3 ). However, for the unique solution of
cos x = 13 no such simple expression exists. Since ‘the unique solution of cos x =
1
3 in the interval 0 ≤ x ≤ π’ is a bit long there is a shorthand for this. For all y
with −1 ≤ x ≤ 1, the equation
cos x = y
has a unique solution x in the interval 0 ≤ x ≤ π. This solution is called the
arccosine of y: arccos y. Often the notation cos−1 y is used, but I will never do
this since it is easily confused with cos1 y , which is a different thing altogether.
I will actually later in these notes use the notation cos−1 y for cos1 y since it is
consistent with e.g. cos2 y meaning (cos y)2 , so be advised! Note that instead
of the interval 0 ≤ x ≤ π we could have picked any other interval of length π
such that cos x = y has a unique solution of that interval, the interval [0, π] is
an arbitrary but customary choice.
Problem 12. Solve cos x =
1
3
for x.
Solution. By definition x = arccos 31 is a solution. Since the cosine is even,
x = − arccos 13 is also a solution. These are the only solutions on the interval
(−π, π], which is one period. So all solutions are given by x = arccos 13 + 2kπ,
x = − arccos 13 + 2nπ, where k and n are arbitrary integers.
11
3.2
2.4
1.6
0.8
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-0.8
Figure 1.1: The graph of the arccos
Similarly we define the analogue for the sine. For all x with −1 ≤ y ≤ 1, the
equation
sin x = y
has a unique solution x in the interval − π2 ≤ x ≤ π2 . This solution is called
the arcsine of y: arcsin y. Note that the interval on which the arcsine takes its
values is different from that on which the arccosine takes its values.
2
1.5
1
0.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-0.5
-1
-1.5
-2
Figure 1.2: The graph of the arcsin
For applications in integration the inverse of the tangent will turn out to be
extremely important. For all y, the equation
tan x = y
has a unique solution x in the interval − π2 ≤ x ≤ π2 . This solution is called the
arctangent of y: arctan y. Note that whereas the arccosine and arcsine are only
defined on the interval [−1, 1], the arctangent is defined on the whole axis.
1.3.4
Polar coordinates
Instead of of using cartesian coordinates it is sometimes convenient to use other
coordinates. Polar coordinates are the most common example of such other
coordinates. Polar coordinates describe a point in the plane by an angle and a
12
2
1.5
1
0.5
-5
-2.5
0
2.5
5
-0.5
-1
-1.5
-2
Figure 1.3: The graph of the arctan
radius. This is how it works. Consider the circle centered at the origin through
the point P that you want the coordinates of. Denote the radius of this circle
by r, that is the first polar coordinate (the radius). For the second coordinate
measure the length of the circle arc you traverse when going from the point
(r, 0) to P and divide this by r. This gives you a number θ in between zero and
2π, that is the second polar coordinate (the angle).
1.5
1
0.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
-0.5
-1
-1.5
Figure 1.4: The Cartesian grid
Every pair (r, θ) with r > 0 and 0 ≤ θ < 2π (note the strict and the nonstrict inequality) determines a unique point in the plane and conversely each
point in the plane except the origin determines such a pair. The origin is special
since its angle is not uniquely determined.
It is easy to compute the cartesian coordinates of a point given its polar
coordinates: (x, y) = (r cos θ, r sin θ). This easily follows from the definition
of the sine and cosine. Going p
the opposite way is slightly more complicated.
The radius is easy enough r = x2 + y 2 by Pythagoras, but the angle is a bit
complicated. First remember that |z|, the absolute value of z is the distance of
the point z on the line to zero (so |z| = z if z ≥ 0 and |z| = −z of z < 0). The
13
0.5π
0.75π
π
0.25π
1
10
1.25π
2π
100
1.75π
1.5π
Figure 1.5: The polar grid
|y|
|y|
angle of (x, y) equals arctan |x|
if the point lies in the first quadrant, π−arctan |x|
|y|
if the point lies in the second quadrant, π + arctan |x|
if the point lies in the
|y|
if the point lies in the fourth quadrant. It is
third quadrant and 2π − arctan |x|
better to draw a picture each time than to try to memorize these formulas for
the angle.
Problem 13. Write the point with radius 4 and angle
nates.
3π
2
in Cartesian coordi-
3π
Solution. The x-coordinate is 4 cos 3π
2 = 0, the y-coordinate is 4 sin 2 = −4.
Problem 14. Write (5, −3) in polar coordinates.
p
√
Solution. The radius is 52 + (−3)2 = 34. The find the angle: draw a picture!
The angle θ that we are after satisfies θ = 2π − ϕ, where ϕ satisfies
tan ϕ =
3
.
5
So the angle that we are after is 2π − arctan 53 .
1.3.5
Harmonic form
Sometimes we will want to solve equations like 3 sin t + 5 cos t = 2. For this it
helps to write the sum of the sine and cosine of the same period as a phaseshifted cosine, i.e. given A and B to find r and θ such that
−A sin t + B cos t = r cos(t + θ).
14
We use the addition formula for the cosine to expand cos(t + θ). This gives
r cos(t + θ) = r cos t cos θ − r sin t sin θ.
Comparing this to −A sin t + B cos t we see that we will need to choose r and θ
such that
A = r sin θ,
B = r cos θ.
Note that this is nothing else than writing (B, A) in polar coordinates. So we
have −A sin t + B cos t = r cos(t + θ) if we choose r the radius and θ the angle
of the point (B, A). Please note the minus sign in front of A!
Problem 15. Find r and θ such that 3 sin t + 5 cos t = r cos (t + θ) for all t.
Solution. We have seen that we should take r, √
θ as the polar coordinates of
(5, −3). These we computed in problem 14: r = 34, θ = 2π − arctan 53 .
Problem 16. Solve 3 sin t + 5 cos t = 2 for t.
Solution.
√
We use that 3 sin t + 5 cos t = r cos (t + θ) with r = 34, θ = 2π − arctan 35 that
√
we showed in problem 15. So we must solve 34 cos(t + 2π − arctan 35 ) = 2.
This gives
3
2
cos(t + 2π − arctan ) = √ .
5
34
2
2
3
3
t + 2π − arctan = arccos √ + 2kπ, t + 2π − arctan = − arccos √ + 2nπ.
5
5
34
34
So
3
3
2
2
t = −2π +arctan +arccos √ +2kπ, t = −2π +arctan −arccos √ +2nπ.
5
5
34
34
Here k and n are arbitrary integers.
1.3.6
Solving more complicated trigonometric equations
The following problems are slightly more difficult since they involve solving
basic trigonometric equations, trigonometric identities and solving polynomial
equations.
This first one is relatively straightforward since it does not involve using
trigonometric identities.
Problem 17. Solve cos2 x + cos x − 2 = 0 for x.
Solution. Substitute u = cos x. Then the equation becomes
u2 + u − 2 = 0.
This can be factored as
(u + 2)(u − 1) = 0.
15
So the solutions are
u = −2, u = 1.
So we have to solve
cos x = −2, cos x = 1.
The first of these equations has no solutions. The second has as solutions
x = 2nπ with n any integer.
The following problem is somewhat more complicated (and can be done in
several ways).
Problem 18. Solve cos2 2x + 2 sin2 x − 1 = 0 for x.
Solution. The first objective is to write cos2 2x in terms of sin x so to obtain
a polynomial equation in sin x. Using the addition formula for the cosine we
have cos 2x = cos2 x − sin2 x and subsequently using Pythagoras gives cos 2x =
1 − 2 sin2 x. It follows that cos2 2x = 1 − 4 sin2 x + 4 sin4 x. So the equation to
be solved is equivalent to
4 sin4 x − 2 sin2 x = 0.
Substituting u = sin2 x this becomes 4u2 − 2u = 0, which factors as 4u(u − 21 ) =
−1
0. So we obtain u = 0 or u = 12 . This gives sin x = 0, sin x = √12 or sin x = √
.
2
3π
π
π
Solving this for x gives x = kπ, x = 4 + 2nπ, x = 4 + 2mπ, x = − 4 + 2pπ,
x = − 3π
4 + 2qπ, where k, n, m, p, q are arbitrary integers.
1.4
The bisection method
You may have gotten a different impression at A-level (and in this course up to
now...), but it is a rare occasion when equations can actually be solved explicitly.
Usually one has to be satisfied with showing that there are solutions (and how
many there are) and with only approximating the solutions. The following
theorem (the intermediate value theorem) is useful in showing that there are
solutions: a continuous function f on the interval [a, b] takes on all values in
between f (a) and f (b), i.e. for all y with f (a) ≤ y ≤ f (b) (or f (b) ≤ y ≤ f (a)
depending on whether f (a) or f (b) is larger) there exists a c with a ≤ c ≤ b
such that f (c) = y.
Problem 19. Show that x3 − 12x + 2 = 0 has a solution with 0 ≤ x ≤ 1.
Solution. With f (x) = x3 − 12x + 2 we have f (0) = 2 and f (1) = −9. By the
intermediate value theorem, f takes on all values in between, in particular the
value zero. So there indeed exists a solution x with 0 ≤ x ≤ 1.
We can use the intermediate value theorem to approximate the unique solution of x3 − 12x + 2 = 0 with 0 ≤ x ≤ 1. This is called the bisection method.
We evaluate x3 − 12x + 2 in x = 1/2 (the middle of the interval [0, 1]). This
16
gives −3.875, the only important thing is that this value is negative so the intermediate value theorem tells us that there exists a solution in [0, 12 ]. Next we
evaluate the function in 14 (the middle of the new interval), look whether this
value is positive or negative and determine from that whether our solution is in
the interval [0, 14 ] or in the interval [ 14 , 12 ]. In this way we continue halving the
size of the interval, so get a better and better approximation of the solution.
It’s a bit much work to do this by hand, but computers are good in this sort of
mindless number crunching.
√
Problem 20. Approximate 2 using the bisection method. Start with an interval
whose endpoints are integers and continue until you can be sure that you have
the first 3 digits correct.
Solution. We use the bisection method to numerically approximate the solution
of x2 = 2. A reasonable interval to start with seems the interval [1, 2] since
12 = 1 < 2 and 22 = 4 > 2. We obtain the following table using the bisection
method:
x
1
2
1.5
1.25
1.375
1.4375
1.40625
1.421875
1.4140625
1.41796875
x2
1
4
2.25
1.5625
1.8906
2.0664
1.9775
2.0217
1.9996
2.0106
interval
[1,2]
[1,1.5]
[1.25,1.5]
[1.375,1.5]
[1.375,1.4375]
[1.40625,1.4375]
[1.40625,1.421875]
[1.4140625,1.421875]
[1.4140625,1.41796875]
So after bisecting 8 times we know for sure that the first three digits of
are 1.41.
17
√
2
Chapter 2
Differentiation
2.1
Definition
The idea behind the derivative is that f 0 (x), the derivative of f at the point
x, is the slope of the tangent line to the graph of f at x. A tangent line is bit
of tricky geometrical object, so we will instead initially consider chords to the
graph: i.e. a line that connects two points on the graph. In figure 2.1 a function
and two chords which both originate from the same point x are drawn.
10
7.5
5
2.5
-0.4
0
0.4
0.8
1.2
1.6
2
2.4
2.8
3.2
3.6
4
-2.5
Figure 2.1: A graph and two chords
When the point to which the chord is drawn comes closer to x, the chord
approaches the tangent line to f at x. So it seems reasonable to look at the
slope of such chords and look what happens if we take the point to which the
chord is drawn closer and closer to x. The line that connects (x, f (x)) and
(x + h, f (x + h)) has slope
f (x + h) − f (x)
.
h
(2.1)
We take for f 0 (x) the ‘natural value to complement this expression’ for h = 0
(note that (2.1) is meaningless for h = 0). This may sound complicated and
18
confusing, but when applied to an example it turns out to not be too difficult.
We consider f (x) = x2 and we want to know the derivative. Now (2.1) equals
(x + h)2 − x2
x2 + 2xh + h2 − x2
2xh + h2
=
=
= 2x + h.
h
h
h
Note that the left-hand side is only defined for h 6= 0, but the right-hand side
is perfectly well-defined for h = 0. So once we have made these simplifications
it is obvious that the ‘natural value to complement this expression’ for h = 0 is
2x. So the derivative of x2 is 2x (and you probably recall that this is what your
A-level teacher said is was). The notation that we use for the ‘natural value to
complement (2.1)’ for h = 0 is:
f (x + h) − f (x)
.
h
lim
h→0
In general the notation
lim g(z) = v
z→w
(read as: the limit of g(z) as z tends to w is v) means that v is the natural value
for the expression g(z) at w.1
The derivative may not always exist. Consider the example of the absolute
value |x| (remember that this equals −x if x < 0 and x if x ≥ 0). See figure 2.2
for its graph. It is obvious that (2.1) equals 1 if h > 0 and −1 if h < 0. This
means that there is no altogether obvious value for the limit.2 So we leave the
derivative undefined. Note that this makes geometric sense: there is no tangent
line to this curve at x = 0.
5
2.5
-5
-2.5
0
2.5
5
Figure 2.2: The absolute value function
1 Of course this is still very imprecise. The precise definition is as follows: for all ε > 0
there exists a δ > 0 such if |z − w| ≤ δ then |g(z) − v| ≤ ε.
2 For the ones who read the previous footnote. Suppose that there does exist a limit v with
the desired properties. Then, choosing ε = 1/2, there should exist a δ > 0 such that |z| ≤ δ
implies |g(z) − v| ≤ 1/2, where g(z) = −1 if z < 0 and g(z) = 1 if z > 0. Now use the triangle
inequality |g(δ) − g(−δ)| ≤ |g(δ) − v| + |g(−δ) − v| to conclude that |g(δ) − g(−δ)| ≤ 1. But
|g(δ) − g(−δ)| = 2. This contradiction shows that our assumption that the limit v exists was
wrong. So the limit does not exist.
19
Problem 21. Find the tangent line of f (x) = x2 + 4 in the point (1, 5).
Solution. We saw that f 0 (x) = 2x, which means that f 0 (1) = 2 so that the slope
of the desired tangent line is 2. The tangent line has to pass through the given
point. The unique line with these properties is y = 2(x − 1) + 5.
2.2
Notation
There are various notations in use for derivatives. The derivative of f is denoted
by
df
f 0 , f˙, , f (1) , Df.
dx
The first of these is due to Lagrange, the second one is due to Newton and is
mainly used when the independent variable represents time, the third one is due
to Leibniz and is very useful in suggesting that the derivative is some kind of
quotient, the fourth and fifth ones are mainly useful when dealing with higher
order derivatives. The second derivative (i.e. the derivative of the derivative) is
denoted by
d2 f
f 00 , f¨, 2 , f (2) , D2 f.
dx
Here you see an advantage of the last three notations over the first two, this is
even clearer when dealing with for example the tenth derivative.
2.3
Derivatives of standard functions
In the first section we already calculated the derivative of x2 . Using the binomial
theorem it similarly follows that the derivative of xn is nxn−1 for n a nonnegative
integer. Actually that formula is true for all numbers n. We will
√ now just show
it for n = 1/2, i.e. we will show that the function f (x) = x has derivative
1
f 0 (x) = 2√
. In this case we have
x
f (x + h) − f (x)
=
h
√
x+h−
h
√
x
.
The trick is to multiply this fraction by 1 written in somewhat strange form:
√
√ √
√
x+h− x
x+h+ x
√
√ ,
h
x+h+ x
carry out the multiplication of the numerators and note that some terms cancel:
√ √
√ √
h
1
x+h−h+ x x+h− x x+h
√
= √
√
√ =√
√ .
h( x + h + x)
h( x + h + x)
x+h+ x
This last expression makes perfect sense for h = 0 and equals
indeed the derivative.
20
1
√
.
2 x
So this is
We now turn to exponential function. For f (x) = ax we have
f (x + h) − f (x)
ax ah − ax
ah − 1
=
= ax
.
h
h
h
So we see that the derivative of ax equals a constant depending on a but not
h
on x (namely limh→0 a h−1 ) times the function ax itself. The number e is the
h
unique number for which this constant equals one. So limh→0 e h−1 = 1. We
h
will see later on what this constant limh→0 a h−1 is for general a.
We now determine the derivative of the sine function f (x) = sin x. We first
determine the derivative in zero. Note that the sine is zero in zero, so that we
are interested in
sin h
.
lim
h→0 h
We use some geometry to show that this limit is 1.
For the moment assume that h satisfies 0 < h ≤ π2 . Consider the point
P = (cos h, sin h) and the right-angled triangle with vertices P , (cos h, 0) and
(1, 0). By Pythagoras we have: distance of P to (cos h, 0) ≤ distance of P to
(1, 0). Since a straight line is the shortest path between two points we also have:
distance of P to (1, 0) ≤ h, the length of the circle arc centered at the origin
through P and (1, 0). So distance of P to (cos h, 0) ≤ h, i.e.
sin h ≤ h.
(2.2)
Now consider the triangle with vertices vertices (1, tan h), (1, 0) and the origin.
Also consider the slice of the unit disc with vertices (cos h, sin h), (1, 0) and the
origin. Since the disc slice is contained in the triangle we have: area disc slice
h
≤ area triangle. The disc slice is 2π
of the whole unit disc (which has area π),
h
so its area is 2 . The area of the triangle is 12 tan h. So we have
h
tan h
≤
.
2
2
Multiplying this by the positive number
2 cos h
h
cos h ≤
sin h
.
h
(2.3)
sin h
≤ 1.
h
(2.4)
gives
Combining (2.2) and (2.3) gives
cos h ≤
We have only shown this for 0 < h ≤ π2 . However, since the sine function is
odd and the cosine is even, the same inequality must hold for − π2 ≤ h < 0. See
figure 2.3.
As h → 0 the left-hand side of (2.4) converges to 1. So we must have
sin h
h → 1 as h → 0. So the derivative of the sine in zero is equal to 1.
21
3
2
1
-5
-4
-3
-2
-1
0
1
2
3
4
5
-1
-2
-3
Figure 2.3: The functions 1,
sin x
x
and cos x.
Next we obtain the derivative of the cosine in zero. So we want to know
what
cos h − 1
lim
h→0
h
is. We have by the double angle formula cos h = 1 − 2 sin2 (h/2). So
sin (h/2)
cos h − 1
= − sin (h/2)
.
h
h/2
Using that limz→0
sin z
z
= 1 and sin 0 = 0 we have
lim
h→0
cos h − 1
= 0.
h
So the derivative of the cosine in zero is zero.
We now use the addition formulas to obtain the derivative of the sine at an
arbitrary point. We have
sin (x + h) − sin x
sin x cos h + cos x sin h − sin x
cos h − 1
sin h
=
= sin x
+cos x
.
h
h
h
h
Letting h → 0 and using the calculated values for the derivatives of the sine and
the cosine in zero we see that the derivative of sin x is cos x.
Similarly, for the cosine we have
cos x cos h − sin x sin h − cos x
cos h − 1
sin h
cos (x + h) − cos x
=
= cos x
−sin x
,
h
h
h
h
so that the derivative of cos x is − sin x.
Problem 22.
Find the second, third and fourth derivatives of cos x.
Solution.
f 0 (x) = − sin x, f 00 (x) = − cos x, f (3) (x) = sin x, f (4) (x) = cos x.
22
Problem 23. Find the derivative of sin 2x from first principles.
Solution. You probably know what this derivative is from the chain rule that
you learned at A-level. Since I asked for “first principles” you’re not allowed to
use the chain rule however.
We write down the limit that we need to compute
sin (2x + 2h) − sin 2x
sin 2(x + h) − sin 2x
= lim
,
h→0
h→0
h
h
lim
use the addition formula
= lim
h→0
sin 2x cos 2h + cos 2x sin 2h − sin 2x
,
h
split the fraction in two
= sin 2x lim
h→0
sin 2h
cos 2h − 1
+ cos 2x lim
,
h→0
h
h
rewrite to get 2h everywhere
= 2 sin 2x lim
h→0
cos 2h − 1
sin 2h
+ 2 cos 2x lim
,
h→0 2h
2h
and substitute t = 2h to get limits that we know from the results in this section
= 2 sin 2x lim
t→0
cos t − 1
sin t
+ 2 cos 2x lim
.
t→0 t
t
These limits we know, the first one is 0 and the second one is 1. So the derivative
of sin 2x is 2 cos 2x.
2.4
Rules for differentiation
We usually don’t use the definition of differentiation to compute derivatives,
but use certain rules. The first rule is rather easy and obvious: the derivative
of a sum is the sum of the derivatives:
(f + g)0 (x) = f 0 (x) + g 0 (x).
Problem 24. Compute the derivative of x2 + x.
Solution. With f (x) = x2 and g(x) = x we have f 0 (x) = 2x and g 0 (x) = 1 by
the results in the previous section. So the derivative of x2 + x is 2x + 1.
23
2.4.1
Product rule
The situation for a product is not so straightforward. It turns out that
(f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
(2.5)
Since it is not so obvious that this should be the case, we will prove it. We are
interested in
f (x + h)g(x + h) − f (x)g(x)
.
(2.6)
h
We rewrite this as
f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)
h
and split this fraction in two
f (x + h) − f (x)
g(x + h) − g(x)
g(x + h) + f (x)
.
h
h
(2.7)
From the equality of (2.6) and (2.7), the product rule (2.5) follows by letting
h → 0.
Problem 25. Calculate the derivative of x2 sin x.
Solution. Use the product rule to obtain
2x sin x + x2 cos x.
Problem 26. Calculate the derivative of ex cos x.
Solution. Use the product rule to obtain
−ex sin x + ex cos x.
2.4.2
Quotient rule
The quotient rule is
0
f
g(x)f 0 (x) − g 0 (x)f (x)
(x) =
.
g
g(x)2
The proof is again based on rewriting.
Problem 27. Calculate the derivative of tan x.
24
Solution. We have
sin x
,
cos x
and we already know the derivatives of the numerator and denominator. So we
can use the quotient rule.
tan x =
cos x cos x + sin x sin x
d
tan x =
.
dx
cos2 x
Note that this may be rewritten as either cos12 x or 1 + tan2 x, depending on how
we simplify the fraction.
Problem 28. Calculate the derivative of
sin x
x .
Solution. Use the quotient rule to obtain
x cos x − sin x
.
x2
Problem 29. Calculate the derivative of
ex
x sin x .
Solution. First use the product rule to calculate the derivative of the denominator
(x sin x)0 = x cos x + sin x.
Use this and the quotient rule to calculate the derivative of the given function.
ex x sin x − ex (x cos x + sin x)
d ex
=
.
dx x sin x
x2 sin2 x
2.4.3
Chain rule
The chain rule is
d
y(u(x)) = y 0 (u(x)) u0 (x).
dx
The chain rule is not so easy to rigorously prove. In the notation of Leibniz the
chain rule is
dy
dy du
=
,
dx
du dx
and seems obvious, but this just proves that the notation of Leibniz is good for
this purpose; it doesn’t show that the chain rule is true.
The chain rule can be extended in the obvious way to chains of more than
two functions. For example in the case of three functions we have
dy
dy dv du
=
.
dx
dv du dx
25
Note that I left out where these functions have to be evaluated. This is shorter,
but can cause confusion. The appropriate way to write it is
dy
dy
dv
du
(v(u(x))) =
(v(u(x))) (u(x)) (x).
dx
dv
du
dx
Problem 30. Calculate the derivative of e−x .
Solution. Apply the chain rule with y(u) = eu , u(x) = −x; then
du
dx = −1 so
dy
dy du
=
= −e−x .
dx
du dx
dy
du
= eu ,
Problem 31. Calculate the derivative of ecx
Solution. Apply the chain rule with y(u) = eu , u(x) = cx; then
so
dy
dy du
=
= cecx .
dx
du dx
dy
du
= eu ,
du
dx
=c
Problem 32. Calculate the derivative of ax .
x
Solution. We have ax = eln a
= ex ln a ; so with c = ln a in the previous
problem we have
d x
a = ln a × ax .
dx
Problem 33. Calculate the derivative of sin 2x.
Solution. Apply the chain rule with y(u) = sin u, u(x) = 2x; then
du
dx = 2 so
dy
dy du
=
= 2 cos 2x.
dx
du dx
dy
du
= cos u,
Problem 34. Calculate the derivative of (x3 + 1)7 .
Solution. Apply the chain rule with y(u) = u7 , u(x) = x3 + 1; then
du
2
dx = 3x so
dy
dy du
=
= 21x2 (x3 + 1)6 .
dx
du dx
Problem 35. Calculate the derivative of cos2 x.
26
dy
du
= 7u6 ,
Solution. Apply the chain rule with y(u) = u2 , u(x) = cos x; then
du
dx = − sin x so
dy
dy du
=
= −2 cos x sin x.
dx
du dx
dy
du
= 2u,
Problem 36. Calculate the derivative of cos (x2 ).
Solution. Apply the chain rule with y(u) = cos u, u(x) = x2 ; then
du
dx = 2x so
dy du
dy
=
= −2x sin (x2 ).
dx
du dx
dy
du
= − sin u,
2
Problem 37. Calculate the derivative of esin (x ) .
Solution. Apply the extended chain rule with y(v) = ev , v(u) = sin u, u(x) = x2 ;
du
v dv
then dy
dv = e , du = cos u, dx = 2x so
2
dy
dy dv du
=
= esin (x ) cos (x2 )2x.
dx
dv du dx
2.5
Derivatives of inverse functions
If y(x) has an inverse function, i.e. y = y(x) can be solved as x = x(y), and is
differentiable, then the inverse function is differentiable and for the derivative
of the inverse we have
1
dx
= dy .
dy
dx
Note that again the Leibniz notation is very suggestive. We now use this formula
to compute the derivatives of the inverse functions of functions of which we
previously computed the derivatives. Note that we want dx
dy as a function of y.
Problem 38. Compute the derivative of x = y 1/3 .
Solution. Of course you may use here that the derivative of xn is nxn−1 , but
here we will use the formula for the derivative of inverse functions.
dy
We have y = x3 , so dx
= 3x2 . So
dx
1
1
= 2 = 2/3 .
dy
3x
3y
Problem 39. Compute the derivative of x = ln y.
27
Solution. We have y = ex , so
dy
dx
= ex . So
1
dx
1
= x = .
dy
e
y
Problem 40. Compute the derivative of x = arcsin y.
dy
dx
Solution. We have y = sin x, so
= cos x. So
dx
1
=
.
dy
cos x
p
2
We
p want to express this in terms of y. We have cos x = ± 1 − sin x =
2
± 1 − y . The question is whether we should have the plus or the minus
sign. For this we have to go back to our definition of arcsin. Remember that
x = arcsin y is the unique solution of sin x = y with − π2 ≤ x ≤ π2 . For
these values
p of x we have cos x ≥ 0, so we need the plus sign. So we have
cos x = 1 − y 2 . That gives
dx
1
=p
.
dy
1 − y2
Similary,
d
dy
arccos y = √−1 2 .
1−y
Problem 41. Compute the derivative of x = arctan y.
Solution. We have y = tan x, so
dy
dx
= tan2 x + 1. So
1
1
dx
=
= 2
.
dy
y +1
tan2 x + 1
2.6
Parametric differentiation
Sometimes curves are not given as a function y(x), but as pairs (x(t), y(t)), i.e.
both coordinates are functions of a parameter t. An example is the unit circle
which is given by x(t) = cos t, y(t) = sin t; but cannot be written as the graph
dy
in terms of the
of a function y(x). The following formula gives the slope dx
derivatives ẋ and ẏ with respect to the parameter t:
dy
ẏ
= .
dx
ẋ
Writing ẏ and ẋ in Leibniz notation and presuming that one may operator with
these expressions as if they are fractions gives an indication (but no proof) of
the fact that this formula is indeed valid.
28
Problem 42. Find the slope at the point (1, 0) of the curve parameterized by
x(t) = cos t + t, y(t) = t4 + t.
Solution. We have
dy
ẏ
4t3 + 1
= =
dx
ẋ
− sin t + 1
We investigate to which parameter value the given point corresponds. The ycoordinate gives t(t3 + 1) = 0, so t = 0 or t = −1. For t = 0 we get the correct
x-coordinate, for t = −1 we don’t. So t = 0 is the parameter value. So the slope
in the given point equals
4 × 03 + 1
= 1.
− sin 0 + 1
2.7
Implicit differentiation
Curves sometimes come in the form of an equation relation x and y from which
dy
y cannot (easily) be solved. It is however still possible to compute the slope dx
at a given point on the curve through what is called implicit differentiation. A
formal formulation of this requires differentiation of functions of several variables
(which we will do only later), but on examples it is not to difficult too figure
out what to do. We illustrate the method on the simple example of the unit
dy
circle. So we have the equation x2 + y 2 = 1 and we want to know that dx
is.
What we do is differentiate both sides of the equation with respect to x (that
of course gives another identity) and keep in mind that y is a function of x. For
the unit circle that gives:
d
d 2
(x + y 2 ) =
1,
dx
dx
and taking the derivatives gives (using
2x + 2y
This equation can be solved for
dy
dx
dy 2
dx
=
dy 2 dy
dy dx ):
dy
= 0.
dx
as
dy
x
=− .
dx
y
√
Using that for the√upper half of the unit circle we have y = 1 − x2 and for the
lower half y = − 1 − x2 it can be checked that this is in both cases the right
formula for the slope.
q
Problem 43. Calculate the slope at the point p = ( 3 π2 , π2 ) of the curve x3 y +
sin y = 2.
29
Solution. Implicit differentiation gives
3x2 y + x3
dy
dy
+
cos y = 0.
dx dx
Substituting the point gives
2/3
2 dy
dy
2
π
+
(p) +
(p) × 0 = 0.
3
π
2
π dx
dx
So the slope is
−3
2.8
2/3 π 4/3
π 2
2
= −3
.
π
2
2
Maxima and minima
An important application of derivatives is that it allows us to find maxima and
minima of functions. But before we go into computing these things lets first
define them properly and say something about when they exist.
The function f has a local maximum at x = c if f (c) ≥ f (x) for all point
x close to c. It is a global maximum if this holds for all x for which f is
defined. The function f has a local minimum at x = c if f (c) ≤ f (x) for all
point x close to c. It is a global minimum if this holds for all x for which f is
defined. Extremum is another word for something that is either a maximum or
a minimum.
The following existence results is very difficult to prove: a continuous function defined on a finite interval that includes the endpoints has a global maximum and a global minimum. Note that the extremum may be reached at
the endpoints, a simple example is f (x) = x when considered on the interval
0 ≤ x ≤ 1; the minimum is zero and is reached at zero and the maximum is 1
and is reached at 1. If we were to exclude the endpoints, i.e. consider f (x) = x
on the interval 0 < x < 1, then the function has no maximum and no minimum.
The following result is of great use in finding extrema: if f has a local extremum
at x = c and f is differentiable in c, then f 0 (c) = 0. Of course a global extremum
is also a local extremum, so this procedure helps us to find global extrema as
well.
To find local extrema of f we do the following:
1. calculate the derivative,
2. solve f 0 (x) = 0,
3. find points where f 0 is not defined,
4. consider the endpoints of the interval on which f is defined.
30
We already showed that step 4 should not be forgotten when we considered the
example f (x) = x on the interval [0, 1]. We now show that step 3 should not be
forgotten. Consider the absolute value function (see figure 2.2). This function
obviously has its minimum in x = 0, but the derivative is −1 for x < 0, it is 1
for x > 0 and undefined for x = 0. So solving f 0 (x) = 0 doesn’t provide you
with the minimum. We further note that the points found in steps 2,3 and 4 are
only candidates for local extrema, they may not be local extrema. An example
here is f (x) = x3 on the interval [−1, 1] for which f 0 (0) = 0, but there is no
local extremum in x = 0 (see figure 2.4). We will say a bit more about this
later.
1
0.5
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
-0.5
-1
Figure 2.4: The graph of x3
Problem 44. Find the global maximum and global minimum of f (x) = x2 on
the interval [−1, 2].
Solution. Of course for this example the maximum and minimum are easily
seen, but we do the procedure anyway. We have f 0 (x) = 2x, so f 0 (x) = 0 if and
only if x = 0. The value at this point is 0. There are no point where f is not
differentiable. At the boundary point x = −1 the value is 1, at the boundary
point x = 2 the value is 4. So the global minimum value is 0 and is reached in
x = 0, the global maximum is 4 and is reached in x = 2.
Determining whether a point is a local minimum, a local maximum or neither
is a bit, but not much, more difficult than determining whether it is a global
minimum, a global maximum or neither. Now we do not compare all values at
the points found in steps 2,3 and 4 with each other, but only neighboring ones.
We illustrate this on the above example.
Problem 45. Determine the local extrema and their nature of f (x) = x2 on the
interval [−1, 2] using the evaluation method.
Solution. As shown in problem 44 the points to be considered are x = −1, x = 0
and x = 2. The values can be given conveniently in the following chart
1 0 4
−1 0 2
31
The value at x = −1 is higher than that at neighboring special points (x = 0),
so at x = −1 there is a local maximum with value 1. The value at x = 0 is
lower than that at the neighboring special points (x = −1 and x = 2), so at
x = 0 there is a local minimum with value 0. The value at x = 2 is higher than
that at neighboring special points (x = 0), so at x = 2 there is a maximum with
value 4.
In the above problem we checked whether the points we found in steps
2,3 and 4 of the procedure were maxima or minima by simply substituting
them into the function and comparing values. There are two ways other of
checking whether a point is a local maximum, a local minimum or neither.
These two other ways have the advantage that they generalize to functions of
several variables, whereas the above used method does not (the problem is that
the concept of neighboring point doesn’t generalize).
The first is by looking at the sign of the derivative. The sign of the derivative
tells us whether the function is increasing (positive derivative) or decreasing
(negative derivative). There is a theorem that says that the derivative can only
change sign at points where it goes through zero or points where it doesn’t exist
(these points we found in steps 2 and 3). We can find the sign of the derivative
in between such points by e.g. substituting a particular point in that interval
into the derivative and looking at the sign of the outcome. The change of sign
of the derivative gives us the following information.
Suppose that f is continuous at c and that c is an interior point
• if the derivative changes sign from positive to negative at x = c, then f
has a local maximum in c,
• if the derivative changes sign from negative to positive at x = c, then f
has a local minimum in c,
• if the derivative does not change sign, then f does not have a local extremum at x = c.
If x = c is an endpoint, then the derivative can obviously not change sign since
it is not defined on one side of x = c. We can nonetheless apply the above result
pretending that the derivative changes sign at the endpoint. We will illustrate
this on the already discussed example.
Problem 46. Determine the local extrema and their nature of f (x) = x2 on the
interval [−1, 2] using the change of sign of the derivative method.
Solution. As shown in problem 44 the points to be considered are x = −1, x = 0
and x = 2. The derivative is f 0 (x) = 2x and its sign chart is
−
−1
+
0
2
At the endpoint x = −1 the sign changes from positive to negative (using our
pretense that the sign changes at endpoints), so at x = −1 there is a local
32
maximum with value 1. At x = 0 the sign changes from negative to positive,
so at x = 0 there is a local minimum with value 0. At the endpoint x = 2 the
sign changes from positive to negative (again using our pretense that the sign
changes at endpoints), so at x = 2 there is a local maximum with value 4.
The third method of determining if we are dealing with a local maximum, a
local minimum or neither is by considering higher order derivatives.
Suppose that f 0 (c) = 0
• if f 00 (c) > 0, then f has a local minimum in x = c,
• if f 00 (c) < 0, then f has a local maximum in x = c,
• if f 00 (c) = 0, then f might have a local maximum, a local minimum or
neither in x = c.
An example where f 0 (0) = f 00 (0) = 0 and f has a local minimum in x = 0
is f (x) = x4 , an example where it has a local maximum is f (x) = −x4 and
an example where it has neither is f (x) = x3 . We will look deeper into this
issue when we consider Taylor polynomials. Note that this method of looking
at the second derivative says nothing in the case of points obtained in steps
3 or 4 (points where f 0 does not exist and endpoints), it really is crucial that
f 0 (c) = 0.
Problem 47. Determine the local extrema and their nature of f (x) = x2 on the
interval [−1, 2] using the method of higher order derivatives.
Solution. As shown in problem 44 the points to be considered are x = −1,
x = 0 and x = 2. The second derivative is the constant 2. So there is a local
minimum at x = 0 with value 0. Since x = 0 is a local minimum, it follows that
the endpoints x = −1 and x = 2 are local maxima.
We now illustrate that the second derivative really does not say anything
about the nature of local extrema if f 0 (c) 6= 0. We first consider the case where
the derivative does not exist. The examples are the absolute value function
|x| + x2 and g(x) = −|x| + x2 . The second derivatives of f and g are the same;
they are 2 everywhere, except in x = 0 where they are undefined. However,
f has a minimum in x = 0, whereas g has a maximum in x = 0. So even
though the second derivatives are identical, the nature of extrema is different.
2
We now consider the case of endpoints. Consider the functions f (x) = x + x2
2
and g(x) = −x + x2 on the interval [0, 1]. The second derivative is identical to
1 in both cases. However, f has a minimum in x = 0 and a maximum in x = 1
and for g it is exactly the other way around. So again, the second derivatives
are identical, but the nature of the extrema is different. So f 0 (c) = 0 is really
crucial for the method of higher order derivatives to work!
Problem 48. Find the location of the local extrema of f (x) = x2 ex on the
interval [−3, 1] and classify them as either local maximum or local minimum.
33
Solution. The derivative is f 0 (x) = x2 ex + 2xex = x(x + 2)ex . So we have
f 0 (x) = 0 if and only if x = 0 or x = −2 and there are no points where f is
not differentiable. Now we consider the nature of these candidate extrema. We
look at the sign of f 0 . If x > 0, then all three terms in x(x + 2)ex are positive,
so f 0 (x) > 0. If x ∈ (−2, 0), then one of these terms is negative and the other
two are positive, so f 0 (x) < 0. If x < −2, then two terms are negative and one
is positive, so f 0 (x) > 0.
−
+
−3
−2
+
0
1
It follows that -3 is a local minimum, -2 is a local maximum, 0 is a local minimum
and 1 is a local maximum.
Problem 49. Find the location of the local extrema of f (x) = 3x − x3 on [−5, 2]
and classify them as local maxima or local minima.
Solution. We have f 0 (x) = 3 − 3x2 = −3(x − 1)(x + 1). So f 0 (x) = 0 if and only
of x = ±1. The sign chart of f 0 is
−
−5
−
+
−1
1
2
So x = −5 gives a local maximum, x = −1 a local minimum, x = 1 a local
maximum and x = 2 a local minimum.
Problem 50. Find the local extrema of f (x) = 3x4 − 4x3 − 6x2 + 12x on [−2, 2]
and classify them as local maxima or local minima.
Solution. We have f 0 (x) = 12x3 − 12x2 − 12x + 12 = 12(x − 1)2 (x + 1). So
f 0 (x) = 0 if and only of x = ±1. We have
−
−2
+
−1
+
1
2
So x = −2 gives a local maximum, x = −1 a local minimum, x = 1 neither a
local maximum nor a local minimum and x = 2 is a local maximum.
Problem 51. Find the maximum and the minimum value of f (x) = x3 sin x on
the interval [− π2 , π2 ].
Solution. f 0 (x) = x3 cos x + 3x2 sin x = x2 (x cos x + 3 sin x). So f 0 (x) = 0 if
and only if x = 0 or x cos x + 3 sin x = 0. The latter equation is equivalent to
x + 3 tan x = 0. This obviously has x = 0 as solution, but are there more? To
figure this out we define g(x) = x+3 tan x, which is well-defined for x ∈ (− π2 , π2 ).
We have g 0 (x) = 1+ cos32 x , so that g 0 is always positive. So g is always increasing.
So g(x) = 0 has only one solution.
34
We now consider the sign chart of f 0 . The term x cos x + 3 sin x is positive
if x ∈ (0, π2 ) and negative if x ∈ (− π2 , 0). So the sign chart of f 0 is
−
+
− π2
π
2
0
it follows that x = 0 gives a local minimum and the endpoints local maxima.
3
So the minimum is f (0) = 0 and the maximum is f ( π2 ) = f (− π2 ) = π8 .
2.9
Asymptotes
An important piece of information about a function is what happens if x becomes
large in magnitude (either positive or negative). Also important is near which
finite values of x the function values become arbitrarily large. This leads to the
notion of asymptotes.
The line y = L is called a horizontal asymptote of the function f if
lim f (x) = L
x→∞
or
lim f (x) = L.
x→−∞
The notation limx→∞ f (x) = L means that f (x) stays arbitrarily close to L
for x large enough3 . For example the function f (x) = x1 has zero as its only
horizontal asymptote. The function ex also has zero as its horizontal asymptote
since limx→−∞ ex = 0. From these two basic examples we will be able to
compute many other cases relatively easily.
Problem 52. Find the horizontal asymptotes of
2x2 −1
x2 −4 .
Solution. Divide numerator and denominator by x2 to obtain
2−
1−
1
x2
4
x2
.
Now the numerator goes to 2 and the denominator goes to 1 as x → ∞ or
x → −∞. So y = 2 is the only horizontal asymptote.
The line x = c is called a vertical asymptote of the function f if
lim f (x) = ±∞
x→c
The notation limx→c f (x) = ∞ means that f (x) is arbitrarily large for x close
enough to c.4 The typical situation in which vertical asymptotes arise is in
fractions where the denominator is zero in x = c. If the numerator is nonzero
in x = c, then x = c is indeed a vertical asymptote. If the numerator is also
zero in x = c, then some more work is needed to figure out whether x = c is a
vertical asymptote or not.
3 The precise definition is: for all ε > 0 there exists a number M such that if x ≥ M then
|f (x) − L| ≤ ε.
4 The precise definition is: for all numbers M there exists a δ > 0 such that if |x − c| ≤ δ
then f (x) ≥ M .
35
Problem 53. Find the vertical asymptotes of
2x2 −1
x2 −4 .
Solution. The denominator is zero exactly in x = ±2. The numerator is nonzero
in both of these points. So x = −2 and x = 2 are vertical asymptotes.
Problem 54. Find the vertical asymptotes of
sin x
x .
Solution. The denominator is zero in x = 0. But so is the numerator. In the
chapter on differentiation we saw that
lim
x→0
sin x
= 1,
x
so x = 0 is not a vertical asymptote of this function. In fact, this function does
not have any vertical asymptotes.
There are some general rules for dealing with limits of quotients:
constant
= 0,
∞
∞
= ±∞,
constant
nonzero constant
= ±∞.
0
Examples of this are
arctan x
= 0,
x→∞
x
lim
x
= ∞,
x→∞ arctan x
lim
cos x
= ∞.
x→0 x2
lim
0
The more difficult cases are ∞
∞ and 0 , for which a closer examination is needed
and the answer depends on the specific functions involved. This is the topic of
L’Hopital’s rule.
2.9.1
L’Hopital’s rule
In the chapter on differentiation we defined the derivative in terms of limits.
But we hardly ever computed these limits, instead we used certain rules for
differentiation. It turns out that differentiation allows us to compute certain
limits. The following rule (L’Hopital’s rule) applies to finding limits of fractions.
Let c be a number, ∞ or −∞ and assume that either
• limx→c f (x) = limx→c g(x) = 0,
• or limx→c f (x) = ±∞ and limx→c g(x) = ±∞,
then
f (x)
f 0 (x)
= lim 0
.
x→c g (x)
x→c g(x)
lim
Problem 55. Find the horizontal asymptotes of
36
2x2 −1
x2 −4
using L’Hopital’s rule.
Solution. Both the numerator and the denominator are infinite at infinity, so
L’Hopital’s rule applies. We obtain 4x for the derivative of the numerator
and 2x for the derivative of the denominator. This is still infinity divided by
infinity, so we apply L’Hopital’s rule again. Then we obtain 4 divided by 2,
which gives us the horizontal asymptote y = 24 = 2. At −∞ the situation is
entirely similar.
Problem 56. Compute
lim
x→0
tan(sin x)
.
x
Solution. Both numerator and denominator are zero in zero, so we apply L’Hopital’s
rule. The derivative of the numerator is, by the chain rule, tan2 (sin x) + 1 cos x,
which equals one in zero and the derivative of the denominator is 1. So the answer is 1.
2.10
Newton’s method
There is a faster algorithm then the bisection method: Newton’s method. The
idea is to start with an initial guess (in effect we did the same with the bisection
method in determining the initial interval) and instead of computing the point
where f intersects the x-axis (which is what we actually want to know), compute
the point where the tangent line to f at the initial guess intersects the x-axis.
This tangent line has formula
y = f 0 (c)(x − c) + f (c)
so it is zero if
x=c−
f (c)
.
f 0 (c)
So from Newton’s method we obtain a sequence of approximations defined by
the iterative relation
f (xn )
xn+1 = xn − 0
.
f (xn )
Problem 57. Use Newton’s method to approximate the solution of x3 −12x+2 =
0 starting from x0 = 1.
Solution. In this case the iterative relation becomes
xn+1 = xn −
x3n − 12xn + 2
.
3x2n − 12
With x0 = 1 we obtain x1 = 0, x2 = 16 , x3 ≈ 0.1671.
√
Problem 58. Approximate 2 using Newton’s method. Start with an integer
and continue
√ until you have six digits correct (using that we somehow already
know that 2 = 1.41421356.....)
37
Solution. We want to solve x2 − 2 = 0. So the iteration is
xn+1 = xn −
This can be rewritten as
xn+1 =
x2n − 2
.
2xn
1
xn
+
.
2
xn
We obtain the following table
x0
x1
x2
x3
1
1.5
1.416667
1.414216
So after three iterations we have the first six digits correct. Note that this
convergence is much faster than using the bisection method.
Starting with x0 = 2 also gives x1 = 1.5 and therefore leads to the same
outcomes.
2.11
Taylor polynomials
The Taylor polynomial of degree n at the point a of a function f is the unique
n-th degree polynomial that has the same derivatives (up to order n) as f at
the point a. The general formula is:
pn,a (x) =
n
X
f (k) (a)
k!
k=0
(x − a)k .
Most often a = 0 and then the Taylor polynomial is sometimes called the
MacLaurin polynomial (I will never do this).
To compute the second degree Taylor polynomial of f (x) = e2x at a = 0
we compute the value of the function, its derivative and its second derivative at
zero: f (0) = 1, f 0 (x) = 2e2x so f 0 (0) = 2, f 00 (x) = 4e2x , so f 00 (0) = 4. We then
substitute this in the above general formula for a Taylor polynomial:
p(x) = 1 +
2
4
x + x2 = 1 + 2x + 2x2 .
1!
2!
Note that the tangent line to the graph of f at the point a is just the first
degree Taylor polynomial of f at the point a. The use of Taylor polynomials
is similar to that of tangent lines: close to a the function ‘looks like’ its Taylor
polynomial. Tangent lines tell use something about the possible occurrence of
extrema (at an extremum the tangent line, if it exists, is horizontal); Taylor
polynomials provide more information, they also tell us something about the
nature of extrema (whether it is a maximum or a minimum). For example,
38
the second derivative test is a consequence of studying second order Taylor
polynomials. If we suppose that f 0 (a) = 0, then f around a looks like
f (a) +
f 00 (a)
(x − a)2 .
2
As we all know, this parabola has a maximum at a if f 00 (a) < 0 and a minimum
at a if f 00 (a) > 0. Since f looks like its Taylor polynomial around a, the same
is true for f and this is simply the second derivative test. By using Taylor
polynomials we can also determine what happens in case f 00 (a) = 0, we simply
look at higher order Taylor polynomials. If f 0 (a) = f 00 (a) = 0, then f around a
looks like
f (3) (a)
(x − a)3 ,
f (a) +
6
so if f (3) (a) 6= 0, then f has neither a maximum nor a minimum at a (compare
what we know about the function x3 ). If f 0 (a) = f 00 (a) = f (3) (a) = 0, then f
around a looks like
f (4) (a)
(x − a)4 ,
f (a) +
24
so f has a maximum at a if f (4) (a) < 0 and a minimum at a if f (4) (a) > 0.
The vague expression ‘f looks like it Taylor polynomial close to a’ that we
used several times above can be made precise as follows. There exists some
point z in between x and a such that
f (x) − pn,a (x) =
f (n+1) (z)
(x − a)n+1 ,
(n + 1)!
and if x is close to a, then this right-hand side is small. This case n = 0 in the
above formula can be re-formulated as follows: there exists a point z in between
x and a such that
f (x) − f (a)
= f 0 (z).
x−a
This result is called the mean value theorem and in words it states that the
mean value of the slope over the interval from a to x (the left-hand side of the
formula) equals the slope at some point on that interval (the right-hand side).
The above form of the difference f (x) − pn,a (x) is called the Lagrange form of
the remainder, there is also the following integral form
Z x (n+1)
f
(t)
f (x) − pn,a (x) =
(x − t)n dt,
n!
a
which for n = 0 becomes
Z
f (x) − f (a) =
x
f 0 (t) dt,
a
which is the fundamental theorem of calculus. In many cases the error-term
converges to zero as n → ∞ and the function equals the limit of its Taylor
39
polynomials. This limit is called the Taylor series. We now give these for some
common functions:
ex =
sin x =
cos x =
1
=
1−x
∞
X
x2
x3
1 k
x =1+x+
+
...
k!
2
6
k=0
∞
X
k=0
∞
X
k=0
∞
X
for all x,
x3
x5
(−1)k 2k+1
x
=x−
+
...
(2k + 1)!
6
5!
for all x,
x2
x4
(−1)k 2k
x =1−
+
...
(2k)!
2
4!
for all x,
xk = 1 + x + x2 + x3 . . .
for − 1 < x < 1.
k=0
The last of these you may recognize (the geometric series).
Calculating with Taylor polynomials is easy: to obtain Taylor polynomials
of functions, you may differentiate and integrate known Taylor polynomials
term by term, you may multiply Taylor polynomials (and omit terms of too
high degree) and you may substitute polynomials for the variable. I will now
illustrate that on some examples.
Problem 59. Calculate the Taylor polynomial of degree 6 of x sin x at the point
x = 0.
Solution. We have
sin x = x −
1
1 3
x + x5 + . . .
3!
5!
so
x sin x = x2 −
1 4
1
x + x6 + . . . .
3!
5!
So the desired polynomial is:
x2 −
1 4
1
x + x6 .
3!
5!
Problem 60. Calculate the Taylor polynomial of degree 3 of ex sin x at the point
x = 0.
Solution. We have
x3
x2
+
+ ...
2!
3!
1
sin x = x − x3 + . . .
3!
ex = 1 + x +
40
so
x2
x3
1 3
e sin x = 1 + x +
+
+ ...
x − x + ...
2!
3!
3!
1 3
1 3
x2
1 3
= x − x + ... + x x − x + ... +
x − x + ... + ...
3!
3!
2!
3!
1 3
x3
= x − x + x2 +
+ ...
3!
2!
So the desired polynomial is:
1
1
−
x3 .
x + x2 +
2! 3!
x
Note that calculating the derivatives up to third order of the given function is
a lot more work than the above manipulation with Taylor series.
Problem 61. Calculate the Taylor polynomial of degree 7 of sin x2 at the point
x = 0.
Solution. We have
sin u = u −
1
1 3
u + u5 + . . .
3!
5!
so
sin x2 = x2 −
1 6
x + ....
3!
So the desired polynomial is
1 6
x .
3!
Note that this is much easier than computing the derivatives up to order seven
of sin x2 .
x2 −
Problem 62. Calculate the Taylor polynomial of degree 5 of ln(1 + x) at the
point x = 0.
Solution. We have
1
= 1 + u + u2 + u3 + u4 + . . . ,
1−u
so with u = −x
1
= 1 − x + x2 − x3 + x4 + . . . .
1+x
and then by integrating
1
1
1
1
ln(1 + x) = x − x2 + x3 − x4 + x5 + . . . .
2
3
4
5
where we have also used ln(1) = 0. So the desired polynomial is
1
1
1
1
x − x2 + x3 − x4 + x5 .
2
3
4
5
41
Problem 63. Calculate the Taylor polynomial of degree 7 of
x = 0.
1
1+x2
at the point
Solution. We have
1
= 1 − u + u2 − u3 + u4 + . . .
1+u
so
1
= 1 − x2 + x4 − x6 + . . .
1 + x2
So the desired polynomial is
1 − x2 + x4 − x6 .
Problem 64. Calculate the Taylor polynomial of degree 7 of arctan x at the point
x = 0.
Solution. We have
1
= 1 − x2 + x4 − x6 + . . .
1 + x2
so (integrating term by term)
1
1
1
arctan x = x − x3 + x5 − 7 + . . .
3
5
x
where we have also used that arctan 0 = 0. So the desired polynomial is
1
1
1
x − x3 + x5 − 7 .
3
5
x
2.12
Numerical differentiation
The definition of differentiation gives a direct way to approximate f 0 (x) in a
given point x: for h small we have
f (x + h) − f (x)
≈ f 0 (x).
h
If we apply this to f (x) = x3 at the point x0 = 1 then we obtain the second
column of the table on the following page.
In the table we also gave the alternative approximation
f (x + h) − f (x − h)
,
2h
42
h
1
0.1
0.01
f (x+h)−f (x)
h
f (x+h)−f (x−h)
2h
7
3.31
3.0301
4
3.01
3.0001
and it is clear that this one is better. For this example we can calculate
f (1 + h) − f (1)
= 3 + 3h + h2 ,
h
and
f (1 + h) − f (1 − h)
= 3 + h2 ,
2h
and this proves that the alternative approximation is better: the error is h2
rather than 3h + h2 .
This can be generalized to arbitrary functions using Taylor polynomials. We
have
2
3
f (x) + hf 0 (x) + h2 f 00 (x) + h6 f 000 (x) + . . . − f (x)
f (x + h) − f (x)
=
h
h
2
h
h
= f 0 (x) + f 00 (x) + f 000 (x) + . . .
2
6
It follows that in this case the error behaves like h.
For the alternative approximation we have
f (x + h) − f (x − h)
2h
f (x) + hf 0 (x) +
=
h2 00
2 f (x)
+
h3 000
6 f (x)
h
+ . . . − f (x) − hf 0 (x) +
h2 00
2 f (x)
−
h3 000
6 f (x)
2h
h2
= f 0 (x) + f 000 (x) + . . .
6
It follows that in this case the error behaves like h2 .
It is this kind of error analysis that is one of the uses of Taylor polynomials.
43
+ ...
i
Chapter 3
Integration
3.1
Antiderivatives
A function F such that its derivative is f is called an antiderivative of f (sometimes the term primitive is used). Note that an antiderivative is not unique:
both x2 and x2 + 1 have derivative 2x, so both are antiderivatives of this function. In fact, for any constant c the function x2 +c is an antiderivative of 2x. An
important question is whether these are all antiderivatives. The answer is yes:
if F is an antiderivative of f , then all antiderivatives of f are of the form F + c
with c a constant. This is not at all easy to prove. The set of antiderivatives of
f is denoted by
Z
f (x) dx.
We now say something about the proof of the fact that two antiderivatives of the
same function differ by a constant. We first reduce it to a special case. Suppose
that F1 and F2 are anti-derivatives of f . Then the derivative of F1 − F2 is
f − f = 0. So F1 − F2 is an antiderivative of the zero function. So if we
can prove that the only antiderivatives of the zero function are the constant
functions, then F1 − F2 must be constant and we are done. So we are left
with the seemingly obvious statement that the only functions that have the
zero function as their derivative are the constant functions. The usual way to
prove this is by employing the mean value theorem already encountered in the
section on Taylor polynomials. Recall that this theorem states that if g is a
differentiable function, then there exists a c ∈ [a, b] such that
g(b) − g(a)
= g 0 (c).
b−a
It follows that if g 0 is the zero function, then g(b) = g(a). Since a and b are
arbitrary, the function g must be constant. It is the mean value theorem that
is not so easy to prove. The first step in its proof is to reduce it to the case
where g(b) = g(a) = 0. This case is called Rolle’s theorem and states that if
44
f is differentiable and f (a) = f (b) = 0, then there exists a point c such that
f 0 (c) = 0. The mean value theorem is proven by applying Rolle’s theorem to
the function
g(b) − g(a)
(x − a).
f (x) = g(x) − g(a) −
b−a
The proof of Rolle’s theorem is based on the criterion for extrema that we
encountered in the section on minima and maxima: if f has a maximum or
minimum at x = c and f 0 (c) exists, then f 0 (c) = 0. So Rolle’s theorem follows
from the fact that a continuous function defined on [a, b] has a global maximum
and a global minimum (also already mentioned in the section on maxima and
minima). The condition f (a) = f (b) = 0 in Rolle’s theorem is made to ensure
that the maximum and minimum are not both reached only at the boundary.
There doesn’t seem to be an easier proof to the innocent looking statement
that the only functions that have the zero function as their derivative are the
constant functions.
3.2
Estimating area
In this section we will calculate the area between the graph of the function
f (x) = 1 − x2 and the coordinate axes (see figure 3.1) using only our knowledge
of the area of a rectangle (of course later on we will use integrals to calculate
areas like this).
1.5
1
0.5
-0.5
-0.25
0
0.25
0.5
0.75
1
1.25
1.5
-0.5
Figure 3.1: The area to be computed
We will do something similar as we did when we numerically approximated
the solution of an equation by the bisection method: we are going to find upper
and lower estimates for the to be determined area. We first divide the interval
[0, 1] into n equals pieces. We index these pieces by the variable k which takes
on the integer values 0 to n − 1: the k-th interval is [ nk , k+1
n ]. We then take
each of these small intervals as the base of two rectangles: one rectangle that
is completely below the curve and one rectangle that has the curve completely
below it: see figures 3.2 and 3.3.
If we denote the area below the curve by A, the sum of the areas of the
rectangles that are below the curve by L(n) and the sum of the areas of the
45
1.5
1
0.5
-0.5
-0.25
0
0.25
0.5
0.75
1
1.25
1.5
-0.5
Figure 3.2: Estimate from below with n = 4
1.5
1
0.5
-0.5
-0.25
0
0.25
0.5
0.75
1
1.25
1.5
-0.5
Figure 3.3: Estimate from above with n = 4
rectangles above the curve by U (n), then we have
L(n) ≤ A ≤ U (n).
(3.1)
We will now produce formulas for L(n) and U (n). Because the function f is
decreasing, the height of the rectangle below the curve is always the value of
f at the right end-point of its base and the height of the rectangle above the
curve is always the value of f at the left end-point of its base. So if we consider
the rectangles with base [ nk , k+1
n ], then the smaller rectangle has area
k+1
1
f
n
n
and the larger rectangle has area
1
f
n
k
.
n
We now sum all of these areas to obtain L(n) and U (n):
L(n) =
n−1
X
k=0
1
f
n
k+1
n
,
U (n) =
n−1
X
k=0
46
1
f
n
k
.
n
Substituting what f is we obtain
L(n) =
n−1
1X
(k + 1)2
1−
n
n2
U (n) =
k=0
n−1
1X
k2
1 − 2.
n
n
(3.2)
k=0
For example for n = 4 (the situation depicted in the figures) we have
1 15 12
7
34
L(4) =
+
+
=
= 0.53125
4 16 16 16
64
and
1
U (4) =
4
15 12
7
1+
+
+
16 16 16
=
50
= 0.78125.
64
What we want to show is that if n → ∞, then L(n) and U (n) converge to the
same number. By (3.1) this number then must be the area A below the curve.
To calculate these limits we must first rewrite (3.2) in a more convenient form.
For this the following formula for the sum of squares is useful:
m
X
j=1
j2 =
m(m + 1)(2m + 1)
,
6
there are similar formulas for other powers e.g.
m
X
j=
j=1
m(m + 1)
,
2
m
X
j=1
j3 =
m2 (m + 1)2
,
4
these formulas can be proven using mathematical induction. Using the formula
for the sum of squares we obtain (with j = k + 1)
L(n) = 1 −
n−1
n
1 X
1 X 2
n(n + 1)(2n + 1)
2
(k
+
1)
=
1
−
j =1−
n3
n3 j=1
6n3
k=0
and (with m = n − 1)
U (n) = 1 −
n−1
1 X 2
(n − 1)n(2n − 1)
.
k =1−
3
n
6n3
k=0
As n → ∞, both L(n) and U (n) converge to 1 −
figure 3.1 equals 23 .
3.3
2
6
=
2
3.
So the grey area in
Definition (optional)
The idea behind the definition of integral is to mimmic what we did in the
example of the previous section. We first need the following concepts which
generalize the maximum and minimum of a set. The greatest lower bound of a
47
set S is that g number such that g ≤ s for all s ∈ S (i.e. g is a lower bound
for S) and such that for all other lower bounds h (i.e. numbers such that h ≤ s
for all s ∈ S) we have h ≤ g. If the set S has a minimum, then this is the
greatest lower bound. An extremely important property of the real numbers
is that every subset of the real numbers that has a lower bound has a unique
greatest lower bound (geometrically this means that there are no holes in a line).
Consider the set of all x with 0 < x < 1, this set has no minimum, but it does
have a greatest lower bound, namely zero. The greatest lower bounded of a set
S will be denoted by glb S. Similarly we define the least upper bound (lub) of
a set S as the unique number l such that s ≤ l for all s ∈ S and such that if k
has the property s ≤ k for all s ∈ S, then l ≤ k. If S has a maximum, then this
maximum is the least upper bound.
We are given a function f defined on an interval [a, b] and we want to know
the area A of the region bounded by the graph of f and the lines y = 0, x = a
and x = b (area above the line y = 0 counting positive and area below the
line y = 0 counting negative, but for simplicity you may think of the graph of
f being entirely above the line y = 0). The first thing we do is partition the
interval [a, b], i.e. we pick points xk with k taking on the integer values between
0 and n such that
a = x0 < x1 < x2 . . . xn−1 < xn = b.
In the example in the previous section we chose xk = nk , but there is good
reason not to insist on the points in the partition being equally spaced. For a
particular partition P we define the lower-sum L(P ) and upper sum U (P ) as
follows:
L(P ) =
n−1
X
glb{f (x) : xk ≤ x ≤ xk+1 }(xk+1 − xk ),
k=0
U (P ) =
n−1
X
lub{f (x) : xk ≤ x ≤ xk+1 }(xk+1 − xk ).
k=0
In the example in the previous section we had xk+1 − xk = n1 , glb{f (x) : xk ≤
x ≤ xk+1 } = f (xk+1 ) and lub{f (x) : xk ≤ x ≤ xk+1 } = f (xk ). For any
partition P we have
L(P ) ≤ U (P ),
since the glb is always less than or equal than the lub. The idea again is that the
desired area A is always in between L(P ) and U (P ) and that as the partition
P becomes finer, L(P ) and U (P ) converge to the same number, which then
must be the desired area A. We now define what may be seen as the best
approximation from below to the area A. The lower integral of f over the
interval [a, b] is defined as
Z
b
f (x) dx := lub{L(P ) : P a partition of [a, b]}.
a
48
In the example of the previous section we could simply take the limit as n → ∞
of L(n) to obtain this lower integral. The upper integral of f over [a, b] (which
may be seen as the best approximation from above to the area A) is defined as
Z
b
f (x) dx := glb{U (P ) : P a partition of [a, b]},
a
in the example of the previous section this was simply the limit as n → ∞ of
U (n). The function f is called Darboux integrable over the interval [a, b] if
Z
b
Z
f (x) dx =
a
b
f (x) dx,
a
in this case the Darboux integral of f over the interval [a, b] denoted by
Z b
f (x) dx
a
is defined as being this number. There is an older and slightly more complicated
approach to the integral due to Riemann. The Riemann integral and Darboux
integral are equivalent. There are more general integrals: the Lebesgue integral
which is the integral used by professional mathematicians and in the UK is
usually taught to mathematics students in their third year (and it takes an
entire module to do this properly) and the Kurzweil-Henstock integral which is
a topic of current mathematical research. The integral was used by e.g. Newton
long before it was properly defined.
There are functions that are not Riemann integrable. The following classical
example is due to Dirichlet. Define the function f to be zero on the irrationals
and 1 on the rationals (remember that rationals are quotients of integers and
irrationals are those numbers that are not rational, like e and π) and consider it
on the interval [0, 2]. For any partition P the interval [xk , xk+1 ] contains both
a rational and an irrational number. So the glb of f on each such interval is 0
and the lub is 1. It follows that L(P ) = 0 and U (P ) = 2 for any partition P .
So the lower integral is zero and the upper integral is 2. Since these numbers
are not the same, f is not Darboux (or Riemann) integrable. This function is
Lebesgue integrable (and it’s integral is zero). The basic problem is that the
function in the Dirichlet example has too many points of discontinuity. The
difference between the Lebesgue integral and the Kurzweil-Henstock integral is
in functions like x1 sin x13 which oscillate wildly. This function is not Lebesque
integrable, but it is Kurzweil-Henstock integrable.
So which functions are Darboux integrable? Every continuous function is
Darboux integrable, every increasing function is Darboux integrable and every
decreasing function is Darboux integrable. This accounts for most functions
that you know. Note that it is crucial that the interval [a, b] is closed (the
endpoints are included). Functions like x12 on the interval (0, 1), endpoints not
included, which blow up at the boundary or functions on infinite intervals like
[1, ∞) will be considered in the section on improper integrals.
49
3.4
The fundamental theorem of calculus
The fundamental theorem of calculus relates integrals to derivatives, i.e. tangent
lines to area. This result was discovered in the second half of the 17th century
independently by several people, Newton and Leibniz being the most prominent
ones. The fundamental theorem consists of two parts.
Theorem 3.4.1 (Fundamental theorem of calculus part I). Assume that f is
continuous on [a, b] and define
Z x
F (x) =
f (t) dt.
a
0
Then F is differentiable and F (x) = f (x) for all x ∈ [a, b].
Theorem 3.4.2 (Fundamental theorem of calculus part II). Assume that f is
continuous on [a, b] and assume that F satisfies F 0 (x) = f (x) for all x ∈ [a, b].
Then
Z
b
f (t) dt = F (b) − F (a).
a
This second part of the fundamental theorem of calculus allows us to relatively easily compute integrals: we simply have to find an anti-derivative F of
the to be integrated function f .
The assumption that the to be integrated function f has to be continuous is
slightly inconvenient. If one considers more general integrals than the DarbouxRiemann integral, then this assumption can be relaxed.
Since the fundamental theorem of calculus is so fundamental, I will sketch
its proof. The starting point is the following result about continuous functions
(the intermediate value theorem): a continuous function takes on every value
in between its minimum and maximum, i.e. if we call the function f , then for
any y with min f ≤ y ≤ max f there exists a c such that y = f (c). From this
we derive what is called the mean value theorem for integrals: if f is continuous
on [a, b] then there exists a c ∈ [a, b] such that
f (c) =
1
b−a
Z
b
f (x) dx.
a
The right-hand side can be seen as the mean value (average) of the function f
on the interval [a, b] (compare with the definition of the average of finitely many
numbers in terms of a sum) and the theorem states that there is a point where
the value of the function is exactly equal to the mean value of the function
over the interval. The proof is easy: take the partition with only two points
x0 = a and x1 = b. Then the lower sum is (b − a) min f and the upper sum is
(b − a) max f . So we have
min f ≤
1
b−a
b
Z
f (x) dx ≤ max f.
a
50
Now use the intermediate value theorem to obtain the point c. Now we are
ready to prove part I of the fundamental theorem of calculus. We write down
the definition of the derivative of F :
R x+h
Rx
f (t) dt − a f (t) dt
F 0 (x) = lim a
h→0
h
and simplify this to
1
h→0 h
F 0 (x) = lim
Z
x+h
f (t) dt.
x
By the mean value theorem for integrals there exists a point c in between x and
x + h such that
Z
1 x+h
f (c) =
f (t) dt.
h x
Note that c depends on h. As h → 0 we must have c → x, since x ≤ c ≤ x + h
if h is positive and x + h ≤ c ≤ x if h is negative. So
1
F (x) = lim
h→0 h
0
Z
x+h
f (t) dt = lim f (c) = f (x),
h→0
x
where we have used that f is continuous to ensure that c → x implies f (c) →
f (x). This proves part I of the fundamental theorem of calculus. Part II now
follows easily. By part I, the function
Z x
G(x) =
f (t) dt
a
is an antiderivative of f . By the result on uniqueness of antiderivatives in the
section on antiderivatives we have that F = G+c for some constant c. It follows
that
Z b
Z a
Z b
F (b) − F (a) = G(b) − G(a) =
f (t) dt −
f (t) dt =
f (t) dt,
a
a
a
as desired.
3.5
Rules for integration
Using the fundamental theorem of calculus and the rules for differentiation,
rules for integration can be derived. The two crucial ones are the substitution
rule (the analogue of the chain rule) and integration by parts (the analogue of
the product rule).
51
3.5.1
The substitution rule
The chain rule tells us that
d
f (g(x)) = f 0 (g(x)) g 0 (x),
dx
If we integrate both sides of this equality then we obtain
Z b
Z b
d
f (g(x))dx =
f 0 (g(x)) g 0 (x)dx.
dx
a
a
The left-hand side is easily evaluated by the fundamental theorem of calculus.
So we obtain
Z b
f 0 (g(x)) g 0 (x)dx.
(3.3)
f (g(b)) − f (g(a)) =
a
This formula is known as the substitution rule. We can use it e.g. to compute
the following integral
Z 3
2
2xex dx.
0
With f (u) = eu , g(x) = x2 , a = 0 and b = 3 this is exactly the right-hand side
of (3.3). Since f (g(0)) = f (0) = 1 and f (g(3)) = f (9) = e9 the integral equals
e9 − 1. Using Leibniz notation the substitution rule reads
Z u(b)
Z b
du
f (u) du =
f (u) dx,
dx
u(a)
a
which is very suggestive.
Problem 65. Compute
Z
p
2x 1 + x2 dx.
Solution. Use the substitution u = 1 + x2 , then du = 2xdx so in terms of the
u-variable we have the integral
Z
√
2
2
u du = u3/2 + C = (1 + x2 )3/2 + C.
3
3
Problem 66. Compute
Z
π/2
cos 7x dx.
0
Solution. Use the substitution u = 7x, then du = 7dx so in terms of the uvariable we have the integral
7π/2
Z
1 7π/2
1
1
7π
1
cos u du =
sin u
= sin
=− .
7 0
7
7
2
7
0
52
Problem 67. Compute
Z
1
1
cos dx.
x2
x
Solution. Use the substitution u = x1 , then du = −1
x2 dx so in terms of the
u-variable we have the integral
Z
1
− cos u du = − sin u + C = − sin + C.
x
Problem 68. Compute
Z
1
dx.
(2 − x)2
Solution. Use the substitution u = 2 − x, then du = −1dx so in terms of the
u-variable we have the integral
Z
1
1
−1
du = + C =
+ C.
2
u
u
2−x
Problem 69. Compute
Z r
x−1
dx.
x5
Solution. First re-write this as
Z
1
x2
r
1−
1
dx.
x
Use the substitution u = 1 − x1 , then du = x12 dx so in terms of the u-variable
we have the integral
3/2
Z
√
2
2
1
u du = u3/2 + C =
1−
+ C.
3
3
x
Note that it may happen that u(b) < u(a). We then get a funny looking
integral where the upper limit is smaller than the lower limit. This is not a
problem, the convention is that
Z b
Z a
f (x) dx = −
f (x) dx
a
b
(i.e. reversing the limits gives a minus sign) and with this convention everything
still works out fine. If we want to compute
Z 2
1
1
cos dx,
2
x
x
1
53
then we can use the anti-derivative − sin x1 obtained in problem 67 to obtain
sin 1 − sin 12 or we can directly use the substitution u = x1 and change the limits
to the u-variable to obtain
Z 1/2
1
1/2
− cos u du = [− sin u]1 = − sin + sin 1,
2
1
or use the above mentioned convention to obtain
Z 1
Z 1/2
1
1
cos u du = [sin u]1/2 = sin 1 − sin .
− cos u du =
2
1/2
1
3.5.2
Integration by parts
The product rule tells us:
(f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
Integrating and using the fundamental theorem of calculus gives
Z b
f (b)g(b) − f (a)g(a) =
f 0 (x)g(x) + f (x)g 0 (x) dx
a
rearranging this gives
Z b
f (x)g 0 (x) dx = [f g]ba −
Z
a
b
f 0 (x)g(x) dx.
a
This last formula is called the integrating by parts formula.
Problem 70. Compute
Z
xex dx.
Solution. Use integration by parts with f (x) = x and g 0 (x) = ex so that f 0 (x) =
1 and g(x) = ex . This gives
Z
Z
x
x
xe dx = xe − ex dx = (x − 1)ex + C.
Problem 71. Compute
Z
x2 ex dx.
Solution. Use integration by parts with f (x) = x2 and g 0 (x) = ex so that
f 0 (x) = 2x and g(x) = ex . This gives
Z
Z
2 x
2 x
x e dx = x e − 2 xex dx = (x2 − 2x + 2)ex + C,
where we have used problem 70.
54
Problem 72. Compute
Z
ex cos x dx.
Solution. Use integration by parts with f (x) = cos x and g 0 (x) = ex so that
f 0 (x) = − sin x and g(x) = ex . This gives
Z
Z
ex cos x dx = ex cos x + ex sin x dx.
Use integration by parts again, now with f (x) = sin x and g 0 (x) = ex so that
f 0 (x) = cos x and g(x) = ex . This gives
Z
Z
x
x
x
e cos x dx = e cos x + e sin x − ex cos x dx.
Rearranging gives
Z
2
so that
Z
ex cos x dx = ex cos x + ex sin x,
ex cos x dx =
ex cos x + ex sin x
+ C.
2
Integration by parts is also useful to find antiderivatives of inverse functions.
We will see that in a later section.
3.6
Anti-derivatives of inverse functions
There is a general method for finding the anti-derivative of an inverse function, it
uses integrating by parts. We illustrate the method on the arcsine. To compute
the anti-derivative of the arcsine we use integration by parts with f (x) = arcsin x
1
and g 0 (x) = 1. Then f 0 (x) = √1−x
and g(x) = x so that
2
Z
Z
arcsin x dx = x arcsin x −
√
x
dx.
1 − x2
This last integral we calculate by substitution: u = 1 − x2 so that du = −2x dx:
Z
Z
p
√
x
1
1
√
√ du = − u + C = − 1 − x2 + C.
dx = −
2
u
1 − x2
So that
Z
arcsin x dx = x arcsin x +
55
p
1 − x2 + C.
Problem 73. Compute
Z
ln x dx.
Solution. Integrate by parts with f (x) = ln x and g 0 (x) = 1. Then f 0 (x) =
and g(x) = x so that
Z
Z
1
ln x dx = x ln x −
x dx = x ln x − x + C.
x
1
x
In practice one just carries out the computations on examples as above
instead of memorizing a general formula, but there is a general formula which
shows why the above method works. Using integration by parts and substitution
we obtain
Z
Z
Z
dy
y dx = xy − x dx = xy − x dy,
dx
so if we can integrate a function x = x(y) then we can integrate its inverse
function y = y(x).
3.7
Anti-derivatives of rational functions by partial fractions
The anti-derivative of a rational function (i.e. a quotient of polynomials) can
be explicitly computed provided that we can explicitly factor its denominator.
This process makes use of the partial fraction expansion, which we will discuss
first.
The first step in finding the partial fraction expansion of a rational function
n(x)
d(x) is to use polynomial long division to write
n(x)
n1 (x)
= ρ(x) +
,
d(x)
d1 (x)
where ρ, n1 , d1 are polynomials and the degree of n1 is strictly smaller than the
degree of d1 . In most examples that we discuss the degree of n will already be
strictly smaller than the degree of d, so that we can usually skip this step.
Problem 74. For n(x) = 2x3 − 4x2 − x − 3, d(x) = x2 − 2x − 3 find ρ, n1 , d1 as
above.
Solution. Polynomial long division gives
2x3 − 4x2 − x − 3
5x − 3
= 2x + 2
,
2
x − 2x − 3
x − 2x − 3
so ρ(x) = 2x, n1 (x) = 5x − 3, d1 (x) = x2 − 2x − 3.
56
The second step in finding the partial fraction expansion of a rational function n(x)
d(x) (where we now assume that the degree of n is strictly smaller than
the degree of d) is factorization of the denominator d into irreducible factors.
This means that we write d as a product of linear terms x − r and irreducible
quadratic terms (x − p)2 + q where q > 0. This can in principle always be done
by the fundamental theorem of algebra, but explicitly finding these factors is
not always possible.
Problem 75. Write d(x) = x2 − 2x − 3 as a product of irreducible factors.
Solution. We have x2 − 2x − 3 = (x − 3)(x + 1), so this polynomial splits into
two single linear factors.
Problem 76. Write d(x) = x2 + 4x + 4 as a product of irreducible factors.
Solution. We have x2 + 4x + 4 = (x + 2)2 , so this polynomial has a repeated
linear factor.
Problem 77. Write d(x) = x2 − 2x + 5 as a product of irreducible factors.
Solution. We have x2 − 2x + 5 = (x − 1)2 + 4, so this polynomial has a single
irreducible quadratic factor.
An example of a denominator that has a repeated irreducible quadratic factor
is d(x) = (x2 + 1)2 .
The third step in finding the partial fraction expansion of a rational function
n(x)
d(x) (where we now assume that the degree of n is strictly smaller than the
degree of d) is to write down the form of the partial fraction expansion based
on the factorization of the denominator. For each linear factor (x − r)k we have
the terms
A1
Ak
+ ... +
,
x−r
(x − r)k
in particular if the linear factor is single then we get only the term
1
x−r
and if the linear factor is double then we get the terms
A1
A2
+
.
x − r (x − r)2
For each irreducible quadratic factor [(x − p)2 + q]k we have the terms
2B1 (x − p) + C1
2Bk (x − p) + Ck
+ ... +
,
(x − p)2 + q
[(x − p)2 + q]k
in particular if the irreducible quadratic factor is single then we get only the
term
2B1 (x − p) + C1
(x − p)2 + q
57
and if the irreducible quadratic factor is double then we get the terms
2B2 (x − p) + C2
2B1 (x − p) + C1
.
+
2
(x − p) + q
[(x − p)2 + q]2
Problem 78. Find the form of the partial fraction expansion of
5x−3
x2 −2x−3 .
Solution. We saw before that x2 − 2x − 3 = (x − 3)(x + 1), so this polynomial
splits into two single linear factors. So the form of the partial fraction expansion
is
B
A
+
.
x−3 x+1
Problem 79. Find the form of the partial fraction expansion of
6x+7
x2 +4x+4 .
Solution. We saw before that x2 + 4x + 4 = (x + 2)2 , so this polynomial has a
repeated linear factor. So the form of the partial fraction expansion is
A
B
+
.
x + 2 (x + 2)2
If we tried to be in keeping with the notation above, then we should have written
A1
A2
+
,
x + 2 (x + 2)2
but I try to avoid subscripts when it comes to actually computing things (which
we will do in a moment), so I use A for A1 and B for A2 .
Problem 80. Find the form of the partial fraction expansion of
6x+7
x2 −2x+5 .
Solution. We saw before that x2 −2x+5 = (x−1)2 +4, so this polynomial has a
single irreducible quadratic factor. So the form of the partial fraction expansion
is
2A(x − 1) + B
.
(x − 1)2 + 4
Problem 81. Find the form of the partial fraction expansion of
2x3 +1
(x2 +1)2 .
Solution. This denominator polynomial has a double irreducible quadratic factor. So the form of the partial fraction expansion is
2Ax + B
2Cx + D
+ 2
.
x2 + 1
(x + 1)2
58
The fourth step in finding the partial fraction expansion of a rational function
(where we now assume that the degree of n is strictly smaller than the
degree of d) is to find the values for the parameters introduced in the form of
the partial fraction expansion. There are a couple of ways of doing this, I prefer
solving linear systems to obtain these values.
n(x)
d(x)
Problem 82. Find the partial fraction expansion of
5x−3
x2 −2x−3 .
Solution. We saw before that the form of the partial fraction expansion is
A
B
+
.
x−3 x+1
So we want to find A and B such that
B
5x − 3
A
+
= 2
.
x−3 x+1
x − 2x − 3
Multiplying both sides by x2 − 2x − 3 gives
A(x + 1) + B(x − 3) = 5x − 3.
(3.4)
This can be re-written as
(A + B)x + (A − 3B) = 5x − 3.
So we should make sure that
A+B =5
A − 3B = −3.
This system of equations can be solved by subtracting the second equation from
the first to obtain 4B = 8. It follows that B = 2 and substituting this back
gives A = 3. So the partial fraction expansion is
3
2
+
.
x−3 x+1
Alternatively, in (3.4) we could substitute x = −1 to obtain −4B = −8, which
gives B = 2, and we could substitute x = 3 to obtain 4A = 12, which gives
A = 3. In the case of single linear factors substitution is easier than solving the
system of equations.
Problem 83. Find the partial fraction expansion of
6x+7
x2 +4x+4 .
Solution. We saw before that the form of the partial fraction expansion is
A
B
+
.
x + 2 (x + 2)2
59
So we want to find A and B such that
A
6x + 7
B
= 2
+
.
x + 2 (x + 2)2
x + 4x + 4
As before we multiply by the original denominator to obtain
A(x + 2) + B = 6x + 7.
(3.5)
This can be re-written as
Ax + (2A + B) = 6x + 7.
So we should make sure that
A=6
2A + B = 7.
This system of equations can be solved easily. We have A = 6 and from the
second equation it then follows that B = −5. So the partial fraction expansion
is
−5
6
+
.
x + 2 (x + 2)2
Alternatively, in (3.5) we could substitute x = −2 to obtain B = −5, which gives
B = 2, and then substitute any other value to obtain A. In the case of repeated
linear factors substitution is as easy as solving the system of equations.
Problem 84. Find the partial fraction expansion of
6x+7
x2 −2x+5 .
Solution. This already is the partial fraction expansion. We can re-write this
in de standard form as
6(x − 1) + 13
.
(x − 1)2 + 4
Problem 85. Find the partial fraction expansion of
2x3 +1
(x2 +1)2 .
Solution. We saw before that the form of the partial fraction expansion is
2Cx + D
2Ax + B
+ 2
.
2
x +1
(x + 1)2
So we want to find A, B, C and D such that
2Ax + B
2Cx + D
2x3 + 1
+
=
.
x2 + 1
(x2 + 1)2
(x2 + 1)2
As before we multiply by the original denominator to obtain
(2Ax + B)(x2 + 1) + (2Cx + D) = 2x3 + 1.
60
(3.6)
This can be re-written as
2Ax3 + Bx2 + (2A + 2C)x + (B + D) = 2x3 + 1.
It follows that we should choose A = 1, B = 0, C = −1, D = 1. So the partial
fraction expansion is
−2x + 1
2x
.
+
x2 + 1 (x2 + 1)2
Substitution in (3.6) does not seem appropriate, solving a linear system like was
done above seems the better method in this case.
Problem 86. Find the partial fraction expansion of
10x2 +12x+20
x3 −8
Solution. We first note that x = 2 is a zero of the denominator, so x3 − 8 has
a factor x − 2. By polynomial long division of x3 − 8 by x − 2 we obtain that
x3 − 8 = (x − 2)(x2 + 2x + 4). Since x2 + 2x + 4 = (x + 1)2 + 3, this quadratic
term is irreducible. We deduce the form of the partial fraction expansion
10x2 + 12x + 20
A
Bx + C
=
+
.
x3 − 8
x − 2 x2 + 2x + 4
To find A, B and C we multiply with the original denominator
10x2 + 12x + 20 = A(x2 + 2x + 4) + (Bx + C)(x − 2).
(3.7)
Substituting x = 2 in (3.7) leaves only the unknown A:
84 = 12A,
so A = 7. Substituting x = 0 in (3.7) only leaves the unknown C (since A is
now known):
20 = 28 − 2C,
so C = 4. We now substitute x = 1 in in (3.7) (since this makes everything easy
to evaluate) to find B:
42 = 49 − (B + 4),
so B = 3. So the partial fraction expansion is:
3x + 4
7
+
.
x − 2 x2 + 2x + 4
Once we have the partial fraction expansion, we can reduce the calculation of
the antiderivative of a rational function to the calculation of the antiderivatives
of some very special rational functions. In the next problems we show how to
calculate such antiderivatives for all the cases that can occur. We start with the
easiest case of single linear factors.
R
Problem 87. Find x25x−3
−2x−3 dx.
61
Solution. We saw before that the partial fraction expansion is
3
2
+
,
x−3 x+1
so we want to calculate
Z
2
3
+
dx,
x−3 x+1
which equals
3 ln |x − 3| + 2 ln |x + 1| + C.
Do not forget the absolute value signs!
The next problem considers the case of a repeating linear factor (it repeats
twice in the specific problem, but a factor that repeats more often can be dealt
with in a similar way).
R
Problem 88. Find x26x+7
+4x+4 dx.
Solution. We saw before that the partial fraction expansion is
6
−5
+
,
x + 2 (x + 2)2
so we want to calculate
Z
Z
Z
6
−5
6
+
dx =
dx − 5 (x + 2)−2 dx,
x + 2 (x + 2)2
x+2
which equals
6 ln |x + 2| + 5(x + 2)−1 + C = 6 ln |x + 2| +
5
+ C.
x+2
The next problem deal with the issue of a single irreducible quadratic factor.
R
Problem 89. Find x26x+7
−2x+5 dx.
Solution. We saw before that the partial fraction expansion is
6(x − 1) + 13
,
(x − 1)2 + 4
so we want to calculate
Z
Z
Z
6(x − 1) + 13
2(x − 1)
13
dx
=
3
dx
+
dx.
(x − 1)2 + 4
(x − 1)2 + 4
(x − 1)2 + 4
We do the first of these integrals by the substitution u = (x−1)2 , du = 2(x−1)dx
which gives
Z
Z
2(x − 1)
1
3
dx = 3
du = 3 ln |u + 4| + C = 3 ln |(x − 1)2 + 4)| + C.
(x − 1)2 + 4
u+4
62
R
The second integral we would want to transform to a constant times t21+1 dt
for a suitable variable t. To achieve this we first divide the numerator and
denominator by 4:
Z
Z
13
1
13
dx
=
dx.
2
(x − 1)2 + 4
4
( x−1
2 ) +1
We then define what is squared in the denominator as the new variable: t =
dt = 12 dx, which gives
Z
13
1
13
13
x−1
dt =
arctan t + C =
arctan
+ C.
2
2
t +1
2
2
2
x−1
2 ,
Collecting things we have
Z
6x + 7
13
x−1
dx = 3 ln |(x − 1)2 + 4)| +
arctan
+ C.
x2 − 2x + 5
2
2
The final problem deals with the case of a repeating irreducible quadratic
factor (it repeats twice in the specific example, if it repeats more often then
more integrations by parts are needed).
R
3
+1
Problem 90. Find (x2x2 +1)
2 dx.
Solution. We saw before that the partial fraction expansion is
2x
−2x + 1
+ 2
,
+ 1 (x + 1)2
x2
so we want to calculate
Z
−2x + 1
2x
+ 2
dx.
+ 1 (x + 1)2
x2
We split this integral into three parts:
Z
Z
Z
−2x
1
2x
dx
+
dx
+
dx.
x2 + 1
(x2 + 1)2
(x2 + 1)2
The first two integrals can be computed using the substitution u = x2 + 1,
du = 2xdx:
Z
Z
2x
1
dx =
du = ln |u| + C = ln (x2 + 1) + C,
x2 + 1
u
Z
Z
−2x
−1
1
1
dx =
du = + C = 2
+ C.
(x2 + 1)2
u2
u
x +1
The third integral is considerably harder. We rewrite the integral as follows:
Z
Z 2
Z
Z
1
x + 1 − x2
1
x2
dx
=
dx
=
dx
−
dx.
(x2 + 1)2
(x2 + 1)2
x2 + 1
(x2 + 1)2
63
For the second integral we use integration by parts with f (x) =
2x
1
−1
0
(x2 +1)2 so that f (x) = 2 , g(x) = x2 +1 which gives
Z
x
1
1
dx = arctan x +
−
(x2 + 1)2
2(x2 + 1) 2
Z
x
2,
g 0 (x) =
1
dx.
x2 + 1
which equals
arctan x
x
+
+ C.
2
2(x2 + 1)
Collecting things we have
Z
1
arctan x
x
2x3 + 1
dx = ln (x2 + 1) + 2
+
+
+ C.
2
2
2
(x + 1)
x +1
2
2(x + 1)
The following gives an alternative method for dealing with repeated quadratic
factors.
R
Problem 91. Find (xx+1
2 +1)2 dx.
Solution. Some thought tells us that an anti-derivative must be of the form
Ax + B
+ C arctan x.
x2 + 1
Computing the derivative of this educated guess we obtain using the quotient
rule
C
A(x2 + 1) − 2x(Ax + B)
+ 2
.
2
2
(x + 1)
x +1
This can be re-written as
A(x2 + 1) − 2x(Ax + B)
Cx2 + C
+ 2
,
2
2
(x + 1)
(x + 1)2
and further re-written as
(−A + C)x2 − 2Bx + A + C
.
x2 + 1
For this to be equal to
x+1
(x2 +1)2
for all x we need
−A + C = 0,
−2B = 1,
A + C = 1.
Solving this system of equations gives A = C = 12 and B =
anti-derivative is
1 x−1
1
+ arctan x + K.
2
2 x +1 2
64
−1
2 .
So the required
3.8
Anti-derivatives of trigonometric rational functions
It is always possible to reduce an integral involving so-called trigonometric rational functions like
Z
cos x + sin2 x
dx
cos3 x + sin2 x cos x + sin x cos x
to an integral of an ordinary rational function by the substitution
x
z = tan ,
2
since then
cos x =
1 − z2
,
1 + z2
sin x =
2z
,
1 + z2
dx =
2
dz.
1 + z2
However, this particular substitution usually gives rather difficult ordinary rational functions to integrate since the degrees becomes rather high and repeated
irreducible quadratic factors are introduced. In special important cases there
are much easier ways. We will look at the special case of the product of a power
of a sine and a power of a cosine:
Z
sinm x cosn x dx.
There are four cases:
• m odd
• n odd
• both m and n even and and positive
• both m and n even, not both positive.
An example of the first case is
Z
sin3 x cos2 x dx.
In this case we use the substitution u = cos x, then du = − sin xdx and (using
sin2 x = 1 − cos2 x):
Z
Z
1
1
1
1
sin3 x cos2 x dx = − (1−u2 )u2 du = − u3 + u5 +C = − cos3 x+ cos5 x+C.
3
5
3
5
An example of the second case is
Z
sin5 x cos x dx.
65
In this case we use the substitution u = sin x, then du = cos xdx and (using
cos2 x = 1 − sin2 x):
Z
Z
1
1
5
sin x cos x dx = u5 du = u6 + C = sin6 x + C.
6
6
Note that in this example also m is odd, so we could alternatively have used
the substitution u = cos x, that would however have resulted in a slightly more
complicated integral in terms of the new variable.
Both of the examples considered had m and n positive, but if either or both
are negative nothing much changes as the following example indicates:
Z
Z
1
−1
1
sin x
du = + C =
dx
=
{u
=
cos
x,
du
=
−
sin
xdx}
=
+ C.
cos2 x
u2
u
cos x
This easy
√ substitutions no longer work if m and n are both even (in that case
a term 1 − u2 appears that we cannot easily deal with). In that case we can
however use the double angle formulas
sin2 x =
1 − cos 2x
,
2
cos2 x =
1 + cos 2x
.
2
We illustrate that on the following example.
Z
Z
Z
1 − cos 2x 1 + cos 2x
1 cos2 2x
sin2 x cos2 x dx =
dx =
−
dx
2
2
4
4
Z
Z
1 1 1 + cos 4x
1 cos 4x
x sin 4x
=
−
dx =
−
dx = −
+ C.
4 4
2
8
8
8
32
If either m or n is negative (and both are even), then the double angle formulas
don’t lead to integrals that are easily evaluated. We however do not have to fall
back on the last resort z = tan x2 substitution. With the substitution t = tan x
we have
1
t2
1
2
cos2 x =
,
sin
x
=
, dx =
dt,
2
2
1+t
1+t
1 + t2
and this leads to somewhat simpler to be evaluated integrals than the z substitution. We illustrate this on the example
Z
Z
Z
1
1
t3
tan3 x
2 2
2
dx
=
(1+t
)
dt
=
1+t
dt
=
t+
+C
=
tan
x+
+C.
cos4 x
1 + t2
3
3
3.9
Trigonometric substitution
√
For integrals involving 1 − x2 the substitution
x = sin u is useful to get rid of
p
√
the square root. In that case 1 − x2 = 1 − sin2 u = cos u if −π/2 ≤ u ≤ π/2
and dx = cos u du so e.g.
Z p
Z
2
1 − x dx = cos2 u du.
66
We compute this integral using the double angle formula
Z
Z
1 + cos 2u
u sin 2u
2
cos u du =
du = +
+ C.
2
2
4
To return to the x-variable we use the double angle formula for the sine:
√
u sin u cos u
arcsin x x 1 − x2
u sin 2u
+
+C = +
+C =
+
+ C.
2
4
2
2
2
2
√
For integral involving 1 + x2 the substitution x = tan u is useful to get rid
of the square root. With this substitution we have
s
r
2
p
p
2u
cos
sin
u
1
1
2
1 + x2 = 1 + tan u =
+
=
=
,
cos2 u cos2 u
cos2 u
cos u
since we may assume u ∈ (− π2 , π2 ) for which cos u > 0. We further have
dx =
1
du.
cos2 u
We use the above to compute
Z
Z
Z
1
1
1
√
du.
dx = cos u 2 du =
2
cos
u
cos
u
1+x
We use the substitution w = sin u to compute this integral. We then have
dw = cos udu so
Z
Z
Z
1
cos u
1
du =
du =
dw.
cos u
cos2 u
1 − w2
This integral in the w-variable we compute by partial fractions
Z
1
1
1
dw = − ln |1 − w| + ln |1 + w| + C.
1 − w2
2
2
x
Since w = sin u and x = tan u we have w = √1+x
. Substituting that and
2
simplifying using the logarithmic rules gives
Z
p
1
√
dx = ln |x + 1 + x2 | + C.
2
1+x
3.10
Hyperbolic substitution
The following functions can be used for integrals involving
can be used for other purposes as well):
sinh x :=
ex − e−x
,
2
cosh x :=
67
√
ex + e−x
.
2
1 + x2 (and they
3.2
3
2.8
2
2.4
1
2
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
1.6
3
1.2
-1
0.8
-2
0.4
-3
-3
Figure 3.4: The graph of sinh
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Figure 3.5: The graph of cosh
These functions are called the hyperbolic sine and hyperbolic cosine respectively
and play a similar role for hyperbolas as the trigonometric sine and cosine play
for circles. Their graphs are given below.
The following properties are easily verified:
d
sinh x = cosh x,
dx
d
cosh x = sinh x,
dx
cosh2 x − sinh2 x = 1;
note the difference in signs with the usual trigonometric case.
The function sinh is increasing and therefore invertible. We find an explicit
formula for its inverse. This means that we we want to solve
ex − e−x
2
x
for x. We first substitute u = e and solve for u. The equation becomes
y=
y=
u−
2
1
u
.
Multiply this by 2u to obtain:
2yu = u2 − 1,
write this quadratic equation in standard form
u2 − 2yu − 1 = 0,
and then use the quadratic formula to obtain
p
u = y ± y 2 + 1.
We then substitute back u = ex and realize that
p we must take the plus sign to
be able to solve the equation for x (since y − y 2 + 1 is always negative and ex
is always positive). So we obtain
p
ex = y + y 2 + 1,
68
so that
p
y 2 + 1).
√
So the inverse function of sinh(x) is ln(x + x2 + 1).
Now we return to integrals. We will compute
Z
1
√
dx
1 + x2
x = ln(y +
which we already computed in Section 3.9 with another substitution.
Now
p
√
2
2
1 + sinh u =
we use x = sinh u. Then dx = cosh u du and 1 + x =
√
cosh2 u = cosh u, noting that the hyperbolic cosine is always positive. So
we have
Z
Z
Z
p
1
1
√
dx =
cosh u du = 1 du = u + C = ln(x + x2 + 1) + C,
2
cosh u
1+x
where in the last step we used the formula for the inverse of the hyperbolic sine.
3.11
Area between curves
In addition to calculating the area between a curve and the lines x = a, x = b
and y = 0 it is also possible to use integrals to compute the area bounded from
above and below by a curve and by the lines x = a and x = b. As an example
we will compute the area between the curves y = 2x, y = x2 and the lines
x = 0, x = 1. Note that the curve y = 2x is always above the curve y = x2
for x ∈ [0, 1], so that the area between the curves is simply the area between
the curve y = 2x and the lines x = 0, x = 1, y = 0 minus the area between the
curve y = x2 and the lines x = 0, x = 1, y = 0. So the desired area between the
curves is
1
Z 1
Z 1
1 3
2
2
2
= .
2x dx −
x dx = x − x
3
3
0
0
0
If we consider the same functions on the interval [0, 3], then it is not longer true
that the curve y = 2x is always above the curve y = x2 : on the interval [0, 2]
this is true, but on the interval [2, 3] the situation is reversed. To obtain the
area between the curves we therefore have to compute the following integral
Z 2
Z 2
Z 3
Z 3
2x dx −
x2 dx +
x2 dx −
2x dx
0
0
2
3 2
2
3 3
3
2
x
x
+
− x2 2
= x2 0 −
3 0
3 2
8
8
8
=4− +9− −9+4= .
3
3
3
In general, the area between the curves y = f (x), y = g(x) and the lines x = a
and x = b is
Z
b
|f (x) − g(x)| dx.
a
69
To compute this integral it is often essential to work out where on the interval
[a, b] the function f (x) − g(x) is positive and where it is negative and split the
integral accordingly into the sum of several integrals (to get rid of the absolute
value signs).
√
Problem 92. Calculate the area bounded by the curves y = 1 + x, y = 14 x2 ,
y = 3 − x and x = 0.
Solution. It helps to draw a picture.
2.4
1.6
0.8
-1
-0.5
0
0.5
1
1.5
2
2.5
3
-0.8
Figure 3.6: The graphs of the functions
√
First
where the two curves 1 + x and 3 − x intersect:
√
√ of all we calculate
1 + x = 3 − x gives x = 2 − x, which in turn gives x = (2 − x)2 which leads
to x2 − 5x + 4 = 0 which factors as (x − 1)(x − 4) = 0. The only solution to
the original equation is x = 1 (x = 4 was introduced by squaring). We then
calculate where the two curves x2 /4 and 3 − x intersect: x2 = 4(3 − x) gives
x2 + 4x − 12 = 0 which factors as (x − 2)(x + 6) = 0, the solution that we are
interested in is x = 2. Now we can write down the formula that gives the area:
Z 1
Z 2
√
1
1
1 + x − x2 dx +
3 − x − x2 dx
4
4
0
1
Z 1
Z 2
Z 2
√
1 2
=
1 + x dx +
3 − x dx −
x dx
0
1
0 4
1 2 2
1 2
1 3
2 3/2
+ 3x − x
−
x
= x+ x
3
2
12
0
1
0
5 3 2
5
= + − = .
3 2 3
2
3.12
Numerical integration
Up to now we have computed integrals symbolically, however often that cannot
be done and one has to resort to numerical approximation. Section 3.2 points
70
the way to how to do this. What we did there was approximate the integral
Z
B
f (x) dx,
A
by the sum of areas of n rectangles and then let n go to infinity. We used there
that the function was decreasing so that we could easily find upper and lower
sums and we used that the function was simple enough so that we could rewrite
the ‘sum of areas formula’ in such a way as to easily take the limit. We will not
be able to do these things in general, but approximating the integral by the sum
of areas of n rectangles for n large will still give a good approximation. There
are three obvious choices for the height of the rectangles: the left endpoint,
the midpoint and the right endpoint. These lead to the following rules for
B−A
we have
approximating the integral: with xk := A + B−A
n k and ∆ :=
n
Pn−1
• (left point rule) ∆ k=0 f (xk ),
Pn−1
k+1
),
• (midpoint rule) ∆ k=0 f ( xk +x
2
Pn
• (right point rule) ∆ k=1 f (xk ).
There are two other rules that are use for numerical integration. There is the
trapezium rule which approximates the integral by a sum of areas of trapeziums
and there is Simpson’s rule which isn’t geometrically obvious (but we will see
another justification for it later):
Pn−1
• (trapezium rule) ∆ k=0 f (xk )+f2 (xk+1 ) ,
• (Simpson’s rule) ∆
Pn−1
k=0,k even
f (xk )+4f (xk+1 )+f (xk+2 )
,
3
where in Simpson’s rule n has to be even.
We will first give a simple example and then analyze the error made using
these numerical integration rules.
As an example we consider the integral
Z 1
x5 dx,
0
i.e. A = 0, B = 1 and f (x) = x5 . Of course we know that the exact answer is
1
1
1
6 . If we choose n = 2, then we have x0 = 0, x1 = 2 , x2 = 1, ∆ = 2 and the
rules give the following approximations to the integral
1
1
• (left point rule) 21 0 + 32
= 64
≈ 0.0156,
1
243
61
• (midpoint rule) 12 1024
+ 1024
= 512
≈ 0.1191,
1
• (right point rule) 12 32
+ 1 = 33
64 ≈ 0.5156,
1
1
0+ 32
+1
• (trapezium rule) 12
+ 322
= 17
2
64 ≈ 0.2656,
71
• (Simpson’s rule)
1
2
4
0+ 32
+1
3
=
3
16
≈ 0.1875.
The accompanying table contains the results of the rules for various values of
n.
left point rule
midpoint rule
right point rule
trapezium rule
Simpson’s rule
2
0.0156
0.1191
0.5156
0.2656
0.1875
4
0.0674
0.1539
0.3174
0.1924
0.1680
8
0.1107
0.1634
0.2357
0.1732
0.1667
16
0.1370
0.1659
0.1995
0.1683
0.1667
It is clear from this table that Simpson’s rule is best, second best are the
midpoint rule and the trapezium rule and worst are the left point rule and the
right point rule. This is true for most functions: it can be shown that the
errors behave like n14 for Simpson’s rule; like n12 for the midpoint rule and the
trapezium rule and like n1 for the left point rule and the right point rule.
We give an indication based on Taylor series for why the errors behave like
this. We consider the error made on the interval [c−h, c+h] with 2h = ∆ = B−A
n
and note that on the whole interval [A, B] the error in the integral is roughly n
times this error (since there are n intervals of length 2h). First we consider the
integral
Z c+h
I(h) :=
f (x) dx,
c−h
as a function of h and we expand it into a Taylor series around h = 0. For this
we note that
I(h) = F (c + h) − F (c − h),
where F is an anti-derivative of f . The Taylor series of F around x = c is
F (x) = F (c)+f (c)(x−c)+
f 0 (c)
f (2) (c)
f (3) (c)
f (4) (c)
(x−c)2 +
(x−c)3 +
(x−c)4 +
(x−c)5 +. . .
2
3!
4!
5!
With x = c + h this gives
F (c + h) = F (c) + f (c)h +
f 0 (c) 2 f (2) (c) 3 f (3) (c) 4 f (4) (c) 5
h +
h +
h +
h + ...,
2
3!
4!
5!
and with x = c − h this gives
F (c + h) = F (c) − f (c)h +
f 0 (c) 2 f (2) (c) 3 f (3) (c) 4 f (4) (c) 5
h −
h +
h −
h + ....
2
3!
4!
5!
Subtracting gives
I(h) = 2f (c)h + 2
f (2) (c) 3
f (4) (c) 5
h +2
h + ....
3!
5!
72
The left point rule gives as approximation L(h) = 2hf (c − h). The Taylor series
of f around x = c evaluated at c − h gives
f (2) (c) 2 f (3) (c) 3 f (4) (c) 4
h −
h +
h + ... .
L(h) = 2h f (c) − f 0 (c)h +
2
3!
4!
We see that for the error we have
I(h) − L(h) = 2f 0 (c)h2 + . . . ,
1
so this error is of the order h2 . Since 2h = B−A
n , the error is of the order n2 .
n
1
Since we have n such intervals, the total error is of the order n2 = n .
The argument for the other numerical integration rules is similar. Here we
only give the one for Simpson’s rule. For Simpson’s rule we have
S(h) = h
f (c − h) + 4f (c) + f (c + h)
.
3
The Taylor series for f gives
f (c − h) = f (c) − f 0 (c)h +
f (2) (c) 2 f (3) (c) 3 f (4) (c) 4
h −
h +
h + ...,
2
3!
4!
f (c + h) = f (c) + f 0 (c)h +
f (2) (c) 2 f (3) (c) 3 f (4) (c) 4
h +
h +
h + ...
2
3!
4!
and
Substituting gives
S(h) = 2hf (c) +
f (2) (c) 3 2 f (4) (c) 5
h +
h + ...
3
3
4!
It follows that for the error we have
2
2
I(h) − S(h) =
−
f (4) (c)h5 + . . . .
5! 3 4!
It follows that for Simpson’s rule the error over a small interval is of the order
h5 which equals n15 so that since we have n of these intervals, the error over the
whole interval is of the order n14 .
3.13
Improper integrals
Integrals are called improper if either either the interval over which we integrate
is infinite or the function that we integrate becomes infinite at the boundary of
the finite integration interval.
We first consider the case of an infinite interval. We start with the example
Z ∞
e−x dx.
1
73
Note that we did not define integrals of this type yet (intervals over which we
integrated had to be finite). It seems reasonable to define the above integral as
∞
Z
e
−x
Z
b
dx = lim
b→∞
1
e−x dx.
1
The integral on the right-hand side is well-defined for all b and if the limit as
b → ∞ exists, then that is the reasonable value for the integral over the infinite
interval. We can actually easily compute this value in this case:
Z
lim
b→∞
1
b
b
e−x dx = lim −e−x 1 = lim −e−b + e−1 = e−1 .
b→∞
So we have
b→∞
∞
Z
e−x dx = e−1 .
1
Often we would just write
Z ∞
∞
e−x dx = −e−x 1 = 0 + e−1 = e−1 ,
1
thus only implicitly taking the limit.
Of course the limit can be infinite as the following example shows:
Z
1
∞
ex dx = lim
b→∞
Z
b
b
ex dx = lim [ex ]0 = lim eb − e1 = ∞.
b→∞
1
b→∞
If the interval that we integrate over is the whole axis, then some problems arise.
For example
Z
c
lim
c→∞
but
Z
lim
a→−∞
x dx = 0,
−c
d
Z
x dx + lim
b→∞
a
b
x dx
d
does not exist for any d (the first limit is −∞ and the second is ∞ and there
is no way to add these). This second definition (letting the lower bound and
the upper bound converge independently to −∞ and ∞ respectively) is chosen
as definition of the integral over the whole line and the first definition (taking
the lower bound equals to minus to upper bound) leads to what is called the
principal value integral. Both have their merits.
The second type of improper integral is where the function becomes infinite
at a boundary point of the interval over which we integrate. This occurs e.g. in
Z 1
1
√ dx,
x
0
74
where the function that we integrate becomes infinite in zero. Note that we also
did not really define this type of integral yet (the upper-integral in this case will
always be infinite). Again we define this integral in terms of a limit of integrals
that are properly defined:
Z 1
Z 1
1
1
√ dx = lim
√ dx.
+
x
x
a→0
a
0
We can again rather easily calculate this limit:
Z 1
√ 1
√
1
√ dx = lim 2 x x=a = lim 2 − 2 a = 2,
lim
+
+
+
x
a→0
a→0
a→0
a
so
Z
0
1
1
√ dx = 2.
x
It is quite customary to write the calculation in the abbreviated form
Z 1
√ 1
1
√ dx = 2 x x=0 = 2 − 0 = 2.
x
0
The following is an example where the limit is infinite, in abbreviated notation:
Z
0
1
1
−1
1
dx =
= −1 − −∞ = ∞.
x2
x 0
In the slightly longer notation where we explicitly use limits this becomes somewhat clearer:
1
Z 1
Z 1
1
−1
1
1
−1
dx
=
lim
dx
=
lim
= lim −1−
= lim −1+ = ∞.
2
2
+
+
+
+
x
x
x
a
a
a→0
a→0
a→0
a→0
0
a
a
If the function we integrate becomes infinite in an interior point of the interval
over which we integrate, then there is again the distinction between the ordinary
improper integral and the principal value integral. We will not go into this
further.
75
Chapter 4
Ordinary differential
equations
In algebraic equations we are searching for a number that satisfies a certain
equation, in a differential equation we search for a function that satisfies a
certain equation involving that function and its derivatives. If we are searching
for a function of one variable than such a differential equation is called ‘ordinary’,
if we are searching for a function of several variables then we speak about partial
differential equations. We will only consider the former. Many physical laws are
given in terms of differential equation, for example Newton’s law for mechanics,
Maxwell’s equations for electromagnetism, the Einstein field equations of general
relativity and equations describing population growth and chemical reactions.
As mentioned, we will only consider ordinary differential equations. Moreover,
we will only consider first order equations (equations involving only the first
derivative), second order equations will be treated in MA10193. There are two
classes of first order ordinary differential equations that we will consider:
• separable:
dy
dx
= f (x)g(y)
dy
• linear: q(x) dx
+ p(x)y = r(x),
here f, g, q, p, r are given functions and y is the function to be found.
4.1
Separable equations
A first order ordinary differential equation is called separable if it can be written
in the form
dy
= f (x)g(y).
dx
dy
There is a simple trick for obtaining the solutions: treat dx
as a fraction, pull
everything involving involving y to one side and everything involving x to the
76
other side and integrate. In formulas:
dy
= f (x)g(y)
dx
gives
1
dy = f (x) dx
g(y)
which gives
Z
1
dy =
g(y)
Z
f (x) dx
Once we compute the anti-derivative, this is an algebraic equation for y which
can hopefully be solved.
Problem 93. Solve
dy
dx
= x.
Solution. Of course we know that the general solution is y(x) = 12 x2 + C, but
we will apply the above procedure anyway to make sure that it indeed gives this
dy
= x we obtain
answer. From dx
Z
Z
dy = x dx
and computing anti-derivatives gives
y=
1 2
x + C,
2
as it should.
Problem 94. Solve
Solution. From
dy
dx
dy
dx
= y.
= y we obtain
Z
1
dy =
y
Z
dx
and computing anti-derivatives gives
ln |y| = x + C.
Solving for |y| gives |y| = ex+C = eC ex . It follows that y = eC ex or y = −eC ex .
We can put this in one formula: y = Kex where K is either eC or −eC (or even
K = 0, which gives y = 0 which is a solution of the original equation).
Problem 95. Solve
dy
dx
= y1 .
77
Solution. From
dy
dx
=
1
y
we obtain
Z
Z
y dy =
dx
and computing anti-derivatives gives
1 2
y =x+C
2
√
Solving for y gives y = ± 2x + 2C.
As can be seen from the previous problems, a differential equation usually
has infinitely many solutions. An initial value problem like
dy
= y,
dx
y(0) = 1
has exactly one solution (under very general conditions on the coefficient functions f and g). The general solution to the above differential equation (ignoring
the initial value) we calculated in problem 94: y = Kex . The initial condition
y(0) = 1 gives K = 1, so y = ex .
Problem 96. Solve
dy
dx
= y1 , y(1) = 5.
Solution. From problem
95 we know that the general solution of the √
differential
√
equation is y = ± 2x + 2C. The initial condition y(1) = 5 gives ± 2 + 2C =
5.
√ First of all, we clearly must have ‘+’ and not
√ ‘-’. So we want to solve
2 + 2C = 5 for C. This gives C = 23/2. So y = 2x + 23.
4.2
Linear equations
We start with the special case
q(x)
dy
+ p(x)y = 0,
dx
i.e. the case r = 0. This is called a homogeneous linear equation (when r 6= 0 the
equation is called inhomogeneous). Homogeneous linear equations are separable
with f (x) = −p(x)
q(x) and g(y) = y, so they can be solved using the method of the
previous section. The inhomogeneous case is much more difficult. There are
approaches to this:
• Integrating factors,
• Variation of parameters,
• Method of undetermined coefficients.
78
4.2.1
Integrating factors
The main idea behind integrating factors is that equations where p = q 0 , i.e the
equations
dy
q(x)
+ q 0 (x)y = r(x)
dx
can be easily solved. Using the product rule the above equation is none other
than
(qy)0 = r.
So
Z
qy =
r(x) dx,
and thus
Z
1
r(x) dx.
q
The other idea behind integrating factors is that if we multiply the original
equation with a function m
q obtaining
y=
m(x)
p(x)
r(x)
dy
+ m(x)
y = m(x)
,
dx
q(x)
q(x)
then this equation is equivalent to the original equation as long as m, q 6= 0.
Combining these two ideas, we want to find a function m (called an integrating
factor) such that
dm
p(x)
=m
.
dx
q(x)
This is again a separable equation, so this is solvable. Once we have the integrating factor m, we can multiply the equation by it and obtain the solution
y.
Problem 97. Solve
dy
dx
+
y
x
= x2 by integrating factors.
Solution. We first find the integrating factor. So we multiply by m:
m
dy
y
+ m = mx2 ,
dx
x
and try to write the left-hand side as (my)0 . That leads to the equation
dm
m
= .
dx
x
Separating variables gives
Z
Z
dx
dm
=
m
x
and integrating gives ln |m| = ln |x| (we can omit the constant since we only
need one integrating factor). Solving gives m = x. With that m we obtain the
following equation for y:
dy
x
+ y = x3 ,
dx
79
or equivalently
(xy)0 = x3 .
It follows that
xy =
1 4
x + C,
4
(now don’t omit the constant!) so
y=
Problem 98. Solve
dy
dx
1 3 C
x + .
4
x
+ 2y = x by integrating factors.
Solution. We first find the integrating factor. So we multiply by m:
m
dy
+ 2my = mx,
dx
and try to write the left-hand side as (my)0 . That leads to the equation
dm
= 2m.
dx
Separating variables gives
Z
dm
=
m
Z
2 dx
and integrating gives ln |m| = 2x (we can omit the constant since we only need
one integrating factor). Solving gives m = e2x . With that m we obtain the
following equation for y:
e2x
dy
+ 2e2x y = e2x x,
dx
or equivalently
(e2x y)0 = xe2x .
It follows that (integrate the right-hand side by parts)
1 2x 1 2x
xe − e + C,
2
4
e2x y =
(now don’t omit the constant!) so
y=
1
1
x − + Ce−2x .
2
4
80
4.2.2
Variation of parameters
A second method to solve inhomogeneous linear differential equations is variation of parameters. The computations carried out are basically the same as for
integrating factors, but the reasoning behind it is different. It has the advantage
that it also works for higher order equations with little change.
The first step in variation of parameters is to find a solution z of the corresponding homogeneous equation
q(x)
dz
+ p(x)z = 0
dx
(this is a separable equation, so that is relatively easy). The main idea is to try
a solution of the form y(x) = c(x)z(x) for the original equation. Substituting
this form gives
qc0 z + qz 0 c + pcz = r.
Using that qz 0 + pz = 0 (that’s how we chose z) we obtain
c0 =
r
.
qz
This can be integrated to find c and then we have the solution y = cz of the
original equation.
Problem 99. Solve
dy
dx
+ 2y = x by variation of parameters.
Solution. We first find the solution of the homogeneous equation.
dz
+ 2z = 0
dx
This gives z = e−2x (we can omit the constant since we only need one solution).
Substituting y = ce−2x in the to be solved equation gives
c0 e−2x − 2ce−2x + 2ce−2x = x
or equivalently
c0 e−2x = x
c0 = xe2x .
Integrating by parts gives (now don’t omit the constant!)
c=
1 2x 1 2x
xe − e + K
2
4
y=
1
1
x − + Ke−2x .
2
4
so
Problem 100. Solve
dy
dx
+
y
x
= x2 by variation of parameters.
81
Solution. We first find the solution of the homogeneous equation.
dz
z
+ =0
dx x
Separating variables gives
Z
dz
=−
z
Z
dx
x
1
and integrating gives ln |z| = − ln |x| = ln |x|
(we can omit the constant since
we only need one solution). Solving gives z = x1 . Substituting y = xc in the to
be solved equation gives
c1
c0
c
− 2+
= x2
x
x
xx
or equivalently
c0
= x2 , c0 = x3 .
x
It follows that
1
c = x4 + K
4
(now don’t omit the constant!) so
y=
4.2.3
1 3 K
x + .
4
x
Method of undermined coefficients
This method doesn’t always work, but when it works it is often rather easy.
dy
Assume that we have a solution y of q(x) dx
+ p(x)y = r(x) and a solution z
dz
of the corresponding homogeneous equation q(x) dx
+ p(x)z = 0. Then y + z
is also a solution of the inhomogeneous equation. It follows that the general
solution of the inhomogeneous equation is of the form y = z + yp , where z is
the general solution of the homogeneous equation and yp is a solution of the
inhomogeneous equation. So we really only have to find one solution of the
inhomogeneous equation. In some cases the form of such a solution can be
easily guessed. This is e.g. the case if p and q are constant and r is of the
form f (x)eax with f a polynomial. Then the solution is of the form h(x)eax
with h a polynomial of the same degree as f if eax is not a solution of the
homogeneous equation and one order higher if it is. Similarly if r is of the form
f (x) cos (ax) + g(x) sin (ax) with f and g polynomials, then the solution is of
the form h(x) cos (ax) + k(x) sin (ax) with h and k polynomials of degree the
maximum of the degrees of f and g.
Problem 101. Solve
dy
dx
+ 2y = x.
82
Solution. Since r is a first degree polynomial we try a first degree polynomial
as solution (this is the case a = 0): y = Ax + B. Substituting in the equation
gives
A + 2Ax + 2B = x.
It follows that 2A = 1, A + 2B = 0 so that A = 1/2 and B = −1/4. So the
particular solution is yp = 21 x− 41 . Since the general solution to the homogeneous
equation is z = Ke−2x the general solution of the inhomogeneous equation is
y = Ke−2x + 12 x − 41 .
Problem 102. Solve
dy
dx
+ 3y = x2 .
Solution. Since r is a second degree polynomial we try a second degree polynomial as solution (this is the case a = 0): y = Ax2 + Bx + C. Substituting in
the equation gives
2Ax + B + 3Ax2 + 3Bx + 3C = x2 .
It follows that 3A = 1, 2A+3B = 0, B +3C = 0 so that A = 1/3 and B = −2/9,
2
. Since the general
C = 2/27. So the particular solution is yp = 31 x2 − 92 x + 27
−3x
solution to the homogeneous equation is z = Ke
the general solution of the
2
inhomogeneous equation is y = Ke−3x + 13 x2 − 29 x + 27
.
Problem 103. Solve
dy
dx
+ 2y = xe3x .
Solution. Since r is a first degree polynomial times e3x , we try a first degree
polynomial times e3x as solution: y = (Ax+B)e3x . Substituting in the equation
gives
(3Ax + 3B + A)e3x + (2Ax + 2B)e3x = xe3x
It follows that 5A = 1, A + 5B = 0 so that A = 1/5 and B = −1/25. So
1
the particular solution is yp = ( 51 x − 25
)e3x . Since the general solution to the
−2x
homogeneous equation is z = Ke
the general solution of the inhomogeneous
1
equation is y = Ke−2x + ( 15 x − 25
)e3x .
Problem 104. Solve
dy
dx
+ 2y = cos x.
Solution. Since r is a zeroth degree polynomial times the cosine we try a zeroth
degree polynomial times the cosine plus a zeroth degree polynomial times the
sine as solution: y = A cos x + B sin x. Substituting in the equation gives
−A sin x + B cos x + 2A cos x + 2B sin x = cos x.
It follows that 2A + B = 1, −A + 2B = 0 so that A = 2/5 and B = 1/5. So the
particular solution is yp = 52 cos x + 15 sin x. Since the general solution to the
homogeneous equation is z = Ke−2x the general solution of the inhomogeneous
equation is y = Ke−2x + 25 cos x + 15 sin x.
Problem 105. Solve
dy
dx
+ 2y = e−2x .
83
Solution. Now r is a solution of the homogeneous equation, meaning that this is
the exceptional case. So we try yp = (Ax + B)e−2x as a solution (a first degree
polynomial times the exponential instead of a zeroth degree polynomial times
the exponential). Substituting in the equation gives
(−2Ax − 2B + A)e−2x + (2Ax + 2B)e−2x = e−2x .
It follows that A = 1 and B is arbitrary. So the general solution is y = (x +
B)e−2x .
4.3
Numerical methods
Up to now we have look for exact solutions to differential equations. However,
these can often not be found. In those cases, we can use a computer to numerically approximate the solution. We will consider the general first order initial
value problem
y 0 (x) = f (x, y(x)), y(0) = y0 ,
where the function (of two variables) f and the number y0 are given and an
approximation of the function y is sought.
We can replace the derivative y 0 (x) by the difference quotient y(x+h)−y(x)
for
h
some small h to obtain an approximate solution. We will call the approximation
z. This gives
z(x + h) − z(x)
= f (x, z(x)), z(0) = y0 .
h
This can be re-written as
z(x + h) = z(x) + hf (x, z(x)),
z(0) = y0 .
Since we know z(0), we can substitute x = 0 in the equation and obtain z(h):
z(h) = y0 + hf (0, y0 ).
Now that we know z(h) we can substitute x = h into the equation and obtain
z(2h):
z(2h) = z(h) + hf (h, z(h)).
In general we have
z(nh + h) = z(nh) + hf (nh, z(nh)).
If h is small enough, then z(nh) ≈ y(nh). This method for approximating the
solution of a differential equation is called forward Euler.
Problem 106. Use forward Euler to approximate y(2), where y is the unique solution of the initial value problem y 0 = −y, y(0) = 1. Investigate the dependence
on the step-size h used and compare to the exact value.
84
Solution. This problem fits into the general framework with f (x, y) = −y and
y0 = 1. The iteration formula then becomes
z(nh + h) = z(nh) − hz(nh) = (1 − h)z(nh),
z(0) = 1.
This can in fact be solved exactly:
z(nh) = (1 − h)n .
It follows that
z(2) = (1 − h)2/h .
The exact solution of the initial value problem obviously is y(x) = e−x so that
for the exact value we have y(2) = e−2 = 0.1353 . . .. The following table shows
the value obtain by forward Euler with various step-sizes h. We note that for
h
1
1
2
1
4
1
8
1
16
1
32
z(2)
0
0.0625
0.1001
0.1181
0.1268
0.1311
more complicated functions f , the equation for z cannot by solved exactly and
to obtain z(2) all the previous values z(nh) with nh < 2 would have to be
obtained first.
Problem 107. Use forward Euler to approximate y(2), where y is the unique solution of the initial value problem y 0 = y1 , y(0) = 1. Investigate the dependence
on the step-size h used and compare to the exact value.
Solution. This problem fits into the general framework with f (x, y) =
y0 = 1. The iteration formula then becomes
z(nh + h) = z(nh) +
h
,
z(nh)
1
y
and
z(0) = 1.
The exact solution of the
√ initial value problem can be found by separating
variables
and
is
y(x)
=
2x + 1 so that for the exact value we have y(2) =
√
5 = 2.23607 . . .. The following table shows the value obtain by forward Euler
with various step-sizes h. The calculation now has to be done by iteration. For
example when h = 1 we calculate z(1) = 1 + 11 = 2 and then z(2) = 2 + 21 = 2.5.
h
1
1
2
1
4
z(2)
2.5
2.3435
2.2894
85
There are other methods of approximating solutions of differential equations
besides forward Euler. To discuss the motivation for those, we’ll first look at
a different way of interpreting forward Euler. By the fundamental theorem of
calculus we have
Z
h
y 0 (x) dx,
y(h) = y(0) +
0
which by using the differential equation and the initial condition gives
Z h
f (x, y(x)) dx.
y(h) = y0 +
0
We approximate this integral using one rectangle with as height the value of the
function at the left end point. We then obtain as approximation of y(h):
y0 + hf (0, y0 ).
We see that this is exactly the same approximation that we got using forward
Euler. Viewing it in terms of integrals however motivates other methods: we
can use other numerical methods for approximating the integral. If we use as
height of the rectangle the value of the function at the right end-point, then we
obtain as approximation of y(h):
y0 + hf (h, y(h)).
Iterating this leads to the following rule
z(nh + h) = z(nh) + hf (nh + h, z(nh + h)).
This method is called backward Euler. Note that we now obtain an equation
for z(nh + h) and not an explicit formula. This equation can be solved using
for example Newton’s method (though in this particular case often some other
method is used). Methods that lead to equations for z(nh+h) are called implicit
and methods that lead to a formula for z(nh + h) are called explicit. Forward
Euler is therefore also often called explicit Euler and backward Euler is often
called implicit Euler.
We could also have approximated the integral by the trapezium method.
Then we obtain as approximation of y(h):
y0 +
h
(f (0, y0 ) + f (h, y(h))) .
2
Iterating this leads to the following rule
z(nh + h) = z(n) +
h
(f (nh, z(nh)) + f (nh + h, z(nh + h))) .
2
This method is also called the trapezium method (but no confusion should arise).
This is an implicit method.
86
We could also have approximated the integral by using Simpson’s rule. With
a slight twist to make sure that the method becomes explicit, this gives rise to
what is known as the Runge–Kutta method. This is the most common method
used to approximate the solutions of differential equations.
Problem 108. Use backward Euler to approximate y(2), where y is the unique
solution of the initial value problem y 0 = −y, y(0) = 1. Investigate the dependence on the step-size h used and compare to the exact value.
Solution. This problem fits into the general framework with f (x, y) = −y and
y0 = 1. The iteration formula then becomes
z(nh + h) = z(nh) − hz(nh + h),
z(0) = 1.
This can in fact be solved exactly:
z(nh) =
It follows that
z(2) =
1
.
(1 + h)n
1
.
(1 + h)2/h
The exact solution of the initial value problem obviously is y(x) = e−x so that
for the exact value we have y(2) = e−2 = 0.1353 . . .. The following table shows
the value obtain by backward Euler with various step-sizes h. We note that
h
1
1
2
1
4
1
8
1
16
1
32
z(2)
0.25
0.1975
0.1677
0.1519
0.1437
0.1395
for more complicated functions f , the equation for z cannot by solved exactly
and to obtain z(2) all the previous values z(nh) with nh < 2 would have to be
obtained first.
Problem 109. Use backward Euler to approximate y(2), where y is the unique
solution of the initial value problem y 0 = y1 , y(0) = 1. Investigate the dependence on the step-size h used and compare to the exact value.
Solution. This problem fits into the general framework with f (x, y) =
y0 = 1. The iteration formula then becomes
z(nh + h) = z(nh) +
h
,
z(nh + nh)
87
z(0) = 1.
1
y
and
Note that this is a quadratic equation for z(n + h) and can be solved as
p
z(nh) ± z(nh)2 + 4h
z(nh + h) =
.
2
We want to take the plus sign since we want a positive solution (note that this
is not completely obvious...), so the iteration is
p
z(nh) + z(nh)2 + 4h
z(nh + h) =
.
2
Note that for more complicated functions f , we might not be able to obtain such
an explicit formula and may have to resort to numerically solving an equation
at each step.
The exact solution √
of the initial value problem can be found by separating
√
variables and is y(x) = 2x + 1 so that for the exact value we have y(2) = 5 =
2.23607 . . .. The following table shows the value obtain by backward Euler with
various step-sizes h. The calculation now has√ to be done
by iteration. For
√
1+ 1+4
1+ 5
example when h = 1 we calculate z(1) =
= 2 and then z(2) ≈
2
2.0953.
h
1
1
2
1
4
1
8
1
16
1
32
z(2)
2.0953
2.1575
2.1942
2.2144
2.2250
2.2305
88
Chapter 5
Functions of several
variables
Up to now we have considered functions of one variable. However, in applications functions of several variables often occur. Luckily most problems for
functions of several variables can be reduced to problems about functions of one
variable. In this module we will consider one of those problems: finding extrema
of functions of several variables.
5.1
Partial derivatives and extrema
Just as in the one variable case, we can define local and global extrema. For
example the function f has a global minimum at the point (c, d) if for all (x, y)
we have f (c, d) ≤ f (x, y).
Now comes an important observation. If the function f has a global minimum at (c, d), then the function f (x, d) (x is the only variable since the second
argument is the constant number d) has a global minimum at c. Similarly, the
function f (c, y) has a global minimum at d. For a global maximum and local
extrema the same reasoning applies. But we know a condition that has to be
satisfied for the functions f (x, d) and f (c, y) of one variable to have local extrema: their first derivative has to be zero. So if a function of two variables has
a local extremum at (c, d), then
d
f (x, d)|x=c = 0,
dx
d
f (c, y)|y=d = 0.
dy
We generalize the above slightly in the following way. We define the partial
derivative of f with respect to x as the derivative of f considered as a function of
x when y is assumed constant. For example the partial derivative with respect
to x of f (x, y) = x2 + yx3 + y 2 is 2x + 3yx2 . Note that this partial derivative
is again a function of two variables (x and y). Similarly, the partial derivative
of f with respect to y is the derivative of f considered as a function of y when
89
x is assumed constant. For the example f (x, y) = x2 + yx3 + y 2 , the partial
derivative with respect to y is x3 + 2y. The notation for the partial derivatives
with respect to x and y is respectively
∂f
.
∂y
∂f
,
∂x
Note the symbol ∂ is not quite a d.
The above reasoning about derivatives and extrema implies that at an extremum both partial derivatives have to be zero (unless of course the extremum
is at a boundary point or a point where the partial derivatives are not defined).
So the candidates for an extremum for the function f (x, y) = x2 + yx3 + y 2 are
the points where
2x + 3yx2 = 0, x3 + 2y = 0.
3
and substitute that in the first
From the second equation we obtain y = −x
2
equation to obtain 2x − 32 x5 = 0. This factors as x2 (4 − 3x4 ) = 0, so that either
q
x = 0 or x4 = 34 . This last equation gives x = ± 4 43 . Substituting this in
3
the equation for y that we obtained, namely y = −x
2 we obtain the stationary
points
r
r
3/4 !
3/4 !
4
1 4
4 4 1
4 4
(0, 0),
−
,
,
,−
,
3 2 3
3 2 3
For a function f (x, y, z) of three variables the situation is very similar: the
partial derivative of f with respect to x is obtained by differentiating with
respect to x assuming that the other variables (y and z) are constant. Similarly
for the partial derivatives with respect to y and to z.
Problem 110. Determine the candidates for extrema given by the first derivative
test for the function
f (x, y, z) = x2 z + z 2 y − z − y
Solution. We first compute the partial derivatives
∂f
= 2xz,
∂x
∂f
= z 2 − 1,
∂y
∂f
= x2 + 2zy − 1,
∂z
and then try to find all the points (x, y, z) where are three of these expressions
are zero. The equation z 2 − 1 = 0 shows that either z = −1 or z = 1. It then
follows from the equation 2xz = 0 that we must have x = 0. If z = −1 then we
obtain from the equation x2 + 2zy − 1 = 0 that y = − 21 and if z = 1 then we
obtain from that same equation that y = 12 . So the first derivative test gives as
candidates for extrema (0, − 12 , −1) and (0, 21 , 1).
Since the partial derivatives are again functions of two variables, these can
be partially differentiated again. In that way we obtain four second order derivatives:
∂2f
∂2f
∂2f
∂2f
,
,
,
.
∂x2
∂y∂x
∂x∂y
∂y 2
90
Luckily the order in which we differentiate does not matter: the partial derivative with respect to y of the partial derivative with respect to x equals the
partial derivative with respect to x of the partial derivative with respect to y;
in formulas
∂2f
∂2f
=
.
∂y∂x
∂x∂y
With the use of these second order partial derivatives we can write down the
second order Taylor approximation:
∂f
∂f
(a, b) + (y − b) (a, b)
∂x
∂y
2
∂2f
1
1
∂ f
∂2f
(a, b) + (y − b)2 2 (a, b) + . . . .
+ (x − a)2 2 (a, b) + (x − a)(y − b)
2
∂x
∂y∂x
2
∂y
f (x, y) = f (a, b) + (x − a)
As in the one variable case, the quadratic part in the Taylor formula determines
whether the candidate extrema found using the first derivative test are maxima
or minima (unless the quadratic term is zero in which case we know nothing).
Due to the presence of the cross-term (x − a)(y − b) this is slightly complicated,
the result is as follows:
• The point (a, b) is a minimum if the following three conditions all hold
–
∂f
∂x (a, b)
=
2
∂f
∂y (a, b)
–
∂ f
∂x2 (a, b)
–
∂2f
∂2f
∂x2 (a, b) ∂y 2 (a, b)
= 0,
> 0,
−
2
∂2f
∂y∂x (a, b)
> 0.
• The point (a, b) is a maximum if the following three conditions all hold
–
∂f
∂x (a, b)
2
=
∂f
∂y (a, b)
–
∂ f
∂x2 (a, b)
–
∂2f
∂2f
∂x2 (a, b) ∂y 2 (a, b)
= 0,
< 0,
−
2
∂2f
∂y∂x (a, b)
> 0.
• The point (a, b) is a saddle point (a minimum in one direction and a
maximum in another direction) if the following two conditions both hold
–
∂f
∂x (a, b)
=
∂f
∂y (a, b)
= 0,
2
2
2
2
∂ f
– ∂∂xf2 (a, b) ∂∂yf2 (a, b) − ∂y∂x
(a, b) < 0.
The easiest examples are as follows:
• f (x, y) = x2 + y 2 has a minimum at (0, 0),
• f (x, y) = −x2 − y 2 has a maximum at (0, 0),
91
• f (x, y) = x2 −y 2 has a saddle point in (0, 0) (a minimum in the x direction
and a maximum in the y direction).
Problem 111. Find the maxima and minima of f (x, y) = x2 − 3xy + 5x − 2y +
6y 2 + 8.
Solution. We first compute the partial derivatives
∂f
= 2x − 3y + 5,
∂x
∂f
= −3x − 2 + 12y.
∂y
From this we get the system of equations
2x − 3y + 5 = 0,
−3x − 2 + 12y = 0.
11
which has the unique solution x = − 18
5 , y = − 15 . We then compute the second
partial derivatives:
∂2f
= 2,
∂x2
2
We have
point
a minimum.
∂2f ∂2f
∂x2 ∂y 2 −
18
(− 5 , − 11
15 ) is
∂2f
∂y∂x
∂2f
= −3,
∂y∂x
∂2f
= 12.
∂y 2
= 24 − 9 = 15 > 0 and
∂2f
∂x2
= 2 > 0 so that the
There is a similar second derivative test for functions of more than two
variables. Without the language of matrices (to be treated in MA10193: mathematics 2) the formulas become too messy to easily write down though, so we
will not bother at this point in time.
5.2
Lagrange multipliers and boundary extrema
Just as in the one variable case, a maximum or minimum may be located on
the boundary of the domain. Consider the function f (x, y) = x2 y. We will find
its maximum and minimum value on the domain x2 + y 2 ≤ 3. As in the one
variable case, there is a general result (Weierstrass theorem) that states that
a continuous function on a domain including its boundary must have a global
maximum and a global minimum. If an extremum is reached at a point with
x2 + y 2 < 3 (i.e. the point is not on the boundary) then as mentioned in the
previous section, the partial derivatives must be zero. It is easy to see that the
partial derivatives are both provided
that
√
√ x = 0. So there are infinitely many
stationary points: (0, y) for − 3 ≤ y ≤ 3. Whether these points are maxima,
minima or saddle points is not determined by the second derivative test since
2 2
∂2f ∂2f
∂ f
−
= 0 for all these points. If z > 0, then f (x, z) ≥ 0 so that
∂x2 ∂y 2
∂y∂x
(0, y) is a local minimum if y > 0 (all points that are close have a function value
greater than or equal than the function value at (0, y)). Similarly if y < 0 then
f (x, z) ≤ 0 so that (0, y) for y < 0 is a local maximum. Arbitrarily close to
92
(0, 0) there are both points with negative and with positive function values so
that (0, 0) is a neither a maximum nor a minimum.
As in the one variable case, when an extremum is reached at the boundary
the partial derivatives need not be zero. So we have to investigate the case of
points for which x2 + y 2 = 3 separately. We will do this in three different ways.
• Solve and substitute,
• Parametrization,
• Lagrage multipliers.
2
2
We first
√ solve and substitute. Since x + y = 3 we can solve y in terms of x
2
as y = ± 3 − x . We can then substitute this in the formula for f and obtain a
formula for f containing x only. Since there are two possibilities for y (the plus
sign and the minus sign), we have to consider two cases. In this case it is easier
to solve x as a function of y and substitute: x2 = 3 − y 2 so that on the boundary
f = x2 y = 3y − y 3 . We find the extrema of this function of one variable. We
df
have dy
= 3 − 3y 2 , so that y = ±1 are the stationary points. We also have to
√
consider√the boundary
points y = ± 3 (note that x is only solvable in terms of
√
y for − 3 ≤ y ≤ 3). We can make the sign chart
√
− 3
−
+
−1
1
− √
3
√
√
√
It follows that at (0,
√ − 3) we have
√ a maximum, at ( 2, −1) and (−
√ 2, −1) we
have minima, at ( 2, 1) and (− 2, 1) we have maxima and at (0, 3) we have
a minimum. Of course, these points are only local maxima or minima compared
to points that are close and are on the boundary. When comparing also with
points in the interior, it may turn out that they are actually saddle points. From
substituting √
this points in √
the function we see that the global maximum of 2 is
reached
at
(
2,
1)
and
(−
2, 1) and the global minimum of −2 is reached at
√
√
( 2, −1) and (− 2, −1).
We now√use a parametrization.
The circle x2 + y 2 = 3 can be parametrized
√
as x(t) = 3 cos t, y(t) = √ 3 sin t with t ∈ [0, 2π). Substituting this in the
formula for f gives f (t) √
= 3 3 cos2 t sin t. The derivative
is by the product rule
0
and chain rule f (t) = 3 3 cos3 t − 2 cos t sin2 t . Using Pythagoras this equals
q
√
3 3 cos t 3 cos2 t − 2 . This equals zero if and only if cos t = 0 or cos t = ± 23 .
√
√
√
√
√
This
√ corresponds to the points (0, 3), (0, − 3), ( 2, 1), ( 2, −1), (− 2, 1),
(− 2, −1). We don’t have to consider boundary values of t because of the
periodic nature of the parametrization. Normally we would also have to look at
points on the boundary of the paramerization interval. From the sign chart of
f 0 (t) we obtain the same information as we obtained in the solve and substitute
case.
Lastly we use Lagrange multipliers. The general result is that if f has an
extremum (c, d) on the curve given implicitly by g(x, y) = 0, then there exists
93
a λ such that
∂f
∂g
=λ ,
∂x
∂x
∂f
∂g
=λ .
∂y
∂y
In our case g(x, y) = x2 + y 2 − 3. We obtain the equations
2xy = 2λx,
x2 = 2λy.
We also of course have the equation x2 +y 2 = 3. This gives us three equations in
three variables (x, y and λ). From the first equation
we obtain x = 0 or y = λ.
√
If x = 0, then the third equation gives y = ± 3. If y = λ, then the second
equation becomes x2 = 2y 2 . Substituting this into the third equation gives
3y 2 = 3, which in√
turn gives y = ±1. For each of these choices the third equation
then gives x = ± 2. So we again obtain the 6 candidate extrema as using the
other two methods. There is a second derivative test to determine whether these
points are local maxima or minima (it is based on what is called the bordered
Hessian), but again without the language of matrices these conditions become
too messy to write down.
The method of Lagrange multipliers has several advantages over the other
two methods. Firstly, in many cases we will not be able to solve one variable in
terms of the other or to find a parametrization. In such cases the methods of
Lagrange multipliers is the only available option. Secondly, even in cases where
we can solve or parametrize, the computations are often easier using the method
of Lagrange multipliers. Thirdly, when using Lagrange multipliers we do not
have to worry about boundary points.
5.3
The chain rule
Assume that x = x(t) and y = y(t), then f (x, y) is actually only a function of
one variable, namely t. The derivative of f with respect to this variable can be
calculated by the following chain rule
df
∂f dx ∂f dy
=
+
.
dt
∂x dt
∂y dt
That this is true follows by letting h → 0 in
f (x(t + h), y(t + h)) − f (x(t), y(t)) f (x(t + h), y(t + h)) − f (x(t), y(t + h)) x(t + h) − x(t)
=
h
x(t + h) − x(t)
h
f (x(t), y(t + h)) − f (x(t), y(t)) y(t + h) − y(t)
+
.
y(t + h) − y(t)
h
We now return to implicit differentiation. There we were given an algebraic
equation relating x and y (e.g. for the unit circle x2 + y 2 = 1) and we wanted
dy
to find the slope dx
. With the use of the chain rule we can make this more
formal. The curve is given by the zero set of a function of two variables: the
set of all points (x, y) such that g(x, y) = 0 (e.g. in the case of the unit circle
94
g(x, y) = x2 + y 2 − 1). We take x(t) = t and differentiate g with respect to t
using the chain rule:
dg
∂g
∂g dy
=
+
.
dt
∂x ∂y dt
dy
dy
Since g(x(t), y(t)) = 0 for all t we must have dg
dt = 0 and realizing that dt = dx
we obtain the equation
∂g
∂g dy
+
= 0.
(5.1)
∂x ∂y dx
When we did implicit differentiation we already used this formula (but without
writing it down in the general case because we didn’t know partial derivatives
at the time).
We return to Lagrange multipliers. We assume that the curve g(x, y) = 0
can be parametrized by x = x(t), y = y(t); but we will not need an explicit
parametrization. Since we have the constraint g(x, y) = 0 we obtain similarly
as above using the chain rule
∂g dx ∂g dy
dg
=
+
= 0,
dt
∂x dt
∂y dt
(5.2)
which holds for all t. For the function f that we want to find the extrema of we
have
df
∂f dx ∂f dy
=
+
.
dt
∂x dt
∂y dt
In points where f has an extremum we must have
df
dt
= 0 so that in those points
∂f dx ∂f dy
+
= 0.
∂x dt
∂y dt
(5.3)
∂g ∂g
Geometrically, the equation (5.2) means that the vector ( ∂x
, ∂y ) is always ordy
thogonal to the vector ( dx
dt , dt ) (since the scalar product of the two vectors
is zero). The equation (5.3) means geometrically that in extrema the vector
∂f
∂g ∂g
∂f ∂f
dx dy
( ∂f
∂x , ∂y ) is orthogonal to the vector ( dt , dt ). Since both ( ∂x , ∂y ) and ( ∂x , ∂y )
at an extremum are orthogonal to the same vector, they must be parallel. It
follows that there must exist a λ such that
∂g ∂g
∂f ∂f
,
=λ
,
.
∂x ∂y
∂x ∂y
These are exactly the Lagrange multipliers equations give earlier.
5.4
Exact differential equations
We will use functions of several variables and the chain rule to solve certain
differential equations called exact differential equations. So assume that we are
given a differential equation
M (x, y) + N (x, y)
95
dy
= 0.
dx
(5.4)
What we will look for is an algebraic expression h(x, y) = 0, i.e. an implicit
expression of the solution y(x). Using implicit differentiation on h(x, y) = 0,
remembering that y is a function of x we obtain (see (5.1)):
∂h ∂h dy
+
= 0.
∂x ∂y dx
So if we want h(x, y) = 0 to be an implicit solution of the given given differential
equation (5.4), then we should have
∂h
= N (x, y).
∂y
∂h
= M (x, y),
∂x
(5.5)
If we suppose that there exists a function h that satisfies both of these partial
differential equations, then we must have
∂N
∂M
=
,
∂y
∂x
(5.6)
since both are equal to the mixed second order partial derivative of h (differentiate once with respect to x and once with respect to y; remember that the
order in which we differentiate is immaterial). The differential equation (5.4) is
called exact if (5.6) holds; in that case the equations (5.5) can be solved and we
obtain a solution of the differential equation (5.4) in implicit form.
Problem 112. Find all solutions of
2x + y 2 + 2xy
dy
= 0.
dx
Solution. We have M (x, y) = 2x + y 2 and N (x, y) = 2xy, which implies that
∂N
∂M
= 2y =
,
∂y
∂x
so that the given equation is exact. The equation
∂h
= 2x + y 2
∂x
gives h(x, y) = x2 + y 2 x + g(y), where g is an arbitrary function of y. We
∂h
0
calculate ∂h
∂y = 2yx + g (y) and noting that should have ∂y = N we see that
g 0 (y) = 0. It follows that g must be constant. So h(x, y) = x2 + y 2 x + C. So
the solutions are given in implicit form by the equation
x2 + y 2 x + C = 0.
This can easily be made explicit:
r
y=±
−C
− x.
x
96
Problem 113. Find all solutions of
y cos x + 2xey + sin x + x2 ey − 1
dy
= 0.
dx
Solution. We have M (x, y) = y cos x + 2xey and N (x, y) = sin x + x2 ey − 1,
which implies that
∂M
∂N
= cos x + 2xey =
,
∂y
∂x
so that the given equation is exact. The equation
∂h
= y cos x + 2xey
∂x
gives h(x, y) = y sin x + x2 ey + g(y), where g is an arbitrary function of y. We
∂h
2 y
0
calculate ∂h
∂y = sin x + x e + g (y) and noting that should have ∂y = N we see
0
that g (y) = −1. It follows that g = −y + C So h(x, y) = y sin x + x2 ey − y + C.
So the solutions are given in implicit form by the equation
y sin x + x2 ey − y + C = 0.
This cannot easily be solved for y.
97