Calculus I document

1
Mathematics of Change
Modeling the change of physical systems is central to the applications of mathematics the speedometer in your car that measures change in position of your car over time, the
stockmarket which models over time the change in the value of stocks, the motion of a
satellite, to mention a few.
Modeling of change was first addressed by the Italian mathematician Galileo (1564 - 1642)
in the early part of the 17th century in the context of describing the motion of a falling
ball. Later in the century Isaac Newton (1643 - 1727) developed a comprehensive theory in
his famous work Philosophiae naturalis principia mathematica 1687 in which he applied
his ideas to many areas including a study of celestial mechanics - the motions of the earth,
moon, and other planets.
The foundation of all mathematics lies with the work of the early Greek mathematicians,
in particular Euclid (325 - 265 B.C.) and Archimedes (287 - 212 B.C.). But surprisingly
enough, a mathematical theory of motion waited for nearly 2,000 years. One reason perhaps is that the ideas of the famous philosopher Aristotle (384 - 322 B.C.) carried such
weight that progress beyond his analysis of motion, which contained major errors, was
difficult.
1.1
Driving to Montreal
When I drive from Toronto to Montreal, a distance of roughly 500 km, if it takes 5 hours,
I calculate my average speed at 100 km/hr - that is, average speed = distance/time.
Midway along I notice a sign saying that the distance to the nearest rest station is 40 km;
I’m interested because I need gas. Twenty minutes later I roll up to the gas station, and
I am able to calculate the average speed for this portion of the trip to be 120 km/hr. So
we all know how to calculate average speed over some specific time interval, but as I sit in
the car and look at the speedometer, I have a notion of the instantaneous speed of the car.
But what is this? How is instantaneous speed measured from one moment to the next?
The speedometer as we know measures only an approximation; it measures the distance
traveled over a small interval of time - say 1 second. It does this by measuring the number
of revolutions of the axle in one second. This translates to a certain number of revolutions
of the wheels in one second, which in turn translates to a certain distance traveled in one
second. So our impression of instantaneous speed is gained by simply taking the time
interval over which we measure distance to be small enough. But what we find then is still
an average speed.
I have just measured the diameter of the wheels on my car to be 60 cm. Thus with one
revolution of the wheels, my car travels 0.6π meters. So with 50 revolutions, it travels 30π
1
2
meters in a second or (6010)(30)
π ≈ 108 km/hr. The question now is, what is instantaneous
3
speed? We certainly have the concept within us. One way to define would be to take the
readings of increasingly more accurate speedometers - ones that measure revolutions over
every tenth or every thousandths of a second. This involves looking at some sort of limit the limit of readings of ever increasingly accurate speedometers.
Figure 1: average speed of falling object
1.2
Falling objects
It has been shown experimentally that an object that has been let fall travels a distance of
f (t) = 4.9t2 meters per second. Is there a way to analytically determine the instantanesous
speed, otherwise known as velocity, at any time after the object has been let fall? Yes,
there is. This is how we do it - in essence we simply measure the average speed over increasingly small intervals of time - that is we look at increasingly more accurate speedometers.
Suppose the time at which we wish to determine the the velocity is t0 seconds after the
object has been let fall, and lets consider a small time interval over which we can calculate
average speed - say the interval from t0 to t0 + 1/n where n is some large number. So
we need to calculate the distance traveled over this small amount of time divided by time
2
ellapsed, which is then t0 + 1/n − t0 = 1/n. In the figure above the amount fallen for two
different times t = 0.5 and t − 1.5 is represented by the vertical distance between the points
of intersection of the chord (the magenta line) with the graph. The time elapsed is the
horizontal distance between the points. The slope of the magenta line is then the distance
divided by the time. It is the average speed over the time interval [0.5, 1.5].
Figure 2: Average speed lines approach tangent
In figure 2 a second line in green is drawn which corresponds to the smaller time interval
[1.15, 1] and as the interval decreases to the point at which the difference of the times is
close to zero the corresponding line approaches the blue line tangent to the graph.
In more generality, lets compute the average speed of the object over the time interval from
t0 to t0 + n1 - in other words, the distance traveled divided by the length of the time interval,
[t0 , t0 + n1 ]. From a geometric point of view this then becomes the task of computing the
3
slope of the line joining the points (t0 , f (t0 )) and t0 + n1 , f (t0 + n1 ) . We get
f (t0 + n1 ) − f (t0 )
t0 + n1 − t0
=
4.9(t0 + n1 )2 − 4.9t20
=
4.9t20 + 9.8 tn0 +
1
n
1
n2
− 4.9t20
1
n
= 9.8t0 + n1 .
As n gets very large this number gets closer and closer to 9.8t0 . We express this by saying
f (t0 + n1 ) − f (t0 )
that the limit of
as n goes to infinity is 9.8t0 . We write
t0 + n1 − t0
f (t0 + n1 ) − f (t0 )
= 9.8t0 .
n→∞
t0 + n1 − t0
lim
1.3
The derivative
What we have done above carries over to any function provided it is possible to have a
unique tangent to the graph at the point of interest. So instead of a function describing
falling objects, lets consider an arbitrary function f : A → R defined on some open interval
A and suppose that f is such that at each point (x, f (x)) on the graph of f there is a unique
tangent. Just as before we modeled the rate of change in distance fallen as a function of
time, lets do the same thing for f. Given a point a ∈ A we will model the instantaneous
rate of change of f at the point a, and as before we do this by first setting up an expression
for an average rate of change as the slope of a line joining two points of the graph
In figure 3 we compute the average rate of change from a point a to a point a + h, as the
slope of the red line, where h is some small real number, negative or positive such that
a + h remains an element of the domain A. The instantaneous rate of change at a is then
the slope of the green line which is tangent to the graph at the point (a, f (a)). As the
quantity h gets smaller and smaller and the red line approaches the green. That is, the
slope of the green tangent line is determined by the slope of the red lines as a limit as h
gets very small. The instantaneous rate of change of f at a is also called the derivative of
f at a, and notationally is written as f 0 (a). We summarize as follows.
Definition 1 Given a function f : A → R defined on some open interval A, let a ∈ A
and suppose that the graph of f possesses a unique tangent line at the point (a, f (a). The
4
Figure 3: derivative at a is slope of green tangent
5
derivative of f at a ∈ A is defined to be the slope of the tangent to the graph of f at the
point (a, f (a)). It is calculated as the limit
f (a + h) − f (a)
.
h→0
h
f 0 (a) = lim
Given a function f as above, if the derivative exists at each point in the domain A of f, we
then in effect can describe a new function, which at every point x ∈ A gives the derivative
of f at the point x. This function referred to simply as f 0 - that is: f 0 (x) is the value of
the derivative of f at x
There are a number of different notational devices used for expressing derivatives. In
particular if a function is described by an equation such as y = 2x2 + 3x + 1, then the
d
derivative is sometimes referred to as dy/dx or dx
(2x2 + 3x + 1). Other times we write
Df (x) to mean the same as f 0 (x)
The problem with the definition of a derivative is that we do not really know what is meant
by such a limit, and also what would it mean for the limit not to exist - that is: under
what conditions would it not be possible to form a line tangent to the graph?
The later question is not too hard to answer. First we must be clear as to what we mean
by the word tangent.
Definition 2 Given any curve C in the plane and a point P on C, a line is tangent to the
curve C if
• in some small region about P the line does not intersect the curve except at the point
P or
• the line corresponds precisely to a portion of the curve.
What then does it mean for a function not to have a unique ( i.e. one and only one) tangent
to the graph at a point (a, f (a))? There are three cases. The first doesn’t really count since
we have prefaced all this by saying that the function is defined at the point a.
• the function is not defined at the point a. For instance the function f (x) =
not defined at x = 1.
1
x−1
is
• there is a break in the graph of f at the point a - if this should happen the function
is said to be discontinuous at a. The step function is an example of a function that
is discontinuous at each integer, see figure 4.
• the graph of f has a corner at (a, f (a)) so that it is possible to define more than
one line tangent at (a, f (a)). An example is the absolute value function which has a
corner at x = 0, see figure 5
6
Figure 4: multiple tangents at point of discontinuity
Figure 5: multiple tangents at a corner
7
The notion of a tangent line is intuitive and easy to grasp, but how does one calculate the
slope of such a line. The only way is by the sort of approximation which we have been
talking about. Given the function f and the point a ∈ A the expression
f (a + h) − f (a)
h
is referred to as difference quotient. It is in fact just another function whose variable
however is h - that is: we have a new function
g(h) =
f (a + h) − f (a)
,
h
and what we are asking is then the same as asking what is meant by the limit
lim g(h)?
h→0
This question is just a special case of a more general question - what precisely do we mean
for some function f we say that the functional values f (x) get closer and closer to some
number L as the points x get closer and closer to a fixed value a.
1.4
Limits
One way of thinking of thinking about limits of a funciton is to construct a sequence, as
we did in the case of falling objects, by looking at terms of the form f (a + n1 ) for n an
integer. If n is always taken as positive, the domain values a + n1 approach a from the
right as n gets large. In this case we are looking at a right-hand limit, which we write as
limn→∞ f (a + n1 ). In the same way, if n is always negative, then the terms a + n1 bit by bit
approach a from the left, and the corresponding limit is said to be a left-hand limit.
Another way of thinking of this is to replace a + n1 by some arbitrary number x which in a
limit gets closer and closer to a. In this case write limx→a+ f (x) to denote the right-hand
limit when x is always chose to be greater than a and limx→a− f (x) to denote the left-hand
limit.
Example 3 Let the function f be defined by f (x) = x in the case that x ≤ 1 and if x > 1
let f be defined by f (x) = x + 1. Then the left-hand limit limx→1− f (x) = 1 whereas the
right-hand limit limx→a+ f (x) = 2
Now it happens rather frequently that a right-hand limit does not equal a left-hand limit,
which occurs when there is a discontinuity in the graph at a. See figure 6. If the left-hand
limit equals the right-hand limit, then we say that the limit exists and we write simply
limx→a f (x)
8
Figure 6: limit from left =1 whereas limit from right =2
Figure 7: For a − δ < x < a + δ and x 6= a, (x, f (x)) in green stripe
9
Example 4 Let the function f be defined so that at a point a there is a jump discontinuity,
which simply means that the function is nicely continuous in the sense that the graph may
be drawn without lifting pen from paper, except at the point x = a where it abruptly jumps
to some distant point. For a particular example, suppose f is defined so that
f (x) = 2, x 6= 1
f (x) = 4, x = 1
Note that limx→1− f (x) = 1, whereas limx→1+ f (1) = 3.
The precise definition of the limit of a sequence has already been mulled over, and this
allows a firm footing for a limit of the form limn→∞ f (a + n1 ). The more general definition,
in which we substitute x for a + n1 , involves some minor changes which we show below in
Definition 3. There are situations in which the sequence version and this general version
do not coincide. We will postpone the details until later.
Definition 5 Limit
A function f has a limit L as x approaches a means that : for every small number > 0,
there exists a number δ > 0 so that for every x, if x is within δ of a, then f (x) is within of L. In other words
∀ > 0 ∃δ > 0 so that ∀x, 0 < |x − a| ⇒ |f (x) − L| < Phrased another way, L is a limit of the function f at the point a, if: given an arbitrary
measure of closeness , which determines an open interval (L − , L + ) centered at L
on the y axis, there exists another measure of closeness δ and a corresponding interval
(a − δ, a + δ) on the x axis so that if x is between a − δ and a + δ but not equal to a namely
0 < |x − a| < δ, then f (x) is on the y axis between L − and L + . See figure 7.
We can also make precise the notions of left limit and right limit. All we need to do is to
modify the above definition of the limit in such a way that for the left limit the values for
x are always less than a and for the right limit, the values for x are always greater than a.
The definitions are as follows,
Definition 6 Left limit
A function f has a left limit L as x approaches a means that : for every small number
> 0, there exists a number δ > 0 so that for every x, if x is within δ of a and less than
a, then f (x) is within of L. In other words
∀ > 0 ∃δ > 0 so that ∀x, 0 < a − x ⇒ |f (x) − L| < 10
Definition 7 Right limit
A function f has a left limit L as x approaches a means that : for every small number
> 0, there exists a number δ > 0 so that for every x, if x is within δ of a and greater
than a, then f (x) is within of L. In other words
∀ > 0 ∃δ > 0 so that ∀x, 0 < x − a ⇒ |f (x) − L| < There are situations in which a limit of the function exists, say limx→a f (x) = L, but
the value L of the limit does not coincide with the value f (a). Consider the following
example.
Example 8 Consider the function f : R → R defined by f (x) = 1 for x 6= 0 and f (0) = 2.
Then limx→0 f (x) = 1, whereas f (0) = 2.
1.4.1
Arithmetic combinations of functions
We need to talk about limits because we need to prove and state some things about
derivatives, but first we need some more machinery. The techniques we will develop for
computing derivatives require that we first break the definition of a given function into
basic parts and then apply differentiation rules to the parts. In order to do this we need
to know how to add, multiply, and divide functions. Given two functions f : A → R and
g : A → R defined on an interval A, we define f + g, f · g, f /g, and a fourth operation
which is called scalar multiplication as follows
• the sum f + g is defined by f + g : x 7→ f (x) + g(x) - that is: (f + g)(x) = f (x) + g(x)
• the product f · g is defined by f · g : x 7→ f (x)g(x) - that is: (f · g)(x) = f (x)g(x)
• the quotient f /g is defined by (f /g : x 7→
(f /g)(x) =
f (x)
provided that g(x)¬0 - that is:
g(x)
f (x)
g(x)
• for a constant k ∈ R, scalar multiplication kf of f is defined by k is defined by
kf : x 7→ k(f (x)) that is: (kf )(x) = k(f (x))
√
Example 9 Let f be defined by f (x) = x3 and let g be defined by g(x) = x
√
1. f + g is then defined by (f + g)(x) = x3 + x
√
2. f · g is defined by (f · g)(x) = x3 x
3. f /g is defined for x > 0 by (f /g)(x) =
x3
√
x
11
1.5
Limit facts
Calculating limits and calculating derivatives, which of course are also limits, can sometimes
be difficult. Luckily there are some easy results that allow us to simplify the task by
breaking the limit up into parts which then can be more easily evaluated. These results
can be simply stated - the limit of a sum is the sum of the limits, the limit of a product is
the product of the limits, a limit of a quotient is the quotient of the limits, provided the
denominator is not zero, and the limit of a composition is the composition of the limit.
The words however can be misleading. The proper statements are as follows
Limit Theorems
Given functions f : A → R and g : A → R, where A is an open interval, suppose that
limx→a f (x) = L and limx→a g(x) = M, then
• Sum of limits
lim f (x) + g(x) = L + M
x→a
• Product of limits
lim f (x)g(x) = LM
x→a
• Quotient of limits
f (x)
L
=
,
x→a g(x)
M
lim
provided g(x) 6= 0 in some small interval containing the point a.
• Composition of limits provided limy→L f (y) exists then
lim (f ◦ g)(x) = lim f (y)
x→a
y→L
Using the definition of a limit one can prove each of the limit rules. To show you how it
goes, I’ll prove the first. You may notice that the proof is very similar to the proof that
the limit of a sum of sequences equals the sum of the limits. The other properties listed
can be similarly proved, although the proofs are more involved.
Theorem 10 In the context of the above, limx→a f (x) + g(x) = L + M.
Proof. According to Definition 7, we need to show for arbitrary > 0 there exists a δ > 0
such that for arbitrary x, 0 < |x − a| < δ ⇒ |f (x) + g(x) − (L + M )| < . Lets begin by
choosing an arbitrary but fixed > 0. Now, since limx→a f (x) = L and considering /2
instead of and applying Definition 3, we see that there is some number δ1 > 0 such that
for arbitrary x,
12
0 < |x − a| < δ1 ⇒ |f (x) − L| < /2.
(1)
Similarly, since limx→a g(x) = M, there exists, according to Definition 7 again, some number δ2 > 0 such that for arbitrary x,
0 < |x − a| < δ2 ⇒ |g(x) − M | < /2.
(2)
From lines (1) and (2) above, we see that if we set δ to be the minimum of δ1 and δ2 , it
follows that for 0 < |x − a| < δ that both |f (x) − L| < /2 and |g(x) − M | < /2 are true .
We can then add . So we have:
0 < |x − a| < δ ⇒ |f (x) − L| + |g(x) − M | < /2 + /2 = .
However, by the triangle inequality, which says that for any two numbers the absolute value
of the sum is less than or equal to the sum of the absolute values - i.e. |a + b| ≤ |a| + |b|,we
conclude that
|f (x) + g(x) − (L + M )| = |f (x) − L + g(x) − M | ≤ |f (x) − L| + |g(x) − M | < .
Hence, we have shown that for arbitrary > 0 there exists δ > 0 so that for all x,
0 < |x − a| < δ ⇒ |f (x) + g(x) − (L + M )| < .
1.6
Differentiation Formulas
Being able to find the derivative of a function at a point is an important analytic tool, but if
one had always to go through the elaborate process of looking at successive approximations,
it would be a rather clumsy tool. Fortunately, there are four general results that allow,
in most cases, a quick calculation. These results are proved using the definition of the
derivative in terms of a limit. The results are the sum rule, the product rule, the quotient
rule, and the chain rule. But before we get to them, it will be useful if we calculate a few
derivatives the long way. The first is the so called Power Rule. The proof is short so we
will include it.
Theorem 11 Power Rule
If f (x) = xn for some positive integer n, f 0 (x) = nxn−1 for all real numbers x.
Proof. Lets let a be an arbitrary fixed number. I want to show that f 0 (a) = nan−1 . The
trick here is to realize that
xn − an = (x − a)(xn − 1 + xn−1 a + xn−2 a2 + · · · + xan−2 + an−1 ).
13
To verify this start multiplying out the right hand side. You will see that all the terms
cancel except for xn and −an . Using this result, we go to the definition and calculate,
f 0 (a) = limx→a
f (x) − f (a)
x−a
= limx→a
= limx→a
xn − an
x−a
n−1
(x−a)(x
+xn−1 a+xn−2 a2 +···+xan−2 +an−1
(x−a)
= limx→a xn−1 + xn−1 a + xn−2 a2 + · · · + xan−2 + an−1 (x − a)
= an−1 + an−2 a + · · · + aan−2 + an−1
= nan−1.
The power rule, expressed above for exponents that are positive integers, may be extended
to the case where the exponent is any real number. Later we shall prove this for the case
in which the exponent is an arbitrary rational number
Theorem 12 Extended Power Rule
If r is an arbitrary real number, then
d r
(x ) = rxr−1
dx
A constant function has graph which is a horizontal straight line, and it is not hard to see
by examining the difference quotient that the derivative of any such function is zero. The
proof is left as an exercise.
Theorem 13 If c is some constant and f (x) = c for all x , then f 0 (x) = 0 for all x
Example 14
1. Let f (x) =
2. Let g(x) =
1.7
√
1
1
x. Then f 0 (x) = 12 x− 2 = √
2 x
√1 .
x
1
3
Then g 0 (x) = − 12 x− 2 =
3
2x 2
Sum,Product, Quotient, and Chain Rules
In what follows let f and g be two functions both of which have derivatives at a point a
denoted as f 0 (a) and g 0 (a).
14
• Sum Rule
the derivative of f + g at a point a is the derivative of f at a plus the derivative of g
at a - that is:
(f + g)0 (a) = f 0 (a) + g 0 (a)
In simple words, the derivative of a sum is the sum of the derivatives
√
√
Example 15 Let h(x) = x3 + x. Then letting f (x) = x3 and g(x) = x , we have
1
h0 (x) = 3x2 + √
2 x
• Product Rule
the derivative of f · g at a is the derivative of f at a times the value of g at a plus
the value of f at a times the derivative of g at a - that is
(f · g)0 (a) = f 0 (a)g(a) + f (a)g 0 (a)
• Quotient Rule
provided g(a) 6= 0,the derivative of f /g at a is the derivative of f at a times g(a)
minus the derivative of g at a times f (a) all divided by g(a) squared - that is
(f /g)0 (a) =
f 0 (a)g(a) − f (a)g 0 (a)
(g(a))2
• Chain Rule
Let g : A → R and f : B → R be two functions such that the composition f ◦ g
makes sense in the sense that the range of g - namely g(A) - is contained in B.
Further suppose that the derivative of f at b = g(a) exists. Under these conditions
the derivative of f ◦ g at a is the derivative of f at b = g(a) times the derivative of g
at a - that is
(f ◦ g)0 (a) = f 0 g(a) g 0 (a).
1.7.1
Examples
1. Let h(x) = kg(x) for a constant k. Setting f (x) = k , knowing that f 0 (x) = 0, and
using the product rule, we see that h0 (x) = kg 0 (x)
2. Let h(x) = (x2 + 4x + 1)(x5 + 3x4 + x2 + 3). To find the derivative one could multiply
the two polynomials and then use the power rule, the sum rule, and the result shown
just above. But to avoid the initial multiplication, we can use the product rule and
simplify the calculation. We get
h0 (x) = (2x + 4)(x5 + 3x4 + x2 + 3) + (x2 + 4x + 1)(5x4 + 12x3 + 2x),
15
which could be simplified, but for exercises involving only differentiation, simplification is not necessary.
3. Let h(x) =
x3 + 2x2 + x + 2
. Using the quotient rule we get,
x2 + 1
h0 (x) =
(3x2 + 4x + 1(x2 + 1) − (x3 + 2x2 + x + 2)(2x)
.
(x2 + 1)2
4. Let h(x) = (x2 +2x+1)45 . So h is the composition of two functions g(x) = x2 +2x+1
and f (y) = y 45 . Using the chain rule, we then calculate the derivative of h as,
h0 (x) = f 0 (g(x)g 0 (x) = 45(x2 + 2x + 1)44 (2x + 2).
Observe that the only effective way of calculating the derivative is to use the chain
rule. Multiplying out (x2 + 2x + 1)45 would be a nightmare.
16