Mathematical Analysis and Modeling Spring 2012

MATH 641: Mathematical Analysis and Modeling
Spring 2012
Jürgen Gerlach∗
May 29, 2012
Contents
1 Fibonacci Sequences
1.1 The Fibonacci Sequence . . . . . . . . . . . . . .
1.1.1 Logarithmic Scales . . . . . . . . . . . . .
1.2 The Golden Ratio . . . . . . . . . . . . . . . . .
1.3 An Explicit Formula . . . . . . . . . . . . . . . .
1.4 Sums and Sums of Squares of Fibonacci Numbers
1.5 Power Series with Fibonacci Coefficients . . . . .
1.6 Matrix Form of Fibonacci Sequences . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
4
6
8
10
12
2 Fixed Point Iterations
2.1 Fixed Points . . . . . . . . . . . . .
2.2 Graphical Methods and Cobwebs . .
2.3 The Fixed Point Theorem . . . . . .
2.3.1 The Mean Value Theorem . .
2.4 Inverse Iterations and Double Steps
2.5 The Logistic Map . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
16
18
19
26
29
33
.
.
.
.
.
37
37
39
43
44
48
3 Discrete Population Models
3.1 Exponential Growth . . . . . . .
3.1.1 Models From Finance . .
3.1.2 Inhomogeneous Terms . .
3.2 Logistic Growth . . . . . . . . . .
3.3 Logistic Growth with Harvesting
∗
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Department of Mathematics and Statistics, Radford University, Box 6942, Radford,
VA 24142
i
4 Continuous Population Models
4.1 Differential Equations . . . . .
4.2 Exponential Growth . . . . . .
4.3 Logistic Growth . . . . . . . . .
4.4 Harvesting . . . . . . . . . . . .
4.5 Euler’s Method . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
54
59
61
65
67
5 A Little Linear Algebra
71
5.1 Solutions of Linear Systems . . . . . . . . . . . . . . . . . . . 71
5.2 Singular Matrices and Determinants . . . . . . . . . . . . . . 75
5.3 Review of Eigenvalues and Eigenvectors . . . . . . . . . . . . 80
6 Matrix Iterative Models
6.1 Populations with Age Structures
6.2 Inhomogeneous Systems . . . . .
6.3 Markov Chains . . . . . . . . . .
6.4 Romeo and Juliet . . . . . . . . .
6.4.1 Complex Eigenvalues . . .
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
87
. 87
. 96
. 100
. 107
. 112
1
Fibonacci Sequences
Suggested Reading:
Wikipedia Article on Fibonacci Numbers
http://en.wikipedia.org/wiki/Fibonacci_number
We begin this modeling course with the study of Fibonacci sequences.
This has several reasons.
For one, Fibonacci sequences are widely known and and its mathematics can be studied on many different levels, from simple addition
of numbers in elementary school to convergence analysis in college.
Secondly, Fibonacci sequences an easily be computed with EXCEL.
Thirdly, the Fibonacci sequence represents a very simple (and very
naive) population model.
And finally, the study of Fibonacci sequences leads us to questions
of convergence of iterative sequences and eigenvalue analysis, both of
which are useful in the study of more sophisticated population models.
1.1
The Fibonacci Sequence
Fibonacci sequences can traced back to Leonardo of Pisa, the author of the
Liber Abaci. They are defined iteratively by
Fn+1 = Fn + Fn−1
with seed values
F0 = 0
F1 = 1
This results in the sequence of numbers
0, 1, 1, 2, 3, 5, 8, 13, 21, . . .
Motivation behind this definition: (Wikipedia)
In the West, the Fibonacci sequence first appears in the book
Liber Abaci (1202) by Leonardo of Pisa, known as Fibonacci.
Fibonacci considers the growth of an idealized (biologically unrealistic) rabbit population, assuming that: a newly born pair of
rabbits, one male, one female, are put in a field; rabbits are able
1
to mate at the age of one month so that at the end of its second month a female can produce another pair of rabbits; rabbits
never die and a mating pair always produces one new pair (one
male, one female) every month from the second month on. The
puzzle that Fibonacci posed was: how many pairs will there be
in one year?
• At the end of the first month, they mate, but there is still
only 1 pair.
• At the end of the second month the female produces a new
pair, so now there are 2 pairs of rabbits in the field.
• At the end of the third month, the original female produces
a second pair, making 3 pairs in all in the field.
• At the end of the fourth month, the original female has
produced yet another new pair, the female born two months
ago produces her first pair also, making 5 pairs.
Figure 1: A represents adult pairs, B stands for newborn pairs, generations
are aligned in rows
There is a wealth of work done on Fibonacci sequences, in fact there
is a Fibonacci Association (http://www.mscs.dal.ca/fibonacci/) which
holds international conferences on Fibonacci numbers, and there are numerous applications.
In our work we will look at some of the mathematics of Fibonacci numbers, and we will allow arbitrary seeds, not just F0 = 0 and F1 = 1. When
talking about the traditional Fibonacci numbers (F0 = 0 and F1 = 1) we
will use uppercase Fn , otherwise we will use the notation fn when a sequence
is defined as
fn+1 = fn + fn−1
with given seeds
Both, fn and Fn will be called Fibonacci sequences.
2
f0 and f1
(1)
In the sequel we will look at the ratio of consecutive Fibonacci numbers, sums of Fibonacci numbers, as well as a matrix representation of the
Fibonacci process.
Exercises
1. Design an EXCEL worksheet to compute Fibonacci numbers for general starting values f0 and f1 , and graph them.
2. Can you find seeds f0 and f1 so that the Fibonacci numbers oscillate
between positive and negative terms?
1.1.1
Logarithmic Scales
When functions grow rapidly, or when you have data which cover several
orders of magnitude, using a linear scale is not very useful for graphing,
and it is customary using log plots. There the tick marks for 0.01, 0.1, 1,
10, 100, 1000 and so on are evenly spaced on the y axis, and there is no
room for negative data. In essence, you are graphing the logarithms of your
data, rather than the values themselves. Just take the common logarithm:
log 0.01 = −2, log 0.1 = −1, log 1 = 0, log 100 = 2, et cetera. The logarithms
are equally spaced. The same argument works for the natural logarithm;
here the distance between the terms is ln 10.
Now, if you graph a function with nominal values y = f (x) on a logarithmic scale, you are effectively plotting the function v = ln f (x), and for the
special case of an exponential function the graph will appear as a straight
line: If f (x) = Cax , we obtain
v = ln f (x) = ln C + (ln a)x ,
which is a line in the xv-plane with slope m = ln a and v-intercept b = ln C.
Figure 2: Linear versus Logarithmic Scale
3
1.2
The Golden Ratio
Here we are interested in the ratio of two consecutive Fibonacci numbers.
For the traditional Fibonacci numbers we obtain
1 2 3 5 8 13 21
,
,
,
,
,
,
,...
1 1 2 3 5
8
13
Numerically these values are
1 , 2 , 1.5 , 1.6667 , 1.6 , 1.625 , 1.6154
and it appears that this sequence approaches a limit. Let us take a closer
look and define
fn
(2)
rn =
fn−1
Since fn+1 = fn + fn−1 it follows that
rn+1 =
fn + fn−1
fn−1
1
fn+1
=
= 1+
= 1+
fn
fn
fn
rn
(3)
We see that the sequence rn is defined recursively, and we do not need the
explicit reference to the Fibonacci numbers. Moreover, since we agreed to
admit any starting values for f0 and f1 , we may use any seed r1 6= 0 in (3).
Example: Let r1 = 3 (that is, f1 is a threefold of f0 ). Then
1
r1
1
= 1+
r2
1
= 1+
r3
1
= 1+
r4
r2 = 1 +
r3
r4
r5
1
4
=
= 1.3333
3
3
7
3
=
= 1.75
= 1+
4
4
4
11
=
= 1.5714
= 1+
7
7
7
18
= 1+
=
= 1.6364
11
11
= 1+
It appears that this sequence converges.
At this point we postpone a more detailed discussion of convergence of
iterative sequences. Nonetheless, we will make an attempt to find the limit
for our special case. Suppose that rn → L, then the shifted sequence rn+1
converges to the same limit, and taking limits on both sides of (3) results in
1
1+
rn
L = lim rn+1 = lim
n→∞
n→∞
4
= 1+
1
L
It now follows that
L2 = L + 1,
and that
L2 − L − 1 = 0
(4)
Hence, we obtain a quadratic equation with solutions
√
1± 5
L =
2
We denote the positive solution by ϕ, the golden ratio,
√
1+ 5
ϕ =
= 1.618, 033, 989
2
Wikipedia on the golden ratio:
In mathematics and the arts, two quantities are in the golden ratio if the ratio of the sum of the quantities to the larger quantity
is equal to the ratio of the larger quantity to the smaller one. ...
At least since the Renaissance, many artists and architects have
proportioned their works to approximate the golden ratioespecially in the form of the golden rectangle, in which the ratio
of the longer side to the shorter is the golden ratio believing
this proportion to be aesthetically pleasing ... Mathematicians
have studied the golden ratio because of its unique and interesting properties. The golden ratio is also used in the analysis of
financial markets, in strategies such as Fibonacci retracement.
Figure 3: The Golden Ratio
There are numerous applications of the golden ratio, most prominently
in architecture, the arts, and biology. The youtube video at
http://www.youtube.com/watch?v=fmaVqkR0ZXg
contains some applications of the golden ratio. Our approach shows that
the ratio of consecutive Fibonacci numbers approaches the golden ratio.
Exercises
5
1. Design an EXCEL worksheet to compute the first 20 values of rn for a
given seed r1 . Experiment with different seeds and convince yourself
the sequence will converge to the golden ratio for (almost any) starting
value r1 .
2. Verify that
(a) ϕ2 = 1 + ϕ
1
(b)
= ϕ−1
ϕ
(c) The quadratic equation L2 − L − 1 = 0 has solutions L = ϕ
and L = 1 − ϕ.
3. Prove that ϕn = Fn ϕ + Fn−1 by induction, where Fn are the traditional Fibonacci numbers F0 = 0, F1 = 1, F2 = 1, F3 = 2, F4 = 3,
F5 = 5 and so on.
1.3
An Explicit Formula
In this section we derive a formula which lets us compute the k-th Fibonacci
number directly, without going through the iterative process. We assume
that fn has the form
fn = C an
Then the recurrence relation fn+1 = fn + fn−1 implies that
C an+1 = C an + C an−1
and thus, after canceling C and an−1 we find that
a2 = a + 1
But this is the quadratic equation (4) for L from the last section with solutions
1
a = ϕ
and
a = 1−ϕ = −
ϕ
We can now construct fn by superposition, namely as
fn = C1 ϕn + C2 (1 − ϕ)n = C1 ϕn + (−1)n
6
C2
ϕn
(5)
where C1 and C2 have to be determined to match f0 and f1 . For example,
let us compute the traditional Fibonacci numbers with F0 = 0 and F1 = 1.
By substitution we find that
0 = F0 = C1 + C2
1 = F1 = C1 ϕ + C2 (1 − ϕ)
1
A short computation yields C1 = 2ϕ−1
= √15 and C2 = −C1 = − √15 , and
thus
1
1 (−1)n
1
(−1)n
n
n
Fn = √ ϕ − √
= √ ϕ −
(6)
ϕn
5
5 ϕn
5
This result is known as Binet’s formula.
We return to general Fibonacci sequences with
fn = C1 ϕn + (−1)n
C2
ϕn
Since ϕ > 1, it follows that ϕ1n → 0 and in the long run, the sequence fn
behaves like fn ≈ C1 ϕn with oscillations about the limit. It is now easy to
see what will happen to the ratio of consecutive terms. We get
rn =
=
fn
fn−1
=
C1 ϕn−1 +
C2 /C1
ϕ2n−1
2 /C1
(−1)n−1 C
ϕ2n−2
ϕ + (−1)n
1 +
C2
ϕn
2
(−1)n−1 ϕCn−1
C1 ϕn + (−1)n
→ ϕ
as n → ∞, since ϕ > 1.
Every sequence fn given by the iterative formula (1) depends on the
seeds f0 and f1 , and we can define f (a, b) as the sequence given by
fn+1 = fn + fn−1
f0 = a , f1 = b
This function f assigns to each pair (a, b) the entire sequence defined by
the iteration formula. For instance, f (0, 1) represents the usual Fibonacci
sequence 0, 1, 1, 2, 3, 5, 8, etc., while f (1, −1) represents 1, −1, 0, −1, −1,
−2. It is left as an exercise to show that f is a linear function from the two
dimensional space into the space of infinite sequences.
Exercises
1. Compute F8 from Binet’s formula (6) without a calculator.
7
2. Find an explicit formula for fn when f0 = 1 and f1 = ϕ.
3. Can you find seeds f0 and f1 so that the Fibonacci numbers oscillate
between positive and negative? This question was asked before, but
we are now in a better position to answer it.
4. Show that f (a+c, b+d) = f (a, b)+f (c, d) and that f (ca, cb) = cf (a, b).
1.4
Sums and Sums of Squares of Fibonacci Numbers
This section contains proofs for the sums and sums of squares of Fibonacci
numbers, and we begin with two exercises.
1. Experiment with EXCEL to find a formula for
n
X
fk .
k=0
2. Experiment with EXCEL to find a formula for
n
X
fk2 .
k=0
This experimentation should give you an idea of what results to expect.
Clearly, the values of the sums should depend on the initial data f0 and f1 .
We start with a formula for
n
X
fk . The next steps reveal the general
k=0
pattern.
f0 = f2 − f1
f1 + f0 = f1 + f2 − f1 = f3 − f1
f2 + f1 + f0 = f2 + f3 − f1 = f4 − f1
f3 + f2 + f1 + f0 = f3 + f4 − f1 = f5 − f1
Theorem. Suppose that the numbers fn are defined by the relationship
fn+1 = fn + fn−1
with arbitrary seeds f0 and f1 . Then
n
X
fk = fn+2 − f1
(7)
k=0
Proof. Induction. The formula holds for n = 0, 1, 2, 3 as seen above. Suppose that it is valid for all values up to n − 1, then
n
X
k=0
fk = fn +
n−1
X
fk = fn + fn+1 − f1 = fn+2 − f1
k=0
8
For the special case where f0 = 0 and f1 = 1 we have the standard
Fibonacci numbers, and the formula becomes
n
X
Fk =
k=0
n
X
Fk = Fn+2 − 1
(8)
k=1
This formula tells us that in order to find the sum of the first n Fibonacci
numbers, we just calculate the next two Fibonacci numbers and subtract 1
from the result.
Example:
1 + 1 + 2 + 3 + 5 + 8 + · · · + 55 + 89 = 233 − 1 = 232
since the next two numbers are 55 + 89 = 144 and 144 + 89 = 233.
Now let us look at the sum of squares. A simple manipulation reveals
the pattern:
f02 = f02 + f1 f0 − f1 f0 = f1 f0 − (f1 − f0 )f0
f12 + f02 = f12 + f1 f0 − (f1 − f0 )f0 = (f1 + f0 )f1 − (f1 − f0 )f0
= f2 f1 − (f1 − f0 )f0
f22
+
f12
+
f02
= f22 + f2 f1 − (f1 − f0 )f0 = (f2 + f1 )f2 − (f1 − f0 )f0
= f3 f2 − (f1 − f0 )f0
Theorem. Suppose that the numbers fn are defined by the relationship
fn+1 = fn + fn−1
with arbitrary seeds f0 and f1 . Then
n
X
fk2 = fn+1 fn − (f1 − f0 )f0
(9)
k=0
Proof. Induction. The formula holds for n = 0, 1, 2 as seen above. Suppose
that it is valid for all values up to n − 1, then
n
X
fk2 = fn2 +
k=0
n−1
X
fk2
k=0
=
fn2
+ fn fn−1 − (f1 − f0 )f0 = (fn + fn−1 )fn − (f1 − f0 )f0
= fn+1 fn − (f1 − f0 )f0
9
For the special case where f0 = 0 and f1 = 1 we have the traditional
Fibonacci numbers, and the formula becomes
n
X
n
X
Fk2 =
k=0
Fk2 = Fn+1 Fn
(10)
k=1
The graph below provides a graphical proof of this result.
Figure 4: Sum of Squares
Exercises
1. Let f0 = 4, f1 = 0, and fn+1 = fn + fn−1 , compute
10
X
fk and
10
X
fk2 .
k=0
k=0
2. Let f0 = 1, f1 = −1, and fn+1 = fn + fn−1 , compute
10
X
fk and
k=0
3. Compute
10
X
fk2 .
k=0
10
X
Fk and
k=1
1.5
10
X
Fk2 .
k=1
Power Series with Fibonacci Coefficients
This section takes us to power series. Question: What can we say about the
power series whose coefficients are Fibonacci numbers? Let
S(x) =
∞
X
n=0
Fn xn =
∞
X
Fn xn
n=1
10
= x + x2 + 2x3 + 3x4 + 5x5 + 8x6 + 13x7 + 21x8 + · · ·
The function S(x) is defined as a power series and as such it will have an
Figure 5: S(x) along with its 3rd and 8th degree approximations
interval and a radius of convergence. It clearly converges at x = 0 with
S(0) = 0, and it diverges at x = 1, since S(1) represents an infinite sum
of Fibonacci numbers, and therefore the radius of convergence will be less
than one. We will investigate three questions:
1. Can we find an explicit formula for S(x)?
2. What is the radius of convergence of the power series?
3. Can we generalize our findings to arbitrary Fibonacci sequences?
Calculation of S(x):
S(x) =
∞
X
Fn xn = x +
n=1
= x+
∞
X
Fn xn
n=2
∞
X
Fn+1 xn+1 = x + x
n=1
∞
X
(Fn + Fn−1 )xn
n=1
= x + x S(x) + x
∞
X
Fn xn+1 = x + x S(x) + x2 S(x)
n=0
Hence
(1 − x − x2 ) S(x) = x
and
S(x) =
x
1 − x − x2
11
We apply the ratio test to find the radius of convergence:
F
n+1 Fn+1
n+1 x
|x| → ϕ |x|
=
Fn xn Fn
Thus, the series will converge for |x| < 1/ϕ and 1/ϕ = 0.618004 is our
radius of convergence.
In order to investigate the third question, we study the function
s(x) =
∞
X
fn xn
n=0
where fn+1 = fn + fn−1 with arbitrary starting values f0 and f1 . Applying
similar methods, it can be shown that
s(x) =
f0 + (f1 − f0 )x
1 − x − x2
The details are left as an exercise. Note that when f0 = 0 and f1 = 1, we
obtain s(x) = S(x), as we should.
Exercises:
1. Does the series S(x) =
∞
X
Fn xn converge at the endpoints x =
n=1
±1/ϕ?
2. Evaluate the series
∞
X
Fn
n=1
3. Verify that s(x) =
1.6
xn
. For which x does the series converge?
f0 + (f1 − f0 )x
.
1 − x − x2
Matrix Form of Fibonacci Sequences
In this section we express the Fibonacci sequence using matrix multiplication. To this end we define a vector xn such that it contains the n-th
Fibonacci number in its first component and fn−1 its second component,
i.e. xn has the form
"
#
fn
xn =
fn−1
12
In order to calculate the next vector, we have to apply the iteration formula
to obtain the next Fibonacci number, and to bump fn into the second entry.
This process can be implemented using matrix multiplication:
"
fn+1
fn
#
"
=
fn + fn−1
fn
#
"
=
# "
1 1
1 0
#
fn
fn−1
or, in short form
xn+1 = Axn
"
where A =
(11)
#
1 1
.
1 0
"
Now we are ready to build our Fibonacci sequence. Set x1 =
f1
f0
#
,
and the rest is matrix multiplication (apply (11) repeatedly):
x2 = Ax1
x3 = Ax2 = A2 x1
x4 = Ax3 = A3 x1
..
.
xn = Axn−1 = An−1 x1
(12)
This result shows that we can compute xn and thus fn immediately for
any set of seeds once An−1 is known. Rather than computing the matrix
(n − 1) times by itself we will develop a formula for An for any n. We have
"
A =
"
A
4
=
1 1
1 0
#
5 3
3 2
#
"
A
2
A
5
=
"
=
2 1
1 1
#
8 5
5 3
#
"
A
3
=
3 2
2 1
#
and we notice that the Fibonacci numbers begin to emerge. In the exercises
you will be asked to confirm (prove) that
"
A
n
=
Fn+1 Fn
Fn Fn−1
#
We can use this result to compute Fibonacci numbers for any set of
initial data. For instance, let us find f10 when f0 = 1 and f1 = −1. We
13
have, according to (12)
"
f10
f9
#
"
=
1 1
1 0
#9 "
f1
f0
#
"
=
55 34
34 21
# "
−1
1
#
"
=
−21
−13
#
Hence, f10 = −21, and as a byproduct we find that f9 = −13.
In practice, multiplying a matrix n times by itself is very time-consuming,
and calculation of the Fibonacci numbers Fn and with substitution into An
is much more efficient. The matrix product also always produces a vector
with the desired Fibonacci number, and an additional entry. This is too
much computational effort!
Let us take a look at xn = An−1 x1 again. When we spell it out, we
obtain
"
fn
fn−1
#
"
=
1 1
1 0
#n−1 "
f1
f0
#
"
=
Fn Fn−1
Fn−1 Fn−1
# "
f1
f0
#
This implies that (just focus on the first component)
fn = Fn f1 + Fn−1 f0
(13)
and we are now able to compute any fn , provided that we know the seeds
f0 and f1 and that we are given a complete listing of the common Fibonacci
numbers Fn .
In Numerical Analysis, the process of repeated multiplication by a matrix
is known as the power method and it is used to compute the dominant
eigenvalue of a matrix along with an associated eigenvector. This method
is efficient for sparse matrices, that is, for large square matrices with lots of
zero entries. We will discuss the power method later. At this point let us
just recall the basics of eigenvalues and eigenvectors. λ is an eigenvalue of
the matrix A if there exists a non-zero vector x such that
Ax = λx
holds. Then x is called an eigenvector of A. The above equation implies
that
(λI − A)x = λx − Ax = 0
and since x 6= 0, it follows that the matrix λI − A must be singular, and
thus its determinant must be zero. Hence, we can find the eigenvalues of A
from
p(λ) = det(λI − A) = 0
14
The function p(λ) = det(λI − A) is called the characteristic polynomial
of A.
Returning to Fibonacci
"
# sequences, let us find the eigenvalues and eigen1 1
vectors of A =
. The characteristic polynomial is
1 0
"
p(λ) = det
λ − 1 −1
−1
λ
#!
= (λ − 1)λ − 1 = λ2 − λ − 1
and the problem p(λ) = 0 takes us back to a familiar quadratic equation
(4) with solutions λ = ϕ and λ = −1/ϕ. These are the eigenvalues
"
# of A.
ϕ
Now let’s look at the eigenvectors. Ax = ϕx leads to x =
as a
1
possible eigenvector
"
# (any multiple of this vector is an eigenvector as well),
1
and x =
as an eigenvector for λ = −1/ϕ. The details are left as
−ϕ
an exercise.
Notice that if we start with f0 = 1 and f1 = ϕ (these are the components
of the first eigenvector), we get
f2 = f0 + f1 = 1 + ϕ = ϕ2
f3 = f1 + f2 = ϕ + ϕ2 = (1 + ϕ)ϕ = ϕ2 ϕ = ϕ3
f4 = f2 + f3 = ϕ2 + ϕ3 = (1 + ϕ)ϕ2 = ϕ2 ϕ2 = ϕ4
and so on. For these initial data, the iteration reduces to multiplication by
ϕ in each step, which in essence is equivalent to Ax = ϕx.
We will review eigenvalues and eigenvectors more thoroughly in a later
chapter.
Exercises:
"
1. Show that
An
=
Fn+1 Fn
Fn Fn−1
#
.
2. Find f10 when f0 = 4 and f1 = 1.
3. Investigate the sequence generated by f0 = 1 and f1 = −1/ϕ and
(−1)n
show that fn =
.
ϕn
15
4. Given that
n
X
Fk =
k=0
to show that show that
n
X
Fk = Fn+2 − 1 holds, use formula (13)
k=1
n
X
fk = fn+2 − f1 is valid.
k=0
"
5. Derive that x =
"
A =
2
ϕ
1
#
"
and x =
1
−ϕ
#
are the eigenvectors of
#
1 1
.
1 0
Fixed Point Iterations
Suggested Reading:
Mooney/Swift: Chapter 1, Discrete Dynamical Systems.
Fixed Point Article on Wikipedia
http://en.wikipedia.org/wiki/Fixed_point_(mathematics)
Discrete population models often take the form pn+1 = g(pn ) for some
function g, where pn represents the population in the n-th time period. For
instance, exponential population growth is induced by pn+1 = pn + rpn =
(1 + r)pn and g becomes g(x) = (1 + r)x, and for logistic growth we have the
x
relationship pn+1 = pn + r(1 − pKn )pn and we obtain g(x) = x + r(1 − K
)x.
Iterations of the form xn+1 = g(xn ) are broadly used in many areas of
mathematics and we devote this chapter to a general study of fixed point
iterations.
2.1
Fixed Points
The main focus of this chapter are iterative sequences defined by
xn+1 = g(xn )
with seed
x0
(14)
The function g is called the iteration function.
Examples:
1. The sequence (3) leading to the golden ratio in the last section was
defined by
1
rn+1 = 1 +
rn
16
Here the iteration function is g(x) = 1 +
1
.
x
2. Pick a number of your choice, and take the cosine of it. Then take the
cosine of the result, and the repeat the process. If your calculator is
set to radians, you will eventually get closer and closer to the number
0.739085. What happens here? The number of your choice is the seed
x0 , and then you apply the iterations xn+1 = cos xn .
3. Newton’s Method in Calculus is defined as
xn+1 = xn −
f (xn )
f 0 (xn )
This yields an iterative sequence with iteration function
g(x) = x −
f (x)
f 0 (x)
(15)
For the first example we saw that the sequence converged to the golden
ratio ϕ, in the second example we had a limit of about 0.739085. What is
the significance of these numbers? They have the property that g(x) = x.
Once we get there, the values do not change anymore.
Definition: The number p is a fixed point of the iteration function g, if
p = g(p)
(16)
holds.
A given function g may have just one, or many, or no fixed points. It all
depends on g.
Exercises:
1. Find the fixed points of the function g(x) =
√
x.
2. What are the fixed points of g(x) = x3 ?
1
3. Let a > 0 and define g(x) =
2
a
x+
. Find the fixed point of g.
x
1
4. Let g(x) = . Select an initial value x0 , and compute a few iterates.
x
What are the fixed points of g?
17
5. Give an example of a function with (a) infinitely many fixed points
and (b) without any fixed points.
6. Show that p is a fixed point of g in Newton’s Method (15) exactly
when f (p) = 0. After all, Newton’s Method is designed to find roots
of a function.
2.2
Graphical Methods and Cobwebs
Here we turn to geometrical solutions of our fixed point problem. Since p
satisfies p = g(p), a fixed point can be found as the intersection of the graph
of y = g(x) with the line y = x.
Cobwebs can be used to obtain a graphical representation of fixed point
sequences. How do they work? Suppose that you have graphed the function
y = g(x) and the diagonal y = x. If you have decided on a starting point
x0 , you can easily find the point x1 = g(x0 ) on the y-axis. But in order to
sketch the next point you need to move the point back to the x-axis. This
is done by going horizontally along the line y = x1 to the intersection with
the diagonal y = x and then proceed vertically to the x-axis.
Once you get the hang of it, the intersections with the coordinate axes
can be omitted. The steps are: move vertically to the intersection with the
graph of g, then horizontally to the intersection with the diagonal, again
vertically to the graph of g, and repeat as needed.
Example (Golden Ratio Iteration): Let g(x) = 1 + x1 . Let’s say that
we take x0 = 32 as our seed. Then x1 = g(x0 ) = 25 , and the next point
is x2 = g(x1 ) = 75 . The cobweb contains the vertical and horizontal line
segments connecting the points ( 23 , 0), ( 32 , 52 ), ( 52 , 52 ), ( 52 , 75 ), ( 57 , 75 ) and so
on. This process is easier carried out than described verbally, and the graph
is given below. The web spirals in toward the fixed point (ϕ, ϕ).
Fixed point iterations do not always converge, even if the function possesses fixed points. And if convergence occurs, sometimes the iterations stay
on the same side as the fixed point, sometimes they oscillate about the fixed
point.
There are numerous illustrations for cobwebs on the internet, see for
instance the one on Wikipedia:
http://en.wikipedia.org/wiki/Cobweb_plot
Exercises:
You are encouraged to use EXCEL to find the numerical values in the calculations.
18
Figure 6: Cobweb for g(x) = 1 +
1
x
√
1. Consider the iteration function g(x) = x. Study the iterations and
sketch the cobwebs for the seeds x0 = 0.1 and x0 = 10.
2. Construct the sequence with starting value x0 = 1 and iteration
formula xn+1 = 12 xn + x2n and sketch the cobweb.
3. Sketch the cobweb when g(x) = 13 x2 + 31 x − 1. What are the fixed
points? Do the sequences converge to the fixed points?
4. Perform a cobweb analysis for the iteration function g(x) = 21 x2 − 32 .
2.3
The Fixed Point Theorem
In this section we will investigate convergence of the fixed point iterations.
In the last set of exercises you encountered a variety of possible scenarios:
Convergence to a fixed point from just one direction, oscillations about a
fixed point, divergence away from a fixed point, very fast and very slow
convergence, among other things. Here we will lay the mathematical foundations which help us to predict the behavior of the sequence.
Our analysis will be local, that is, we will assume that somehow we are
already near a fixed point, and we want to know whether we are closing
in or whether we will move away from it. Global analysis asks a different
question, namely, what is the set of possible initial values x0 , such that the
iterations will converge? Generally global convergence conditions are much
harder to come by, although in some of our examples we can make global
convergence statements.
19
The fixed point theorem is based on the following observation: Let g be
a differentiable function with fixed point p, and suppose that an iterative
sequence is defined by xn+1 = g(xn ). Then the Mean Value Theorem (see
the following subsection for a review) implies
xn+1 − p = g(xn ) − g(p) = g 0 (ξn )(xn − p)
(17)
where the number ξn lies between the iterate xn and the fixed point p. This
formula is at the heart of our analysis: The derivative governs whether the
iterates get closer to the fixed point or whether they move away from it.
Fixed Point Theorem: Suppose that g is a continuously differentiable
function with a fixed point p. If |g 0 (p)| < 1, then the sequence defined by
xn+1 = g(xn ) converges to p if the seed x0 is sufficiently close to p.
Proof: Since |g 0 (p)| < 1, and since g 0 is continuous, there exists an interval
of the form I = [p − a, p + a], where a > 0, such that |g 0 (x)| ≤ L for all x ∈ I
for some constant L < 1. The actual value of L is not important. Here is the
rationale for this assertion: Since |g 0 (p)| < 1, it is always possible to squeeze
0
a number L between |g 0 (p)| and 1, for instance the average L = 1+|g2 (p)|
will do the trick. Furthermore, g 0 is continuous and it cannot jump past L
immediately; hence there must be an interval extending on both sides of p
such that |g 0 (x)| ≤ L holds on that interval. Finally, there is no reason to
assume that the largest such interval is symmetric about p, so we narrow
it down to make it symmetric and of the form I = [p − a, p + a] for some
positive number a.
Figure 7: Graph of |g 0 (x)|
Now assume that x0 ∈ [p − a, p + a]. Then, using formula (17), we get
|x1 − p| = |g 0 (ξ0 )| |x0 − p| ≤ L |x0 − p| < |x0 − p|
20
(18)
The inequality holds, since the point x0 lies between p and x0 , and therefore
it belongs to the interval I where |g 0 (x)| ≤ L. The inequality also shows
that x1 is closer to the fixed point than x0 . Now we repeat the process
|x2 − p| = |g 0 (ξ1 )| |x1 − p| ≤ L |x1 − p| ≤ L2 |x0 − p|
|x3 − p| = |g 0 (ξ2 )| |x2 − p| ≤ L |x2 − p| ≤ L3 |x0 − p|
and inductively we find that
|xn − p| ≤ Ln |x0 − p|
But L < 1, hence Ln → 0, which implies that
lim xn = p
n→∞
and our proof is complete.
Comments: Some remarks are in order before we look at some examples.
1. The constant L used in the proof is usually called a ”Lipschitz” constant, named for the mathematician Rudolf Lipschitz (1832 - 1903). A
function f is said to satisfy a Lipschitz condition if there is a constant
L such that |f (x) − f (y)| ≤ L |x − y| holds for all x and y under
consideration. Such a function will always be continuous, but it may
not be differentiable. On the other hand, if we know that the function
is continuously differentiable, and if we know that the derivative is
bounded by a constant M , i.e. that |f 0 (x)| ≤ M , it follows that
|f (x) − f (y)| = |f 0 (ξ)| |x − y| ≤ M |x − y|
and the function satisfies a Lipschitz condition with L = M . Functions
with Lipschitz constant L < 1 are called contractions, and the Fixed
Point Theorem holds for such functions.
2. The theorem is a local result. There must be some interval about
the fixed point p, so that the iterates get sucked in toward p. If the
iterations do not converge to p, you just weren’t close enough with
your choice of x0 . It can be interesting to determine the set of all
possible x0 so that the resulting sequence converges to a given fixed
point p. This type of analysis, especially when g has several fixed
points, is used for fractals.
21
3. The proof can be modified for the case where |g 0 (p)| > 1. Again, there
will be an interval I centered at p so that |g 0 (x)| ≥ M > 1 for all x ∈ I,
and the inequality (18) is reversed
|xn+1 − p| = |g(xn ) − g(p)| = |g 0 (ξ)| |xn − p|
≥ M |xn − p| > |xn − p|
if xn ∈ I. This means that xn+1 is further away from p than xn was.
We will call such a fixed point repelling.
4. The size of |g 0 (p)| matters. The inequality (18) shows that the distance
between x1 and p is reduced by a factor of L when compared the the
distance between x0 and p. Thus the smaller we can choose L, the
faster the convergence will be.
Graphically g 0 (p) is the slope of g at the point of intersection with the
diagonal y = x. If the slope is close to zero, the curve g is flat, and the
cobwebs converge rapidly. If the slope g 0 (p) = 1, the curve is tangent
to the diagonal, and the convergence is slow, if it occurs at all. When
g 0 (p) > 1, the cobwebs move away from the intersection.
Definition:
A fixed point p of the function g is called attracting if |g 0 (p)| < 1.
If |g 0 (p)| > 1, the fixed point is called repelling.
Examples: We review some exercises of this section.
√
1. The function g(x) = x has fixed points p = 0 and p = 1. Since
1
1
g 0 (x) = √ we get g 0 (1) = . The fixed point p = 1 is attracting,
2 x
2
and the iterates converge toward this point. The other fixed point lies
at the endpoint of the domain of g, and since g as a vertical tangent
at x = 0, the fixed point p = 0 is repelling.
2. The function g(x) = x3 has fixed points at p = ±1 and at p = 0. The
point p = 0 is attracting, since g 0 (0) = 0, and the points p = ±1 are
repelling, because g 0 (±1) = 3.
3. The fixed points of the function g(x) = 31 x2 + 31 x − 1 are p = 3 and
p = −1. The fixed point p = 3 is repelling, because g 0 (3) = 37 ; p = −1
is attracting because g 0 (−1) = − 31 .
22
4. For g(x) = 21 x2 − 23 the fixed points are again at p = 3 and p = −1.
g 0 (3) = 3, and the fixed point is repelling, but at p = −1 we have
g 0 (−1) = −1, and the point falls through the cracks: it is neither
attracting nor repelling. Further inspection shows that sequences with
initial point close to −1 converge to −1 very slowly.
We will address two more questions in this section. For one, how does
the sign of g 0 (p) affect the behavior of the sequence? And secondly, what
happens when g 0 (p) = 0?
The first question is easily answered. If g 0 (p) is positive, then there is an
entire interval near p where g 0 (x) > 0. Recall equation (17) and inspect the
signs of the quantities involved
xn+1 − p = g(xn ) − g(p) = g 0 (ξn )(xn − p)
If xn > p, we get xn − p > 0, and with g 0 (ξn ) > 0 it follows that xn+1 > p
as well. Conversely, xn < p leads to xn+1 < p. In either case the iterates
remain on the same side of p. If on the other hand g 0 (p) < 0, the derivative
will be negative on an interval containing p. In this region (xn − p) and
(xn+1 − p) will have opposite signs, and consecutive terms of the sequence
will be on alternate sides of p.
Now let us consider the case where g(p) = p and g 0 (p) = 0. A second
order Taylor expansion of g results in
g(x) = g(p) + g 0 (p)(x − p) +
g 00 (ξ)
g 00 (ξ)
(x − p)2 = p +
(x − p)2
2
2
where ξ lies between x and p. The proof of Taylor’s formula requires the
Mean Value Theorem of Integration, and you are encouraged to check your
favorite Calculus text for details. Using xn+1 = g(xn ) as usual, we arrive at
xn+1 − p = g(xn ) − g(p) =
23
g 00 (ξn )
(xn − p)2
2
If the second derivative of g is bounded by some constant M we find that
|xn+1 − p| ≤
M
|xn − p|2
2
This feature is called quadratic convergence. Suppose, for instance, that xn
is within 0.02 units of p already, i.e. |xn − p| = 0.02, then |xn+1 − p| ≤
M
2 × 0.0004 and if M is not too big, xn+1 is a lot closer to p than xn was.
Example: Let g(x) = 2x − x2 with fixed points p = 0 and p = 1. Moreover,
g 0 (0) = 2 and g 0 (1) = 0. The fixed point of interest is p = 1. The table
below summarizes the iterations for x0 = 1.5.
Figure 8: g(x) = 2x − x2
Notice that in the last column (difference from the fixed point) each
value is the square of the number above. This is a very special case caused
by the fact that we are dealing with a quadratic function with g 00 (x) = −2.
Here we get
xn+1 − p =
g 00 (ξn )
(xn − p)2 = −(xn − p)2
2
In more general examples the quadratic decay is not quite as evident.
The appeal of Newton’s method lies in its rapid, quadratic convergence.
When you apply Newton’s method to the root finding problem f (x) = 0,
you use the iterative formula
xn+1 = xn −
24
f (xn )
f 0 (xn )
But these are just fixed point iterations with iteration function
g(x) = x −
f (x)
f 0 (x)
The fixed point condition g(p) = p leads directly to f (p) = 0 and thus a
fixed point of g is a root of f . For g 0 (x) you obtain
g 0 (x) = 1 −
f (x)f 00 (x)
f 0 (x)2 − f (x)f 00 (x)
=
f 0 (x)2
f 0 (x)2
and for x = p you have
g 0 (p) =
f (p)f 00 (p)
= 0
f 0 (p)2
since f (p) = 0. Hence the quadratic convergence of Newton’s method.
Problems occur if f 0 (p) = 0. Then the function is rather flat at its root, and
the efficiency of Newton’s method breaks down.
Of course we could also encounter the case where in addition to g 0 (p) = 0
we have g 00 (p) = 0. Then the convergence is even faster, but we shall not go
into further detail at this point.
Exercises:
1. Sketch a function g with a fixed point p and such that
(a) 0 < g 0 (p) < 1
(b) −1 < g 0 (p) < 0
(c) g 0 (p) = 0
(d) g 0 (p) = 1
(e) g 0 (p) > 1
(f) g 0 (p) < −1
and then sketch the associated cobwebs.
2. The function g(x) = cos x has one fixed point. Is it attracting or
repelling? Will the iterates oscillate about the fixed point?
3. Classify the fixed points of g(x) =
x
2
+ x2 .
4. Apply Newton’s method to the problem f (x) = x3 − 8.
25
(a) Find the iteration function g(x) and confirm that p = 2 is the
only fixed point of g.
(b) Compute g 0 (2).
(c) Use EXCEL to compute a few iterations, starting with x0 = 1
and keep track of the quantity xn − 2.
5. Apply Newton’s method to f (x) = sin x with p = 0. Identify the
associated iteration function g(x), and compute a few iterates starting
with x0 = 0.2. Compute g 0 (0), g 00 (0) and g 000 (0).
6. Suppose that a continuous function maps an interval into itself. Show
that it has at least one fixed point.
7. (a) Show that the function f (x) = sin x satisfies a Lipschitz condition. (L = 1).
(b) Find a constant L such that |x2 − y 2 | ≤ L |x − y| holds for all
0 ≤ x, y ≤ 5.
2.3.1
The Mean Value Theorem
We close this section with a review of the Mean Value Theorem from Calculus, and we look at some of its consequences.
Theorem: Suppose that a function f is continuous on the closed interval
[a, b] and continuously differentiable on the open interval (a, b). Then there
exists a number a < ξ < b such that
f 0 (ξ) =
f (b) − f (a)
b−a
(19)
holds.
Comments:
1. The Mean Value Theorem is graphically intuitive. If the function is
differentiable, then there should be a point in the interval such that
the tangent line at that point runs parallel to the line connecting the
endpoints.
2. Usually nobody cares to determine the exact location of the point ξ.
It must exist somewhere in the interval; on occasion there may even
be several points which meet the requirements.
26
Figure 9: Mean Value Theorem
3. The endpoints a and b in formula (19) can be interchanged; there is
no requirement that a < b.
4. The stipulation that the function be continuous on the closed interval
and differentiable on the open interval is a technicality. It is true that
differentiable functions are automatically continuous, but we waive the
differentiability requirement
for the endpoints. With these conditions
√
functions like f (x) = 1 − x2 on the interval [−1, 1] are covered by
the Mean Value Theorem.
5. Proof of the Mean Value Theorem.
It is difficult to present a proof in a review section, as it is not clear
what facts from Calculus have already been established, and what
needs to be proven. Therefore we outline the proof and present the
main ideas.
(a) A continuous function on a closed interval assumes its maximum
and its minimum. This is a very powerful theorem, and its proof
uses the completeness of the real number system. A detailed
proof can be found in every Advanced Calculus text.
(b) The derivative vanishes at a relative extremum. This follows more
or less directly from the limit definition of the derivative.
(c) Rolle’s Theorem. This covers the Mean Value Theorem for the
case where f (a) = f (b), and you have to verify the existence of
a value ξ with f 0 (ξ) = 0. If an extremum occurs in the inside of
the interval, we have the point ξ, due to part (b). If the absolute
maximum and the absolute minimum occur at the endpoints,
the function must be constant, because f (a) = f (b), and the
derivative of a constant function is zero everywhere (you can pick
any number in (a,b) to serve as your ξ).
27
(d) Finally, observe that the auxiliary function
g(x) = f (x) −
f (b) − f (a)
(x − a)
b−a
satisfies g(a) = g(b) (= f (a)), and that g 0 (ξ) = 0 is equivalent
(a)
to f 0 (ξ) = f (b)−f
. Now Rolle’s Theorem implies the validity of
b−a
the Mean Value Theorem.
In many applications the value a in the Mean Value Theorem is held
fixed, and b is replaced by the variable x, and the formula (19) is manipulated
into
f (x) − f (a) = f 0 (ξ)(x − a)
or
f (x) = f (a) + f 0 (ξ)(x − a) ,
where ξ lies between a and x. Remember that we can have either a < x in
which case a < ξ < x, or we have x < a with x < ξ < a. The wording that
ξ lies between a and x covers both cases.
Applications:
1. (Mathematics) If the derivative of a function is positive on an interval
I, then the function is increasing on I.
Proof: Select any two points a and b in the interval such that a < b
(these points do not have to be the endpoints of the interval I). Then
f (b) = f (a) + f 0 (ξ)(b − a) > f (a)
The inequality follows from the fact that ξ belongs to the interval
I, and thus f 0 (ξ) > 0. Moreover, b − a > 0 by assumption, and
since the product of positive numbers must be positive we find that
f 0 (ξ)(b − a) > 0. Hence, f (b) > f (a) whenever b > a, which makes f
an increasing function.
2. (Motion) A vehicle is known to travel between 50 mph and 60 mph.
How far will it travel in 4 hours? By inspection it is easy to see that
the distance covered is somewhere between 200 and 240 miles, but
what does this have to do with the Mean Value Theorem?
Let s(t) the position of the vehicle at time t and with s(0) as starting
position and s(4) as final position. Then the distance traveled is the
quantity s(4) − s(0). From the Mean Value Theorem we find that
s(4) − s(0) = s0 (ξ)(4 − 0) = 4s0 (ξ)
The velocity of the vehicle is given by v(t) = ds
dt , and since the speed
is known to be between 50 and 60 mph, have 50 ≤ s0 (ξ) ≤ 60 for any
0 < ξ < 4. Thus, 200 < 4s0 (ξ) < 240, and our claim is confirmed.
28
Exercises:
1. Although the exact location of ξ is usually not relevant, determine if
for the following cases. This should help with the understanding of
the Mean Value Theorem.
(a) f (x) = x2 − 2x on [0, 4],
√
(b) f (x) = x on [0, 1],
(c) f (x) = x + sin x on [0, 2π].
2. (Speeding ticket). A truck drives past a state police car in a 70mph
zone of highway. 10 minutes later and 13 miles down the road he
passes another patrol car and gets pulled over for speeding. Although
equipped with good vision and a radar detector he observed the speed
limit as he drove by the patrol cars, but he gets a ticket anyway. Use
the Mean Value Theorem to convict the trucker.
Assume that the cops were in radio contact and timed the trucker.
2.4
Inverse Iterations and Double Steps
On occasion it is necessary to look at backward iterations.
1
Example: Consider the iteration function g(x) = 1+ and find all initial
x
points x0 for which the sequence defined by xn+1 = g(xn ) will terminate after
a finite number of steps.
The first such point is x0 = 0, since g(0) is undefined. A second such
point would be a value x0 such that g(x0 ) = 0, which results in x0 = −1.
The next question is to find x0 such that g(x0 ) = −1, and so on.
If we proceed in this fashion, we generate a sequence of points yn such
that yn = g(yn+1 ). In particular, y0 = 0, y1 = −1, and y2 is found from
g(y2 ) = −1 which leads to
−1 = g(y2 ) = 1 +
1
y2
which implies that
y2 = −
1
2
This is an example where we look at iterations in reverse direction.
Generally, a backward iterations satisfy the relationship
xn = g(xn+1 )
which leads us to inverse functions.
29
(20)
Recall that the function h is the inverse of the function g if
y = h(x)
is equivalent
x = g(y)
Applying this condition twice we find that g(h(x)) = g(y) = x for all x in
the domain of g and that h(g(y)) = h(x) = y holds for all y in the range of
g. Also recall that a function has an inverse if and only if it is one-to-one,
and that the graph of a function and its inverse are mirror images about he
diagonal y = x.
In returning to our iteration analysis, let h be the inverse function of g.
Then (20) implies that
xn+1 = h(g(xn+1 )) = h(xn )
and the backward iteration is driven by the inverse function of g, which was
to be expected.
Now let p be a fixed point of g. Then
h(p) = h(g(p)) = p
and we see that g and its inverse have the same fixed points. This can also
be explained by the symmetry of their graphs. The graphs of a function and
its inverse cross the diagonal y = x at the same locations.
Now we take the derivative on both sides of the identity
x = h(g(x))
and we obtain
1 = h0 (g(x))g 0 (x)
At a fixed point p this leads to
1 = h0 (g(p))g 0 (p) = h0 (p)g 0 (p)
and thus
h0 (p) =
1
g 0 (p)
It now follows that the fixed points have reversed characteristics. If |g 0 (p)| <
1 then |h0 (p)| > 1 and p is attracting for g but repelling for h, and vice versa.
Again, this result is intuitive: If the original iterations move toward p, the
backward iterations should move away from it. The case g 0 (p) = 0 needs to
be treated separately.
Sometimes it is useful to take double steps.
30
Example: Let g(x) = 3.2x(1 − x) take x0 = 0.3. Then, by looking at the
data, we notice that the iterates eventually oscillate between 0.80 and 0.51,
approximately. Neither one of these values is a fixed point of g, but if we
gather the data with even and odd subscripts separately, we might have tow
convergent sequences.
Figure 10: g(x) = 3.2x(1 − x)
Taking a double step means that we each time we apply the function g
twice, and then record the values. This is tantamount to working with the
iteration function
h(x) = g(g(x)) = (g ◦ g)(x)
Here h is the composition of g with itself. The new function h inherits all
the fixed points of g, because if p is a fixed point of g then
h(p) = g(g(p)) = g(p) = p
But the function h may have additional fixed points. If p is fixed point of
h, then q = g(p) is a fixed point of h as well (exercise), and we call q the
companion fixed point of p. If p is also a fixed point of g, and p is its own
companion (exercise).
In order to check the nature of the fixed points of h, we find from the
chain rule that
h0 (x) = g 0 (g(x)) g 0 (x)
31
For a fixed point p of h with companion fixed point q = g(p) it follows that
h0 (p) = g 0 (g(p)) g 0 (p) = g 0 (q) g 0 (p) = h0 (q)
By symmetry it follows that p and its companion q have the same slopes. If
p is also a fixed point of g, we have that p = q, and h0 (p) = g 0 (p)2 . Hence
an attracting fixed point of g is also attracting for h, and conversely, if it is
repelling for g then it is repelling for h as well.
Example: We return to g(x) = 3.2x(1 − x). This function has two fixed
points, namely p = 0 and p = 11
16 = 0.6875, both of which are repelling, since
0
0
g (0) = 3.2 and g (0.6875) = −1.2.
When we form the function h we find that
h(x) = g(g(x)) = 3.2g(x)(1 − g(x))
= 10.24x(1 − x)(1 − 3.2x(1 − x))
= 10.24(x − 4.2x2 + 6.4x3 − 3.2x4 )
As expected, h(0) = 0 and h(0.6875) = 0.6875, and h inherits the fixed
Figure 11: g(x) and h(x)
points of g. But (using numerical calculations) we can find two more fixed
points of h, namely p = 0.513, 045 and q = 0.799, 455. These are companion
fixed points, since g(p) = q and g(q) = p.
Upon differentiation we obtain
h0 (x) = 10.24(1 − 8.4x + 19.2x2 − 12.8x3 )
h0 (0) = 10.24 = 3.22 and h0 (0.6875) = 1.44 = (−1.2)2 , as predicted by our
analysis. At the additional fixed points we have h0 (0.513) = h0 (0.799) =
0.159 and both new points are attracting for h.
32
Exercises:
1. Let g(x) = 1 + x1 .
(a) Compute the inverse function of g.
(b) Compute the fixed points of g and its inverse.
(c) Analyze the nature of the fixed points of g and its inverse.
2. Perform a fixed point analysis for the linear function g(x) = mx + b.
(a) Under which conditions does g have exactly one fixed point, and
what is its value?
(b) Under which conditions is the fixed point of g attracting?
(c) Under which conditions does g have an inverse function?
(d) Under which conditions does the inverse of g have exactly one
fixed point.
(e) Under which conditions is the fixed point of h attracting?
(f) Summarize your findings.
3. Let g(x) =
4
x
(a) Find the fixed points of g.
(b) Find the fixed points of h = g ◦ g.
4. (a) Let g be an iteration function, and let h = g ◦g be the double step
function. Show that if p is a fixed point of h, then its companion
q = g(p) ia a fixed point of h as well.
(b) Show that the companion of the companion is the original point.
(c) Show that p is a fixed point of g if and only if p is its own companion.
2.5
The Logistic Map
Further Reading: Wikipedia Article on the Logistic Map
http://en.wikipedia.org/wiki/Logistic_map
In the literature the function
g(x) = r x (1 − x)
33
0<r≤4
(21)
is called the logistic map, and its fixed points have been studied extensively.
The web page http://en.wikipedia.org/wiki/Cobweb_plot contains
an animation this subject. In the previous section we considered the special
case r = 3.2 in our example.
The graph of the function g is a parabola which opens downward and
with x-intercepts at x = 0 and x = 1. Its maximum value is assumed at
the vertex with g( 21 ) = 4r ≤ 1, since by assumption r ≤ 4. If we start our
iterations in unit interval [0, 1], then all subsequent terms will remain in I,
since 0 ≤ g(x) ≤ 1 for all x ∈ I.
Clearly, p = 0 is a fixed point of g, and the second, non-trivial fixed
point is located at
1
p=1− ,
r
the details are left as an exercise. This fixed point lies to the left of [0, 1]
when r < 1, and as r approaches 1 it approaches p = 0. For 1 < r ≤ 4, it
belongs to the interval [0, 1]; in fact p ≤ 34 . The derivative of g is
g 0 (x) = r − 2rx
and we have
g 0 (0) = r
and
g0 1 −
1
r
= 2−r
(22)
We are now ready for a case study. If 0 < r < 1, the non-trivial fixed
point does not belong to [0, 1], and the fixed point p = 0 is attracting. Things
change significantly when r > 1. Now p = 0 is repelling, and p = 1 − 1r is
attracting as long as r < 3. The points r = 1 and r = 3 are called bifurcation
points for the problem.
Now let us consider what happens when r > 3. Both fixed points are
repelling, and iterations cannot converge to either one of them! On the other
hand, the sequence {xn } remains bounded with 0 ≤ xn ≤ 1. Therefore it
must have accumulation points somewhere in the interval [0, 1]. How many
are there, and where are they located?
Since we observed oscillations in the example of the precious section, we
consider double steps with the iteration function
h(x) = g(g(x)) = rg(x)(1 − g(x))
= r2 x(1 − x)(1 − rx(1 − x))
h inherits the fixed points of g, and it is trivial to confirm that h(0) = 0.
The verification that h(1 − 1r ) = 1 − 1r is left as an exercise. To find the
34
remaining fixed points, we must solve h(x) = x, which leads to
r2 x(1 − x)(1 − rx(1 − x)) − x = 0
(23)
We will solve this problem by factoring. The factor x is obvious, and since
1 − 1r is a fixed point, there must be a factor of the form (x − 1 + 1r ). The
rest is long division and factoring, and we find that (23) is equivalent to
rx
x−1+
1
r
r2 x2 − r(r + 1)x + r + 1
= 0
We find the remaining fixed points by solving the quadratic equation r2 x2 −
r(r + 1)x + r + 1 = 0 with the result
p1,2 =
1
2r
r+1±
q
(r − 1)2 − 4
(24)
The preceding computations are elementary, yet time consuming. Since
r > 3, the radicand is positive, and we don’t have to deal with complex
solutions. In the limiting case where r = 3 we have p1,2 = 23 = 1 − 13 , and
we see that at this bifurcation point the fixed point p = 23 branches into the
two fixed points p1,2 . The two points are companions fixed points of each
other, since g(p1 ) = p2 (exercise).
The derivative of h will clarify the nature of the fixed points. The computation of h0 is straightforward.
h0 (x) = g 0 (g(x)) g 0 (x)
= r2 (1 − 2x) (1 − 2rx(1 − x))
h0 (0) = r2 is obvious, h0 (1 − 1r ) = (2 − r)2 after some work. For p1 and p2
we find
h0 (p1 ) = g 0 (g(p1 )) g 0 (p1 ) = g 0 (p2 ) g 0 (p1 )
= r2 (1 − 2p1 ) (1 − 2p2 )
=
−1 +
q
(r −
1)2
= 1 − (r − 1)2 − 4
−4
−1 −
q
(r −
1)2
−4
= 4 + 2r − r2
and by symmetry we know that h0 (p1 ) = h0 (p2 ). The two fixed points
√ are
attracting if −1 < h0 (p1,2 ) < 1, which is equivalent to 3 < √
r < 1+ 6 =
3.449. In this interval we observe the two cycles, and 1 + 6 is the next
bifurcation point.
35
Figure 12: Bifurcation Map
√
We stop our case study at this point. For r > 1 + 6 four-cycles appear
for a while, and after that it becomes even more complicated. The bifurcation map, shows the attracting fixed points and cycles as a function of
r.
Exercises:
1. Compute the fixed points of g.
2. Verify the assertions in (22).
3. Let r = 1. Since g 0 (0) = r = 1 our theory for attracting or repelling
points does not apply. Do iterates approach p = 0 when 0 < x0 < 1?
4. Sketch a cobweb for the function g(x) = 3.2x(1 − x).
5. Confirm that h(1 − 1r ) = 1 − 1r .
6. Confirm the calculations leading to (24).
1
1
7. Let p1 = 2r
(r+1− (r − 1)2 − 4) and p2 = 2r
(r+1+ (r − 1)2 − 4),
confirm that g(p1 ) = p2 and that g(p2 ) = p1 .
p
p
36
3
Discrete Population Models
Suggested Reading: Mooney/Swift: Chapter 1, Discrete Dynamical Systems.
In this chapter we look at discrete population models which have the form
the form xn+1 = g(xn ) for some iteration function g. Here xn represents
the population in the n-th time interval. We will also look at simple models
from finance, some of which are mathematically quite similar to population
growth in biology.
3.1
Exponential Growth
Population models in biology usually consider the following equation
Change
of
Population
= Births − Deaths + Immigration − Emigration
For simplicity let us assume that we look at annual changes of the population. In our models we assume that births and deaths are proportional to
the current population p. Then the number of births in a year is bx, where
b is the birth rate (number of births per capita per year), and the deaths
are given as mx, with m being the mortality rate. The net growth rate is
the difference r = b − m. This quantity can be negative; in this case the
deaths outnumber the births and the population is declining. If the current
population is denoted by xn , then in the following year it will be
xn+1 = xn + rxn + fn = (1 + r)xn + fn
(25)
where the term fn accounts for the difference of immigration and emigration.
In these calculations xn may not always come out as an integer, and we may
interpret it as an expected value or as biomass. Equation (25) is an iterative
model with iteration function
g(x) = (1 + r)x + f (x)
The term f is complex. If emigration and immigration are proportional
to the population, we can include it as part of the net rate r. Otherwise
unforseen events, such as war, famine, natural disasters in the case of human
populations, result in emigration or immigration. For other populations,
such as fish or trees, f might represent fishing or logging, or it could represent
37
animals released into the wild. For our purposes we will first assume that
f = 0, and consider constant values of f later on.
Without emigration or immigration the iteration function becomes
g(x) = (1 + r)x. It is easy to see that p = 0 is the only fixed point of
this function. The fixed point is attracting, if r < 0 (and r > −2), and
the population will die out in this case. If r > 0 the population will grow
exponentially.
It is straightforward to find an explicit formula for xn when it is defined
by
xn+1 = (1 + r)xn .
We have x1 = (1 + r)x0 , x2 = (1 + r)x1 = (1 + r)2 x0 , and inductively we
find that
xn = x0 (1 + r)n ,
The population grows exponentially, since we are taking powers of (1 + r).
Example: The world population in 1950 was 2.557 billion, and by 2000 it
had increased to 6.069 billion, according to the U.S. Census Bureau. What
is the annual growth rate for that period?
If we take n = 0 as 1950, we have x0 = 2.557 and x50 = 6.069. This
leads to the equation
6.069 = 2.557 (1 + r)50
and solving for r yields
r =
6.069
2.557
1/50
− 1 = 0.01744 = 1.744%
Exercises:
1. A population grows at 4% annually. How long will it take until it
doubles in size?
2. Use the data from the example to compute the world population for
2010.
3. Suppose that a population declines at 6% per year, and that initially
100 individuals are present.
(a) How long will it take until the population drops below 40?
38
(b) In order to slow the decline, at the end of each year 3 individuals are added. How long will it take until the population drops
below 40 for the first time. Design an EXCEL to simulate the
population development.
3.1.1
Models From Finance
Populations in real life will exhibit fluctuations in growth rates, due to numerous biotic and abiotic factors, and most of all, counting or estimating a
population is a challenge in itself. In the world of finance matters are much
easier. You know the interest rate on your loan, and balance sheets are calculated to the penny. No ambiguities there. Compound interest follows the
same principles as population growth, and we can copy our results directly.
Compound Interest. Let P be the original investment, the principal,
and let r be the nominal annual interest rate. If the account has value An ,
then in the following year it will be valued at
An+1 = An + rAn = (1 + r)An
By induction we find that the value of the investment after n years (future
value) is given by
An = P (1 + r)n
(26)
Example: In 1626 Peter Menuit bought Manhattan Island from the natives
for the equivalent of $ 24. If invested at 3 % interest its value in 2012 would
be
A = $24 × 1.03386 = $2, 164, 611.58 ,
a bargain at today’s real estate prices.
Example: The ”Rule of 72” is a simple method to find out how long it
takes for an investment to double in value when invested at a given interest
rate. If T is the doubling time, and r is the interest rate, then rT = 0.72.
For instance, at 3% interest an investment will double its value in 24 years,
since 3 × 24 = 72. Keep in mind that 3% = 0.03, and we get 24r = 0.72.
This rule is just an estimate. An exact calculation begins with
2P = P (1 + r)T
which results in
T =
ln 2
1+r
or
39
r = 21/T − 1
The Taylor expansion of 2x is
2x = 1 + (ln 2) x +
(ln 2)2 2
x + ···
2
and thus
ln 2
,
T
or equivalently, rT ≈ ln 2 = 0.693147. This shows that ”Rule of 72” should
actually be called the ”Rule of 69”. But 69 doesn’t have as many nice
divisors, and 72 it is for this rule of thumb.
r = 21/T − 1 ≈
Interest is not always computed annually. Savings accounts, loans and
mortgages are based on monthly compounding, some banks use daily compounding, and occasionally quarterly compounding is applied. The mathematics in this case is similar.
Let m be the number of compounding cycles per year and set i = r/m.
Here i is the interest rate per compound period. For instance, 18% interest
compounded monthly results in i = 1.5%. If t is time measured in years,
and n is the number of compound cycles, we have n = mt. For instance
n = 60 monthly payments cover t = 5 years. With these stipulations (26)
becomes
r mt
n
An = P (1 + i) = P 1 +
,
(27)
m
just adjust the interest rate and count in compound cycles rather than in
years.
Example: Find the future value of $1,000 in five years when invested at 6%
interest compounded (a) annually, (b) quarterly, (c) monthly and (d) daily.
Formula (27) can be used in all four cases. For part (a) we have m =
1, and A = $1000 × 1.065 = $1, 338.23, and in (b) we have m = 4 and
A = $1000 × 1.01520 = $1, 346.86. In (c) we get i = 0.06
12 = 0.005 and
m = 60, which leads to A = $1, 348.85 and the final answer for part (d) is
A = $1, 349.83.
From the example it is evident that different compound methods lead
to different results. The effective interest re is the annual rate which would
result in the same yield. In one year, for instance, $100 grows to
$100 × 1.01512 = $119.65
when invested at 18% compounded monthly, which is the equivalent of a
19.65% annual interest rate. The general formula for the effective interest
40
rate is
r m
re = (1 + i) − 1 = 1 +
−1
m
Annuities. These are regular payments made into an account. The
geometric sum formula
m
n
X
rk =
k=0
1 − rn+1
rn+1 − 1
=
1−r
r−1
where
r 6= 1
(28)
will play an important role in our calculations.
Let R be the amount of the regular payments, and let Sn be the value
of the annuity after n cycles. Then Sn satisfies the recurrence relation
Sn+1 = (1 + i)Sn + R
with
S1 = R ( or S0 = 0 )
(29)
Here we assume that the payments are made at the end of each cycle, including the first payment. With
g(x) = (1 + i)x + R
the iterations for S take the form Sn+1 = g(Sn ).
It is feasible to derive an explicit formula for Sn by induction. We have
S1 = R,
S2 = (1 + i)S1 + R = ((1 + i) + 1) R
The next time around we have
S3 = (1 + i)S2 + R = ((1 + i)2 + (1 + i) + 1) R ,
and after n cycles this grows to
Sn = (1 + i)Sn−1 + R
= ((1 + i)n−1 + · · · + (1 + i)2 + (1 + i) + 1) R
=
n−1
X
!
(1 + i)
k
R =
k=0
(1 + i)n − 1
R
i
(30)
Example: You make monthly payments of $100 into an account that draws
4% interest. What is the value of this investment after five years?
You paid $6,000 into this account, and since there are no fees, you should
have substantially more than that in your annuity account. With i = 0.04
12 =
1
300 and n = 60 we find from (30) that
S60 =
301 60
( 300
) −1
1
300
× 100 = 6, 629.90
41
and you have earned $629.90 in interest.
Loans. Suppose that you borrow P dollars with annual interest rate
r, and that you pay R dollars each month. If the balance on your account
after n payments is Bn , then the next month’s balance is
Bn+1 = (1 + i)Bn − R
(31)
This is an iterative process with iteration function g(x) = (1 + i)x − R and
Bn+1 = g(Bn ). Inductively we obtain the following pattern
B0 = P
B1 = (1 + i)P − R
B2 = (1 + i)B1 − R = (1 + i)2 P − ((1 + i) + 1)R
B3 = (1 + i)3 P − ((1 + i)2 + (1 + i) + 1)R
..
.
n
Bn = (1 + i) P −
n−1
X
!
k
(1 + i)
R
k=0
(1 + i)n − 1
R
i
This result shows nicely what is going on. The lender could have invested
the principal P and earned interest on it, the borrower could have set up an
annuity, and gained some equity. Once the value of the annuity catches up
with the lender’s investment, the loan is paid off. If this is to be done with
n payments, we set Bn = 0 and solve for R with the result
= (1 + i)n P −
R =
i(1 + i)n P
iP
=
n
(1 + i) − 1
1 − (1 + i)−n
Example: What are the monthly payments for a $20,000 car loan over five
years at 6% interest?
We have m = 12, n = 12 × 5 = 60, i = 0.06
12 = 0.005 and P = 20, 000,
and R becomes
0.005 × $20, 000
$100
=
= $386.66
R =
−60
1 − 1.005
1 − .741372
The last payment is usually different by a small amount to make up for
rounding. Your total payments add up to $23,199.36, and you pay $3,199.36
interest.
Exercises:
42
1. How much money should you invest today into an account which draws
4% interest so that you will have $50,000 ten years from now? Repeat
the problem with 6% interest.
2. How much money should you invest at the end of each month into an
account with 4% interest so that you will have $50,000 ten years from
now? Repeat the problem with 6% interest.
3. What are the monthly mortgage payments if you borrow $120,000 on
a 30 year loan at 4% interest? How would changing to a 15 year loan
affect the mortgage payment?
4. Set up an EXCEL sheet which shows the balance for each month in
the car payment example. Round all entries to cents (in the bank’s
favor), and find the final payment.
3.1.2
Inhomogeneous Terms
Let us return to general iteration problems. If we use m = 1, and set
xn = Sn and f = R, then the iteration formula (29) becomes
xn+1 = (1 + r)xn + f
,
x0 = 0
with solution
(1 + r)n − 1
f
r
based on (30). If the starting value is changed to an arbitrary x0 , the solution
to
xn+1 = (1 + r)xn + f
xn =
becomes
(1 + r)n − 1
f ,
(32)
r
and we have found the general solution to an exponential growth model with
an inhomogeneity.
xn = (1 + r)n x0 +
Example: (Mooney/Swift) A population of Florida sandhill cranes is decreasing at a rate of 3% per year, and wildlife managers release 5 birds
annually to maintain the survival of the population. Suppose that initially
100 birds are present, compute the size of future populations.
Given the data we consider the iteration model
xn+1 = 0.97xn + 5
43
x0 = 100
Here r = −0.03. According to (32) we find that
xn = 100 (0.97)n +
(0.97)n − 1
500 200
5 =
−
(0.97)n
−0.03
3
3
Eventually the population approaches p =
is balanced by the additional 5 birds.
500
3 .
At this level, the 3% decline
Exercises:
1. A population grows according the the law
xn+1 = 1.06xn − 5
,
x0 = 100 ,
that is, we have 6% net growth, and 5 individuals are removed at the
end of each cycle. Find a formula for xn .
2. Find an explicit formula for the population in Exercise 3 from the
Exponential Growth Section.
3. Prove formula (32) by induction.
3.2
Logistic Growth
Growth at a fixed rate always leads to an exponential population explosion,
where the population grows without bound. The basic idea in logistic growth
is to make the growth rate dependent on the population size. We need fast
growth for relatively small populations, slow growth for large populations,
and negative growth if the population exceeds a carrying capacity K. Using
x
a linear model, the growth rate takes the form ρ = ρ(x) = r− Kr x = r(1− K
),
with the rate r representing the growth rate for small populations. If we
apply xn+1 = xn + ρ(xn ) xn , as usual, we are lead to the iterative scheme
1−
xn+1 = xn + r
xn
K
xn
(33)
Here we have the quadratic iteration function
x
g(x) = x + r 1 −
x
K
r 2
rx
= (1 + r)x −
x = x 1+r−
K
K
44
(34)
Figure 13: Logisitc Growth Rate
Figure 14: Logistic Growth Curve
and we use whichever expression is most convenient. This function is reminiscent of the iteration function from Section 2.5. However, the factor r in
the earlier section is now replaced by r + 1, and the crucial bifurcation will
occur when r = 2.
Example: Plot the first 100 terms for the logistic growth model with r =
0.1, K = 1000 and x0 = 20. Here the recurrence relation is
xn+1 = 1.1xn − 0.0001x2n
,
x0 = 20
The computation and the graph are done in EXCEL. The graph shows the
logistic curve (blue) along with the corresponding exponential curve (red),
where the quadratic term of the recurrence relation is dropped. On the onset,
at very small populations, the population grows almost exponentially. But
45
then the growth slows down, and eventually the population approaches the
carrying capacity. In our example we have x100 = 998.0
Logistic growth always exhibits the characteristic S-shape. The fastest
growth occurs when the population is at about one half of the carrying
capacity. In our model we always assume x ≥ 0, 0 < r ≤ 3 and K > 0. The
condition r ≤ 3 will become evident later, and in order to avoid negative
iterates we also assume that x ≤ (1 + 1r )K.
We will now perform a fixed point analysis on g. As always, we find the
fixed points from the equation x = g(x), which is equivalent to
x
rx 1 −
K
= 0
Hence the fixed points are p = 0 and p = K. For the derivative of g we find
that
2r
g 0 (x) = 1 + r −
x
K
The point p = 0 is repelling, since g 0 (0) = 1 + r > 1. For the second fixed
point we find that
g 0 (K) = 1 − r ,
and this point is attracting as long as 0 < r < 2, which takes us immediately
to the question as to what happens when r > 2?
Example: Graph the first twenty terms for the logistic growth model with
r = 2.2, K = 1000 and x0 = 20. The work is done in EXCEL and the graph
is shown below.
The oscillations in the graph are very reminiscent of those seen in the
previous chapter, and we will now find the connection between the two. The
key lies in a change of variable. We define a new variable u via
1
1+
r
x = K
u
or
u =
rx
(r + 1)K
(35)
r
Then x = 0 is equivalent to u = 0, and x = K results in u = r+1
. Now let
us look at the recurrence relation (33). Upon substitution find that
K
1
1+
un+1
r
= xn+1 =
= K
1
1+
r
rxn
1+r−
xn
K
un (1 + r − (r + 1)un ) ,
46
Figure 15: Logistic Growth Curve with r = 2.2
and after cancelation we see that
un+1 = (r + 1)un (1 − un )
We can√now apply the results from Section
√ 2.5 with r replaced by r + 1.
If 2 < r < 6, it follows that 3 √
< r+1 < 1+ 6, and the iterations will show
periodic two-cycles. For r > 6, at first four-cycles will appear, and the
behavior will become more chaotic as r increases. If r > 3, the populations
will eventually become negative.
Example: Further analysis of r = 2.2 and K = 1000. The change of
2.2
11x
variable formula implies that u = 3200
x = 16000
. In (24) we also found
a formula for the fixed points of the attracting two-cycle. In our case this
becomes
√
√
1
21 ± 21
p1,2 =
(4.2 ± 0.84) =
6.4
32
These are the points in the cycle of u. If we convert back to x, we get
√
√
16000 21 ± 21
500(21 ± 21)
q1,2 =
=
11
16
11
Numerically these values are q1 = 746.247 and q2 = 1162.844, which agrees
well with the observations in the EXCEL experiment.
Exercises:
47
Figure 16: Logistic Growth Rate with Harvesting
1. Design an EXCEL sheet for logistic growth. This sheet should have
a block where you enter the parameters r, K and x0 . Include an
automatic graph of the growth curve.
2. Confirm that if x0 ∈ I = [0, (1 + 1r )K] all subsequent iterations xn will
remain in I, provided that r ≤ 3.
3. Consider a logistic growth model with K = 200 and r = 2.25. Determine the values in the ensuing two-cycle numerically (EXCEL) and
analytically (change of variable). The starting value x0 is irrelevant.
3.3
Logistic Growth with Harvesting
Here we look at a population which follows a logistic growth pattern, but
during each cycle a fixed number of individuals is being removed. This could
serve as a model for the fish population in a lake or an ocean being subjected
to fishing, or for harvesting of a plant population in the wild.
The model is logistic growth with a slight modification
xn+1 = xn + r
xn
xn − h
1−
K
Here h ≥ 0 represents the harvest constant. The iteration function for our
model is
r 2
g(x) = x + rx −
x −h ,
K
which is the parabola from the previous section, lowered by the amount
h. This changes the region where g(x) ≥ 0, and it alters the fixed points.
Over-harvesting may lead to the extinction of the species.
48
Example: We explore the harvesting model for r = 0.1, K = 1000, x0 =
200 and h = 10. The iteration formula becomes
xn+1 = 1.1xn − 0.0001x2 − 10
and the resulting population growth displayed on the EXCEL sheet. Notice
that the steady state drops considerably from K = 1000 to below 900.
Figure 17: K = 1000, h = 10, r = 0.1
Let us now take a look at the fixed points. The equation x = g(x) results
in the quadratic equation
x + rx −
r 2
x −h = x
K
with solutions
p1,2
K
=
±
2
s
K
2
2
−
Kh
r
Here we chose the sign so that p1 ≤ p2 . For h = 0 (no harvesting) the fixed
points are p1 = 0 and p2 = K, as in the previous section. When h = rK
4
the two points coincide, and we just have a single fixed point located at
p = K2 . In this case the parabola y = g(x) is tangent to the diagonal y = x.
∗
We call h∗ = rK
4 the critical harvest. When h > h we have lowered the
parabola too far; there are no fixed points, and the population is headed for
extinction. In the sequel we assume that h < h∗ , so that the square root
term always remains positive.
49
Let us rewrite the expression for the fixed points some more. It is easy
to verify that
K
2
K
2
2
2
−
=
Kh∗
r ,
Kh
r
=
=
and the radicand becomes
K ∗
K2 4
(h − h) =
(h∗ − h)
r
4 rK
K 2 h∗ − h
4
h∗
and for the fixed points we obtain
p1,2
K
K
=
±
2
2
s
h∗ − h
h∗
The fixed points are symmetric about K2 , and since the square root term lies
between 0 and 1, the fixed points are between 0 and K, as expected from
the graph.
Now let us look at the derivative. g 0 (x) = 1 + r − 2r
K x, and a brief
calculation shows that

2r
g 0 (p1,2 ) = 1 + r −
p1,2 = 1 + r − r 1 ±
K
s
= 1∓r
s

h∗ − h 
h∗
h∗ − h
h∗
q
∗
For the smaller fixed point we obtain g 0 (p1 ) = 1 + r h h−h
> 1, and this
∗
point is always repelling. For initial populations x0 close to p1 we conclude
that if x0 < p1 , the iterates xn will decrease and the population will become
extinct, and that if x0 > p1 , the iterates xn will increase toward p2 .
For the second fixed point we have
s
g 0 (p2 ) = 1 − r
h∗ − h
h∗
g 0 (p2 ) < 1 is built in, and we need to be concerned with g 0 (p2 ) > −1. The
square root term lies between 0 and 1, and thus g 0 (p2 ) > 1 − r and p2 will
always be attracting, if r < 2. When r > 2, the stability of p2 depends on
h, and if h is close enough to h∗ , the fixed point p2 will still be attracting.
Otherwise oscillations will occur, and eventually we get into the realm of
chaotic behavior.
50
Example: We return to the model with r = 0.1, K = 1000 and x0 = 200.
Here the critical harvest is h∗ = rK
4 = 25, and with h = 10 we are well
below the limit. The two fixed points are p1 = 112.3 and p2 = 887.3. Since
x0 > p1 the iterates converge to p2 , which we observed before. If the initial
population x0 had been below p1 , it would have become extinct.
Example: In this example we use K = 1000, r = 4 and x0 = 800, and we
examine the behavior of the populations for different values of h. Here the
critical harvest is h∗ = 1000, and since r > 2, the stability of p2 depends on
h.
√
1. Case h = 800. A brief calculation shows that
p2 = 500 + 100 5 =
√
723.6 and g 0 (p2 ) = g 0 (p1,2 ) = 1 − 4 5 5 = −0.789. Thus, p2 is
attracting and it is close enough to h∗ to maintain stability. Since
g 0 (p2 ) < 0, the iterates oscillate about the fixed point.
√
√
2. Case h = 600. Now p2 = 500 + 100 10 and g 0 (p2 ) = 1 − 45 10 =
−1.530. This shows that the fixed point is repelling, and the graph
exhibits a four-cycle.
3. Case h = 450. When h is decreased even more, there is no hope for a
stable equilibrium. A graph supports this finding.
51
Case h = 800
Case h = 600
Case h = 450
Results based on the Fixed Point Theorem are local in nature, and we
close this section by looking at locations for the starting point x0 . The
52
stipulation that h < h∗ was put in place to assure that the function g has two
fixed points. It also implies that the the parabola y = g(x) has two distinct,
positive x-intercepts q1,2 , and we assume that 0 < q1 < p1 < p2 < q2 (the
fixed points must lie between the roots of g). If x0 < q1 it follows that
x1 = g(x0 ) < 0, and the population vanishes. We have also seen that the
population decreases to extinction if x0 < p1 , because p1 was repelling.
These are reasonable outcomes for our population model. However, our
model also allows for negative iterates, if x0 > q2 , and this must be ruled
out explicitly, since having a fairly large population disappear in one step
makes no sense. Therefore we are lead to the requirement
1+r
1q
x0 < q 2 =
K+
(1 + r)2 K 2 − 4rhK
2r
2r
on the starting point x0 .
Finally, we need to make sure that we will not be thrown past q2 in
the course of the iterations. Therefore, the maximum value of g should be
less than or equal to q2 . But g is a quadratic function, and the maximum
occurs at the vertex. With some manipulations this results in the inequality
condition
q
(r2 − 1)K − 4rh ≤ 2 K 2 (1 + r)2 − 4rhK
(36)
This is strictly a condition on the parameters of the model. For instance,
if r ≤ 1, the expression on the left hand side of (36) is always negative,
and we are safe. (36) is also satisfied if the expression (r2 − 1)K − 4rh is
non-positive, which can be rewritten as
h∗ − h 2
r ≤ 1
h∗
Reworking the entire inequality (36) into something more manageable is
cumbersome, and we shall abstain from this endeavor.
(r2 − 1)K ≤ 4rh
or
Exercises:
1. Discuss the stability of the fixed point p =
it attracting? Is it repelling?
K
2
when h = h∗ =
rK
4 .
Is
2. Let K = 1000, r = 4 and h = 700. Compute the fixed point p2 and
determine if it is attracting. Use EXCEL to illustrate your result.
3. (a) Confirm that if h < h∗ the function g has two positive roots.
(b) Give an argument why the fixed points of g must be located
between the roots of g.
4. Verify the condition (36).
53
4
Continuous Population Models
Suggested Reading: Review your Calculus book for information on differential equations. This is usually found with the applications of integration,
some books devote an entire chapter to this topic. Some information can be
found in Mooney/Swift, Chapter 5.
Discrete models are well suited for populations with periodic reproduction patterns, such as animals with set mating seasons, or for plants with
annual cycles. Other species reproduce pretty much year around, and we
might look for refinements in the model. For human populations, for instance, one might be interested in monthly, or even daily changes. In this
chapter we make the transition to continuous models.
4.1
Differential Equations
In our work so far expressed the population models in terms of recurrence
relations
xn+1 = g(xn )
with iteration function g(x). The same result can be accomplished using
difference equations of type
xn+1 − xn = f (xn )
These two methods are completely equivalent when we use
g(x) = x + f (x) ,
(37)
there is just a different philosophy behind the approaches. With recurrence
relations and iteration functions you have a rule on how to proceed from one
state to the next, while difference equations emphasize the change form one
step to the next. For instance, in logistic population growth we may write
either
xx+1 = xn + r 1 −
xn
r 2
xn = (1 + r)xn −
x = g(xn )
K
K n
which is the recurrence view, or
xn
= r 1−
xn = f (xn )
K
xn+1 − xn
54
which is a difference equation. Both models are equivalent, since
x
x + f (x) = x + r 1 −
K
x = g(x)
In order to make the transition to continuous models, we will adjust
our notation. Using subscripts was fine as long as we stepped forward one
time cycle at a time, but it becomes cumbersome when we want to use the
transitional stages in between. Therefore we write x(n) in place of xn . In
this notation, xn+1 = g(xn ) is replaced by x(n + 1) = g(x(n)), if we step
forward in time by one time unit, and with difference equations we have
x(n + 1) − x(n) = f (x(n))
Now suppose that we want to step from time t to time t + ∆t, that is,
we advance time by the amount ∆t. Then we have to multiply f by ∆t as
well, in order to be consistent with the original model, and we obtain
x(t + ∆t) − x(t) = ∆t f (x(t))
(38)
When ∆t = 1 this is equivalent to x(t + 1) = x(t) + f (x(t)) = g(x(t)), our
usual discrete model. We have seen this modification (the multiplication by
∆t) before in the compound interest section. If a nominal annual interest
r
rate r is applied to monthly compounding, the interest per month is i = 12
.
1
Here ∆t = 12 , since we advance time by one twelfth of a year. Equation
(38) can be rewritten as
x(t + ∆t) − x(t)
= f (x(t))
∆t
and taking the limit ∆t → 0 (we want to take extremely small time steps)
results in
d
x(t + ∆t) − x(t)
[x(t)] = lim
= f (x(t)) .
∆t→0
dt
∆t
It is customary to omit t all together and write
dx
= f (x)
dt
(39)
This equation is the prototype of a general first order, autonomous (time
independent) differential equation. A solution of a differential equation is a
function x(t) so that the equation (39) becomes an identity, that is,
d
[x(t)] = f (x(t))
dt
55
is satisfied for all t under consideration.
We will not discuss general solution strategies at this point, but it is
worthwhile to think about the nature of this equation a little: it makes
a connection between the slope of the curve, given by dx
dt and its function
values x; the connection is made via the function f . The purpose of the
next two examples is to illustrate this concept further.
d 2
Example: Let x(t) = t2 . Then dt
[t ] = 2t, and if t ≥ 0, we have 2 x(t) =
√
2
2 t = 2t, and function x solves the differential equation
p
√
dx
= 2 x,
dt
This tells you that in order to find the slope at a point, you can take the
square root of the function
√ value and double it. For instance, the slope at
the point (4, 16) is m = 2 16 = 8. In the exercises you are asked to confirm
that any parabola of the form x = (t + C)2 is a solution to the differential
equation.
Example: Consider the differential equation
dx
= t−x
dt
Here we try to find a function so that the derivative coincides with the
difference of the time t and the function value at t. One such function is
x = t − 1. Why? Solution: The slope of the function is dx
dt = 1, since we
have a line with slope 1. On the other hand, t−x = t−(t−1) = 1, and since
the two expressions match up, we confirmed that x = t − 1 is a solution of
the problem.
Are there more solutions? The answer is affirmative, and it turns out
that any function of the form x = Ce−t + t − 1, where C is an arbitrary
constant, solves the differential equation. Above we just looked at the simple
case C = 0. Here is how you can confirm the claim:
dx
= −Ce−t + 1 = −(Ce−t − 1 + t) + t = −x + t
dt
The differential equation in this example is not autonomous, since the independent variable t appears on the right hand side. Its form is dx
dt = f (x, t),
which is the general form of a first order differential equation.
Don’t be alarmed by the complexity of the calculations! It takes an
entire course to go over the most common solution methods, and still, in
56
many cases closed form solutions cannot be found. The examples tell us
though, that a differential equation has a family of solutions, indicated by
the constant C, and we can single out a particular solution if we specify the
solution at a single point. We call such a point an initial condition. This
is a familiar process from Calculus: A function has many anti-derivatives,
symbolized by the constant of integration C, and we can determine C, if we
specify the anti-derivative at a single point.
Let us also remember that the derivative dx
dt still means slope. The
dx
differential equation dt = f (x, t) therefore defines the slope m of a solution
curve at point the point (x, t) by m = f (x, t). If we indicate the slopes
by little line segments, we can use the differential equation to generate a
slope field (or direction field). This is done in the figure for the equation
dx
dt = t − x. A solution curve will fit smoothly into the direction field.
Figure 18: Direction Field for
dx
dt
=t−x
Our population models are autonomous, that is, they have the form
= f (x), and since f does not depend on t, line segments along horizontal
lines will be identical in the direction field. For time-independent equations
we can use state diagrams to depict the behavior of the solution. Whenever
f (x) > 0 we get dx
dt > −0, and the function x(t) is increasing, and when
f (x) < 0, x(t) is decreasing.
dx
dt
Definition: A point c with f (c) = 0 is called a stationary point (or equilibrium) of the differential equation dx
dt = f (x).
Stationary points play an important role in the study of autonomous
equations, since they make up the constant solutions of a differential equation. As was the case for discrete sequences, we can classify stationary points
57
Figure 19: State Diagram
as stable (attracting) or unstable (repelling). In the state diagram (Figure
19) the first stationary point c1 is unstable, the second point c2 is stable. We
can express stability easily, if f is differentiable. At a stable point f changes
from positive to negative (going left to right), which makes f decreasing.
Definition: A stationary point c is stable if f 0 (c) < 0, and it is unstable if
f 0 (c) > 0
1
Example: We consider the differential equation dx
dt = x − x. We note that
the differential equation is undefined when x = 0. Such a point is called
a singularity. The stationary points are found from f (x) = 0, which has
solutions x = 1 and x = −1. The direction field shows the two constant
solutions, as well as vertical slopes along the t-axis (x = 0).
Figure 20: Direction Field for
dx
dt
=
1
x
−x
Furthermore, f 0 (x) = − x12 − 1 < 0. Hence, both stationary points are
attracting.
Exercises:
1. Show that any function of the form x = (t + C)2 is a solution to
58
dx
dt
√
= 2 x for t ≥ −C. Here C is an arbitrary constant.
2. Confirm that the function x = Ct4 solves the differential equation
dx
4x
=
for any choice of C.
dt
t
3. What is the continuous counterpart for the iteration formula xn+1 =
1
4
2 (xn + xn )? Construct the differential equation.
4. Sketch a direction field for the equation dx
dt = 1 − x, identify the stationary point, and determine whether it is stable.
4.2
Exponential Growth
We return to population growth and begin with the exponential model. The
differential equation becomes (recall that f (x) = rx in this case)
dx
= rx ,
dt
(40)
that is, the rate of change is proportional to the present amount. Notice,
that since r > 0, and x > 0, it follows that dx
dt > 0, and solutions must be
increasing functions. This is indicated in the state diagram.
Figure 21: State Diagram for Exponential Growth
The solution to (40) is usually derived in Calculus texts.
dx
1 dx
= rx
⇒
= r
dt
x dt
Z
Z
1 dx
⇒
dt =
r dt
⇒
ln x = rt + C
x dt
⇒ |x| = ert+C = eC ert
⇒
x = x0 ert
The rationale for the last step is as follows: Since C is an arbitrary constant,
eC is an arbitrary positive number. If we drop the absolute value, we get
x = ±eC ert , and the factor K = ±eC can be an arbitrary positive or negative
number. Our solution then takes the form x = Kert . Finally we notice that
59
K = x(0), and we set K = x0 . In summary we have the following result:
The general solution to the initial value problem
dx
= rx , x(0) = x0
dt
is
x = x0 ert
(41)
A word of caution is in order in regard to the growth rate r. When
used in the exponential model, we find that the population after one year is
x(1) = x0 er , and the actual increase is
x(1) − x(0) = (er − 1)x0
Thus, the population grew by the factor re = er − 1. The nominal rate
r used in a continuous models is equivalent to the effective rate re used
in the discrete annual model. This is an extension of our observations for
compound interest.
Example: Suppose we know that a bacteria population doubles in size
every four hours.
First we use the continuous model. Using (41) we know that the population is given by x = x0 ert . Since x(4) = 2x0 we conclude that 2x0 = x(4) =
x0 e4r , and 2 = e4r . This leads to r = ln42 = 0.173.
For the discrete annual model with xn+1 = xn + rxn , we saw that
xn = x0 (1√+ r)n . Since x4 = 2x0 , it follows that 2x0 = x4 = x0 (1 + r)4 , and
thus r = 4 2 − 1 = 0.189
The values for r do obviously not agree. The exponential rate r = 0.173
is equivalent to the annual rate re = 0.189, and a brief calculation confirms
that
√
4
re = er − 1 = e(ln 2)/4 − 1 = 21/4 − 1 =
2−1
Exercises:
1. The world population in 1950 was 2.557 billion, and by 2000 it had
increased to 6.069 billion. Use the exponential growth model (41)
to estimate when the 7 billion mark (or the 10 billion mark) will be
reached.
2. An imaginary radioactive substance decays to one tenth of its original
size in five days. What is its half life?
60
4.3
Logistic Growth
x
In logistic growth we have f (x) = r(1 − K
)x and the differential equation
becomes
x
dx
x
(42)
= r 1−
dt
K
where both, the growth rate r and the carrying capacity K, are positive
numbers. This differential equation is also known as the Verhulst1 equation.
The stationary points for this equation are located at c = 0 and at c = K.
The derivative dx
dt = f (x) is positive for 0 < x < K and thus, the solution
is increasing in this range. In the remaining cases the solution curves are
decreasing. This is depicted in the state diagram. Moreover, since
x
f (x) = r 1 −
K
x = rx −
rx2
K
and
f 0 (x) = r −
2rx
K
it follows that f 0 (0) = r > 0 and f 0 (K) = −r < 0. Thus c = 0 is unstable
and c = K is a stable equilibrium. These results are equivalent to our
findings for the discrete case, notice however, that we do not have to impose
any conditions on r for the stability of x = K.
Figure 22: State Diagram for Logistic Growth
A direction field reveals further similarities to discrete logistic growth.
The graph shows solution curves for r = 0.1 and K = 1000. Solutions
with positive initial values move toward the stable equilibrium K, and we
recognize the characteristic S-shape for logistic growth, if the curve is started
below K.
The differential equation for logistic growth is more sophisticated than
the one for exponential growth, and consequently the derivation of an explicit formula of the solution is more difficult. The steps are outlined below.
x
dx
= r 1−
dt
K
1
K −x
x = r
x
K
Pierre François Verhulst, 1804 - 1849, Belgian mathematician, University of Ghent.
61
dx
x
Figure 23: Direction Field for
= 0.1 1 −
x
dt
1000
K
dx
1
1
dx
⇒
=
+
= r
(K − x)x dt
x K − x dt
Z Z
1
1
dx
⇒
+
dt =
r dt
x K − x dt
x = rt + c
⇒ ln |x| − ln |K − x| = ln K − x
x
= C ert
⇒
K −x
Here we removed of the absolute value by introducing a new constant in the
same fashion as we did it for exponential growth. Solving for x results in
(exercise)
CKert
CK
x =
=
rt
Ce + 1
C + e−rt
Finally, we determine C from the condition that x(0) = x0 , and after a few
manipulations (exercise) we arrive at
x =
x0 K
K
K
=
=
,
−rt
K−x
0
−rt
x0 + (K − x0 )e
1 + αe−rt
1+
e
x0
where α =
K−x0
x0 .
Example: A population grows according to the model
dx
x
= 0.1 1 −
x
dt
1000
62
(43)
with initial population x0 = 100, where time t is measured in years. What is
the population after 10 years and how long will it take until the population
reaches 800?
We use formula (43) to obtain the explicit solution to the differential
1
equation. Since x0 = 100, r = 0.1 = 10
and K = 1000 it follows that
x =
1000
1 + 9e−t/10
After 10 years the population will be
x =
1000
≈ 232
1 + 9e−1
In order to answer the second question we have to solve x = 800 for t, which
leads to t = 20 ln 6 = 35.8 years.
Solutions of logistic growth problems
increase the fastest when x = K/2,
dx
x
simply because dt = f (x) = r 1 − K x constitutes a parabola in x. The
x-intercepts of y = f (x) are located at x = 0 and at x = K, and the vertex
is always at the midpoint between the intercepts, at K/2 in our case. As an
alternative we could also take the derivative with respect to t on both sides
of (42) to obtain
d2 x
dt2
d dx
d
x
=
=
r 1−
dt dt
dt
K
x
dx
d
r 1−
x ×
=
dx
K
dt
2x dx
= r 1−
K
dt
x
Thus, the second derivative vanishes when dx
dt =, namely for x = 0 and for
x = K, and then x = K2 .
Logistic models are nor limited to population growth. They can by used
to describe chemical reactions, the spread of disease, learning processes in
psychology and many other processes.
Example (Spread of Disease): We consider a contagious disease which
has infected a certain percentage of a population, and we want to obtain a
model for the spread of the disease.
We denote the percentage of the infected individuals by x. Then the
percentage of uninfected persons is y = 1 − x (by using percentages, the
total has to add up to one). The disease is only transmitted when infected
63
people come in contact with healthy persons; interactions between healthy
individuals do not contribute to the spread of the disease, neither do contacts
of infected people.
In the diagram we separate the individuals into infected and healthy
groups, and each point in the square can be thought of as a contact between
two individuals (the point shown in the graph depicts a contact between two
uninfected persons). The shaded areas contributes to the propagation of the
disease, the white areas don’t. If the interactions within the community are
more or less random, then the disease increases at a rate proportional to the
product xy = x(1 − x), and we obtain the differential equation
dx
= r(1 − x)x
dt
This is a logistic equation with K = 1, and we may apply the results of this
section to analyze the problem.
Exercises:
1. Fill in the missing steps in the derivation of (43).
2. Confirm by differentiation that the function found in (43) solves the
differential equation (42) with x(0) = x0 .
1
3. Solve the initial value problem dx
dt = x(1 − x) , x(0) = 2 , and
show (by considering the second derivative) that the solution has an
inflection point at t = 0.
4. At the onset of the an epidemic 2% of the population were infected, and
four days later this number had increased to 5%. Use the logistic model
to determine after how many days from the onset will the number of
infections rise to 20%? (Answer: 10.6 days)
64
4.4
Harvesting
In these models we assume that a fixed number of individuals are removed
from a population in each cycle. You can think of fishing or deer hunting as
examples. In our models we assume that this occurs at a fixed rate h, and
thus dx
dt needs to be reduced by this amount.
First we consider exponential growth. When we incorporate harvesting,
the differential equation becomes
dx
= rx − h
dt
It is easy to see that the equilibrium point is x = h/r; if the population is
below this value it will become extinct, otherwise it will grow exponentially.
It can be shown that the general solution of the differential equation has the
form
h
x = + Cert ,
r
where C is to be chosen to match the initial condition. Notice, that the
equilibrium point x = h/r is unstable; the solutions move away from it.
For logistic growth with harvesting we consider the model
dx
x
= r 1−
dt
K
x − h = f (x)
(44)
We already studied its discrete counterpart in Section 3.3 with iteration
function
x
g(x) = x + r 1 −
x − h = x + f (x)
K
and we may transcribe the results. With h∗ = rK
4 , we note that f (x) < 0 if
h > h∗ , and the population will be driven to extinction in finite time.
Figure 24: State Diagram for h = h∗
If h = h∗ it follows that x = K2 is the single equilibrium point, and f is
decreasing everywhere else. Since solutions are decreasing on either side of
the equilibrium, we call this point semi-stable.
65
Figure 25: Direction Field for h = h∗
Finally, we look at the case where 0 < h < h∗ . In order to find the
equilibrium points, we set the right hand side of (44) to zero, which is
equivalent to solving x = g(x), and we obtain the familiar result
p1,2
K
K
±
=
2
2
s
h∗ − h
h∗
The state diagram for this case is shown in the figure, and we see that
the smaller equilibrium point p1 is unstable, and that p2 is stable. If the
initial population is large enough (x(0) > p1 ), the population will eventually
approach the stable equilibrium p2 . If x(0) is too small to begin with,
harvesting will lead to extinction.
Figure 26: State Diagram for 0 < h < h∗
We refrain from calculating exact solutions and instead show a few solutions graphically, embedded in the direction field.
Exercises:
1. Show that x =
x0 −
h rt h
e +
solves the initial value problem
r
r
dx
= rx − h
dt
66
,
x(0) = x0
Figure 27: Direction Field for 0 < h < h∗
2. (a) Show that the differential equation (44) becomes
dx
r
= −
dt
K
if h = h∗ =
K
x−
2
2
rK
4 .
K
K
+
solves the
2 C + rt
differential equation of part (a). Here C is an arbitrary constant.
(b) Show that any function of the form x =
(c) Argue that either x →
time.
K
2
as t → ∞, or that x → 0 in finite
3. Can you go backwards? Find r, K and h to that (44) becomes
dx
1
1
= − (x − 1)2 +
dt
5
10
4.5
Euler’s Method
In this section we link the continuous models with differential equations to
their discrete counterparts by means of numerical approximations.
Many differential equations are too complex to be solved analytically,
and numerical methods are employed to generate approximate solutions.
Suppose we are given the initial value problem
dx
= f (x, t)
dt
,
x(t0 ) = x0
In order to find an approximate solution, we decide on a step size ∆t first.
Then, if we evaluate the quantity m = f (x0 , t0 ), we know the slope at the
initial point. Barring further information, we expect that if we increase t by
67
∆t, the dependent variable x will change by the amount ∆x = m∆t, and
we arrive at a new point (x1 , t1 ), where t1 = t0 + ∆t and
x1 = x0 + ∆x = x0 + m∆t = x0 + f (x0 , t0 )∆t
There is no reason to assume that the solution is a line, therefore we check
Figure 28: Euler’s Method
the slope at the newly found point and compute f (x1 , t1 ). As before, we use
this slope to advance to the next point. If we keep repeating the steps, we
arrive at the iterative scheme
xn+1 = xn + f (xn , tn )∆t
(45)
where tn+1 = tn + ∆t. This method is called Euler’s method, and it is the
starting point for the development of powerful numerical methods for differential equations. It stands to reason that or very small step sizes ∆t the
approximations will be fairly close to the actual solution. A word of caution is in order: Extremely small step sizes require a lot more calculations,
and the solutions obtained that way may be tainted by an accumulation of
rounding errors.
Let us return to time-independent differential equations, where f (x, t) =
f (x). If we take ∆t = 1 in Euler’s method (45), we obtain
xn+1 = xn + f (xn ) = g(xn )
In other words, the iteration xn+1 = g(xn ) can be thought of as an approximation to dx
dt = f (x) by Euler’s method with step size ∆t = 1.
68
Example: We consider logistic growth with r = 0.1, K = 1000 and x0 =
100. The associated initial value problem is
dx
1
=
dt
10
x
x
1000
1−
,
x(0) = 100
and we know from the last chapter that the solution is x(t) =
1000
.
1 + 9e−t/10
Figure 29: Euler’s Method with ∆t = 1
If we approximate this solution by Euler’s method with step size ∆t = 1,
we are lead to the iterations
xn+1
1
= xn +
10
xn
1−
xn
1000
,
x0 = 100
Here xn is an approximation of x(n). The EXCEL output shows both, the
exact solution and its approximation. The curves agree at t = 0, but at
t = 1 we have x(1) = 109.367 versus x1 = 109, that is, Euler’s method lags
behind by 0.367. In the next steps this discrepancy grows, for instance we
have x(20) = 450.863 and x20 = 463.035, but since K = 1000 is attracting
for both curves, the difference declines again, and it approaches zero as t
gets large.
Better precision requires smaller step sizes. For comparison, let us take
1
. Then the iterations become
∆t = 0.05 = 20
xn+1
1
= xn +
200
xn
1−
xn
1000
69
,
x0 = 100
n
where xn represents the population at t = 20
. After 20 steps we obtain
x20 = 109.3477, which is a much better approximation to x(1) than the value
we obtained with ∆t = 1, at the expense of a twentyfold of calculations. x
In the case of the logistic equation with f (x) = r(1− K
)x the iterations
from (45) become
xn+1 = xn + ∆t r 1 +
xn
xn ,
K
a scheme, which we have studied extensively, if we replace ∆t r with r0 . The
derivative at the fixed point K for this iteration is
g 0 (K) = 1 − r0 = 1 − r∆t
If we chose ∆t small enough, everything is fine. If ∆t > 1r first oscillations
will appear in the numerical solution, and Euler’s method approaches K
with overshoots and undershoots. If ∆t > 2r the steady state K is no
longer stable and cycles will appear in the numerical approximations, and
for ∆t > 3r numerical solutions will eventually result in numerical overflow.
Exercise:
1. Consider the initial value problem
dx
= 0.06x
dt
,
x(0) = 2000
(a) Solve the initial value problem and compute x(5).
(b) Approximate x(5) using Euler’s method with ∆t = 1.
(c) Approximate x(5) using Euler’s method with ∆t = 14 .
(d) Relate the results from (a)-(c) to compound interest.
2. Approximate the solution to the initial value problem (logistic growth
with harvesting)
dx
x
= 0.2 1 −
x − 40
dt
1000
,
x(0) = 1200
with Euler’s method. What is your estimate of x(1) when
(a) ∆t = 1,
(b) ∆t = 0.1, and
(c) ∆t = 0.02?
70
5
A Little Linear Algebra
Suggested Reading: Your favorite Linear Algebra text, Mooney/Swift
address some topics in sections 3.4-3.6.
It is impossible to give a comprehensive review of Linear Algebra in just
a few pages. In this section we will go over just a few basic ideas.
Linear Algebra deals to a large extend with the solution of linear systems
of equations. The equations
x + 2y = 0
4x + 3y = 5
(46)
are a simple example for such a system. With just two variables each equation represents a line, and solving the two equations simultaneously is tantamount to finding the point of intersection of the lines. Linear equations of
three variables represent planes in the three dimensional space, and solving
a system with three equations means looking a point where the three planes
intersect.
Linear system can be expressed in matrix form. In our example (46) we
are lead to
"
A =
1 2
4 3
#
"
x =
x
y
#
"
b =
0
5
#
Then (46) can be simply written as Ax = b.
5.1
Solutions of Linear Systems
For small linear systems one can use the substitution method for find a
solution. In our example, the first equation implies that x = −2y, and
substitution into the second equation yields −8y + 3y = 5, and thus y = −1
and x = 2.
Most linear systems are not that simple to solve. The method of elimination provides a more systematic and algorithmic procedure to find solutions.
It is based on the fact that
1. interchanging (reordering) equations,
2. multiplication of an equation by a non-zero constant and
3. adding a multiple of one equation to another equation
71
do not change the solution set of the system. If matrices are used, these
three processes translate into the elementary row operations for a matrix,
namely
1. interchange two rows
2. multiply a row by a non-zero constant
3. add a multiple of one row to another row
If a matrix B is obtained from a matrix A by elementary row operations,
the two matrices are said to be row equivalent. The solution process for a
linear system then consists of the following steps:
1. Translate the linear system into the augmented matrix [A b],
2. use elementary row operations to transform the system into a simpler
form, and
3. convert the newly found matrix back into a system of equations.
4. Solve the simplified system.
We illustrate this process for our example. The associated augmented
matrix has the form [A b], which becomes
"
1 2 0
4 3 5
#
Next we perform a few elementary row operations
"
1 2 0
4 3 5
#
"
→
1
2 0
0 −5 5
#
"
→
1 2
0
0 1 −1
#
In the first step we added the (−4)-fold of the first row to the second row, and
in the second step we multiplied the last row by − 51 . The final augmented
matrix is equivalent to the system
x + 2y = 0
y = −1
with obvious solution y = −1 and x = 2. This was a very simple example
of the Gaussian Elimination Method.
In a different approach the augmented matrix is transformed into the
reduced row echelon form (rref). A matrix is said to be in reduced row
echelon form if
72
1. All rows consisting entirely of zeros, if any, are at the bottom.
2. The first non-zero entry in each row is 1, the so-called leading one.
3. Each column with a leading one contains zeroes otherwise.
4. The leading one in a given row is to the right of the leading one in the
row directly above it. This staircase patterns gives rise to the term
echelon.
Each matrix is row equivalent to exactly one matrix in reduced row echelon
form, and the rref command on calculators with matrix capabilities performs
exactly this calculation. If in our example we add the (−2)-fold of the last
equation to the first equation, we obtain a matrix in row echelon form
"
1 2
0
0 1 −1
#
"
→
1 0
2
0 1 −1
#
The resulting matrix represents the system of equations x = 2 and y = −1,
and we are done. This technique is called the Gauss-Jordan Method.
Matrices can be multiplied, provided that the sizes match up. The product of the m × n matrix A and the n × p matrix B is the m × p matrix
C = AB. We omit the formulas for the product at this point (your calculator knows), and note that the matrix A must have as many columns rows
as B has rows, otherwise the product is undefined. Square matrices of the
same size can always be multiplied. The matrix product is not commutative,
that is, usually AB 6= BA.
The matrix


1 0 0 ··· 0 0
 0 1 0 ··· 0 0 




 0 0 1 ··· 0 0 


I =  . . . .
. . ... ... 
 .. .. ..


 0 0 0

··· 1 0 
0 0 0 ··· 0 1
is called the identity matrix. It has the property that AI = IA = A holds
for any matrix A of the appropriate size. The identity matrix acts like the
number one: multiplication by I does not change the matrix. The matrix
A−1 is called the inverse of the matrix A, if AA−1 = A−1 A = I. This applies
to square matrices only, and a matrix which has an inverse is called nonsingular. Linear are systems with non-singular matrices are easy to solve
(in theory): Just multiply both sides of the equation by A−1
Ax = b
⇒
A−1 (Ax) = A−1 b
73
⇒
x = A−1 b
Elementary row operations can be used to compute the inverse of a
matrix. Begin with the augmented matrix [A I] and compute its reduced
row echelon form. If the resulting matrix has the form [I B], then B = A−1
is the inverse of A.
Here is how this works in our example.
"
1 2 1 0
4 3 0 1
Thus
#
"
A−1 =
"
rref
→
− 35
4
5
2
5
− 15
#
1
=
5
2
5
− 51
1 0 − 35
4
0 1
5
"
−3
2
4 −1
#
#
and our system has the solution
"
x
y
#
= x = A
−1
1
b =
5
"
−3
2
4 −1
# "
0
5
#
"
=
2
−1
#
Exercises:
1. Express the linear system
6x − y
= 8
x + 4y −z = 0
in the form Ax = b.
2. Use elementary row
 operationto find the reduced row echelon form of
1 4 7


the matrix A =  2 5 8  .
3 6 9

1
 1
3. (a) Solve Ax = b, where A = 
2
1
3
1
2
1
3
1
4
1
3
1
4
1
5



1
 1
(b) Repeat the problem with A changed to A = 
Note that only a3,3 has been modified.
74

0



 and b =  0 .
1
2
1
3
1
2
1
3
1
4
1
3
1
4
1
4


.
5.2
Singular Matrices and Determinants
Most of the time two equations with two unknowns have exactly one solution,
and so do three equations with three variables. But there are exceptions, an
we take a closer look at these in this section.
Two lines in the plane will intersect at exactly one point, unless they
run parallel, in which case they either have no point in common or they are
identical lines. Example:
2x − y = 2
−x + 12 y = 0
and
2x − y = 2
−x + 12 y = −1
In the first system we have the lines y = 2x − 2 and y = 2x, which are
parallel without intersection. In the second system both equations become
y = 2x − 2 and they overlap. Any point
# (x, y) = (t, 2t − 2) is a
" of the form
2 −1
, and it should be plain
solution. In both cases the matrix is
1
−1
2
that the problems are rooted in the matrix A, and not in the vector b.
The matrix in our example is singular, and there are several ways to
characterize singular matrices. The following statements about a square
matrix are equivalent:
1. A is singular, that is, it does not have an inverse.
2. The homogeneous linear system Ax = 0 has a non-trivial solution
x 6= 0.
3. The last row in rref (A) consists of zeroes and therefore A is not rowequivalent to the identity matrix.
4. det A = 0.
5. The column vectors of the matrix are linearly dependent, that is, one
of the columns can be written as a linear combination of the remaining
column vectors.
We will now discuss these statements without going into much detail.
Check you favorite Linear Algebra text for proofs and further explanations.
1. This statement is just the definition of singularity: A−1 does not exits.
If a matrix has an inverse, we call it non-singular.
2. For any matrix, singular or not, it is always true that A0 = 0.
Therefore the zero vector always is a solution of the homogeneous
75
system. The distinction between singular and non-singular matrices
arises when we look at additional solutions. If the zero vector is the
only solution of Ax = 0, then the matrix A is non-singular, if there are
additional solutions, the matrix is singular. Let us review the example:
2x − y = 0
1
−x + y = 0
2
x = 0 and y = 0 works, but both equations imply that y = 2x, and
for instance x = 1 and y = 2 is a nontrivial solution.
Such non-trivial solutions cannot exist for a non-singular matrix, since
Ax = 0 implies that x = A−1 0 = 0.
3. This claim follows from the construction of inverse matrices. If
rref ([A I]) = [I B], then rref (A) = I and A is non-singular with
A−1 = B. This process fails if rref (A) 6= I. But A is a square matrix,
and if it is not row-equivalent to the identity matrix, the last row must
consist of entirely zeroes due to the staircase pattern of rref (A). Let
us review the example:
"
2 −1 1 0
1
−1
2 0 1
#
rref
→
"
1 − 12 0 −1
0
0 1
2
#
The resulting matrix does
not have
the form [I B], and in particular
"
#
1 − 12
we have rref (A) =
.
0
0
4. The condition det A = 0 is probably the most popular characterization
of non-singular matrices, and we will elaborate on determinants below.
One rule for determinants states that the determinant of a matrix
product is the product of the determinants
det AB = (det A) (det B) ,
and since det I = 1, it follows that
(det A) (det A−1 ) = det I = 1
1
det A−1 = (det A)−1 =
.
det A
or
Thus, a matrix with vanishing determinant cannot have an inverse.
76
5. The statement about the linear dependence of the column vectors is
included for the sake of completeness.
The definition of a n × n determinant is a mystery to many. Of course,
the computation of a 2 × 2 determinant is well known, and with expansions
along rows or columns you can calculate larger determinants. But this is
not a definition; it’s a recipe, and nothing else.
The actual definition of a determinant appears to be rather complicated
at first glance. In a first step you need to find all possible selections of entries
in the matrix so that each row and each columns is represented exactly once.
Then you multiply these numbers, take the negative for some of them, and
finally you add all results to obtain the determinant.
Now let’s do this more systematically. Suppose that the element in the
i-th row and j-th column is denoted by ai,j . Then, if we go through the
rows on step at a time, i increases from 1 to n by increments of 1. Now,
for each row we have to select a column element without repetition. This
can be accomplished by taking an arbitrary rearrangement of the numbers 1
through n. For the 4×4 matrix in the figure, the rearrangement is (1, 3, 4, 2),
and the associated product becomes a1,1 · a2,3 · a3,4 · a4,2 = 4 · 1 · 1 · (−1) = −4.
The first index increases by one, the second index follows the order of the
permutation. For a 2 × 2 matrix there are only two permutations, namely
(1, 2) and (2, 1) corresponding to the terms a1,1 ·a2,2 = ad and a1,2 ·a2,1 = bc,
respectively. For a n × n matrix we have n! permutations of the numbers 1
through n, and thus a n × n determinant has n! terms, the reason why we
rarely see the definition of a 4 × 4 determinant (24 = 4! terms) spelled out
term by term.
Permutations are classified as even or odd. First, we define a permutation
as simple, if only two terms are switched. Then we count how many simple
permutations are required to construct the given permutation, starting from
(1, 2, . . . n). For instance, for the permutation (1, 3, 4, 2) we find
(1, 2, 3, 4) → (1, 4, 3, 2) → (1, 3, 4, 2)
77
This construction took two switches, and we call the permutation even.
Likewise, an odd permutation requires an odd number of steps. As it turns
out, this classification into even and odd permutations is unique; it will not
happen that one can construct the same permutation once with an even
number of steps, and another time with an odd number of steps. In our
determinant, the terms arising from odd permutations are multiplied by
(−1), terms based on even permutations are taken as is.
For a 2 × 2 determinant we only have the permutations (1, 2) and (2, 1).
The first is even, the second is odd, and the determinant becomes
"
det
a1,1 a1,2
a2,1 a2,2
#
= a1,1 a2,2 − a1,2 a2,1
which is usually written as
"
det
a b
c d
#
= ad − bc
For 3 × 3 determinants we have at total of six permutations. The permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2) are all even, and all of the permutations
(1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd. Thus the determinant becomes


a1,1 a1,2 a1,3


det  a2,1 a2,2 a2,3  = a1,1 a2,2 a3,3 + a1,2 a2,3 a3,1 + a1,3 a2,1 a3,2
a3,1 a3,2 a3,3
−a1,1 a2,3 a3,2 − a1,2 a2,1 a3,3 − a1,3 a2,2 a3,1
Theorem: The determinant of a triangular matrix is the product of its
diagonal elements.
Proof: Suppose that A is an upper triangular matrix, that is, all elements
below the diagonal are zero, or more formally, ai,j = 0 when i > j. Then
all terms in the determinant will contain at least one element below the
diagonal, and therefore become zero, with the exception of the diagonal
product, which proves the theorem.





det 




=
a1,1 a1,2 a1,3
0 a2,2 a2,3
0
0 a3,3
..
..
..
.
.
.
0
0
0
0
0
0
···
···
···
..
.
a1,n−1
a2,n−1
a3,n−1
..
.
· · · an−1,n−1
···
0
a1,1 · a2,2 · a3,3 · · · an−1,n−1 · an,n
78
a1,n
a2,n
a3,n
..
.









an−1,n 
an,n
Clearly, using the definition of a determinant is generally not an efficient
way to evaluate it. Co-factor expansions are a popular way to compute
determinants; they allow to reduce the size of determinants, for instance a
4 × 4 determinant can be calculated by taking a few 3 × 3 determinants,
which in turn can be expressed by 2 × 2 determinants. We will not go into
more detail at this point.
The most efficient way to calculate determinants is using row reduction.
The application of elementary row operations affect determinants as follows:
1. Interchanging two rows introduces a factor of (−1)
2. Multiplying a row by a factor changes the determinant by the same
factor.
3. Adding a multiple of one row to another row leaves the determinant
invariant.
The third property is most important for computation. We perform these
operations to transform our matrix into triangular form, and then just multiply the diagonal elements.
Example:




det 




= det 




= det 



= det 



0 −1 2 1
4
1 0 −1
 3
2 1 0
3
2 1
0 


 = det 
 0 −2 3 1
0 −2 3
1 
−2 −1 1 1
−2 −1 1
1
−2 −1 1 1
0 −1 2 1
0 −2 3 1
3
2 1 0






 = det 










2
1 −1 −1
0
1 −2 −1 


0 −2
3
1 
3
2
1
0

2
0
0
0
1 −1 −1
2 1 −1 −1
 0 1 −2 −1 
1 −2 −1 



 = det 

 0 0 −1 −1 
0 −1 −1 
1
5
3
7
0 0
2
2
2
2
2
2
0
0
0
1 −1 −1
1 −2 −1
0 −1 −1
0
0 − 32


3

= 3
 = 2 · 1 · (−1) · −

2
79
Exercises:
1. Compute the determinants of the matrices below
"
(a) A =
2 3
2 2
#


1 2 3


(b) B =  4 5 6 
7 8 9


1 2 3


(c) C =  4 5 6 
7 8 10




(d) D = 
1
0
0
0
2
5
0
0
3 4
6 7
8 9
0 10





2. Compute the determinant of the matrix




A = 
1
0
0
4
0
0
3
0
0
2
0
0
1
0
0
2





Use all methods: the definition using permutations, expansion along
a row or a column, elementary row operations and the calculator.
3. How does multiplication by a constant affect the value of the determinant? Express det(cA) in terms of det A.
5.3
Review of Eigenvalues and Eigenvectors
Definition: Let A be a n×n matrix. A vector x 6= 0 is called an eigenvector
of A, if there exists a scalar λ, such that
Ax = λx
(47)
holds. λ is called an eigenvalue of A, and (λ, x) are called an eigenpair.
80
Eigenvectors are not unique. Let x be an eigenvector of A and let y = cx
for some non-zero constant c. Then
Ay = A(cx) = cAx = cλx = λy ,
which makes y an eigenvector of A as well. This computation shows that any
non-zero multiple of an eigenvector is an eigenvector as well. One speaks of
an eigenspace, if the zero vector is included. Computer algorithms frequently
usually select unit vectors as eigenvectors (||x|| = 1). For computations with
pencil and paper we select the constant multiple by convenience criteria.
The equation (47) can be written as
λx − Ax = (λI − A)x = 0
and a non-trivial solution exists if and only if the matrix λI − A is singular.
Hence its determinant must be zero. The function
p(λ) = det(λI − A)
is called the characteristic polynomial of A, and the eigenvalues are found
by solving p(λ) = 0.
Example: Find the eigenvalues and eigenvectors of the matrix


2 −3 1


A =  1 −2 1  .
1 −3 2
First we set up the characteristic polynomial and find its roots.


λ−2
3
−1


−1 
p(λ) = det  −1 λ + 2
−1
3 λ−2
= (λ − 2)2 (λ + 2) + 3 + 3 + 3(λ − 2) + 3(λ − 2) − (λ + 2)
= λ3 − 2λ2 + λ = λ(λ − 1)2
It follows that the eigenvectors are λ = 0 and λ = 1, the latter being a
double eigenvalue.
Now we calculate the associated eigenvectors, one eigenvalue at atime.

a


For λ = 0 we have to find a non-trivial solution of Ax = 0. With x =  b 
c
81
this results in the system
2a −3b +c = 0
a −2b +c = 0
a −3b +2c = 0
and after a few elimination steps the system reduces to
a −2b +c = 0
b −c = 0
0 =0
The last equation is redundant, because the system is singular. The last
system has infinitely many solutions, and we
the convenient values
 select

1


a = b = c = 1, and our eigenvector is x =  1 .
1
When λ = 1 we have to solve Ax = x, or equivalently, (I − A)x = 0.
The latter results in the linear system,
−a +3b −c = 0
−a +3b −c = 0
−a +3b −c = 0
and it turns out that all three equations are identical (λ = 1 is a double
eigenvalue). In this example
we
 two independent eigen are able to extract

3
1




vectors, namely x =  0  and x =  1 .
0
−1
In summary, we found the linearly independent eigenvectors, one for the
simple eigenvalue λ = 0, and two for the double eigenvalue λ = 1.
Here are some more important facts about eigenvalues and eigenvectors.
1. When A is a n × n matrix, p(λ) is a polynomial of degree n, and
hence the matrix has n, possibly complex eigenvalues, where repeated
eigenvalues have to be counted according to their multiplicity.
2. The eigenvalues of upper or lower triangular matrices are very easy to
find: They are the entries in the diagonal, because the determinants
of these matrices are just the product of diagonal elements.
3. It can be shown that eigenvectors associated with distinct eigenvalues
are linearly independent; for repeated eigenvalues linearly independent
eigenvectors may or may not exist.
82
4. Eigenvalues may be complex numbers, even if the matrix contains real
numbers only.
5. Eigenvalues of a symmetric matrix are real numbers, they cannot be
complex.
6. If a n × n matrix has n linearly independent eigenvectors, then there
exists a non-singular matrix P such that P −1 AP = D is a diagonal
matrix.
Consult your favorite Linear Algebra text for proofs and further details.
"
#
1 0
Example: The identity matrix I =
is diagonal, and λ = 1 is a
0 1
double" eigenvalue.
Any "vector
#
# x is an eigenvector of I, and we can select
1
0
x =
and x =
as linearly independent eigenvectors.
0
1
"
On the other hand, λ = 1 is a double eigenvalue of B =
#
1 1
, but
0 1
"
we can only find one independent eigenvector, namely x =
multiples thereof.
"
#
1
, or
0
#
0 −1
. Then the characteristic polynomial of
Example: Let A =
1
0
A is p(λ) = λ2 + 1, and its eigenvalues are λ = ±i. The eigenvectors
will contain complex number, and eigenvectors do not exist, if we restrict
ourselves to the real number system. This should not be surprising as the
matrix represents a 900 counterclockwise rotation of the xy-plane, and there
is no vector, which is being mapped onto a multiple of itself (other than the
zero vector).


1 0 0


Example: Consider the matrix A =  2 3 0 . It is lower diagonal,
4 5 6
and its eigenvalues are λ = 1, λ = 3 and λ = 6. Since the eigenvalues are
all distinct,
independent. As it turns out,
 the eigenvectors
 
 must be
 linearly

1
0
0

 



they are  −1 ,  1  and  0 , respectively.
1
5
−3
1
5
83

1

We use the eigenvectors to form the matrix P =  −1

0 0

1 0 . Lin1
5
5 −3 1
ear independence implies
that P is non-singular, and a brief computation

1 0 0


shows that P −1 =  1 1 0 . Moreover,
22
5
15
3 1


1 0 0


P −1 AP =  0 3 0 
0 0 6
Finally we take a look at the power method. The basic question here
is to see what happens if we multiply a vector by the matrix A repeatedly.
We have encountered this situation in conjunction with the matrix view of
Fibonacci numbers, and we will look at further applications of this concept.
We will explain the main idea for 2 × 2 matrices, and you are encouraged to
consult the literature for more general results.
Let A be a 2 × 2 matrix with real eigenvalues λ and µ such that |λ| > |µ|.
Further, let u and v be the associated eigenvectors. Since λ 6= µ these
vectors are linearly independent, and any vector x0 can be written as a
linear combination of u and v:
x0 = au + bv
Let us assume that a 6= 0 and proceed with multiplications by A.
Ax0 = A(au + bv) = aAu + bAv = aλu + bµv
A2 x0 = A(aλu + bµv) = aλ2 u + bµ2 v
A3 x0 = A(aλ2 u + bµ2 v) = aλ3 u + bµ3 v
..
.
An x0 = aλn u + bµn v
As n increases the powers of λ will dominate those of µ, since by assumption
|λ| > |µ|. If |λ| > 1, the vectors An x will grow to infinite size, and for
|λ| < 1 they approach the zero vector. Neither is desirable. If λ was known
in advance, we could form an iterative sequence of vectors via
xn+1 =
84
1
Axn
λ
Then
1
µ
Ax0 = au + b v
λ
λ
2
1
µ
v
Ax1 = au + b
λ
λ
x1 =
x2 =
and inductively
n
1 n
µ
A x0 = au + b
λn
λ
xn =
v → au
Thus, the sequence of vectors approaches the eigenvector au.
In this form the method is not suited for practical computations, since
we normally don’t know the eigenvalue λ in advance. However, the only
purpose of division by the eigenvalue was to keep the growth of An x0 under
control, and we can achieve this by other means. On a computer one would
scale the norm of An x0 to unity, but this requires square roots and ugly
calculations. For pencil and paper computations, we can keep the first
(or second) component fixed at 1, or we can make adjustments so that the
absolute values of the vector components add to one, or such that the largest
component is ±1. Unless there is a concern with overflow or underflow, one
can just perform a single normalization after a fixed number of iterations.
"
Example: Let A =
1 1
1 0
#
be the matrix which we used in the context
"
#
5
of Fibonacci numbers. We take x0 =
as our arbitrary starting vector,
1
and in each step we normalize so that the second component is always 1.
This results in the following sequence of vectors:
"
"
"
"
1 1
1 0
1 1
1 0
1 1
1 0
1 1
1 0
#"
#"
#"
#"
6/5
1
#
11/6
1
#
17/11
1
#
5
1
"
6
5
#
11/5
6/5
#
17/6
11/6
#
28/11
17/11
#
=
"
=
"
=
#
"
=
"
6/5
1
"
11/6
1
"
17/11
1
#
"
28/17
1
#
x1 =
x2 =
x3 =
x4 =
#
#
In fractional from it becomes evident how the first components relate to
the associated Fibonacci sequence 1, 5, 6, 11, 17, 28, . . ., and we already
85
know that
√ the ratio of consecutive numbers approaches the golden ratio
1+ 5
ϕ=
. Hence, for the sequence of vectors we find that
2
"
lim xn =
n→∞
ϕ
1
#
,
which is an eigenvector of A associated with the eigenvalue λ = ϕ. Once the
eigenvector is approximated reasonably well, the growth in each component
is an indicator of the eigenvalue. In this example we observed
"
x3 =
1.5455
1
#
"
and
Ax3 =
2.5455
1.5455
#
The first component grew by a factor of 1.6471, the second component grew
by 1.5455, and by averaging the two we expect an eigenvalue λ ≈ 1.6.
In the calculations we normalized in each step in order to get a closer
look at convergence. The vector x4 and also be obtained by calculating
"
4
A x0 =
28
17
#
"
x4 =
28/17
1
#
directly and performing a single normalization.
We saw above that the power method converges to an eigenvector associated with the dominant eigenvalue. More work is required in order to
extract remaining eigenvectors. We studied the method for the 2 × 2 case,
but with similar arguments it can be shown that it works for n × n matrices
with a single dominant eigenvalue. Is it a practical way to find eigenvectors? For small matrices we are better off to compute eigenvalues by means
of characteristic polynomials, but for huge sparse matrices (these have only
a few non-zero entries) the power method is very useful.
We will encounter the power method in the form of matrix iterations in
the next chapter.
Exercises:


2 −3 1


1. Let A =  1 −2 1  as in the example. Compute Ax for
1 −3 2






1
1
3






x =  1 , x =  0  and x =  1 .
1
−1
0
86
"
2. Let A =
#
1 2
.
0 3
(a) Compute the eigenvalues and eigenvectors of A.
(b) Construct a matrix P such that P −1 AP is a diagonal matrix.
3. Verify that a matrix A is singular if and only if λ = 0 is one of its
eigenvalues.
4. Let λ be an eigenvalue of the matrix A. Show that λn is an eigenvalue
of An .


1 0 0


5. Apply the power method to the matrix A =  2 3 0  beginning
4 5 6


1


with x0 =  0 . Normalize the vectors such that in each step the
0
sum of the entries is one, and compute the next five vectors.
6
Matrix Iterative Models
Suggested Reading: Mooney/Swift: Sections 3.5 - 3.6.
In this section we will investigate models which lead to iterations of the
from xn+1 = Axn + b, where A is a n × n square matrix. If b = 0 the
iterations simplify to xn+1 = Axn , which relates to the power method for
finding eigenvectors.
6.1
Populations with Age Structures
When in the previous sections we argued that the rate of change of a population was proportional to its size, we lumped all of the population into one
single variable, without regard for age or gender. This is reasonable for some
models, but now we want to take a more refined approach. In biology it is
common practice to monitor the size of the female population only, since as
long as there are a sufficient numbers of males around, it is the female, who
carries the fertilized egg and gives birth. Age is a factor as well, because
it usually takes some time to reach puberty, and reproduction rates are age
dependent.
87
We use a hypothetical population to illustrate the concepts. Let’s go
back to the Fibonacci rabbits, and suppose that rabbits 4 month or older
no longer take part in the reproductive process. This leaves us with female
rabbits in the age groups x0 through x3 , where xk stands for the number of
k month old rabbits. The age distribution in any given month is described
by the state vector


x0
 x 


x =  1 
 x2 
x3
Suppose that each month each of the rabbits one month or older gives birth
to exactly one female. Then the total number of newborns is
x+
0 = x1 + x2 + x3 ,
where the superscript + indicates the population in the following month,
and if none of the rabbits die, we have
x+
1 = x0
,
x+
2 = x1
,
x+
3 = x2 .
If we denote the rabbit population in the n-th month by xn , we find that

xn+1 = Axn
where



A=
0
1
0
0
1
0
1
0
1
0
0
1
1
0
0
0





A is called the transition matrix.
The EXCEL simulation shows how the population develops if we begin
with one newborn. The table shows the age progression, as each entry in
the newborn column moves down diagonally (highlighted for the number 4
in the table), and it also shows that the number of newborns equals the sum
of the adults in the previous month.
Also included in the table is the total population and its growth rate.
We observe that in the long run the population increases at a rate of approximately 46.6%, and by testing the growth in each age bracket we find
a similar factor. For instance, in the transition from month 19 to month 20
we obtain:
760
= 1.467
518
518
= 1.463
354
354
= 1.469
241
88
241
= 1.461
165
Figure 30: Modified Fibonacci Rabbits
This gives rise to the conjecture that eventually the progress from one
month to the next can be described by multiplication by a common factor
λ in all age groups. Suppose that the current age distribution vector is x
, then the population in the following month should be x+ = λx. But this
translates into the condition that
λx = x+ = Ax
and we see that λ must be an eigenvalue of A.
The computation of the characteristic polynomial of A in our example
yields
p(λ) = det(λI − A) = λ4 − λ2 − λ − 1
89
and for the eigenvalues we obtain
λ1 = 1.465571,
λ2 = −1,
λ3,4 = −0.2328 ± 0.7926i
by numerical computations. Our observed growth rate is associated with the
dominant eigenvalue λ1 = 1.465571. Recall that the power method picks out
the dominant eigenvalue.
We can also calculate percentages for each age bracket. In our examples
these are
760
= 40.6%,
1873
518
= 27.7%,
1873
354
= 18.9%,
1873
254
= 12.9%
1873
This is called the stable age structure. But these are just the components
of the eigenvector of λ1 , which has been normalized to the condition that
its entries add up to 1. A more refined numerical calculation identifies the
eigenvector as


0.405586
 0.276742 


x = 

 0.188829 
0.1288432
Leslie Matrix. Now let us add a little more realism to our model.
Usually fertility is age dependent. Women over 40 are less likely to give birth
than women in their twenties. Therefore we should assign fertility rates fn
with each age group. Secondly, we cannot assume that the entire population
survives from one cycle to the next. For example, infant mortality has been
a huge issue throughout the ages, and still is a problem in the poorer parts
of the world. We assign survival rates sn in order to model the transition
from one age group to the next. Then the transition matrix A takes the
form


f1 f2 f3 · · · fn
 s
0
···
0 
 1 0



0 s2 0
···
0 
A=




..
. ···
 0 0
0 
0
0
· · · sn−1
0
Such a matrix is called a Leslie matrix, named after Patrick H. Leslie.
Example: Compute the stable age distribution for a population with four
age groups, and with fertility rates f1 = 0, f2 = 0.8, f3 = 0.9 and f4 = 0.4
and with survival rates s1 = 0.6, s2 = 0.9 and s3 = 0.8.
90
The transition matrix for this model is




A=

0 0.8 0.9 0.4
0.6 0
0
0 


0 0.9 0
0 
0
0 0.8 0
A brief calculation reveals that the characteristic polynomial becomes
p(λ) = λ4 − 0.48λ2 − 0.486λ − 0.1728
and the dominant eigenvalue is λ1 = 1.048976. Thus, we expect a 4.9%
growth.
The proportions in each age bracket will eventually approach the eigenvector associated with λ1 . Let x = [a b c d]T be this eigenvector, then its
components satisfy the equations
0.8b + 0.9c + 0.4d = λ1 a
0.6a = λ1 b
0.9b = λ1 c
0.8c = λ1 d ,
and since we are interested in percentages, we also require that
a+b+c+d=1 .
From the first set of equations we can express a, b and c in terms of d:
c=
λ1
d = 1.3112d ,
0.8
b=
λ1
c = 1.5283d ,
0.9
a=
λ1
b = 2.6719d
0.6
and since the components must add up to one, we adjust d accordingly and
obtain


0.4103
 0.2347 


x = 

 0.2014 
0.1536
An EXCEL simulation beginning with 100 individuals in the second age
bracket is shown below. The percentages refer to the distribution in the last
row.
Some questions for the work with Leslie matrices have been left open.
It remains to be proven that there is exactly one dominant eigenvalue, that
91
Figure 31: Leslie Model
is is a real (rather than a complex numbers), and that it is positive. We
have also tacitly assumed that the associated eigenvector has positive entries
only. All these questions require a careful study, but we will not pursue this
matter here.
Lefkovitch Matrix. Mooney/Swift (pp.101) discuss a human population model, where for the sake of simplicity, the women have been grouped
into age classes 0-15, 15-30 and 30-45 years. In such a model each time step
represents fifteen years.
What should we do if we want to monitor annual changes in such a
model? In this case we should allow for some of the members to remain
92
in their age group, and for others to advance to the next age bracket. The
resulting matrix is sometimes called a Lefkovitch matrix.
Example: Suppose that in a human population we group women into brackets of ten year intervals for ages 0 to 50, and assume further that there are
no deaths, and that all women in their twenties have exactly one baby girl,
and there are no children for women in the other age groups. The Leslie
matrix for this model is




A=


0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
1
0
0
0
0
0







Here one time step represents ten years.
If we want to study the annual progress, we need to make some assumptions: For one, lacking any other information, we assume that the ages are
evenly distributed in each bracket. Then from one year to the next, 90%
will remain in the bracket and 10% move on to the next bracket. Secondly,
we assume that 10% of the women in their twenties have their baby girl in
any given year. This results in the transition matrix




A=


0.9 0 0.1 0
0
0.1 0.9 0
0
0
0 0.1 0.9 0
0
0
0 0.1 0.9 0
0
0
0 0.1 0.9







Finally, we look at an example with uneven age brackets, that is, the
age groups may span time intervals of different lengths. Our starting point
is the Leslie model used above with four age groups and with given fertility
rates (0, 0.8, 0.9, 0.4) and survival rates (0.6, 0.9, 0.8). Suppose that we
want to simplify the model using just two groups, one for youth and one for
adult. The first age bracket remains identical (youth), and the remaining
three are combined into one class (adult). The first column of the transition
matrix is evident: The young females do not reproduce (f1 = 0), and the
survival rate remains the same at s1 = 0.6. The fertility and survival rates
for the adult group are more difficult to determine.
A plain averaging of the fertility rates for the three adult groups results
in f = 0.8+0.9+0.4
= 0.7, and the average survival rate for adults is s =
3
93
0.8+0.9+0
3
= 0.5667. Then the transition matrix becomes
"
A=
0
0.7
0.6 0.5667
#
The dominant eigenvalue of this matrix is λ1 = 0.9906, which predicts a
slight decline of the population, which is much different from the 4.898%
growth in the original model.
The discrepancy is caused by the fact that the population is not evenly
spread within the three age groups, as we saw in the EXCEL simulation,
and as an alternative we now form weighted averages based on the stable
age distribution. This results in
f
=
s =
0.2347 ∗ 0.8 + 0.2014 ∗ 0.9 + 0.1536 ∗ 0.4
= 0.7280
0.2347 + 0.2014 + 0.1536
0.2347 ∗ 0.9 + 0.2014 ∗ 0.8 + 0
= 0.6314
0.2347 + 0.2014 + 0.1536
with transition matrix
"
A=
0
0.728
0.6 0.6314
#
The dominant eigenvalue of this matrix is λ1 = 1.0481, and it can be shown
that the observed deviation from the original λ1 is due to rounding. This
simplified compressed model has the same growth rate as the original Leslie
model, but based on the initial distributions in the starting data, it will not
predict the same total populations. By simplification of the original problem
with four age classes to just two classes, some information has been lost and
it cannot be recovered completely.
Summary: In all population models which we discussed in this section, the dynamics was described in terms of a transition matrix A. The
population structure at a given time was represented by a state vector x,
and the entry aij of the matrix was the proportion of the i-th state, which
progressed into the j-th state. This process can be nicely depicted in by
state diagrams. See Mooney/Swift, Sections 3.2 and 3.3 for further details.
In our examples the states were age classes, ordered by increasing age.
The main diagonal contained the proportion of the population which remained in the same bracket, and the first sub-diagonal element contained
the fraction which moved into the next age group. All entries below the subdiagonal remained zero, since it is not permissible to jump age classes. The
entries above the diagonal also had to remain zero, with the exception of
the first row, since more mature individuals cannot pop up out of nowhere.
94
Since all models were of the type xn+1 = Axn , it was the dominant
eigenvalue of the matrix which revealed the growth rate of the population,
and the associated eigenvector described the stable age structure.
Exercises:
1. Given that λ = 1.465571 is an eigenvalue of the matrix




A=
0
1
0
0
1
0
1
0
1
0
0
1

1
0
0
0


 ,

compute the associated eigenvector whose components add up to one
directly from the condition Ax = λx. Do not use the iterative results
from the text.
2. Find the growth rate and the stable
model with transition matrix

0 2

A= 1 0
0 1
age distribution for the Leslie

1

0 
0
3. (a) Construct a Leslie model with three age classes, so that the intrinsic growth rate is 100%, i.e. the dominant eigenvalue is λ1 = 2.
(b) Construct a Leslie model with three age classes, with dominant
eigenvalue λ1 = 2 and with stable age distribution [0.6 0.3 0.1]T .
4. Investigate the Leslie model




A=
0
2
0
0.6 0
0
0 0.6 0
0
0 0.8
1
0
0
0





5. Consider the Leslie model defined by the matrix

0
 14
 15
 0
A=
0
3
0
6
7
0

2 3
0 0 


0 0 
2
3 0
Simplify the model by merging the upper three age classes into a single
group. What is your resulting transition matrix? Set up an EXCEL
sheet to compare the two models.
Hint: The dominant eigenvalue is an integer.
95
6.2
Inhomogeneous Systems
We begin with an example: Suppose that we have a population, broken
down into young and adult, and with transition matrix A. If the dominant
eigenvalue of A is less than 1, the population is headed for extinction. But
let us further assume that this is an important endangered species, and
that a wildlife protection agency releases a fixed number of adults, bread in
captivity, in order to stabilize the population. This can be modeled by a
vector b, and the population is now driven by the rule
xn+1 = Axn + b
(48)
How many species should be released? Will the population survive? What
is the long term population distribution?
"
#
0 0.225
Example: Let A =
and let each cycle represent one year.
0.4 0.8
The eigenvalues of A are λ1 = 0.9 and λ2 = −0.1, and if b = 0, the iterative
model (48) predicts roughly a 10% annual decline of the population.
Now assume that initially 20 young and 100 adults are present, and that
conservation specialists
add 5 adults each year. In this case the vector b
#
"
0
. The EXCEL worksheet contains a simulation of this
becomes b =
5
scenario, and the graph indicates that the population approaches a steady
state.
Let us now analyze the behavior of the system (48). To this end we let
u be the steady state of the system, that is, u solves
u = Au + b
(49)
This system has a unique solution for any vector b, unless λ = 1 is an
eigenvalue of A (exercise).
Now we define a new sequence vn by
vn = xn − u
or, equivalently
xn = vn + u
This is our old trick of perturbations about the equilibrium. Then
vn+1 = xn+1 − u = Axn + b − u
= Axn − Au = A(xn − u) = Avn ,
96
Figure 32: Stabilizing a Population
that is, vn follows the homogeneous iterations. If the dominant eigenvalue
is greater than one, then vn , and thus xn , will grow exponentially, but if
the dominant eigenvalue is less than one, we get vn → 0, and hence
lim xn = lim vn + u = u ,
n→∞
n→∞
that is, the iterates approach the steady state in this case.
Example: We return to the example from the beginning of this section.
The steady state is found by solving
a = 0.225b
b = 0.4a + 0.8b + 5
which results in a = 10.73 and b = 45.45. The dominant eigenvalue λ1 =
0.9 < 1, and the iterates converge to the steady state, which agrees well
with our Excel observations.
We conclude this section with a discussion of harvesting, and we restrict
ourselves to two age groups (young and adult), and that a fixed number H
of adults is taken out each cycle. This results in the matrix model
xn+1 = Axn − h,
97
"
#
0
where h =
is the harvest vector. This is an inhomogeneous system,
H
and its steady state u solves the equation
u = Au − h
As before, the iterations defined by vn = xn − u satisfy vn+1 = Avn .
If the dominant eigenvalue λ1 of A is less than one, the population will
approach vanish without harvesting, and we assume that λ1 > 1. But
then vn , and thus xn , will grow without bound. When we dealt with one
variable, we saw that if the initial population was above the steady state, the
population would grow exponentially, and if it was below the steady state,
the population would become extinct. With two population parameters the
problem is becomes more difficult, since can deviate from the steady state
in many different ways, and simple inequality conditions no longer apply.
Example: Suppose that a fish population is broken down into youth and
adults, and that it grows with transition matrix
"
A=
0 4.4
0.5 0.9
#
The eigenvalues of A are λ1 = 2 and λ2 = −1.1, and the iterative model
predicts a 100% growth per cycle. Assume further that we want to harvest
H = 20 adult fish in each cycle.
"
#
880/21
The steady state is u =
, and the Excel example shows that
200/21
the population
" will
# eventually grow exponentially, if the initial population
33
is x0 =
. Experimentation in Excel shows that the population
21
will survive if both components of x0 are greater than those of u, and the
population will decline if both components are less. But there are mixed
cases: What happens is one component is larger, and the other is smaller?
Let u1 and u2 be the eigenvalues of A associated with λ1 and λ2 , respectively, with |λ1 | > |λ2 |. Then the transformed initial vector v0 = x0 − u
can be written as a linear combination of the eigenvectors, and we have
v0 = αu1 + βu2
for suited scalars α and β. Now successive iterations yield
v1 = Av0 = A(αu1 + βu2 ) = αλ1 u1 + βλ2 u2
98
Figure 33: Fish Population with Harvest H = 20
v2 = Av1 = A(αλ1 u1 + βλ2 u2 ) = αλ21 u1 + βλ22 u2
and inductively we find that
vn = αλn1 u1 + βλn2 u2
But |λ1 | > |λ2 |, and therefore vn is dominated by the first term. If both
components of u1 are positive and if α > 0, then both components of the
vectors vn will converge to +∞; for α < 0 both components of vn converge
toward −∞ and the population becomes extinct. The line separating the
two regions is found from α = 0. This implies that v0 = βu2 , and the initial
vector belongs to the line
x0 = βu2 + u
(50)
With β as a parameter, this line separates the regions of exponential growth
and extinction.
Example: We return to our example. For eigenvectors we use
"
u1 =
11
5
#
"
4
−1
#
880/21
200/21
#
u2 =
Then the equation (50) becomes
"
x
y
#
"
= β
4
−1
99
#
"
+
and after elimination of β and further simplifications we arrive at
x + 4y = 80
If the initial data are such that x0 + 4y0 > 80, the population will survive,
and for x0 + 4y0 < 80 it will vanish. For the initial data (33, 12) from
the Excel table we obtain x0 + 4y0 = 81 > 80 and the population grows
exponentially. If the initial data satisfy x0 + 4y0 = 80 we will remain on
this line, and since λ2 = −1.1 the points will oscillate about the equilibrium
(λ2 < 0) and move away from it (|λ2 | > 1).
In closing it should be noted that we assumed that both eigenvalues
of A are real numbers, that we have a dominant eigenvalue, and that the
components of the associated eigenvector are positive. It should also be
pointed out that we did not consider the case where components of the
iteration vectors xn become negative.
Exercises
"
1. Compute the eigenvalues of A =
#
0 0.225
.
0.4 0.8
2. Consider a harvesting model with
"
A =
0
8
0.4 0.8
#
and
H = 15
Find the steady state and the line (50) in simplified form.
3. Let A be a square matrix. Show that the system
u = Au + b
has exactly one solution u for any choice of b, unless λ = 1 is an
eigenvector of A.
6.3
Markov Chains
Suggested Reading: Mooney/Swift: Sections 3.8 is entirely devoted to
Markov chains.
The transition matrix in a Markov process contains probabilities. To
be more specific, suppose that our system has n different states, which are
100
collected as the components of the state vector x. Then the entry aij of the
transition matrix A is the probability that the state j will become the state
i after the next time step. The row index i represents the destination, the
column index is the origin, or in short, aij = adestination,origin . An example
will shed more light on this.
1
Example (G. Strang): Each year 10
of the people outside of California
1
move to the Golden State and 5 of Californians move out.
Here the state vector has two components: either you live in California
(x), or you don’t (y). Then, using the superscript + to indicate the following
state, the information we are given results in the equations
x+ =
y+ =
1
4
x+ y
5
10
1
9
x+ y
5
10
"
In matrix form this system becomes
"
x+
= Ax, where A =
4
5
1
5
1
10
9
10
#
#
x
and x =
. The matrix entry a11 = 45 represents the proportion of
y
Californians who stay there, and a21 = 51 is the fraction of those moving
away. Here i = 2 is the destination (leave California), and j = 1 is the
origin (currently in CA).
Inspection of the matrix A in the example shows that the columns add
to one. This is no coincidence. Let’s fix a particular state j, (the origin, the
column index). Then we need to assign the probabilities at which state j
changes to any of the destination states i, and since we have to exhaust all
possibilities, these probabilities have to add to unity.
Definition: A matrix A is called a Markov transition matrix, if all its
entries are non-negative, and if all columns add to one, that is, if
0 ≤ aij ≤ 1
n
X
and
aij = 1
for all j
(51)
i=1
The iterations xn+1 = Axn are called a Markov process, or a Markov
chain.
As in the past, we are interested in steady state solutions.
101
Example. We return to the California example. The equations for the
steady state are
4
1
x+ y = x
5
10
1
9
x+ y = y ,
5
10
1
1
x =
y, which means that the number
5
10
y
x
of those leaving California ( 5 ) must match the newcomers ( 10
). If we add
1
the condition that x + y = 1, the steady state is x = 3 and y = 23 .
and simplification results in
Example (Shopping: We are tracing 200 customers of Walmart, Kmart
and Target stores. Each month 50% of Walmart customers remain loyal
to their store, 40% of the Walmart shoppers switch to Kmart and 10% to
Target. Only 20% of Kmart shoppers stay with the store, the other customer
switch to Walmart and Target in equal amounts (40% each). Target has the
highest retention rating at 60%, and 10% of its customers switch to Walmart,
and the other 30% seek bargains at Kmart. Beginning with 80 shoppers
each at Walmart and Kmart, and 40 at Target, what is the distribution of
shoppers in the next three months, and what is the steady state distribution?
Figure 34: State Diagram Shopping
Denoting the Walmart customers with x, those of Kmart with y and the
Target clientele with z, we obtain the transition matrix


0.5 0.4 0.1


A =  0.4 0.2 0.3 
0.1 0.4 0.6
102


80


The initial vector is x0 =  80 , and the three next iterates become
40






65.84
68.4
76






x1 = Ax0 =  60  , x2 = Ax1 =  61.6  , x3 = Ax2 =  60.68 
73.48
70
64
Since we cannot form fractions of people, we should think of the vectors as
expected values. Notice that in each state the entries in the vectors
add to


63.4921


200. In the long run the customer distribution becomes x =  60.3175 .
76.1905
This result can be verified by continuing the iteration process, or by solving
x = Ax along with x + y + z = 200.
We now confirm our observation that the sum of the components remains
invariant when a vector is multiplied by a Markov transition matrix. Let
y = Ax, then by definition of matrix-vector multiplication we have
yi =
n
X
aij xj
j=1
Using (51) we obtain the desired result
n
X
yi =
i=1
=
n X
n
X
aij xj =
i=1 j=1
n
X
n
X
n
X
j=1
i=1
!
aij xj
xj
j=1
If x is a probability vector, that is, if the components of x satisfy 0 ≤
xj ≤ 1 and x1 + x2 + · · · + xn = 1, then y = Ax is a probability vector as
well. To see this, we first note that the components of y add to one, and
secondly, with aij ≥ 0 and xj ≥ 0, it is impossible for y to have negative
components.
Let us now take a look at possible eigenvalues and eigenvectors of a
Markov matrix, and let Ax = λx. Since the component sum is invariant
under matrix multiplication, it follows that
λ
n
X
j=1
xj =
n
X
(Ax)j =
j=1
103
n
X
j=1
xj
How is this possible? There are only two options, either λ = 1, or the components of the eigenvector add to zero. In other words, if x is an eigenvector
associated with a real eigenvalue λ 6= 1, then some of the components of x
must be negative, while others are positive; they cannot all have the same
sign! If λ is a complex number, the eigenvectors contain complex numbers
as well, yet, their sum will be zero.
Example: California, revisited. The transition matrix is
"
A =
4
5
1
5
1
10
9
10
#
with characteristic polynomial p(λ) = λ2 − 17
10 λ +
7
λ1 = 1 and λ2 = 10 , with associated eigenvectors
"
x1 =
1
2
#
7
10 .
"
and
x2 =
The eigenvalues are
#
1
−1
The first eigenvector has positive entries only, and the components of the
second eigenvector add to zero.
Next we discuss the convergence of a Markov chain. To be specific, we
consider convergence of the sequence of vectors defined xn+1 = Axn for
some arbitrary starting vector x0 . This is very closely linked to the power
method, as we saw in the last sections.
Our next example shows that convergence is not automatic.
"
Example: Let A =
#
0 1
, and take x0 =
1 0
"
a
b
#
as starting vector.
Then
"
x1 =
b
a
#
"
,
x2 =
a
b
#
= x0
,
x 3 = x1
,
and so on
The components of x are interchanged at each iteration, and convergence
cannot occur (unless a = b). An investigation of eigenvalues yields p(λ) =
λ2 − 1 and λ1 = 1 and λ2 = −1. This matrix does not have a single
dominant eigenvalue since |λ1 | = |λ2 | = 1.
Definition: A Markov matrix is called regular, if there is a power of A
such that all entries in An are positive.
It can be shown that a regular Markov matrix has λ1 = 1 as its dominant eigenvalue, that is, |λk | < 1 for all remaining eigenvalues. This is a
104
consequence of the Perron-Frobenius Theorem. The proof of this theorem is
not elementary, and it is beyond the scope of these course notes.
In our derivation of the power method we had to normalize to avoid
numerical overflow or underflow. This step is not necessary for regular
Markov processes, since the dominant eigenvalue is λ1 = 1, and we conclude
that the Markov chain defined by xn+1 = Axn converges an eigenvector u
associated with λ1 = 1. Moreover, if the starting vector x0 is a probability
vector, then so is u.
Finally, we view Markov chains in terms of matrix powers. As usual,
implies that
xn = A n x0 → u
The sequence depends linearly on the starting vector x0 , and therefore the
limit u is a linear function of x0 as well. Linear relations in n-space can
be expressed in matrix form, and we have u = Lx0 . Since this argument is
valid for arbitrary starting vectors x0 it follows that
lim An = L
n→∞
What can we say about this matrix L? If we take any of the basic unit
vectors2 ej as starting vectors, all of the resulting sequences xn will converge
to the same steady state u. Thus
Lej = u
This can only happen if all column vectors of the matrix L are equal to u.
In summary:

n
lim A
n→∞


= L =


u1
u2
..
.
u1
u2
..
.

· · · u1
· · · u2 

.. 

. 
u1
u2
..
.
un un un · · · un
where the values uj are the components of the eigenvector u associated with
the dominant eigenvalue λ1 = 1.
Example: Let


0.7 0.3 0


A =  0.2 0.4 0.5 
0.1 0.3 0.5
2
These are n-vectors with 1 in the j-th component, and all the other components are
zero
105
A is regular, since all values in


A2
0.55 0.33 0.15


=  0.27 0.37 0.45 
0.18 0.30 0.40
are non-zero. An elementary, yet lengthy, calculation reveals
p(λ) = λ3 −
8 2 31
1
λ +
λ−
5
50
50
as characteristic polynomial. Factoring results in
3
1
p(λ) = (λ − 1) λ − λ +
5
50
2
and the eigenvalues are
√
3+ 7
λ2 =
= 0.5646 ,
10
λ1 = 1 ,
√
3− 7
λ3 =
= 0.03542
10
λ1 = 1 is the dominant eigenvalue. For the associated eigenvectors we chose





√ 
−3√
4− 7
5






u2 =  4√− 7  ,
u3 =  √ −3 
u1 =  5  ,
4
7−1
7−1
The components of u1 are all positive and their sum is 14, and the components of the remaining eigenvectors add to zero.


14


Now let us look at the sequence xn , where we start with x0 =  0 
0
as initial vector – the factor 14 was included for convenience. Numerical
calculations yield

x0







14
9.8
5.8604
5.0279








=  0  x1 =  2.8  x4 =  4.6116  x10 =  4.9874 
0
1.4
3.5280
3.9847
and the convergence to u1 becomes apparent.
Finally we calculate powers of A. Note, that the columns add to one,
and that they eventually become identical. A and A2 have already been
given. Here are two more matrix powers

A5



0.3918 0.3523 0.3198
0.3572 0.3571 0.3571




16
=  0.3415 0.3593 0.3740  A =  0.3571 0.3571 0.3572 
0.2667 0.2884 0.3062
0.2857 0.2857 0.2858
106
Exercises
"
1. Compute the eigenvalues and eigenvectors of the matrix A =
4
5
1
5
1
10
9
10
#
(California example).
2. Evening News, hypothetical data. After each month 25% of the the
viewers of ABC News switch to NBC, the others remain loyal to ABC
(none turns to CBS). Among CBS viewers 70% remain loyal, 20% turn
to ABC and 10% to NBC. NBC retains 60%, and loses 10% to ABC
and 30% to CBS. Find the stable viewer distribution (in percent).
3. A serious epidemic befalls a bird population. After each week 50% of
the healthy birds become infected, 25% of the infected birds remain
infected, 50% recover, but 25% die.
(a) Determine the transition matrix A.
(b) Is A a regular Markov matrix?
(c) What are the eigenvalues of A?
(d) Beginning with healthy 10,000 birds, how many are still alive
(healthy or infected) after 10 weeks? How many are still alive 20
weeks after the outbreak of the epidemic?
4. Let A be a Markov transition matrix, i.e. A satisfies (51), show that
A2 is a Markov transition matrix as well.
6.4
Romeo and Juliet
We continue with our theme of matrix iterative models. For simplicity we
limit our discussions to 2 × 2 matrices, however, we will have no restrictions
on the coefficients of A.
Our example is goes back to Strogatz3 . Suppose that R represents
Romeo’s love for Juliet. If R > 0 he loves her, if R < 0 he hates her,
and the magnitude expresses the strength of these feelings. His love changes
daily according to the formula
Rn+1 = a Rn
3
See deVries/Hillen/Lewis/Müller/Schönfisch, A Course in Mathematical Biology,
SIAM 2006
107
If a < 0, love and hate alternate from day to day. If 0 < a < 1 his love
will eventually fade, but if a > 1 it will grow exponentially. The model is
incomplete, if we don’t incorporate Juliet’s feelings J for Romeo. To do so,
we use the following model
Rn+1 = aRn + bJn
Jn+1 = cRn + dJn
(52)
Now Romeo’s feelings are impacted by the way Juliet feels about him, and
vice versa. If b is positive, her love will increase his; but if b is negative, he
will love her less if she loves him back, but he will love her more, the more
she hates him. All told, we have all the ingredients of a soap opera, or a
romantic comedy.
Example: We take a = 0.8, b = −0.5, c = 0.6 and d = 0.9, and we start
with R0 = 1 and J0 = 0. The table shows the data for the first week
Day 0
1
2
3
4
5
6
7
Rn 1 0.8 0.34 −0.238 −0.751 −1.035 −0.992 −0.632
Jn 0 0.6 1.02
1.122
0.867
0.329 −0.324 −0.887
At the onset, Romeo is in love with her, and Juliet develops feelings for him.
But as her love grows, Romeo gets cold feet, his love fades and he develops
hateful feelings. Juliet doesn’t put up with this attitude and her love turns
into hatred. Further developments of this drama are shown in the graph. Figure 35: Romeo and Juliet for a = 0.8, b = −0.5, c = 0.6 and d = 0.9
This model
falls
into the class of iterative" matrix# models with state
"
#
R
a b
vector x =
and transition matrix A =
. As we know from
J
c d
past experience, the long term behavior of these systems depends on the
108
eigenvalues of A. For a 2 × 2 matrix we can easily determine the eigenvalues
in terms of the matrix coefficients. The characteristic polynomial is
"
λ − a −b
−c λ − d
p(λ) = det(λI − A) = det
#
= (λ − a)(λ − d) − bc
= λ2 − (a + d)λ + ad − bc = λ2 − (tr A)λ + det A
Here tr A = a + d is the trace of the matrix, i.e. the sum of the diagonal
elements. We write α = tr A and β = det A for simplicity. Then the
eigenvalues of A are solutions of the quadratic equation
λ2 − (tr A)λ + det A = λ2 − αλ + β = 0
and the quadratic formula yields
λ =
α±
p
α2 − 4β
2
As is the case for all quadratic equations, we may have two different real
number solutions, one (double) root, or two complex number solutions, as
indicated by the value of the discriminant α2 − 4β.
Let |λ1 | ≥ |λ2 |, then it can be shown that the iterates xn converge to
0 if |λ1 | < 1, and they grow in magnitude, if |λ1 | > 1. We have seen this
behavior for real eigenvalues, but the same argument can be made if the
eigenvalues are complex numbers.
We now go through the different cases and determine the region where
|λ1 | < 1 in terms of α and β. This discussion of the properties of solutions of
a quadratic equation may be found valuable in its own right, aside from being
important for the characterization of eigenvalues, and hence the behavior of
iterative systems.
Real Case: First assume that the eigenvalues are real numbers. Then
α2 −4β ≥ 0. The structure of the quadratic formula shows that the solutions
are located symmetrically about the value α2 . From here it follows that we
must have −2 < α < 2 in order to achieve |λ1 | < 1. Moreover, if α > 0
we have to satisfy
p
α + α2 − 4β
<1
2
which can be simplified into α − 1 < β. If α < 0, the condition
α−
p
α2 − 4β
> −1
2
109
results in −α − 1 < β. Both cases (α > 0 and α < 0) can be combined into
β > |α| − 1
for
−2<α<2
(53)
Complex Case: Here α2 − 4β < 0, and the roots have the form
α ±i
p
4β − α2
2
In order to calculate the magnitude (modulus, absolute value) of a complex
number we have to square real and imaginary parts, and then take the
square root of the sum (note that both roots have the same magnitude).
The condition |λ| < 1 now becomes
v
u u α 2
t
+
2
p
4β − α2
2
!2
< 1
which simplifies to β < 1.
In combining both, the real and the imaginary case, we find that for
|λ1 | < 1, the values of α and β must lie in the triangle
|α| − 1 < β < 1
in the αβ-plane. After adding 1 everywhere, we obtain in terms of trace and
determinant
|tr A| < 1 + det A < 2
(54)
These inequalities are called the Jury conditions, named for E.I. Jury. Note
that in the αβ-plane complex eigenvalues lie above the parabola β = 14 α2 .
If the matrix satisfies the Jury conditions, the vectors of the sequence given
by xn+1 = Axn will converge to 0. If det A > 1 or if tr A > 1 + det A,
the vectors become infinite in magnitude.
Example: For the problem with a = 0.8, b = −0.5, c = 0.6 and d = 0.9,
we find that tr A = 1.7 and det A = 1.02. This point is slightly outside the
triangle, since β = det A > 1. Therefore the eigenvalues are greater than one
in magnitude, and the iterates will grow in amplitude, which is supported
by the graph.
In iteration problems of the form xn+1 = Axn + b we proceed as in
Section 6.2. First we find the steady state from u = Au + b, and we define
vn = xn − u. Then vn+1 = Avn , and we can determine the behavior
of vn from the analysis above (Jury conditions). Since xn = vn + u, the
110
Figure 36: The Jury Stability Triangle (lightly shaded)
sequence xn will converge to the steady state, if vn → 0, otherwise it will
diverge.
Example: Suppose that Romeo’s and Juliet’s feelings for each other are
given by
Rn+1 = 0.6Rn − 0.5Jn + 6.3
Jn+1 = 0.4Rn + 0.9Jn − 2.1
Romeo’s love is fueled by Cupid’s magic powers, while Juliet’s evil stepmother plant doubts in Juliet’s mind. Then the steady state is R = 7 and
J = 7. Moreover, since tr A = 1.5 and det A = 0.74, the Jury conditions are
satisfied and in the long run we will have a happy ending of this romance,
with both sequences approaching 7.
Exercises:
"
1. Let A =
#
0.5
−1
.
1 −0.5
"
(a) Let x0 =
#
32
, compute x1 through x8 , and plot the vectors
16
111
in the state plane.
(b) Find the eigenvalues of A, and compute their respective absolute
values.
"
2. Let A =
#
0.8 −0.5
.
0.4
1.0
(a) Find the characteristic polynomial of A and determine the eigenvalues as roots of the characteristic polynomial. What are the
absolute values of the eigenvalues?
(b) Confirm the results using the formula for the eigenvalues in terms
of trace (α) and determinant (β).
(c) What can you say about the data in relation to the Jury triangle?
(Inside, outside or boundary?)
"
3. Let A =
a b
c d
#
with eigenvalues λ1 and λ2 . Show that
(a) λ1 + λ2 = tr A
(b) λ1 λ2 = det A
4. (a) Confirm the calculations leading to (53).
(b) Confirm that in the complex case |λ| < 1 implies β < 1.
5. Show that the curve β = |α| − 1 is tangent to the parabola β =
at the points where α = ±2.
6.4.1
1
4
α2
Complex Eigenvalues
Here we take another look at 2×2 matrices with special attention to complex
eigenvalues. To fix our notation we set
"
A =
a b
c d
#
,
where all coefficients are real numbers, and as before we use α = tr A = a+d
and β = det A = ad − bc for the sake of brevity. Then the eigenvalues are
given by
p
α ± α2 − 4β
λ =
2
112
For a fixed eigenvalue λ, one can construct an associated eigenvector via
"
x =
b
λ−a
#
"
or
x =
λ−d
c
#
(55)
These vectors are multiples of each other, and either formula may be used.
This shortcut works for real and for complex eigenvalues, provided that none
of the vectors is zero. Of course, the formulas are limited to 2 × 2 matrices.
If α2 > 4β we have two distinct real eigenvalues, and in the long run
the iterations xn+1 = Axn approach an eigenvector associated with the
dominant eigenvalue. Moving from one state to the next is eventually more
or less equivalent to multiplication by the dominant eigenvalue. There is
a loophole, however, which occurs when both real eigenvalues are equal in
magnitude. This is caused
" by α #= tr A = 0, and we have evidenced an
0 1
example above when A =
.
1 0
The limiting case of repeated eigenvalues occurs when α2 = 4β, or equivalently (a − d)2 = 4bc, and we will not address this scenario.
Complex eigenvalues occur if α2 < 4β, which is equivalent to (a − d)2 <
−4bc, and we see that this situation can only occur if b and c have opposite
sign. The two eigenvalues are complex conjugates, they are equal in magnitude (no dominant eigenvalues here). Since we are dealing with matrices
consisting of real numbers, the eigenvectors must contain complex numbers.
"
#
3 −1
Example: Let A =
. Then tr A = 3 − 1 = 2 and det A =
5 −1
−3 + 5 = 2. Hence, the eigenvalues are
√
2± 4−8
λ =
= 1±i .
2
An eigenvector associated with λ1 = 1 + i is
"
u =
−1
1+i−3
#
"
=
−1
i−2
#
"
=
−1
−2
#
"
+i
0
1
#
using the first shortcut formula (55). Eigenvectors are not unique, and the
second formula in (55) results in
"
u =
2+i
5
#
"
=
113
2
5
#
"
+i
1
0
#
and the combination
"
u =
i
1 + 2i
#
"
=
0
1
#
"
+i
1
2
#
works as well. All vectors differ by a (complex) constant.
If we separate real and imaginary parts, we can write an eigenvector in
the form u = x + iy, where x and y are real-valued. If we use λ = µ + νi,
the condition Au = λu becomes
Ax + iAy = A(x + iy) = (µ + νi)(x + iy)
= µx − νy + i(νx + µy)
Comparing real and imaginary parts it follows that
Ax = µx − νy
and
Ay = νx + µy
(56)
Of course, neither x nor y are eigenvectors themselves, but the above equations explain how A acts on these vectors.
In our examples in the previous section we saw that complex eigenvalues
resulted in a cyclic behavior of the state vectors, and we want to investigate
this aspect a little further. To this end we need to study properties of An .
As a first case we consider
A2 x = AAx = A(µx − νy)
= µ(µx − νy) − ν(νx + µy) = (µ2 − ν 2 )x − 2µνy
using (56). We can compute A2 y by the same method, but the calculations
can be simplified and streamlined if we use that A2 u = λ2 u and separate
real and imaginary parts:
A2 x + iA2 y = A2 (x + iy) = A2 u = λ2 u
= (µ + νi)2 (x + iy) = (µ2 − ν 2 + 2µνi)(x + iy)
= (µ2 − ν 2 )x − 2µνy + i(2µνx + (µ2 − ν 2 )y)
which implies that
A2 x = (µ2 − ν 2 )x − 2µνy
and
A2 y = 2µνx + (µ2 − ν 2 )y
and we get both results in a two for one deal (one calculation with two
results), which is commonplace when working with complex numbers.
114
The extensions to higher powers of A are obvious. But we can do better
if we use polar coordinates and express the eigenvalues in the form
λ = µ ± νi = ρ(cos θ ± i sin θ)
where ρ = |λ| is the magnitude of the eigenvalue. Then it follows from
DeMoivre’s Theorem4 that
(µ + νi)n = ρn (cos(nθ) + i sin(nθ))
Then
An x + iAn y = An (x + iy) = An u = λn u
= (µ + νi)n (x + iy) = ρn (cos(nθ) + i sin(nθ))(x + iy)
= ρn (cos(nθ)x − sin(nθ)y) + iρn (sin(nθ)x + cos(nθ)y)
Separation into real and imaginary parts results in
An x = ρn (cos(nθ)x − sin(nθ)y)
An y = ρn (sin(nθ)x + cos(nθ)y)
(57)
In this form we see that x and y are multiplied by sines and cosines, and their
respective contributions will never vanish, neither will one of the vectors
become dominant. Most of all, ρ = |λ| is the decisive term, which separates
infinite growth from convergence to zero. If ρ = 1, we have true cyclic
behavior.
Throughout we have tacitly assumed that x and y are linearly independent. But independence is indeed the case, for if x = cy for some real
constant c, we find that x + iy = (1 + ci)x, which leads to
(1 + ci)Ax = A(x + iy) = λ(x + iy) = (1 + ci)λx
and thus
Ax = λx
The last equation implies λ must be a real number (because A and x are
real), which contradicts that λ is a truly complex eigenvalue. The remaining case that y = 0 can be treated similarly. Thus, x and y are linearly
independent, and in particular, neither vector can be zero. Moreover, any
4
If you are not familiar with this theorem, you show confirm it for n = 2 and n = 3
using the pertaining trig identities.
115
starting vector x0 can be expressed as linear combination of these vectors,
and An x0 can then be calculated using (57).
We can push this idea one step further and assume that x = x0 . Here
is the rationale: Let x and y be a pair of vectors such that u = x + iy is
an eigenvector of A. Now let x0 be the starting vector and select a and b
such that x0 = ax − by (x and y are linearly independent, and therefore
any vector in the plane can be written as a linear combination of these two
vectors). Since any complex multiple of u is an eigenvector of A as well, the
vector
(a + bi)(x + iy) = ax − by + i(bx + ay) = x0 + iy0
where y0 = bx + ay, is an eigenvector of A. The real part of the new
eigenvector is the initial vector x0 , and we picked up a companion vector
y0 .
"
Example: Let us return to A =
"
#
3 −1
, and suppose that the starting
5 −1
#
1
, and suppose that we want to compute A10 x0 directly,
vector is x0 =
0
without going through 10 matrix multiplications. We already saw that u =
x + iy is an eigenvector of A for λ = 1 + i, where
"
x =
−1
−2
#
"
and
y =
0
1
#
Our starting vector x0 can be written as
"
x0 =
1
0
#
"
= −
−1
−2
#
"
−2
0
1
#
= −x − 2y
Hence a = 1 and b = −2 and we find a new eigenvector by switching to
(−1 + 2i)u = (−1 + 2i)(x + iy) = −x − 2y + i(2x − y) = x0 + iy0
"
#
−2
where y0 = 2x − y =
. The real part of the new eigenvector becomes
−5
x0 , and we pick up a companion vector y0 .
√
In polar coordinates we obtain λ = 1 + i = 2 (cos π4 + i sin π4 ). Hence
√
ρ = 2 and θ = π4 , and with n = 10 we find that nθ = 5π
2 . Now, according
116
to the first formula in (57) we obtain
√
A10 x0 = ( 2)10
"
= −32
5π
cos
2
−2
−5
"
#
1
0
"
=
#
5π
− sin
2
64
160
"
−2
−5
#!
"
2 1
−1 2
#
Exercises:
1. Find the eigenvalues and eigenvectors of A =
#
.
2. Confirm the shortcut formulas (55).
3. Show that if u = x + iy is an eigenvector of A associated with λ =
µ + νi, then v = x − iy is an eigenvector for λ = µ − νi.
117