Bounded Population Growth: a Curve Fitting - Full

- MATHEMA77CSAND COMPUTER EDUCA77ON -
BOUNDED POPULATION GROWTH:
A CURVE FITTING LESSON
by John H. Mathews
California State University
Fullerton, California
The logistic curve is presented as a mathematical model for population
growth in calculus, differential equations and mathematical modeling
textbooks. However, there is seldom a development of how to obtain the
coefficients for this model. The purpose of this article is to present two
methods for fitting the logistic curve to data supplied by the U. S. census
bureau. The computer algebra software Mathematica is used in this article
to carry out the computations and plot graphs . However, it is not essential
to use such sophisticated software . It is easy for students to obtain the
solution using the "data linearization" method on a pocket calculator such
as the Hewlett- Packard 20S or Texas Instrument 30SLR .
Let p(t) denote the population at time t. Assume that Lim p(t) = L .
The differential equation relating p(t) and p'(t) is:
(1)
t--~00
P'(t) = rp(t)[l-~]
The parameter r is the initial rate of increase when p(t) is small, and L
is the carrying capacity or limiting population as t--~ -. The form of the
solution to (1) is known to be :
(2)
L
1+cerW to)
The value to is usually chosen to be a convenient starting date such as
1900. We want to fit the curve of the form (2) to a given set of data
(to ,P0), (tl , pl ), . . ., ( to Pn) The model (2) involves three unknown
parameters r, c and L . However, by judiciously selecting a value for L,
we can reduce the problem to finding just two parameters r and c. This
will permit us to use the technique known as "data linearization" which
reduces the problem to finding a least squares line .
First, rearrange (2) in the form :
(3)
P = P(t) =
L -1=cer(t_ t o)
P
Vol. 26, No. 2
Spring 1992
169
- MATHEMATICS AND COMPUTER EDUCATION -
Now take the logarithm of both sides of (3) and obtain :
(4)
In
(L -1) =
In(c) +
r (t - to) .
Next, we introduce the change of variables
(5)
T = t - to , P = In(
L _
1 ), B = In(c) and A = r.
This will transform equation (4) into the linear form:
(6)
P = F(T) = AT + B .
The data points to be used in (6) are the transformed pairs:
(7)
(Tk,Pk) _
In(
( tk - to,
pk -1 ))
for k
When the least squares line is fit to the data (7), the coefficients A and
B in equation (6) are obtained . This computation is carried out automatically on almost any pocket calculator, or statistical software such as
Minitab or a spreadsheet or a computer algebra system such as Mathematica . Finally, the coefficients c and r are calculated :
(8)
c = eB
and
r= A .
The 1990 U . S. census was taken . However, the final figures will not be
available until fall of 1991 or later. We shall show how to fit data to the
above model and use formula (2) to estimate the population in the years
1990 through 2050. One must proceed with caution for extrapolation too
far in the future because of: immigration into the U . S., wars, advances in
medical technology, and so forth (see references [1], [2] and [4]) . Numerical methods for fitting data to a logistic curve are discussed in [3] . For the
"data linearization" method, presented in Example 1, we have assigned the
limiting population as Lim p(t) = L = 1000 . For the 'least squares fit"
t-->method, presented in Example 2, the computer determines the parameter
L = 1481 .53 . The former requires only a working knowledge about the
least squares line which is programmed into almost every modern pocket
17 0
- MA774BNA77CS AND COMPUTER EDUCA770N -
calculator . The latter method involves the more sophisticated problem of
minimizing the sum of the squares of the residuals and requires a minimization subroutine for functions of several variables. The solutions obtained from the two methods will differ, but their values for the near future
(1990<_ t <_ 2000) are almost the same .
Example
1 . Fit the curve
p(t)
=
L
1+cer(t- to)
to the points
(1900, 75.995), (1910, 91 .972), (1920, 105.711),
(1930, 122.755),
(1940, 131 .699), (1950, 150.697), (1960, 179.323), (1970, 203.212) and
(1980, 226 .505) which are the census figures for the population ofthe U.S.
Assume that L = 1000 and use the method of "data linearization."
Solution. Enter the nine data points into the two dimensional array tps :
1900, 75 .995), { 1910, 91 .972), { 1920,105.711) ,
{ 1930,122.755), { 1940,131 .699) , { 1950,150.697),
{ 1960,179 .323) , { 1970,203 .212) , 11980,226.505))
tps = { (
Set L = 1000 and proceed with the change of variables . In Mathematica
this is accomplished by first forming the transpose of the data: trans =
Transpose[ tps] . Then the list of first coordinates trans[ [ 1] ] is selected
and placed in the variable ts:
is = trans[111]
{ 1900, 1910, 1920, 1930,
1940, 1950, 1960, 1970, 1980},
Similarly, the list of second coordinates trans[[2]]
ps =
is
placed in ps :
trans[ [ 2] ]
175 .995, 91 .972, 105 .711, 122 .755, 131 .699, 150 .697, 179 .323, 203 .212, 226.505)
Now the change of variables indicated in (5) is performed . The value
to = 1900 is used:
Ts = is -1900
{ 0, 10, 20, 30, 40, 50, 60, 70, 80}
Ps =
Log[IJps-1]
12 .49805, 2 .28979, 2 .13532, 1 .9666, 1 .88602, 1 .72914, 1 .52094, 1 .36634, 1 .22815)
The transformed pairs (Tk,
array TPs :
TPs =
Spring
Pk)
are then placed in the two-dimensional
Transpose[ { Ts,Ps) ]
1992
Vol .
26,
No.
2
- MATHEMATICSAND COMPUTER EDUCATION ((0, 2.49805), { 10, 2.28979), (20, 2.13532), (30, 1 .9666), (40, 1.88602),
( 50, 1 .729141, ( 60, 1 .52094), 170, 1 .36634), ( 80, 1.22815) 1
Mathematica's subroutine
Fit is used to obtain the least squares line in
(6) in the transformed (T,P) plane :
F[Tj = Fit[TPs,(1,T},T]
2.46778 - 0.0155269 T
A plot of the transformed data (7) and the line (6) is useful in understanding the process, and is shown in Figure l . The Mathematica commands used to create the plot are :
graphl = Plot[ F[ T],( T,0,80) ,PIotRange-> 104.511 ;
dots= ListPlot[ TPs,PIotStyle-> (PointSize[0 .02] 1 ] ;
Show[ graphl,dots,AxesLabel-> ('T' ;'P'}] ;
20
40
60
80 T
Figure 1 . The least squares line for the transformed data.
Set c = exp(B) and
r=
A to get the coefficients of
p(t) =
back in the original (t p) plane .
c = Exp[2.46778]
L
1+ cer (t- to)
11.7962
r = -0.0155269
-0.0155269
Now the function p(t) is defined with the
p[tj = LJ(1 + c Exp[r(t-1900)])
172
Mathematica
command :
- MATHEMATICS AND COMPUTER EDUCATION 1000
11 .7962
1+ E0.0155269(- 1900+ 1)
A plot of the logistic curve p(t) is given in Figure 2. This was accomplished by typing :
graph2 = Plot[ p[ t],(t,1890,2000) ,PlotRange-> (0,300}] ;
dots= ListPlot[tps,PlotStyle-> { PointSize[ 0.02] } ] ;
;
Show[ graph2,dots,AxesLabel-> ("t ","p'}]
1920 1940 1960 1980 2000
Figure 2. The logistic curve fit p(t) for the population data.
The population estimate p(1990) = 255.335 is obtained with this formula. Furthermore, is also worthwhile to graph the curve p(t) over a larger
range of values so that the inherent "S" shape is visible . This is given in
2000
2100
2200
2300
Figure 3. Extrapolation using the logistic curve p(t).
Spring 1992
173
Vol. 26. No. 2
- MATHEMATICS AND COMPUTER EDUCA77ON -
We now investigate how close our guess L = 1000 actually fits the data.
The next example shows how to find L by solving the 'least-squares"
problem when all three parameters r, c and L are determined numerically
by the computer.
Example 2 . Fit the curve f(t) =
1+cer ( t to)
to the data points in
Example 1 by finding the parameters c, r and L which minimize the
quantity :
n
E(rc,L) = J[
k= 1 1+ceLtk- tot
(9)
Pk]2
Solution . Clear the variables r c,L and f and form f(t) :
Clear[f,r,c,L] ;
f t-] = L/(1 + e Exp[r(t-1900)])
L
1+ E '(- 1900+ t) c
The quantity E(r c,L) is formed and stored in the variable sum with the
command :
sum := Sum[(f[ts[[i]]]-ps[[i]])^ 2,{i,1,9}]
[-75 .995+ 1+c]2 + [-91 .972+ 1 + E10rc ]2 + [-105.711+
1+E20rc]2
[-122 .755+
]2
L
L
L rc ]2 + [-131 .699+
]2 + [-150.697+
rc
1+E50
1+E30
1+E40rc
[-179 .323+ 1+E60rc]2+[-203 .212+ 1+E70rc]2+[-226 .505+
1+E80rc]2
Then the parameters are found by invoking Mathematica's minimization
routine :
sol = FindMinimum[ sum,{ r,-0.0146982},{ c,17 .8357},
{ L,1481 .53},AccuracyGoal-> 19,
WorkingPrecision-> 19,MaxIterations-> 250]
(80.1745, { r --->-0 .0146898, c ---> 17.8261, L -41481 .531)
Notice that the value L = 1481 .53 is involved in the solution to the
"least-squares" problem, The population function q(t) corresponding to
these parameters is:
17 4
- MATHEMATICS AND COMPUTER EDUCATION q[tj = f[t]/ .sol [[2]]
1481 .53
17.8261
l+ E0.0146898(- 1900+ t)
Graphs of q(t) over the intervals [1900,2000] and [1900,2030] are shown
in Figures 4 and 5, respectively. For the near future (1990 <_ t<_ 2000) the
functions p(t) and q(t) agree within 1 .5% . As mentioned in the introduction, it is difficult to extrapolate too far in the future . The values p(2030)
= 389 .531 and q(2030) = 406.953 differ by about 4.5% .
1920
1940
1960
1980
2000
Figure 4. The logistic curve fit q(t) for the population data.
g
1400
120010001
800
600
4001
2002000
2100
2200
2300
Figure 5. Extrapolation using the logistic curve q(t).
Spring 1992
175
Vol. 26, No. 2
-
MATHBJWATICS AND COMPUTER EDUCATION -
In conclusion, a comparison of the original data and the logistic curves
p(t) and q(t) is given in Table l . These examples illustrate how curve fitting
is used in a simple mathematical model. Ifthe limiting value of L is known,
or can be determined by other means, then the first method should be used
and p(t) is used to fit the data and to extrapolate . If L is not known, but
can be estimated, then the second method can be used to determine L and
the function q(t) is used to model the data .
Year
tk
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
2020
2030
2040
2050
Observed
Population
Prediction
using
Pk
p(t k )
75 .995
91 .972
105 .711
122 .755
131 .699
150 .697
179 .323
203 .212
226 .505
78 .148
90 .092
103 .656
118 .996
136 .260
155 .587
177 .093
200 .865
226 .948
255 .335
285 .959
318 .685
353 .303
389 .531
427 .021
465 .369
Percent error
in p(t k )
100
pk - P (tk)
P
-2 .83
2 .04
1 .94
3 .06
-3 .46
-3 .25
1 .24
1 .15
-0 .20
Prediction
using
q(t k )
78 .696
90 .388
103 .690
118 .782
135 .854
155 .101
176 .716
200 .887
227 .788
257 .565
290 .335
326 .162
365 .056
406 .953
451 .714
499 .112
Percent error
in q(t k )
100
Pk - q (t k )
P
-3 .55
1 .72
1 .91
3 .24
-3 .16
-2 .92
1 .45
1 .14
-0 .57
Table 1. Comparison of the census data pk and values of p(tk) and q(tk).
REFERENCES
1 . M . Braun, Differential Equations and TheirApplications : an Introduction to Applied
Mathematics, 2nd Ed., Springer-Verlag, New York (1978) .
2. F. R.Giordano,andM .D .Weir,A First Course in Mathematical Modeling, Brooks/Cole
Pub. Co., Monterey, CA (1985) .
3. J. H . Mathews, Numerical MethodsforComputerScience, Engineeringand Mathematics, Prentice-Hall, Inc., Englewood Cliffs, NJ (1987)
4. W. J . Meyer, Concepts of Mathematical Modeling, McGraw Hill, New York (1984) .
176