FISH 458 Lab 2: Fitting models to data using sums of squares Aims: to explore fitting models to data using sums of squares, using the log(sum of squares), fitting the exponential growth model to data, fitting the logistic growth model to data, learning to use the Table Function in Excel, using Solver, finding sum of squares profiles, parameter confounding (as one parameter increases, another decreases), uncertainty in parameters. Files needed for this laboratory: “Part 1 template.xlsx”, “Part 2 template.xlsx”, “Part 3 template.xlsx” Part 1: The following estimates of wildebeest abundance (thousands of individuals) in the Serengeti National Park were obtained in various years: Year 1961 1963 1965 1967 1971 1972 Census 263 357 439 483 693 773 Begin a new sheet in Excel (good practice), or work from “Part 1 template.xlsx”. We are going to fit the exponential growth model to the data: Nt +1 = rNt , assuming that Nt =1961 = 263 . Leave some lines for the parameter values (r and N1961) and the total log sum of squares (lnSSQ), and create a block of cells with columns for the year, abundance estimate, model prediction, and log sum of squares. Set the 1961 abundance equal to the parameter N1961. The formula for exponential growth will go in column C. Plot the observed and predicted data. By convention, use solid circles for the observations and a solid line for the predictions. Try various values of r to see which value provides the best (visual) fit to the data. To find the “best fit” of the model to the data, a common approach is to calculate the sum of squared differences between observed and predicted values, generally known as the sum of squares or SSQ: = SSQ ∑ ( observed − predicted ) 2 t SSQ , which for this model will be= ∑ (N t t − Nˆt ) 2 Where the ^ denotes the model prediction of abundance. For this particular problem, since we are dealing with abundances we will use the natural logs of observed and predicted values: = ln SSQ ∑ (lnN t t − ln Nˆt ) 2 . We do this so that the proportional differences in prediction error are weighted the same, in other words observing 2 and predicting 4 is penalized the same as observing 200 and predicting 400. Calculate the lnSSQ between the observed and predicted values in the D column, but don’t include a formula for 1961 since we are not actually predicting N1961. In order to have a consistent formula in all years, including when there are no data, we can use an IF statement to calculate lnSSQ only in years where there are observations, thus the formula in D11 would be: =IF(B11>0, (ln(B11)-ln(C11))^2, ""). The order is IF, THEN, ELSE, and the ELSE part returns “” which is a cell containing a blank character. Now add up all of the lnSSQ terms for each year and put the result in the lnSSQ cell. Now we want to see how much support the data provide for different values of r using the Table Function. Fill in cells A25 to A45 with values of r going from 1.00 to 1.20 in steps of 0.01. These will be the alternative values of r we will explore. Think of these as different model hypotheses. You could plug each of these values into the r cell (in B4) one at a time and copy the resulting lnSSQ total to the right of the r value, but the Table Function in Excel does this automatically and is a skill well worth learning. To use the (one-dimensional) Table Function, we put the value of the output we want (“=B7”) one cell above and one cell to the right of the column of r values. Select the entire table (A24:B45) and invoke Data-What If Analysis-Data Table. Leave the Row Input cell blank, and in the Column Input cell enter the cell containing the r value (cell B4). Press OK. The Table Function is very finicky, so practice going through these steps several times until you are satisfied you have mastered the function. We will be using this function again and again in Excel. Now draw the graph of lnSSQ vs. r. What does this tell you about the relative support that the data give for different values of r? Later in the course we will use likelihoods to make statistically rigorous pronouncements about the validity of each of the different hypotheses. Just like sum of squares, likelihoods allow us to fit models to data by minimizing a function of the data and the model predictions. Part 2: The wildebeest census data up to the mid-1980s look like this: Year 1961 1963 1965 1967 1971 1972 1977 1978 1980 1982 1984 1986 Abundance 263 357 439 483 693 773 1444 1248 1337 1208 1337 1146 These data show clear evidence that the period of exponential growth has slowed down and that the population has reached equilibrium. Fit the logistic growth model to the wildebeest data using these equations: Nˆt =1961 = N0 Nˆ Nˆt +1 =+ Nˆt rNˆt 1 − t K = ln SSQ ∑ (lnN t t − ln Nˆt ) 2 Set up your sheet in a similar way to part 1 (or use “Part 2 template.xlsx”). This time, we will estimate three parameters: N0, r, and K. We will treat the 1961 population size as a parameter to be estimated, unlike in part 1 where we assumed it was exactly 263. We are going to use the Excel feature Solver to find the values of the parameters that will minimize the lnSSQ. Some key hints for success with Solver and with other non-linear function minimizers. a. First get a good fit by eye by trial and error using combinations of parameter values. b. Set automatic scaling ON in the Solver options. c. Although you can constrain parameter values in Solver, convergence is often better when you constrain parameters by yourself rather than relying on the built-in Solver features (more on this later). d. Try solving for one parameter at a time rather than all simultaneously. e. Do not be satisfied with any single answer, try a variety of starting points to see if you can find a better fit to the data (smaller lnSSQ). Now use Solver to estimate the parameters N0, r, and K. Next calculate the sum of squares profile on r. This is similar to the profile you created in Part 1, but now there are no shortcuts available using the Table Function since there are three inputs. For a series of fixed values of r we will try to find the minimum lnSSQ possible by changing only the values of N0 and K. You will have to do this manually by looping through values of r in the form of the pseudo-code (generalized computer code not in the form of any particular programming language): Step 1: set the value of r to the target value (r = 0.10). Step 2: use Solver to find the values of N0 and K that minimize lnSSQ. Step 3: copy the values of r, N0, K and lnSSQ to the E column to the right using Paste-Special-Values Only (shortcut alt-E-S-V). Copying values only ensure they don’t change when you go to the next step Step 4: go back to step one, incrementing r by 0.01. Step 5: continue until you have a series of values for r (0.10 to 0.25). Some key points can now be learned to see how the values of other parameters are related to the values of r. In particular, as r becomes larger, the best fits to the data have smaller K, and smaller initial population size N0. To visualize this, create the following plots: 1. Graph the lnSSQ profile on r. 2. Graph the relation between r and K from the profile on r. 3. Graph the relation between r and N0 from the profile on r. If time permits, also do a sum of squares profile on K. Part 3: beginning in the late 1970s, Tanzania, where most of the Serengeti ecosystem is found, underwent a serious economic crisis, and no resources were available for wildlife protection such as anti-poaching patrols. It is believed that there was a dramatic increase in the illegal harvest of wildebeest at that time, which continued into the 1980s. Modify your model to allow for harvesting beginning in 1977 using the following equations: Nˆt =1961 = N0 Nˆ Nˆt +1 =Nˆt + rNˆt 1 − t − Ct K Ct 0 if t < 1977 = Ct x = ln SSQ = if t ≥ 1977 ∑ (lnN t t − ln Nˆt ) 2 Now you will have four parameters to estimate: N0, r, K, and x (the annual harvest in numbers after 1977). You will need to add an additional column to your sheet with the annual catch, and modify the predicted equations in column C. Use Solver to find the best estimates of these parameters by minimizing the lnSSQ as before. Find the sum of squares profile on r, K, and x, comparing the profiles on r and K with those obtained in Part 2. When we admit poaching has occurred, we now know much less about the range of K. Advanced exercises 1. Modify the exponential model in Part 1 to add harvesting as in Part 3. Is it necessary to invoke the logistic model to explain the leveling off in the population, or could it be due entirely to increased poaching? 2. Use the age-structured model for wildebeest you are building in the homework, to fit to the time series of abundance data with two parameters: calf survival rate and the adult density-dependence parameter. Add harvesting as above to create a three-parameter model. 3. Implement Part 1 in the programming language R. The function optim can be used for non-linear function minimization in R.
© Copyright 2026 Paperzz