The general surplus production model under the non

The general surplus production model under the non-equilibrium condition
with multiple data sources being combined: application to assessment of
Georges Bank yellowtail flounder
Saang-Yoon Hyun
School of Marine Sciences, University of Massachusetts
School for Marine Science and Technology, University of Massachusetts at Dartmouth
706 S. Rodney French Blvd, New Bedford, MA 02744 USA
Phone: (508) 999-8875; FAX: (508) 910-6374
Email: [email protected]
Surplus production models; ICESms110819
1
Abstract
Under limited data and information, a surplus production model is useful for stock
assessment because it has much fewer parameters than age-structured models. I propose
the general surplus production (Pella-Tomlinson) model (Pella and Tomlinson 1969)
under the non-equilibrium condition, because it is flexible in the model shape, and the
non-equilibrium status is more realistic than the equilibrium. The general model’s shape
parameter is not fixed unlike the logistic surplus production (Schaefer) model (Schaefer
1954). The shape parameter is defined as the ratio of BMSY / K , where BMSY is the
biomass that produces the maximum sustainable yield (MSY), and K is the virgin biomass
or the carrying capacity. It is not a trivial task to estimate parameters in the general
surplus production model, because the model is non-linear and it suffers from an overparameterization problem.
On the other hand, the logistic surplus production model is free of the overparameterization problem, because its shape parameter is fixed as 0.5. Thanks to the fixed
value, the precision in the logistic surplus production model’s estimation is good, and the
logistic surplus production model could be considered to outperform the other surplus
production models (Prager 2002). But the fixed value could lead to serious bias in
estimation. The shape parameter is affected by natural mortality, the steepness in a stockrecruitment model (the steepness = % of virgin recruitment that will be produced when the
biomass level is a certain % of the virgin biomass), and K (Maunder 2003). The shape
parameter fixed as 0.5 is too strict to be realistic.
Another thing new in this study is to incorporate different survey data
simultaneously. The ASPIC software is available from the US NOAA tool box
(http://nft.nefsc.noaa.gov/) for application of a surplus production model, but it fails to
accommodate different survey data together. When different surveys take place in the
same year, they are likely to be correlated to each other. Then because of the dependence
between different survey data, it should be more efficient and less biased to systematically
combine those data than to separately use the data.
Finally I warn that population biomass or abundance could be overestimated by the
common or routine (almost all) practice in stock assessment models of treating the
logarithm of survey index or catch data as a normal (Gaussian) random variable. Because
the mean of the logarithm of a random variable is not the same as the logarithm of the
mean of the variable, estimate of a population size (abundance or biomass) is likely
positively biased (over-estimated) from the practice. I suggest a more accurate expected
value of the logarithm. For demonstration purposes, I apply the above model and idea to
data on Georges Bank (GB) yellowtail flounder (Limanda ferruginea).
Data
Annual commercial yield and survey data are available. Four kinds of surveys have
been made: the US NEFSC spring, fall and scallop surveys, and the Canada DFO survey.
I use data from two surveys of the NEFSC spring and fall surveys, because both data have
been collected much longer than the NEFSC scallop and DFO surveys: the spring survey
since 1968, and the fall survey since 1963 (Legault et al. 2010).
Surplus production models; ICESms110819
2
Difference equation of non-equilibrium surplus production model
Management practice is generally based on annual data, and thus, using time
increment as one year, I use a difference equation of the general surplus production model
aka Pella-Tomlinson model under the non-equilibrium condition. I use Gilbert’s (1992)
formulation of the Pella-Tomlinson model to explicitly have the shape parameter.
Bt 1  Bt 
 Bt n

r
 n 1  Bt   Yt
(1/ n  1)  K

(1)
where n = the parameter that determines BMSY / K (eq. 2), called the shape parameter; K =
the virgin biomass, or the carrying capacity; r = the production rate at maximum
production. Management references are calculated as follows (Maunder 2001).
BMSY
1
 (1/( n 1))
(2)
K
n
where BMSY = biomass that corresponds to MSY, and the ratio of BMSY to K represents the
position of maximum production. MSY is:
MSY  r  BMSY
(3)
Yield in the difference equation (eq. 1) is the product of fishing mortality and
biomass:
Yt  Ft Bt
(4)
It should be noted that in the difference equation model, Ft is a dimensionless rate whose
range is from 0 to 1. That is, it is different in range from the instantaneous rate, rooted
from a differential equation model.
Survey data are available, which represent a relative total biomass or a scaled catch
(in weight) per unit effort. Denoting the index from survey s in year t as I st ,
I st  qs ( Bt  Yt / 2)
(5)
qs is the scaled catchability coefficient of survey s. Because the survey index data is not
raw but scaled (expanded to the population total area from survey areas), theoretically qs
can be larger than one: i.e., qs > 0. Term, ( Bt  Yt / 2 ) in the above equation represents
the average of biomass between the beginning and end of the year. For example, recruits
(births) at year t, fish survived from the previous years to the year, and removal by
fisheries in the year average “[(recruits + survivals) + (recruits + survivals – harvest)]/2”
(Schnute and Richards 1995).
Probability error & Dependence between survey data
As in most surplus production models, I consider that survey data have an error,
assuming that the logarithm of survey index follows a normal (Gaussian) distribution,
treating yield data as a constant.
log I st ~ N(E(log I st ), V(log I st ))
where V( ) = the variance operation. The expected value of log I st is:
E (log I st )  log qs  log( Bt  Yt / 2)
Surplus production models; ICESms110819
(6)
(7)
3
where Bt is a function of four free parameters: n, K, r and the initial biomass B1 (eq. 1).
V( log I st ) is treated as an additional parameter.
When data are collected from different surveys in a common year, then those survey
data are likely to be dependent on unknown population biomass in the year. For example,
index data from spring and fall surveys for GB yellowtail flounder are significantly
correlated (Fig. 1). In the case, the dependence between different survey data must not be
ignored. For the purposes, I assume that the joint probability of different survey data is
conditionally independent on parameter of unknown biomass.
i
Pr(log I1t , log I 2 t , , log I it | Bt )   Pr(log I st | Bt )
(8)
s 1
The conditional independence between survey data in a year incorporates the dependence
on the common biomass in the year.
Likelihood
Denoting parameters as the vector of θ , the likelihood of the parameters is:
y
i
L(θ | I )   Pr(log I st | θ)
t 1 s 1
(9)
y
i
 
t 1 s 1
 (log I st  E (log I st )) 2 
1
exp  

2  V (log I st )
V (log I st )


where y = the number of years, and i = the number of surveys. To save degrees of
freedom, I do not treat V( log I st ) as a free parameter, but analytically calculate it. The
maximum likelihood estimate (MLE) of the variance is:
y
Vˆ (log I st ) 
 log I
t 1
 E (log I st ) 
2
st
y
(10)
E (log I st ) in the above eqs, 9 and 10 is replaced by the right side in eq. 7.
It is not trivial to estimate four parameters (i.e., n, K, r, and B1 ) in the above
likelihood function because the likelihood surface is flat and the global maximum is hard
to find. Because of the reason, the constraint of B1  K is suggested (Polacheck et al.
1993). I estimate parameters both with the constraint and without it. I use ADMB
software (Fournier 2007) to numerically differentiate the likelihood function (9) with
respect to parameters.
With the constraint of B1  K
Surplus production models; ICESms110819
4
I use data that start in 1969 (Figure 2) for deploying the constraint of B1  K ,
because the average of the spring and fall survey indices in 1969 was largest, and the
largest value makes the constraint less arbitrary.
Without the constraint of B1  K
I estimate parameters without the constraint of B1  K , using different values of n
(eq. 1). The possible range of n is from 0.1 to 2.5, which correspond to the shape
parameter ( BMSY / K ) of 0.08 – 0.54. I use data that start in 1976 (Table1, Figure 4)
whose survey values are low enough to remove the constraint. Another reason for the
choice of year 1976 as the start year is that only US and Canada commercial fisheries have
been permitted since the year and the yield data since the year have been relatively well
monitored. Fisheries from other foreign countries before 1976 took place, and their yield
data may not be reliable.
Questionable practice in using a lognormal density function
Most fish stock assessment models often apply the logarithm to catch (Fournier and
Archibald 1982) or survey index data (Polacheck et al. 1993, Maunder 2001). The
logarithm scales down raw data and also transforms the raw data’s distribution to a normal
(Gaussian) distribution. The scale-down and the normality increase stability in parameter
estimation. I have no problem with the role of the logarithm. However, I am concerned
that the routine practice of the lognormal application may lead to overestimate true
population size (in number or biomass), because the expected value of logarithm of a
random variable, say X is not the same as the logarithm of the expected value of the
variable: i.e., E (logX )  log E ( X ) . When log X is a concave function where X > 0, then
the Jensen inequality indicates that
(11)
E (log X )  log( EX )
That is, in fish stock assessment literature’s routine treatment of E(log Catch) = log
E(Catch), and E(log Index) = log E(Index), the equality (=) indicates approximate equality
(  ), which is Taylor series first order approximation. The Jensen inequality (eq.11)
means that stock assessment models, which contain the routine treatment, are likely to
over-estimate a population size (abundance or biomass). Surplus production and catch-atage models use catch-per-unit-effort (e.g., survey index) and catch data by deploying the
routine practice. The cumulative effect might increase the positive bias.
To remove the overestimation potential, I consider the expected value of the
logarithm up to Taylor series second order approximation. Letting X ~ N(  ,  2 ), then
E (log X )  log  
2
2 2
Applying the same principle to survey index data,
E ( I st )  st  qs ( Bt  Yt / 2)
Var ( I st )   s 2
Like eq. 10, the analytical MLE of the variance is:
Surplus production models; ICESms110819
(12)
(13)
5
y
s 
2
 (I
t 1
st
  st ) 2
(14)
y
E (log I st )  log  st 
 s2
2  st 2
(15)
 st in eqs. 14 and 15 is replaced by qs ( Bt  Yt / 2) in eq. 13.
10000
Key preliminary results are shown below as Figures and Table.
6000
4000
2000
0
Index (mt)
8000
r = 0.72 (P-value < 0.000)
1970
1975
1980
1985
1990
1995
2000
2005
2010
Year
Figure 1. A significant correlation between the US NEFSC spring (solid line) and fall
(dotted line) survey indices for Georges Bank yellowtail flounder. The significant
correlation suggests that we should correctly incorporate the dependence between the
different survey data into the stock assessment.
Surplus production models; ICESms110819
6
5000 10000
2e+04
40
60
80
(d)
0
20
$ mortality (%)
Fishing
100
0
1.0
0.6
0.4
0.2
Ratio
$
0.8
(c)
1970
1980
1990
2000
2010
1970
1980
1990
2000
2010
Year
Figure 2. On the basis of survey and commercial yield data from 1969-2009 on Georges
Bank yellowtail flounder, performance of the non-equilibrium surplus production model
where the shape parameter is treated as a free parameter. Panel (a): the predicted survey
indices are overlapped with the observed values. Red solid line indicates the predicted
spring survey index, and red dots on broken red line are the observed spring survey index.
Counterparts in blue represent fall survey index. Panel (b): the solid line is the predicted
biomass whose values are on the left y-axis, and black dots are the yield data whose values
are on the right y-axis. The horizontal broken line indicates MSY (= 9869.6 mt). Panel
(c): Bt / K (solid line) is compared with BMSY / K (broken line), the shape parameter
whose estimate is 0.44. Panel (d): Fishing mortality (%).
Surplus production models; ICESms110819
7
Yield (mt)
20000
1e+05
$ (mt)
Biomass
(b)
6e+04
10000
6000
2000
Index (mt)
(a)
1e+05
8e+04
6e+04
4e+04
1.1
1.2
2e+04
Biomass (mt)
0.7
0.8
0.9
1970
1975
1980
1985
1990
1995
2000
2005
2010
Year
Figure 3. Predicted biomass from data in 1976-2009, given different values of n (eq. 2;
Table 1) where the constraint of B1 = K was removed. n controls the shape parameter.
The green line was biomass predicted by the best model where n =1.2. For comparison,
the red line is added, which is the predicted biomass from data in 1969-2009 where the
constraint of B1 = K was assumed.
Surplus production models; ICESms110819
8
100
80
60
40
0
20
$ mortality (%)
Fishing
(d)
1970
1980
1990
2000
2010
1970
1980
1990
2000
2010
Year
Figure 4. On the basis of survey and commercial yield data from 1976-2009 on Georges
Bank yellowtail flounder, performance of the non-equilibrium surplus production model
where the constraint of B1 = K is relieved. Panel (a): the predicted survey indices are
overlapped with the observed values. Red solid line indicates the predicted spring survey
index, and red dots on broken red line are the observed spring survey index. Counterparts
in blue represent fall survey index. Both predicted spring and fall survey indices were
very close to each other. Panel (b): the solid line is the predicted biomass whose values
are on the left y-axis, and black dots are the yield data whose values are on the right y-axis.
The horizontal broken line indicates MSY (= 7599.6 mt). Panel (c): Bt / K (solid line) is
compared with BMSY / K (broken line), the shape parameter whose estimate is 0.40. Panel
(d): Fishing mortality (%).
Surplus production models; ICESms110819
9
Yield (mt)
10000
10000
5000
30000
$ (mt)
Biomass
6000
0
1.0
0.6
0.4
0.2
0.0
Ratio
$
0.8
(c)
15000
10000
(b)
2000
Index (mt)
(a)
Table 1. Evaluation of the non-equilibrium surplus production model fitted to data from
1976 - 2009 without the constraint of B1 = K. The model was best fitted when the shape
parameter (= BMSY / K ) was 0.40 (i.e., n = 1.2 in eq. 2), where (i) the residuals between
observed survey index values and the model fitted values were the smallest (60.9% in the
spring survey, and 44.3% in the fall survey) as the mean of the absolute values of the
relative residuals; and (ii) the scaled negative log-likelihood value was the smallest as -4.8.
The other values of n (i.e., outside the range of 0.7 – 1.2) did not lead to stable estimation
of parameters or they resulted in unreasonable estimates (e.g., Bt < Yt ).
n
0.7
0.8
0.9
1.1
1.2
The mean of the absolute values of
Scaled negative
relative residual (%)
BMSY / K
log-likelihood
Spring survey
Fall survey
0.30
63.6
46.2
-1.9
0.33
63.2
45.8
-2.5
0.35
62.7
45.4
-3.0
0.39
61.6
44.7
-4.2
0.40
60.9
44.3
-4.8
Acknowledgements
Chris Legault at the US NOAA Northeast Fisheries Science Center (NEFSC) and
Heath Stone at the Canada DFO provided me with data and valuable information about
Georges Bank yellowtail flounder ecology and management. Brian Rothschild, Yue (June)
Jiao, and Steve Cadrin in the School for Marine Science & Technology at the Univ. of
Massachusetts Dartmouth were consulted. Mark Maunder at the Inter-American Tropical
Tuna Commission helped me with ADMB and advice. The work is part of the project of
New England Multi-Species Survey, funded by the NOAA NMFS (NA10NMF4720287).
References
Fournier, D.A., Archibald, C.P., 1982. A general theory for analyzing catch at age data.
Can. J. Fish. Aquat. Sci. 39: 1195–1207.
Fournier, D.A. 2007. An introduction to AD Model Builder Version 8.0.2: for use in
nonlinear modeling and statistics. Otter Research Ltd., Sidney, B.C., Canada.
Gilbert, D.J. 1992. A stock production modeling technique for fitting catch histories to
stock index data. New Zealand Fisheries Assessment Res. Doc. 92/15. [Available
from National Institute of Water and Atmospheric Research (NIWA), Greta
Point, P.O. Box 297, Wellington, N.Z.)
Legault, C.M., L. Alade, and H.H. Stone. 2010. Stock assessment of Georges Bank
yellowtail flounder for 2010. TRAC Reference Document - 2010/06.
Surplus production models; ICESms110819
10
Maunder, M.N., 2001. A general framework for integrating the standardization of catchper-unit-of-effort into stock assessment models. Can. J. Fish. Aquat. Sci. 58:
795–803.
Maunder, M.N., 2003. Letter to the editor. Is it time to discard the Schaefer model from
the stock assessment scientist’s toolbox? Fisheries Research 61: 145-149.
Pella, J.J., Tomlinson, P.K., 1969. A generalized stock production model. IATTC Bull. 13,
421–458.
Polacheck, T., Hilborn, R., and Punt, A.E. 1993. Fitting surplus production models:
comparing methods and measuring uncertainty. Can. J. Fish. Aquat. Sci. 50:
2597–2607.
Prager, M.H., 2002. Comparison on logistic and generalized surplus production models
applied to swordfish, Xiphias gladius, in the north Atlantic Ocean. Fish. Res. 58,
41–57.
Schaefer, M.B., 1954. Some aspects of the dynamics of populations important to the
management of commercial marine fisheries. IATTC Bull. 1, 25–56.
Schnute, J.T., Richards, L.J., 1995. The influence of error on population estimates from
catch-age models. Can. J. Fish. Aquat. Sci. 52: 2063–2077.
Surplus production models; ICESms110819
11