Online Appendix S4. Details on estimating the maximum

Online Appendix S4. Details on estimating the maximum likelihood
of a model
Online Appendix S4 contains details of how the maximum likelihoods of the models used in the
main text were estimated. There are two models, one with and one without environmental
variance. The method used is first described, and this is followed by full details of the
evolutionary algorithm used in the method. Both the models and method used to estimate
maximum likelihoods were implemented in R (R Development Core Team 2010).
Method for estimating maximum likelihood of a model
Overall, we had four types of parameterization of our full model with environmental variance:
BCI with DBH ³10 cm, BCI with DBH ³1 cm, Pasoh with DBH ³10 cm and Pasoh with
DBH ³1 cm. To fulfill the aim of this study, under each of these parameterization types, we
needed to calculate the maximum likelihood of the model reproducing four observed patterns of
biodiversity: static species richness values, community abundances and species-abundance
distributions (SADs), on the one hand, and dynamic temporal fluctuations in species abundances,
on the other. These maximum likelihoods were then used to calculate corresponding AIC values
as a measure of overall goodness-of-fit. In a subsequent exercise, we calculated the AIC values
of a neutral model (without environmental variance) and compared these with AIC values for the
full model.
Given the full or neutral model and a particular parameter set Q , an exact analytical equation for
the likelihood of the model could not be derived because of its complexity. Therefore, it was
necessary to estimate the likelihood using a numerical simulation. Under this approach, the
model would be simulated using Q for M time-steps and model species abundances extracted
from N times-steps in the steady state (Geyer 1991). Ideally, these model species abundances
would then be used to calculate the probability of the model producing the exact patterns of
species abundances observed in the corresponding censuses, which would be the likelihood.
However, for relatively complex stochastic community models such as ours, this probability is
extremely small. Therefore, to make the problem tractable, the observed abundances were used
to define a set of statistics summarizing four commonly examined patterns of biodiversity, and
the likelihood of the model producing the observed values of the statistics (not the underlying
patterns of species abundances) estimated. Specifically, let N be the set of observed species
abundances for all q censuses. These data were summarized using a set of n statistics
n
Z = { Zi (N)}i=1 :
Z1 =
Z2 =
å
q
j=1
Sj
q
å
q
j=1
q
Jj
,
(S.1)
,
(S.2)
å (å DN )
q-1
Z3 =
Zk =
j=1
q-1
å
q
j=1
q
Sjk
,
j
,
(S.3)
(S.4)
where Sj is the total number of species in census j; J j is the total number of individuals
(community abundance) in census j;
(å DN )
is the sum of the absolute changes in species
j
abundances from census j to j +1; Sjk is the number of species in census j and log2 abundance
class k - 4 (with upper bound 2 k-4 ); and k varies from 4 to 4 + r , with 4 + r being the maximum
log2 abundance class out of all censuses. Z1 and Zk both measure species richness, but at
different levels of organisation – the whole community and one abundance class, respectively.
Z1 is related to Zk by Z1 = å
4+r
k=4
Zk . These statistics were chosen because they summarize four
patterns of biodiversity that are commonly measured – species richness, community abundance,
the SAD and temporal fluctuations in species abundances – and thus facilitate comparison with
previous studies (e.g., Volkov et al., 2003, 2007, Kalyuzhny et al. 2015). Other statistics may be
used to characterize other patterns of biodiversity, but these lie outside the scope of our study
and are left for future work.
(
)
*
The likelihood that we seek to calculate is then P Z = Z | Q , where Z * is a random realization
of the n statistics derived from the model with parameter values Q . Because Z * is a function of
*
the model abundances N, the likelihood P Z = Z | Q is the probability of particular outcomes
(
)
of N, which follow a multivariate discrete distribution. Given the high dimensionality of this
distribution and the complexity of the statistics considered, each being functions of hundreds of
species abundances undergoing stochastic growth and mortality, it would take an impractically
large number of model time-steps to achieve a realization Z * equal to Z . To increase the
efficiency in estimating the likelihood, one possibility is to try and smooth the distribution of
model values of the statistics using kernel density estimation and then use this smoothed
distribution to estimate the probability of the model producing Z . In this case, it is no longer
necessary for the model to produce Z exactly – it is sufficient for it to produce values that are
close to it. We used a method of kernel density estimation that can be applied to models that
produce multivariate data and which has been used in a number of studies on population genetics
(Fu and Li 1997, Weiss and von Haeseler 1998, Pritchard et al. 1999; summarized in Beaumont
*
2010). This consists of estimating P Z = Z | Q with the probability density in the local
( (
(
)
)
)
(
)
*
neighborhood, i.e. with P r Z *, Z £ e | Q , where r Z , Z is some distance measure between
Z * and Z and e is a tolerance threshold. In our study, r ( Z *, Z ) was defined as
r (Z , Z ) =
*
=
Z1* - Z1
Z1
+
Z2* - Z2
Z2
å
+
rS + r J + rSAD + rDN ,
4
4+r
m=4
å
Zm* - Zm
4+r
m=4
+
Z3* - Z3
Zm
Z3
(S.5)
where
rS =
rJ =
rSAD
Z1* - Z1
Z1
Z2* - Z2
Z2
å
=
4+r
m=4
å
,
(S.6)
,
(S.7)
Zm* - Zm
4+r
m=4
(S.8)
Zm
and
rDN =
Z3* - Z3
Z3
.
(S.9)
The statistics pertaining to the number of species in each abundance class were considered
together as rSAD , because only together do they constitute a measure of community structure. r S
, r J , rSAD and rDN are the normalized absolute differences between the mean species richness
values, mean community abundances, mean SADs and mean sum of temporal species-abundance
fluctuations derived from the censuses and those produced by the model (with a particular
parameter set Q ) over a length of time corresponding to the censuses, respectively. These four
quantities can be interpreted as the absolute model errors in the four measures of biodiversity,
and are used as absolute goodness-of-fit metrics. From (S.5), equal weights of 1/4 are given to
r S , r J , rSAD and rDN ; because the first three of these pertain to static patterns of biodiversity
whereas only the last pertains to dynamic patterns, this means that the overall distance measure
r emphasizes the importance of the model producing static rather than dynamic patterns of
biodiversity. Therefore, the use of equal weights provides a conservative test of whether the full
model with environmental variance is more likely to reproduce observed static and dynamic
patterns of biodiversity compared with a neutral model. Thus, if the maximum likelihood of the
full model were indeed found to be greater than that of the neutral model, then this would also
hold if the weight of rDN was greater. The choice of weights was therefore motivated by the
aims of our study, consistent with how weights were defined in previous studies that have used
the same approach (Weiss and von Haeseler 1998, Pritchard et al. 1999).
Taking the last N time-steps of a model simulation, during which the model is at a steady state,
consider the N - q+1 unique time intervals of q consecutive time steps, corresponding to q
(
)
censuses. For time interval a, 1£ a £ N - q+1, let the corresponding value of r Z *, Z be r a .
( (
)
)
Then P r Z *, Z £ e | q was estimated as
{a | ra £ e }
N - q+1
,
(S.10)
where {a | ra £ e } is the number of entries in the set {a | ra £ e } . However, an issue here is what
value of e to use. To be informative, e should be chosen to be large enough to ensure that (S.10)
is non-zero for at least one parameter set. On the other hand, it is desirable for e to be as small as
possible to obtain a closer correspondence between simulated and empirical data. We found that
a value of e = 0.05 generally produced non-zero maximum likelihood estimates for our model
with environmental variance, thus allowing it to be compared with a corresponding model
without environmental variance (for which maximum likelihood estimates were lower). Halving
e to 0.025 produced qualitatively similar results (described further below and shown in
Appendix S5: Table S1), but produced small maximum likelihood estimates of <0.02 for the
model with environmental variance, suggesting that e = 0.05 is close to a lower limit below
which it is no longer informative. Doubling e to 0.1 also produced qualitatively similar results
(described further below and shown in Appendix S5: Table S1), but produced maximum
likelihood estimates for the model with environmental variance that could be close to or equal to
1. This suggests that e = 0.05 is close to an upper limit above which the correspondence between
simulated and empirical data is weak. Thus, e = 0.05 was chosen as a default value reflecting a
balance between efficiency and accuracy. Intuitively, using e = 0.05 means a model realization
was only included in the estimate of the likelihood if the normalized absolute differences
between the model and observed mean species richness values, mean community abundances,
mean SADs and mean sum of temporal species-abundance fluctuations was on average no
greater than 0.05.
Dynamics of a model with a particular set of parameter values Q were simulated for M = 10,000
time-steps, with data from the last N = 9,000 time steps extracted and used to estimate the
likelihood using (S.10). This is because by this time, species richness and community abundance
dynamics had converged to a steady state (see Appendix S5: Figs. S1 and S2). The only
exception was for the neutral model for Pasoh with a DBH threshold of 1 cm, for which M =
12,000 time-steps were used because of slower convergence of dynamics to a steady state. To
test sensitivity of our model results, we also used M = 15,000 and N = 14,000 (N = 12,000 for the
Pasoh neutral model with DBH threshold of 1 cm), and M = 20,000 and N = 19,000 (N = 17,000
for the Pasoh neutral model with DBH threshold of 1 cm). We note that both the full model and
*
the neutral model can theoretically reproduce Z , so that p Z = Z | Q ¹ 0 . Therefore, if the
(
)
(
)
estimate (S.10) was found to be zero, then we do not infer that p Z * = Z | Q = 0 but instead infer
(
)
that p Z = Z | Q <1 ( N - q+1) .
*
We implemented an evolutionary algorithm to find, for each model and parameterization, a set of
parameter values that maximized the estimated likelihood given by (S.10). Given the fivedimensional model parameter space to be explored for the full model with environmental
variance (see main text or Online Appendix S3 for the five parameters) with an unknown
‘fitness’ landscape describing the fits of a dynamic community model with different sets of
parameter values to observed data, it was appropriate to use a global evolutionary algorithm to
converge to the best-fit set of parameter values. The algorithm used was the B-Cell Algorithm
(BCA) (Kelsey and Timmis 2003, Kelsey et al. 2003), adapted to the tree community models
used. It has been found to converge to the extrema of complicated functions quicker than
competing algorithms (Kelsey and Timmis 2003, Kelsey et al. 2003). In an ecological context,
this algorithm has previously been applied to test the possibility of alternative stable states in
coral reefs, using dynamic community models with up to 13 parameters (Fung et al. 2011).
Further details of the BCA are provided below. For each parameterization, it was applied for
1,000 generations, which is the same order of generations that Kelsey and Timmis (2003) found
was sufficient to converge to the minimum of 12 functions, with up to 20 parameters.
The results presented in Table 1 in the main text were those obtained using the default tolerance
threshold of e = 0.05 , except for the models for Pasoh and a DBH threshold of 1 cm. In this case,
a threshold of e = 0.05 gave an estimated likelihood of zero for both the full model with
environmental variance and the neutral model, so that they could not be distinguished
statistically. Therefore, results using the higher threshold of e = 0.1 were presented. In addition,
the results in Table 1 were derived using data from the last N = 9,000 time-steps of a total of M =
10,000 time-steps simulated for each model, which are the default values. The exception is the
model for Pasoh with no environmental variance and a DBH threshold of 1 cm, for which the
default value of M = 12,000 time-steps, as described above. Online Appendix S5 shows results
for all the different values of e , M and N tested (Appendix S5: Tables S1, S2, S5, S6).
In particular, we found that model results were generally quantitatively similar across the three
different pairs of values of M and N, and qualitatively similar across the three different values of
e . We now examine the results for different M and N in greater detail to clarify the underlying
reasons for the differences in the results observed.
As M and N increased simultaneously from the default values, the maximum likelihood estimate
typically changed by small amounts – for 15 out of 16 combinations of two sites (BCI and
Pasoh), two DBH thresholds (1 and 10 cm), two model types (neutral and non-neutral) and two
larger pairs of values of M and N (as described above), the absolute percentage change was
<12% (Appendix S5: Table S6). In addition, the absolute percentage change in each of the
parameters was typically small, with a median of 12.6% (n = 64). However, the maximum
likelihood estimate for the non-neutral BCI model with a DBH threshold of 10 cm exhibited a
32.5% increase as M and N increased from 10,000 and 9,000 to 20,000 and 19,000, respectively
(Appendix S5: Table S6). Two sources of error could have caused this relatively large
discrepancy. Firstly, with the same M and N, different runs of the evolutionary algorithm used to
estimate the maximum likelihood could result in different maximum likelihood estimates,
because of imperfect convergence in each run. To test this, we re-ran the algorithm for the nonneutral BCI model with a DBH threshold of 10 cm and with M = 10,000 and N = 9,000 a further
nine times, each time with a different, randomly chosen set of initial parameter values. We found
that the total set of 10 maximum likelihood estimates had a coefficient of variation of 10.2%,
corresponding to a range of 0.137-0.189. The upper values of this range are close to the
maximum likelihood estimate of 0.181 when M = 20,000 and N = 19,000 was used (Appendix
S5: Table S6). Thus, error between runs of the evolutionary algorithm could explain the
differences in maximum likelihood estimates for the different sets of values of M and N tested.
But the errors in the likelihood do not appear to be large enough to affect our main conclusion
that the non-neutral models give better fits as measured by the AIC. To be more certain of this,
we re-ran the evolutionary algorithm a further nine times for each of the seven other models used
to generate the main results in Table 1, each time with a different randomly chosen set of initial
parameter values. For each model, we then considered the 10 maximum likelihood estimates
from the 10 runs of the evolutionary algorithm and retained only the largest one, together with
the corresponding parameter set. This resulted in final maximum likelihood estimates for the four
non-neutral models that were 0-38.7% larger than the corresponding estimates found with only
one run of the evolutionary algorithm, corresponding to small absolute increases of 0-0.053;
maximum likelihood estimates for the four neutral models remained the same at very low values
(Table 1, Appendix S5: Tables S5, S6). Thus, the main conclusions in our study are unaffected.
We also found that the parameter values that maximize the likelihood typically did not change
when the number of runs of the evolutionary algorithm was increased to 10 (median of 0%; n =
32). There were a few cases where a parameter value changed by relatively large amounts, up to
32.6%, and this can be attributed to different regions of parameter space producing likelihoods
close to the maximum.
A second source of error is incomplete convergence of model dynamics given a set of parameter
values. For a particular model with a given set of parameter values, we had simulated model
dynamics for M time steps and used data from the last N time steps to estimate the likelihood. It
is known from the Central Limit Theorem for Markov chain processes that the estimated
likelihood converges to the true likelihood as N ®¥ (Geyer 1991). However, a finite value of N
produces incomplete convergence and associated error in the estimate. The standard deviation of
the estimated likelihood, sˆ L , can be estimated using formulae from Geyer (1991). For the
simulation of the non-neutral BCI model in Appendix S5: Table S6 with a DBH threshold of 10
cm, M = 10,000, N = 9,000 and e = 0.05 , we found that ŝ L = 0.0115, which is 8.4% of the
estimated likelihood of 0.137. This type of error is thus unlikely to be the main reason why the
estimated likelihood is 0.044 smaller than the estimated likelihood of 0.181 when M = 20,000
and N = 19,000 (Appendix S5: Table S6). By applying the same technique to the non-neutral
BCI model in Appendix S5: Table S6 with a DBH threshold of 1 cm and M = 10,000, N = 9,000
and e = 0.05 , and the non-neutral Pasoh model in Appendix S5: Table S6 with a DBH threshold
of 10 cm and the same values of M, N and e , we found that ŝ L = 0.00514 and 0.0347
respectively, which are 16.0% and 5.6% of the estimated likelihoods, respectively. The
remaining five models in Appendix S5: Table S6 with M = 10,000, N = 9,000 and e = 0.05 had
sˆ L = 0 . For the non-neutral Pasoh model in Appendix S5: Table S5 with a DBH threshold of 1
cm and e = 0.1, ŝ L = 0.0318, which is 4.6% of the estimated likelihood; for the neutral Pasoh
model in Appendix S5: Table S5 with a DBH threshold of 1 cm and e = 0.1 (Table 1), sˆ L = 0 .
Together, these error estimates pertain to models that include the main ones presented in the
main text (Table 1). Thus, considering these estimates, it is unlikely that the type of error
specified would affect the main conclusions of our study. We note that the Central Limit
Theorem for Markov chain processes does not depend on samples from a chain being
independent or identically distributed (Geyer 1991, 2011), so that in this sense it was not
necessary to subsample the Markov chains for our simulations at intervals corresponding to
when the autocorrelation decays to small values. Subsampling would have the undesired effect of
reducing the accuracy of an estimated likelihood for a given sample size (Geyer 1992, 2011;
MacEachern and Berliner 1994).
In summary, we have identified two further sources of error in our likelihood estimates, but by
performing further tests, we have provided evidence that the main conclusions of our study are
robust to these errors. In future studies that use our models to investigate questions for which
more accurate likelihood and parameter estimates are important, it would be ideal to run the
evolutionary algorithm multiple times for each model parameterization and to use more than
20,000 time steps for each simulation of each model.
Lastly, we note that it is also possible to interpret our fitting procedure under a Bayesian
framework. Under this framework, the prior probability distributions of the model parameters are
assumed to be distributed uniformly. This means that maximizing the posterior probability of Q ,
p(Q | Z ) , is the same as maximizing the likelihood p( Z | Q) . Thus, finding the parameter set that
maximizes (S.10) was equivalent to finding the set that maximizes an estimate of p(Q | Z ) .
Details of the evolutionary algorithm used
The global evolutionary algorithm used in maximizing the estimated likelihood was the B-Cell
Algorithm (BCA) of Kelsey and Timmis (2003) and Kelsey et al. (2003). It is designed to
minimize a function F (or equivalently, maximize -F ) defined in a bounded k-dimensional
parameter space X. Starting with a ‘population’ of arbitrarily chosen parameter sets, the BCA
uses a sequence of steps that involve cloning and mutating the parameter sets to converge to a
parameter set that minimizes F (Kelsey and Timmis 2003, Kelsey et al. 2003, Fung et al. 2011).
Throughout, each parameter value is represented as a bit string. Mutation of a parameter value
consists of random changes in a subset of bits – this allows the possibility of large changes in the
parameter value, which helps to prevent the algorithm from being stuck at a local extremum.
When applying the BCA to either one of the two tree community models used in this paper (one
with and one without environmental variance), the function F is chosen to be the negative of the
estimated likelihood given by (S.10) when it is non-zero. However, when the estimated
*
likelihood is zero, F is set to min { ra } , where { ra } is the set of all r Z , Z values calculated
(
)
from all simulated time intervals a (each of length equal to the period over which all the
observed censuses were conducted) corresponding to steady-state model dynamics produced
using a given parameter set. The reason for this is to help the BCA converge to a region of
parameter space with a non-zero estimated likelihood, which occurs when min { ra } £ e . Thus,
this specification of F ensures that the BCA maximizes the estimated likelihood in an efficient
way. For the full model with environmental variance, the parameter space X explored has k = 5
dimensions, corresponding to the five parameters g , F1 , F 2 , I and q (see Table S1 in Online
Appendix S3 for definitions). For the neutral model, X has k = 3 dimensions, corresponding to
the three parameters M r , I and q (see Online Appendix S3 for definition of M r ).
g , F1 , F 2 and M r were decimal real numbers represented by 64 bits, following Kelsey et al.
(2003). Since g has a range [g min , g max ] that spans negative and positive numbers for the BCI and
Pasoh with a DBH threshold of 10 cm (see Appendix S3: Table S1), values of g were increased
by g min before being represented by bits and mutated. g min was then subtracted from the value
represented by the mutated bit string. This ensured that all values in [g min , g max ] could potentially
be reached. In effect, the parameter g was replaced by the parameter g ¢ = g + g min with a range
of éëg min + g min , g max + g min ùû. Similarly, values of M r were increased by M r,min before being
represented by bits and mutated, with M r,min subtracted from the mutated value. I is an integer
and was represented by nI bits, which is the minimum number of bits required to represent the
largest number in the range of I . q is the fundamental biodiversity number that determines the
shape of the expected log-series metacommunity SAD. For each value of q , a set of 1,000
metacommunities each with SM species was constructed according to the log-series SAD and
then used to calculate the expected metacommunity abundance of each species (Online Appendix
S3). SM was specified by equation (S.2) in Online Appendix S3 and was assumed to be two to
four times the local plot richness (Online Appendix S3), corresponding to ranges of [ 600, 1, 200]
and [1, 600, 3, 200] for the BCI and Pasoh plots respectively. Generation of expected
metacommunity abundances was a time-consuming process – for example, it takes 2 h with
SM =1, 000 . Therefore, for the BCI parameterizations, only four values of q within the plausible
range were considered in each application of the BCA, corresponding to four values of SM : 600,
800, 1,000 and 1,200. In effect, q was replaced by an integer-valued parameter q ¢ with a range
of [1, 4] . q ¢ was represented by three bits, the minimum required to represent the maximum
value of four. Similarly, for the Pasoh parameterizations, only nine values of q within the
plausible range were considered in application of the BCA, corresponding to nine values of SM :
1,600, 1,800, 2,000, 2,200, 2,400, 2,600, 2,800, 3,000 and 3,200. In effect, q was replaced by an
integer-valued parameter q ¢ with a range of [1, 9 ] . q ¢ was represented by four bits, the minimum
required to represent the maximum value of nine.
Kelsey and Timmis (2003) found that using a population of W = 3 parameter sets with w = 3
clones for each set in each generation gave efficient convergence for 12 functions with up to 20
parameters. Thus, we used W = 3 and w = 3 for each run of the BCA in our study.
Literature Cited for Online Appendix S4
Beaumont, M. A. 2010. Approximate Bayesian Computation in Evolution and Ecology. Annual
Review of Ecology, Evolution and Systematics 41:379–406.
Fu, Y.-X., and W.-H. Li. 1997. Estimating the age of the common ancestor of a sample of DNA
sequences. Molecular Biology and Evolution 14:195–199.
Fung, T., R. M. Seymour, and C. R. Johnson. 2011. Alternative stable states and phase shifts in
coral reefs under anthropogenic stress. Ecology 92:967–982.
Geyer, C. J. 1991. Markov Chain Monte Carlo maximum likelihood. Pages 156–163 in
Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E. M.
Keramidas, Ed.). Interface Foundation, Fairfax Station, USA.
Geyer, C. J. 1992. Practical Markov Chain Monte Carlo. Statistical Science 7:473–483.
Geyer, C. J. 2011. Introduction to Markov Chain Monte Carlo. Pages 3–48 in Handbook of
Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Eds.).
Chapman & Hall/CRC, Boca Raton, USA.
Kalyuzhny, M., R. Kadmon, and N. M. Shnerb. 2015. A neutral theory with environmental
stochasticity explains static and dynamic properties of ecological communities. Ecology Letters
18:572–580.
Kelsey, J., and J. Timmis. 2003. Immune inspired somatic contiguous hypermutation for function
optimisation. Pages 207–218 in Genetic and Evolutionary Computation – Gecco 2003, Pt I,
Proceedings. Springer-Verlag, Berlin, Germany.
Kelsey, J., J. Timmis, and A. Hone. 2003. Chasing Chaos. Pages 413–419 in Proceedings of the
2003 Congress on Evolutionary Computation. IEEE Computer Society, Washington, USA.
MacEachern, S. N., and L. M. Berliner. 1994. Subsampling the Gibbs Sampler. The American
Statistician 48:188–190.
Pritchard, J. K., M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman. 1999. Population growth
of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and
Evolution 16:1791–1798.
R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria.
Volkov, I., J. R. Banavar, S. P. Hubbell, and A. Maritan. 2003. Neutral theory and relative
species abundance in ecology. Nature 424:1035–1037.
Volkov, I., J. R. Banavar, S. P. Hubbell, and A. Maritan. 2007. Patterns of relative species
abundance in rainforests and coral reefs. Nature 450:45–49.
Weiss, G., and A. von Haeseler. 1998. Inference of population history using a likelihood
approach. Genetics 149:1539–1546.