Diversity: between neutrality and structure

OIKOS 112: 392 /405, 2006
Diversity: between neutrality and structure
Salvador Pueyo
Pueyo, S. 2006. Diversity: between neutrality and structure. / Oikos 112: 392 /405.
Here I present an integrated framework for species abundance distributions (SADs)
that goes beyond the neutral theory without relying on complex mechanistic models. I
give some general mathematical results on the relationship between SADs and their
underlying dynamics, and analyse an extensive set of marine phytoplankton data in
order to test the neutral theory against this broader framework.
The main theoretical and empirical results are: (i) the logseries, which is the
SAD produced by simple neutral models without migration, is quite robust in
response to additional factors, including some forms of niche segregation; (ii) when
there is a small but significant deviation from a logseries, the SAD will generally
have the form of a power law, regardless of the specific mechanisms; (iii) when
the deviation is moderate, the SAD will generally have the form of a lognormal,
regardless of the specific mechanisms; (iv) although in a wide range of situations
neutral and non-neutral dynamics cannot be distinguished from the SAD alone, some
empirical SADs do have the fingerprint of non-neutrality: this is the case of marine
dinoflagellates, in contrast to marine diatoms, which adjust to neutral theory
predictions. The results for marine phytoplankton illustrate that both neutral and
non-neutral mechanisms coexist in nature, and seem to have different weights in
different groups of organisms.
In addition to the above findings, I discuss several related contributions and point
out some important pitfalls in the literature.
S. Pueyo, Dept. d’Ecologia, Univ. de Barcelona, Avgda. Diagonal 645, ES-08028
Barcelona, Catalonia, Spain ([email protected]).
Ecologists are currently engaged in a strong controversy
(Whitfield 2002) about the unified neutral theory of
biodiversity and biogeography (Hubbell 2001). This
theory maintains that the patterns of diversity must be
explained without considering differences between species (reviewed by Chave 2004).
At present, the neutral theory is centred on two
aspects of diversity patterns (Hubbell 2001): (i) the
proportions between different species, as captured by
the statistical distribution of n, which is the abundance
of a species chosen at random among those represented
in a sample taken from a natural community, and (ii) the
species /area relationship. Here I address species abundance distributions (SADs). An extension of my approach to species /area relationship (SAR) can be found
in Pueyo (in press).
Many possible SADs have been proposed in the
ecological literature (May 1975, Engen 1978, McGill
2003a). Among these, it is particularly frequent
to choose either the lognormal (Preston 1948) or the
logseries (Fisher 1943) for fitting empirical data. These
two distributions were introduced on empirical
grounds and have proved quite useful, but the reasons
for their success remain unclear. It has generally been
thought that the proportions between the abundances
of different species have a direct relation with the
proportions between the sizes of their ecological niches:
SADs would thus reflect a rather rigid ecosystem
structure. At the other end, some authors have maintained that SADs result from random fluctuations in
species abundances. The neutral theory belongs to this
second category.
Accepted 4 July 2005
Copyright # OIKOS 2006
ISSN 0030-1299
392
OIKOS 112:2 (2006)
Following Chave (2004), we can distinguish two types
of neutral community models, depending on whether or
not these are spatially structured. Here I call the neutral
community models without space ‘‘simple neutral models’’ (SNMs).
Watterson (1974) introduced SNMs in ecology in a
review in which he imported several models from
population genetics literature. This approach was later
divulgated and expanded by Caswell (1976). SNMs have
three ingredients:
. Ecological drift (in Hubbell’s terms): changes in
species abundances caused by stochastic reproduction
and death events that have the same rules for all
organisms, regardless of species (reproduction is
assumed asexual).
. Regulation of total community size, equally affecting
all organisms regardless of species.
. Rare speciation events consisting of the introduction
of an organism that belongs to a new species.
Watterson (1974) showed that the SNMs that he
reviewed produced a logseries SAD (given a large
number of individuals N and species S).
Hubbell (2001) introduced a spatially structured
model with two levels of integration: community and
metacommunity. A metacommunity is a large set of local
communities and has the dynamics of an SNM (Watterson’s ‘‘model 2’’, which is a limiting case of a model
by Karlin and McGregor 1967). Therefore, at a large
spatial scale, Hubbell expects an SAD close to a
logseries. A community has the same dynamics except
that, instead of rare speciation events that give rise to
new species, there is a given rate of immigration m of
organisms that belong to a finite number of species and
have a logseries distribution of abundances (because they
are assumed to come at random from any site in the
metacommunity). Hubbell calls the resulting SAD a
‘‘zero-sum multinomial’’ (ZSM). This distribution was
analytically formalized by Volkov et al. (2003), Vallade
and Houchmandzadeh (2003), McKane et al. (2004) and
Alonso and McKane (2004). The ZSM equals a logseries
for large m, and is more ‘‘humped’’ for small m, like a
lognormal.
The empirical support for Hubbell’s (2001) unified
neutral theory lies in the similitude between empirical
SADs and the ZSM and that between empirical SARs
and those in a more elaborate version of his model, with
explicit space. In particular, the main argument used by
Hubbell (2001) is that the ZSM would fit two large
samples of tropical forest trees better than the lognormal. However, McGill (2003b) carried out several
statistical tests, the results of which did not support
Hubbell’s claim. Volkov et al. (2003) responded with a
new statistical analysis that seemed to support the ZSM
over the lognormal, but Pueyo (unpubl.) shows that their
OIKOS 112:2 (2006)
analysis was incorrect and, as far as we know, the ZSM
and the lognormal fit the data equally well.
Many ecologists are uncomfortable with the neutral
theory because there is much evidence of ecological
niches and other factors that are overlooked by the
theory, and it is difficult to believe that these are
irrelevant for the global features of ecological communities. Hubbell himself (2001) expects a future unification of the neutral and niche theories, although
meanwhile he and his collaborators focus on defending
the former (Hubbell 2003, Volkov et al. 2003, 2004).
It is not clear how this unification can be achieved.
The introduction of an increasing number of parameters
in the models will not produce a fundamental theory. In
addition, even though the distribution predicted by the
neutral theory does not fit tropical forest tree SADs
better than the lognormal, as far as we know it does fit
them, and this must be explained by opponents of the
theory.
An idea gaining momentum is that both neutral and
non-neutral models will give rise to similar SADs. This
was argued by Chave et al. (2002) on the basis of a series
of simulations, and also by Mouquet and Loreau (2003).
McGill (2003a) outlined a theoretical framework for this
point of view. He proposed that SADs would generally
belong to a class of statistical distributions that he called
POLO, which would cover both the lognormal and the
power law (or Pareto, or Zipf-Mandelbrot distribution).
He classified these two distributions together because the
lognormal becomes a power law at the limit of infinite s
(Montroll and Shlesinger 1982; s is the standard
deviation of the variable after taking logarithms). He
conjectured that ‘‘any complex theory that invokes
multiplication of complex factors will produce POLOlike distributions’’, so ‘‘almost any theory will match
almost any data as long as we only look at the shape of
the distribution’’.
This line of thought is enlightening but still lacks an
analytical foundation. It also lacks a strong empirical
justification of the need to transcend the neutral theory,
as would be given by a test that clearly rejects neutrality
in favour of a broader alternative such as the above
mentioned. These are the two main points that I address
in this paper, which includes a theoretical and an
empirical part.
In the theoretical part I tentatively present an
integrated framework for diversity patterns, focusing
on SADs. I analyse the SADs to be expected under a set
of ‘‘minimal assumptions’’. This must not be confused
with the use of ‘‘minimal models’’ as in the neutral
theory. Neutral models can be considered ‘‘minimal’’ in
the sense that they have few parameters. However, the
simplicity of neutral models is not due to minimal but to
major assumptions: the tuning of each of the parameters
that there would be in a more complex model to zero.
Instead, in my approach, zero or any other specific value
393
for any possible parameter is not assumed. My main
assumption is that the contribution of these additional
parameters to the SAD is small to moderate.
In the empirical part, I analyse an exceptionally large
set of marine phytoplankton data, in the light of the
integrated framework proposed. In particular, I test
whether or not these data are consistent with the neutral
theory.
The final discussion includes a critical review of
related contributions. I describe some pitfalls in the
explanations of the SADs given by Bell (2000, 2001) and
Pachepsky et al. (2001). I also show how the integrated
approach sheds light on some of the main controversies
in the literature, such as those related to Preston’s (1962)
‘‘canonical’’ distribution or the possible ‘‘left skew’’ of
SADs as compared to a lognormal.
Methods
This paper presents theoretical and empirical results on
species abundance distributions (SADs). There is a
methodological choice crucial for both types of results:
the formalism that we use to capture the information
held in an empirical sample. The relevance of this choice
for the empirical part is clear. In my approach, this
choice is equally relevant for the theoretical part,
because my theoretical research explicitly addresses the
fraction of reality that is empirically observable by
means of a sample.
In most scientific fields, the statistical distribution of a
given variable x is represented either as a set of
probabilities {p(x)} when x is discrete, or as a continuum
of probability densities {f(x)} when x is continuous.
Since species abundances n are discrete, we may consider
estimating the set of discrete probabilities {p(n)} by
means of a common histogram. However, there is a
drawback with this method. As noted by other authors
(McGill 2003a) and clearly supported by the results in
this paper, SADs approach the power law statistical
distribution, also called Pareto or Zipf-Mandelbrot
p(n)8nb
(1)
(8means
/
‘‘proportional’’), with b not far from unity. The
power law is a heavy-tailed distribution that differs
considerably from common ‘‘textbook’’ statistical functions, and makes a common histogram inappropriate in
most cases. In a histogram, most species are concentrated in a few bins at the lower end of the distribution,
followed by a queue of bins with one or a few sparse
species among many empty bins. Instead of the equalsized bins of a common histogram, unequal sizes must
be used in order to obtain a more homogeneous number
of data per bin. The probabilities of bins of varying sizes
can be compared if these are previously standardised by
dividing by the size of the bin. The result obtained is the
394
density of probability for each bin. Therefore, in spite of
the discrete nature of abundances, it is more useful to
represent probability densities {f(n)} than discrete probabilities {p(n)}.
In particular, when dealing with power laws, it is
convenient to use multiplicative intervals for the bins
and to represent the results in a log /log scale. This
produces a plot with an array of equally-spaced spots
arranged in a straight line and with quite homogeneous
error bars. These are completely homogeneous for b/1
in Eq. 1. Multiplicative intervals have the form [lj ; lj1 )
for some constant l and a series of discrete j beginning
with j /0. I use l/2 because this is the minimum l for
which each species unambiguously belongs to a single
bin, considering that abundances are in fact discrete.
For each bin j, the (logarithmically) central value is
1
pffiffiffiffiffiffiffiffiffiffiffiffi
j
nj 2j 2j1 2 2 ; and its estimated probability density
s
ˆ j ) 1 j ; where sj is the number of species in bin j,
is f(n
j
2 S
S is the total number of species and 2j is the width of
the bin.
In his well known representation, Preston (1948) also
took multiplicative intervals with l /2, but with three
differences: (i) the probability of each bin was calculated
without standardising by size, (ii) the probabilities were
represented in an arithmetic scale, and (iii) overlapping
intervals were taken for successive bins. The reason for
the first two differences is that Preston’s representation
was designed in such a way to obtain the shape of a
Gauss bell for a lognormal SAD. Therefore, this might
be considered a method specifically designed for working
with the lognormal hypothesis rather than a generalpurpose method. The third difference is inconvenient for
statistical tests such as x2 because it eliminates the
independence in statistical error between bins.
Appendix A gives details on methods of parameter
estimation and hypothesis testing.
Theory
An integrated framework for species abundance
distributions
The meaning of the logseries
In this subsection I investigate the range of circumstances in which a logseries SAD can be expected, and in
next subsection I investigate the SADs to be expected
when these circumstances are not met.
Watterson (1974) showed that simple neutral models
(SNMs) with a large number of species S and individuals
N produce a logseries. This is the reason why Hubbell
(2001) expects this SAD at a metacommunity level.
While neutrality leads to the logseries, to which extent is
the logseries an evidence of neutrality?
OIKOS 112:2 (2006)
The logseries distribution has the form
f(n)kn
1 fn
(2)
e
Equation 2 has two parts. The first is a power law
f(n)knb
(3)
community size
with b/1. The second is an exponential function.
Appendix B shows that these two parts are the independent outcome of two distinct mechanisms: the power law
with b/1 is the result of ecological drift, while the
exponential results from the regulation of total community size. These two mechanisms are not equally
important for all species abundances, as shown by
Fig. 1, which is the probability density function (p. d.
f.) of a logseries with realistic parameters (those fitted to
Mediterranean diatoms in the ‘‘Case study’’ below). In
this case, the logseries (solid line) overlaps with the
power law (dotted line) for most of its range. This
‘‘power law’’ region becomes larger when sample size
increases. The exponential part of the logseries is
apparent only at the upper end of the distribution.
This implies that, in SNMs, the trajectory of the
abundance of each species is essentially governed by
drift, and only when a species becomes exceptionally
abundant is it clearly affected by the limited size of the
community. The main role of the exponential part is to
-1
f(ni )ki nb
i
-3
i
i. e. it reduces to Eq. 3. In SNMs, every species has the
same ki, but Eq. 5 implies that the SAD to be expected
may be quite similar without requiring this assumption.
Three outstanding situations in which we will find a
power law (Eq. 2) with b/1 in some range of
abundances are thus (for large N and S):
β
w,
la
=
1
3)
b
fu end
nc in
tio g
n
-6
-7
0
1
2
3
4
5
log10(n)
Fig. 1. Probability density function of a logseries distribution
(solid line) with realistic parameters, with indication of its two
components: (i) the power law (dotted line) with slope parameter b /1, which results from ecological drift in simple neutral
models (SNMs), and (ii) the bending function (arrow), which
results from the regulated community size (short dash line) in
these same models.
OIKOS 112:2 (2006)
(5)
i
1)
2)
-5
i
X
[f(ki )ki ] nb knb
er
-4
(4)
for ni [n0 ; nM ]
Now take a set of species that satisfy Eq. 4, with ki either
equal or different for each species. The probability
density of an abundance n for one such species chosen
at random is
X
X
f(n)
[f(ki )f(njki )]
[f(ki )ki nb ]
w
po
log10(f(n))
-2
set an upper bound, resulting from community size, to
the power law. In the physical literature, a function with
this role is called a bending function. Therefore, a
logseries is a power law distribution with b /1 and an
exponential bending function.
In a different context, Mandelbrot (1963) showed that
the power law has a special property that makes it robust
in response to factors other than its generating mechanism. Let us call this property ‘‘invariance under assemblage’’. Take a given range of abundances [n0, nM].
Assume that the dynamics of a species i in this range is
governed by drift. Then the density of probability f(ni) of
finding an abundance ni at time t is
SNMs.
When the reproduction rate nearly equals the
mortality rate for each species but these rates differ
between species, while the premises of SNMs are
satisfied in other respects. Then the abundance of
each species will fluctuate at different speed, but
each will satisfy Eq. 4 and the ensemble will satisfy
Eq. 3.
When there are several guilds with many species in
each, in such a way that the niche of each guild
does not overlap with the others, but there is full
niche overlap within each guild, and the dynamics
of the set of species in the guild agrees with either
point 1 or point 2 above.
Case 3 above will render an SAD really close but not
necessarily equal to the logseries, because the principle of
invariance under assemblage does not extend to the
bending function.
More generally, when there is an appreciable overlap
of niches among different species but this does not have
the simple form in the third of the above cases, the
species will still undergo wide fluctuations driven by
drift. However, this will not be pure drift, but drift
395
modulated by the influence of niches. Therefore, the
outcome may still be close to a logseries, but will differ
more clearly. This divergence will be larger the stronger
the niche segregation, or any other factor besides niche
segregation and drift. In next subsection I show what
happens in this case.
Of course, in addition to the extension of SNMs in
this section, we cannot rule out that a completely
different mechanism may lead to the logseries or a
similar distribution (e. g. by producing a logseries niche
structure).
When the logseries is not enough
In the previous subsection, I show that the logseries
distribution produced by SNMs is robust in response to
the presence of factors that are not contemplated in such
models, but the robustness is not unlimited. Here I study
which SADs are to be expected when these additional
factors are strong enough for the distribution to differ
significantly from a logseries.
If we have a given range of abundances fully dominated by ecological drift, the probability density function
(p. d. f.) will have the form
(10)
When there is only ecological drift (Eq. 7), Dp/0.
Otherwise, we can perform the expansion
Dr(n)
X
cj [Dlog(n)]j
(11)
j1
(8)
where {z} are the parameters for niche segregation,
environmental noise, relative fitness, life history traits,
compensatory mortality, Allee effect, migratory exchanges, anthropogenic influences, non-stationarity,
etc., and for the variation in these factors between
different species.
At this point there are several possible strategies.
Hubbell’s (2001) model assumes zi /0 for all i except
one. The chosen parameter is a measure of the overall
intensity of migratory exchange, which Hubbell tries to
fit from the SAD. Another option is to attempt to model
all of the processes that affect the dynamics of the
community and give a reasonable value to each parameter involved. This would be extremely difficult, and we
might eventually find that the addition of so much
396
(9)
ρ(n γ)
(7)
In this way, the significance of a given r does not depend
on n. As stated in the Methods section, the error bars for
f0 are homogeneous in this representation.
The deviations r may result from a number of factors,
i. e.
r(n)r(n; z1 ; z2 ; :::; zv )
Dr(n)r(n)r(ng )
(6)
which corresponds to a straight line in a log /log plot,
with slope /1. If there are some other factors of
importance for the p. d. f. f in a community, there may
be some deviation from this straight line. The only
relevant deviations are those statistically significant.
Therefore, it is convenient to measure these deviations
r in terms of the log /log plot,
r(n)log[f(n)]log[f 0 (n)]log[f(n)]log[kn1 ]
Dlog(n)log(n)log(ng )
ρ(n)
f 0 (n)kn1
complexity does not produce results that differ qualitatively from those obtained with much simpler models.
This is precisely what opens the door to the alternative
presented here: the development of an approach to SAD
based on those properties of r that do not depend on the
particular values taken by the set of parameters {z}.
If the sample is small, the observable range of n will
also be small and the occurrence of significant deviations
r is more unlikely. As the sample increases, the range of
n will increase and the deviations will become progressively larger. If the new shape that emerges is well
matched by a continuous and indefinitely derivable
function, the set of deviations can be decomposed into
a Taylor series.
Let us choose an arbitrary ng in the range of
observable abundances and define (Fig. 2):
log(n γ) log(n)
Fig. 2. Graphical representation of the function r, which I use
to express the probability density function of abundances (solid
line) as a modification of a power law with b /1 (dotted line).
OIKOS 112:2 (2006)
laws are subject to the principle of invariance under
assemblage, as explained in the subsection ‘‘The meaning
of the logseries’’ above, which makes this result robust.
If two terms in Eq. 11 are required instead of one
(j /1, 2), a lognormal is obtained from Eq. 7 and 9 /11
Taylor series such as Eq. 12 are often used in physics for
simplifying complex functions whose details are not well
known. Since, when Dlog(n) is small, [Dlog(n)]j vanishes
for large j, we often find that a few terms with small j
suffice for fitting the observable range of a complex
function. This procedure lies at the basis of results as
important for physics as for example the principle of
minimum entropy production (Nicolis and Prigogine
1977).
For a better understanding of the Taylor series
approach, I illustrate its application with a large sample
of Mediterranean marine phytoplankton (Margalef
1994), which I analyse in detail in the next section.
Fig. 3a compares the empirical SAD with the straight
line that would correspond to Eq. 6. The empirical SAD
is close to what we would expect from neutral fluctuations alone, but there are indeed other factors at work.
If the observable range is small in view of the strength
of the factors other than ecological drift, it is likely that
we require only one term in the expansion in Eq. 11,
which corresponds to j /1. Then, from Eq. 7 and 9 /11
we obtain a power law (Eq. 3) with b"1: More
specifically,
f(n)8n1 e
c (n )
m 12c g log(ng )
(14)
2
s2 1
2c2
In Eq. 14, c1 appears as a function of ng, unlike in
Eq. 12. This occurs because c1 is a measure of slope in
the log /log plot. In Eq. 12 I assume a straight line
(Fig. 3b), so c1 is a constant. In contrast, in this second
approach I assume a curve (Fig. 3c), so the slope
depends on the point where it is measured. However,
the term log(ng) cancels this effect and the resulting
parameter m does not depend on ng.
Fig. 3c shows that the lognormal fits our data even
better than the power law.
The intuition behind our mathematical result is that
(i) in addition to increasing or decreasing the number of
species of small abundance as compared to those of large
abundance, the simplest thing that any additional factor
can do is slightly increase or decrease the number of
species of intermediate abundance as compared to those
of small or large abundance, (ii) the simplest way for
capturing such deviation is by introducing a degree of
curvature by means of a quadratic term, and (iii) if the
range of n is small enough, there may not be room for
further involved outcome. Again, the simple fact that an
SAD is lognormal will not provide information on the
relative importance of each of the mechanisms involved.
One point must be clarified in relation to the above
developments. In the subsection ‘‘The meaning of the
logseries’’ above, I distinguish two domains in
the logseries distribution produced by SNMs (Fig. 1).
We preserve the straight line in the log /log plot, but
inclined. Fig. 3b shows that this suffices for quite a good
fit to the empirical data (in this case, b:1:2):
The intuition behind our mathematical result is that
(i) the simplest thing that any additional factor can do is
slightly increase or decrease the number of species of
small abundance as compared to those of large abundance, (ii) the simplest tool for capturing one such
deviation is a straight line (this is the reason why linear
regressions are so widely used in all fields), and (iii) if the
range of n is small enough, there may not be room for a
further involved outcome. Therefore, when a power law
with b"1 fits our empirical data well, as we see in Fig.
3b, we cannot determine which of the ecological factors
corresponding to different zi are responsible, because
several will primarily have the same outcome. A measure
of their combined strength is b /1. Furthermore, power
-1
(13)
(/8 means ‘‘proportional’’), satisfying
(12)
b1c1
1 log(n)m 2
2
s
a
b
c
OIKOS 112:2 (2006)
log10 (f(n))
-2
Fig. 3. Incorporation of
successive terms of the Taylor
expansion in Eq. 3 for fitting the
Mediterranean phytoplankton
species abundance probability
densities. a, 0th order approach,
which corresponds to a power
law (Eq. 1) with b/1. b, 1st
order approach, which
corresponds to a power law with
arbitrary b (in this case, b:1:2):
c, 2nd order approach, which
corresponds to a lognormal.
-3
-4
-5
-6
-7
0
1
2
3
log10 (n)
4
5 0
1
2
3
log10 (n)
4
5 0
1
2
3
4
5
log10 (n)
397
The first domain covers most of the range of n in large
samples and consists of a power law with b/1, resulting
from ecological drift. The second domain is a deviation
from the power law at the upper end of the distribution,
produced by the exponential bending function caused by
the finite size of the system. The previous developments
in this section refer to the modification in the first
domain when we add other mechanisms that we assume
to be relevant but with less impact on the SAD than
ecological drift. These results do not directly affect the
bending function. The precise shape of this function will
be modified in an uncertain manner with these added
factors, but we may assume that it will not change in the
essential, i. e. there will be a decrease in probabilities at
the upper end of the distribution because of the finite
size of the system. The power law with b"1 that we
found will thus display an upper bound: this could not
be otherwise, because, without such a bound, the
expected abundance would be infinite for any bB/2
(Mandelbrot 1983). In principle, also the lognormal that
we obtain in this manner is a lognormal with an upper
bound. We can often overlook this bound, because a
lognormal decays by itself faster than a power law and
has a finite expectation, but, on the other hand, decays
more slowly than an exponential bending function.
Whenever the bound is unduly overlooked when fitting
the parameters of the lognormal, there will be a
decoupling between empirical and expected SAD, with
an ‘‘excess’’ rarity in the first, which may be perceived as
a left skew in Preston’s representation.
Further generalization
In the previous subsections I show that, when an SAD is
shaped mainly by ecological drift but other mechanisms
also have some relevance, this SAD is likely to be either a
power law or a lognormal. However, the conditions that
lead to lognormality are even broader. We will generally
find a lognormal when there is a combination of:
1)
2)
One or several mechanisms that, in isolation, would
produce a power law SAD. This is the case of
ecological drift, but also of simple forms of
environmental noise (Appendix B) or a scaleinvariant niche structure (Morse et al. 1985).
One or several mechanisms that slightly favour
intermediate abundances. This is, for example, the
case of migration, compensatory mortality or some
possible forms of non-scaling niches.
The sequence in which I introduced the series of
analytical steps in this section is based on the hypothesis
that it is mainly because of ecological drift that SADs
approach a power law, and that the other factors
introduce minor variations on this theme. However, it
must be pointed out that there are other possible sources
for the power law and thus for the lognormal. Some of
398
these mechanisms produce power laws with b"1 without the requirement of additional factors. For example,
in the extreme case of population fluctuations driven
entirely by environmental noise with a completely
different effect on each species, a power law with b/2
would be obtained (Appendix B).
Case study: a glimpse at marine phytoplankton
In the light of the integrated approach outlined in the
previous sections, here I analyse two large sets of marine
phytoplankton data published by Margalef (1994), one
from the Mediterranean and the other from the Caribbean. Each set results from grouping more than 1000
samples from a number of sites in a large area, so they
are expected to capture the features of their metacommunities, according to Hubbell (2001). There are 162 478
identified cells in the Mediterranean set and 883 352 in
the Caribbean. This is an exceptional amount of data:
both sets exceed the size of the sample taken by Siemann
et al. (1996, 1999), which the authors claimed to be the
most thorough sample of an ecological community to
date, and also the sample sizes in Hubbell (2001).
Details of the data sets are shown in the upper part of
Table 1. All the samples were taken in the photic zone
(down to 110 m). The Mediterranean samples were
obtained at several sites in the Catalan sea, while the
Caribbean samples were obtained along the eastern
coast of Venezuela. I considered only the cells identified
to species. For each sea, I studied three sets: (i) the
complete set including all the groups of phytoplankton,
(ii) diatoms, and (iii) dinoflagellates. These two last
groups are important because most cells belong to one of
these and, in addition, they are the only ones exhaustively identified to species level. It is of interest to analyse
them separately because their differences are not only
taxonomic but also ecological (Margalef 1978). In each
case, I represent the empirical probability density function (p. d. f.) as explained in the Methods section, and
perform the following operations: (i) the logseries is
fitted by maximum likelihood estimation and its adequacy tested, (ii) the power law slope parameter b is
estimated by simple regression for the whole range of
abundances, and (iii) b is fitted by maximum likelihood
estimation in the interval of abundances [10, 1000),
which displays no significant deviation from a power law,
and the 90% confidence intervals are quantified. I then
examine the goodness of fit to the distributions with
standard chi-square tests (which might however suffer a
slight bias in favour of the null hypothesis when applied
to SADs, according to recent results by Alonso and
McKane 2004). Appendix A gives the procedures for this
set of statistical treatments.
Table 1 shows the results. Fig. 4 compares the
empirical SADs with the logseries, and Fig. 5 displays
OIKOS 112:2 (2006)
Table 1. Statistics of marine phytoplankton diversity. NT: total sample size, including cells identified and not identified to species
level. N: sample size without unidentified cells (these were not used in the analyses). S: number of species identified. b: ‘‘slope’’
parameter of the power law (Eq. 1). ci: confidence interval. df: degrees of freedom. o: minimum significance level that allows the
rejection of a given distribution. a: parameter of the logseries (Appendix A).
Mediterranean
All
Diatoms
Dinofl.
102 558
60 851
122
Power law in the interval [10,1000], by maximum likelihood estimation
b
1.23
1.02
1.46
90% c. i.
(1.19,1.26)
(0.91,1.07)
(1.40,1.51)
6.12
0.45
18.0
x2
df
5
5
6
o
0.29
0.99
0.21
1.20
(1.15,1.24)
2.23
5
0.82
1.02
(0.92,1.07)
4.47
5
0.48
1.44
(1.35,1.50)
6.83
4
0.15
Power law in the whole range, by regression
b
1.31
1.13
0.99
0.98
r2
1.40
0.98
1.24
0.99
1.14
0.98
1.34
0.99
Logseries
a
x2
df
o
36.7
24.3
7
0.001
24.5
51.7
14
3/10 5
10.6
28.1
14
0.014
14.6
31.0
10
6/10 4
11.7
3.9
11
0. 972
14 055
10 874
209
All
779 347
759 794
118
42.8
52.5
11
B/10 6
116 409
112 352
107
Dinofl.
1 113 581
883 352
257
NT
N
S
197 535
162 478
353
Diatoms
Caribbean
the set of power laws that best fit the whole range of
abundances. The logseries can definitely be rejected for
the phytoplankton as a whole and for dinoflagellates in
particular, in both seas. Only Mediterranean diatoms
strictly adhere to the logseries, while those in the
Caribbean allow this distribution to be rejected but not
as strongly as dinoflagellates and phytoplankton in
general. For all the sets, the overall SADs are close to
power laws. In the range [10,1000), diatoms have b:1:0
in both seas, as corresponds to a logseries. Dinoflagellates have b:1:45 in both seas, while phytoplankton as
a whole displays b:1:2; also in both.
Discussion
Neutral theory, niche theory and the integrated
framework
In this paper I present an integrated theoretical framework for diversity patterns. While recognising the great
importance of the mechanisms highlighted by the
neutral theory (random drift, community-level regulation, migration), I consider that there are many other
ecological mechanisms that must not be overlooked, and
show a simple way to incorporate them into the theory
of diversity patterns.
A key argument in this study is that many different
models will produce the same few diversity patterns, as
maintained by other authors, such as Chave et al. (2002),
McGill (2003a) and Mouquet and Loreau (2003). Therefore, a given species abundance distribution (SAD) or
species area relationship (SAR) will rarely suffice for
OIKOS 112:2 (2006)
supporting a narrowly defined model, not even the
neutral theory. On the other hand, the SADs of some
natural communities allow the neutral theory to be
rejected, in principle. In the ‘‘Case study’’ above, I show
that this applies to marine dinoflagellates (in contrast to
diatoms, whose SADs are consistent with neutrality).
The analytical findings that result from my minimal
assumptions support the conjecture by McGill (2003a):
under broad conditions, complex systems involving
multiplicative processes will render POLO-like distributions. ‘‘POLO’’ is the term that he proposes for embracing the power law and the lognormal.
Neutral models are just an instance of this type of
system. At a metacommunity level, Hubbell (2001)
expects a logseries distribution, which is a particular
case of power law distribution (with b/1 and an
exponential bending function). At a community level,
he expects what he calls a ‘‘zero-sum multinomial’’
(ZSM), which can be assimilated to a lognormal. Indeed,
the equations for the lognormal and for the ZSM differ,
and Hubbell (2001) maintains that the ZSM fits tropical
forest tree data better than the lognormal. However,
according to recent analyses (S. Pueyo, unpubl.), the
ZSM and the lognormal fit the data equally well. Since
we cannot currently distinguish between these two
distributions from empirical data, for practical purposes
a ZSM is a lognormal. When Preston (1948) and other
authors assert that a given sample from a natural
community ‘‘displays a lognormal distribution’’, what
is meant is that it ‘‘displays a statistical distribution that
cannot be distinguished from a lognormal in practice’’.
Therefore, when we search for a mechanism that
generates a lognormal, what we are actually looking
399
-1
log10 (f(n))
-2
Mediterranean
phytoplankton
Mediterranean
diatoms
Mediterranean
dinoflagellates
Caribbean
phytoplankton
Caribbean
diatoms
Caribbean
dinoflagellates
Fig. 4. Species abundance
probability densities of marine
phytoplankton (empty spots),
compared with the best-fit
logseries distribution (full spots).
-3
-4
-5
-6
-7
-1
log10 (f(n))
-2
-3
-4
-5
-6
-7
0
1
2
3
4
5
6 0
1
log10 (f(n))
2
3
4
5
6 0
log10 (f(n))
-2
2
3
4
5
6
log10 (f(n))
for (or must look for) is a mechanism that generates a
distribution that cannot be distinguished from a lognormal in practice. Hubbell gives an option for one such
mechanism: the combination of ecological drift and
migration (plus community regulation). However, these
two factors can be replaced or complemented by other
factors, which are listed in the subsection ‘‘Further
generalization’’ above. For example, there is evidence
of compensatory mortality in tropical forest trees (Peters
-1
1
log10 (f(n))
2003), and this mechanism will produce an effect on the
SAD that is difficult to distinguish from the effect of
migration.
The set of phytoplankton data examined here is
specially relevant in this context, for two reasons: the
huge amount of data, and the fact that these result from
sampling at a metacommunity level. At this level, the
distribution expected from the neutral theory has a
single parameter to fit, which implies a more specific
Mediterranean
phytoplankton
Mediterranean
diatoms
Mediterranean
dinoflagellates
Caribbean
phytoplankton
Caribbean
diatoms
Caribbean
dinoflagellates
-3
-4
-5
-6
-7
-1
log10 (f(n))
-2
-3
-4
-5
-6
-7
0
1
2
3
4
log10 (n)
400
5
6 0
1
2
3
4
log10 (n)
5
6 0
1
2
3
4
log10 (n)
5
6
Fig. 5. Species abundance
probability densities of marine
phytoplankton, with power laws
fitted by regression.
OIKOS 112:2 (2006)
prediction than at a community level and is thus
advantageous for testing the theory. The results obtained
for the Mediterranean and Caribbean are very similar,
which suggests that these have quite a general validity.
The analyses indicate that marine phytoplankton is not
neutral, at least with respect to dinoflagellates. On the
other hand, diatoms largely agree with the predictions of
the neutral theory in the Caribbean and completely so in
the Mediterranean. SADs departing from the neutral
theory expectations are reasonably well fitted by the
power law distribution, as the integrated framework
predicts for small departures from the logseries. In both
seas, dinoflagellates display a power law with a slope
parameter b:1:45; which differs significantly from the
value b/1.0, which we would expect from neutrality
and we do find in diatoms. For all phytoplankton
together, we obtain b:1:2 in both seas, which also
differs significantly from b/1.0. The power law for the
full samples results from assembling the power law for
each taxonomic group, which illustrates the principle of
invariance under assemblage (Eq. 5). In this case,
however, we assemble power laws with different b, unlike
Eq. 5, but not different enough for the result to clearly
deviate from a power law. Other authors have previously
used power laws with b"1 to fit empirical SADs.
Siemann et al. (1996) fitted their huge samples of
grassland arthropods with an expression equivalent to
a power law with b/1.5.
Marine phytoplankton SADs give evidence of nonneutrality, but this does not imply that ecological drift is
unimportant in these organisms. My theoretical developments begin with the assumption that drift is the
factor with the strongest influence on the type of shape
that SADs display, while making clear that this is not the
sole option. However, this option is plausible for
phytoplankton. According to the well-known ‘‘paradox
of the plankton’’ enunciated by Hutchinson (1961), ‘‘the
problem that is presented by phytoplankton is essentially
how it is possible for a number of species to coexist in a
relatively isotropic or unstructured environment all
competing for the same sort of materials’’. The diversity
of phytoplankton is difficult to explain from niche
segregation alone, which suggests that niche overlap
must be high. If this is the case, drift will be a key factor
for SADs.
The above results also suggest that niche overlap
might be broader in diatoms than in dinoflagellates
(which must not be confused with a total absence of
niche segregation in diatoms). This is a conjecture that
can and must be tested. At least on first inspection, it
appears to be congruent with other ecological differences
between diatoms and dinoflagellates. Diatoms are
mainly associated with mixed waters, well matched by
Hutchinson’s above description, while dinoflagellates are
more often found in stratified waters, and have traditionally been attributed the characteristics of late stages
OIKOS 112:2 (2006)
of succession (Margalef 1978), in which the community
would be more structured (Margalef 1963).
Given the number of models that produce the same
few SADs, little can be said from the SADs alone, but
our case study strongly supports a ‘‘between neutrality
and structure’’ paradigm: SADs seem to result from a
combination of ecological drift and ecosystem organisation, with these two elements having different weight in
different groups of organisms.
Critical review of some related contributions
Here I discuss a few recent contributions that explain
the origin of SADs and that do not entirely coincide
with either Hubbell’s (2001) unified neutral theory or
the integrated framework enunciated in this paper.
I also show how the integrated approach sheds
light on some old controversies in the ecological
literature.
Bell (2000, 2001) developed a neutral model that
differed from that of Hubbell. This must be considered
a simple neutral model (SNM) as defined in the
introduction, because it has the characteristic
ingredients: ecological drift, a form of global regulation
equally affecting all individuals regardless of their
species, and a process analogous to speciation. The
latter process is labelled as ‘‘migration’’ because it
is based on a finite pool of species. However, for
practical purposes it is equivalent to speciation, because
the pool of species is large, the rate of ‘‘migration’’ small,
and ‘‘migration’’ events are equiprobable for all
species instead of following a plausible metacommunity
distribution as in Hubbell’s model. Surprisingly, Bell
finds a distribution that resembles the lognormal and
Hubbell’s ZSM in that it is more humped than the
logseries. This result is due to an artifact in Bell’s
simulations. Take, for example Fig. 1 in Bell (2000).
This was obtained by prescribing an equal initial
abundance for all species and then running 2000
iterations of his model. However, 2000 iterations is not
enough to reach the final steady-state SAD. After some
tens of thousands of iterations, the SAD that results
from Bell’s model is a logseries.
Pachepsky et al. (2001) presented another dynamic
model to explain plant SADs. They explicitly introduced
many physiological traits that differ between species, but
eventually reduced this setting to a tradeoff between
fecundity and time to reproduction. This implies a
difference between species in the statistical distribution
of reproduction events, but (i) reproduction and mortality events are still random events at an individual level,
(ii) reproduction rate equals mortality rate for all species,
and (iii) there is no regulation mechanism differentiating
between species. If we take Appendix B and the
subsection ‘‘The meaning of the logseries’’ into account,
401
it is clear that the above model is a neutral model and
must lead to a logseries, like other neutral models.
However, Pachepsky et al. report a lognormal. This is
because they introduced no speciation or immigration,
so all species except one must eventually become extinct,
and what they studied is a transitory, like Bell (2001,
2002). In this case, there is a depression of probabilities
at the lower end of the range because it is there where
species become extinct without being replaced. This
model is not valid for explaining the lognormal-like
shape of SADs in nature, but is an example of how
moderate modifications of the logseries lead to the
lognormal.
Magurran and Henderson (2003) analysed the SAD of
an estuarine fish community and reached some conclusions on its origin. They examined the SAD for the
whole data set and also the SADs for the species that
had been recorded for either less or more than 10 years
out of 21. The set of long-lasting species had a
lognormal distribution. The authors thus proposed
establishing a distinction between the core species in a
community, which would have a lognormal distribution,
and occasional immigrants that usually attain low
numbers and would be responsible for the ‘‘left skew’’
as compared to the lognormal, often claimed for
empirical SADs. I reanalysed the data and found that
the full set approaches a power law with b:1:3: For
most conceivable models, either neutral or non-neutral,
sporadic species are more likely to be rare. It is not
surprising that their removal makes the shape of the
distribution more humped and that this must then be
fitted by a lognormal instead of a power law, as expected
from my Taylor series approach. The set of occasional
species still approaches a power law, but with a larger
b (/b:1:65); which is not either surprising. These
observations might suggest that the results reported by
Magurran and Henderson (2003) do not contribute to
our theoretical framework, but indeed they do, thanks to
an interesting observation in their paper: their ‘‘occasional’’ species were attributed to non-estuarine habitats
in the literature much more often than their ‘‘core’’
species. This is empirical proof of a specific non-neutral
mechanism affecting the SAD in this community. In
particular, it is the mechanism operating in the model of
‘‘source /sink competitive metacommunity’’ described by
Mouquet and Loureau (2003): community-level SADs
are affected by the presence of species with low local
fitness that reiteratively immigrate from other communities with habitats to which these species are better
adapted. There can be little doubt that this mechanism
contributes to the estuarine fish species studied by
Magurran and Henderson displaying b:1:3 instead of
the value b/1, which we would expect from the neutral
theory.
Besides the origin of the lognormal-like SAD, the
main related issues discussed in the literature (Magurran
402
and Henderson 2003) have been its seeming left skew in
Preston’s representation, and the observations by Preston (1962) himself on the ‘‘canonical’’ lognormal
(Sugihara 1980). In the subsection ‘‘When the logseries
is not enough’’, I give a possible explanation for the left
skew. The integrated approach also gives some clues on
Preston’s canonical. Preston (1948) distributed the
abundances of species by multiplicative intervals similar
to my intervals [2j ; 2j1 ) (the difference is that Preston’s
intervals overlap, but this does not affect the following
results). Preston’s best known representation of species
abundances consists of j vs the number of species in the
bin. However, in addition to this ‘‘species curve’’, he also
calculated what he called the ‘‘individuals curve’’: j vs the
sum of the abundances of the species that belong to the
bin. In several samples, he found a shape like the lower
half of a Gauss bell. Assuming that abundances are
lognormally distributed, Preston (1962) noted that this
empirical result suggested a constraint in the relationship
between the two parameters of the lognormal. He called
this particular case of lognormal the ‘‘canonical’’
lognormal. According to the integrated approach, the
lognormal SAD results from a small deviation from a
power law SAD, with a slope parameter b not far from
b /1. In the case of a power law, the number of
individuals in bin j of Preston’s ‘‘individuals curve’’
22b 1 [(2b)log(2)]j
2j1
will be Sf2j xkxb dx Sk
e
in a
2b
continuous approximation, where S is the number of
species in the sample. This function increases exponentially with increasing j if bB/2, as is the case for b close to
1. This will rule for the whole distribution range except
at the upper end, where the bending function will
produce a downward inflexion. This result is similar to
half a Gauss bell. If lognormal SADs result from small
deviations from power laws, these will often display the
effect found by Preston (1962).
Practical consequences
The analysis of an empirical SAD can be performed as
follows:
1)
2)
3)
4)
Represent the SAD as explained in Methods.
Determine the number of terms in Eq. 11 required
for fitting the SAD. As shown in the ‘‘Case study’’
above, a single factor will sometimes give a reasonable fit, which simplifies matters. On other occasions, two factors and, exceptionally more, will be
required.
Estimate the corresponding parameters. At this
step all the information in the SAD will have
probably been exhausted.
Try to find quantitative relationships between functional parameters (related to migration rate, niche
OIKOS 112:2 (2006)
segregation, compensatory mortality, etc.) and the
descriptive parameters of the SAD. This is not
possible by studying the SAD alone, because each
descriptive parameter may well be a function of more
than one functional parameter.
Supporters of the neutral theory could argue that, when
two functional parameters suffice for explaining the type
of SAD, it is unnecessary to consider other parameters.
This line of reasoning is seriously flawed: while the
inclusion of more parameters will not have a qualitative
effect on the shape of the predicted SAD (or the SAR), it
may substantially alter any quantitative prediction. For
example, the speciation rate that can sustain a given
number of species will change dramatically with just a
small niche segregation or compensatory mortality, and so
will the effects of ecosystem fragmentation.
If we reach some results at the fourth of the
above levels, we will be in a better position to predict
how different forms of anthropogenic interference will
affect diversity patterns, and perhaps also to reach a
deeper understanding of the role of diversity in ecosystems. Neutral theory will have played a pivotal role in
paving the road to this stage, but along the way it
will have to cease to be neutral. Everything seems to
indicate that the differences between species have much
greater ecological importance than neutral theory might
suggest.
Acknowledgements / I thank a number of colleagues for useful
comments and discussions: D. Alonso, E. Clavero, J. Flos,
E. Gutiérrez, D. Jou, J. Martı́nez-Alier, A. McKane, J. L. Pretus,
M. A. Rodrı́guez, R. V. Solé, and especially B. McGill and R.
Margalef. I also thank J. Flos for facilitating the continuity of
my research. I dedicate this paper to the late Ramon
Margalef, who was one of my main sources of inspiration
(and data).
References
Alonso, D. and McKane, A. J. 2004. Sampling Hubbell’s neutral
theory of biodiversity. / Ecol. Lett. 7: 901 /910.
Bell, G. 2000. The distribution of abundance in neutral
communities. / Am. Nat. 155: 606 /617.
Bell, G. 2001. Neutral macroecology. / Science 293: 2413 /2418.
Bulmer, M. G. 1974. Fitting Poisson lognormal distribution to
species /abundance data. / Biometrics 30: 101 /110.
Caswell, H. 1976. Community structure: a neutral model
analysis. / Ecol. Monogr. 46: 327 /354.
Chave, J. 2004. Neutral theory and community ecology. / Ecol.
Lett. 7: 241 /253.
Chave, J., Muller-Landau, H. C. and Levin, S. A. 2002.
Comparing classical community models: theoretical consequences for patterns of diversity. / Am. Nat. 159: 1 /23.
Engen, S. 1978. Stochastic abundance models. / Chapman and
Hall.
Engen, S. and Lande, R. 1996. Population dynamic models
generating the lognormal species abundance distribution.
/ Math. Biosci. 132: 169 /183.
Fisher, R. A. 1943. A theoretical distribution for the apparent abundance of different species. / J. Anim. Ecol. 12: 54 /
57.
OIKOS 112:2 (2006)
Frieden, R. 1985. Estimating occurrence laws with maximum
probability, and the transition to entropic estimators. / In:
Smith, C. R. and Grandy Jr., W. T. (eds), Maximum-entropy
and Bayesian methods in inverse problems. Reidel, Dordrecht, pp. 133 /169.
Hubbell, S. P. 2001. The unified neutral theory of biodiversity
and biogeography. / Princeton Univ. Press.
Hubbell, S. P. 2003. Modes of speciation and the lifespans of
species under neutrality: a response to the comment of
Robert E. Ricklefs. / Oikos 100: 193 /199.
Hutchinson, G. E. 1961. The paradox of the plankton. / Am.
Nat. 95: 137 /145.
Jaynes, E. T. 1983. Papers on probability, statistics and
statistical physics (Rosenkratz, R. D., ed.). / Reidel,
Dordrecht.
Karlin, S. and McGregor, J. 1967. The number of mutant forms
maintained in a population. / In: Proc. 5th Berkeley Symp.
Math. Statist. Prob. IV, pp. 415 /438.
MacArthur, R. 1960. On the relative abundance of species.
/ Am. Nat. 94: 25 /36.
Magurran, A. E. and Henderson, P. A. 2003. Explaining the
excess of rare species in natural species abundance distributions. / Nature 422: 714 /716.
Mandelbrot, B. 1963. New methods in statistical economics. / J.
Polit. Econ. 71: 421 /440.
Mandelbrot, B. B. 1983. The fractal geometry of nature. / W.
H. Freeman.
Margalef, R. 1963. On certain unifying principles in ecology.
/ Am. Nat. 97: 357 /373.
Margalef, R. 1978. Life-forms of phytoplankton as survival
alternatives in an unstable environment. / Oceanol. Acta 1:
493 /509.
Margalef, R. 1994. Through the looking glass: how marine
phytoplankton appears through the microscope when
graded by size and taxonomically sorted. / Sci. Mar. 58:
87 /101.
May, R. M. 1975. Patterns of species abundance and diversity. /
In: Cody, M. L. and Diamond, J. M. (eds), Ecology and
evolution of communities. The Belknap Press of Harvard
Univ. Press, pp. 81 /120.
McGill, B. J. 2003a. Strong and weak tests of macroecological
theory. / Oikos 102: 679 /685.
McGill, B. J. 2003b. A test of the unified neutral theory of
biodiversity. / Nature 422: 881 /885.
McKane, A. J., Alonso, D. and Solé, R. V. 2004. Analytic
solution of Hubbell’s model of local community dynamics.
/ Theor. Popul. Biol. 65: 67 /73.
Montroll, E. W. and Shlesinger, M. F. 1982. On 1/f noise and
other distributions with long tails. / Proc. Natl Acad. Sci.
USA 79: 3380 /3383.
Morse, D. R., Lawton, J. H., Dodson, M. M. et al. 1985. Fractal
dimension of vegetation and the distribution of arthropod
body lengths. / Nature 314: 731 /733.
Mouquet, N. and Loreau, M. 2003. Community patterns in
source /sink metacommunities. / Am. Nat. 162: 544 /557.
Nicolis, G. and Prigogine, I. 1977. Self-organization in nonequilibrium systems. From dissipative structures to order
through fluctuations. / John Wiley & Sons.
Pachepsky, E., Crawford, J. W., Bown, J. L. et al. 2001. Towards
a general theory of biodiversity. / Nature 410: 923 /926.
Peters, H. A. 2003. Neighbour-regulated mortality: the influence of positive and negative density dependence on tree
populations in species-rich tropical forests. / Ecol. Lett. 6:
757 /765.
Preston, F. W. 1948. The commonness, and rarity, of species.
/ Ecology 29: 254 /283.
Preston, F. W. 1962. The canonical distribution of commonness
and rarity. / Ecology : 185 /215 43: 410 /432.
Pueyo, S. 2006. Self-similarity in species abundance distribution
and in species area relationship. / Oikos. 112: 156 /162.
Siemann, E., Tilman, D. and Haarstad, J. 1996. Insect species
diversity, abundance and body size relationships. / Nature
380: 704 /706.
403
Siemann, E., Tilman, D. and Haarstad, J. 1999. Abundance,
diversity and body size: patterns from a grassland arthropod
community. / J. Anim. Ecol. 68: 824 /835.
Sugihara, G. 1980. Minimal community structure: an explanation of species abundance patterns. / Am. Nat. 116: 770 /
787.
Vallade, M. and Houchmandzadeh, B. 2003. Analytical solution
of a neutral model of biodiversity. / Phys. Rev. E 68:
061902.
Volkov, I., Banavar, J. R., Hubbell, S. P. et al. 2003. Neutral
theory and relative species abundance in ecology. / Nature
424: 1035 /1037.
Volkov, I., Banavar, J. R., Maritan, A. et al. 2004. The stability
of forest biodiversity. / Nature 427: 696 /697.
Wagensberg, J., López, D. and Valls, J. 1988. Statistical aspects
of biological organization. / J. Phys. Chem. Solids 49: 695 /
700.
Watterson, G. A. 1974. Models for the logarithmic species
abundance distributions. / Theor. Popul. Biol. 6: 217 /250.
Whitfield, J. 2002. Neutrality versus the niche. / Nature 417:
480 /481.
Subject Editor: Per Lundberg
Appendix A: Data analysis
The main methodological innovation in data analysis in
this paper is the type of SAD representation (Methods).
Here I add some details on parameter estimation and
intervals of confidence.
Whenever there is a truly good fit, I seek the maximum
likelihood estimator (m. l. e.) of the parameters (Bulmer
1974). Take the abundances ni for species i /1 to S in the
sample. For the statistical distribution that we assume,
these will have some probabilities {p(ni; w1. . .wq)} or
densities of probability {f(ni; w1 . . . wq)}, depending on a
set of parameters w1. . .wq. The m. l. e. consists of the
values of the parameters that maximise either ai log(p(ni ))
or ai log(f(ni )); which is equivalent to maximising the
ensemble probability of our set of abundances.
For the interval of abundances n [10; 1000); the
marine phytoplankton data in my case study is well
fitted by a continuous power law
b1
f(n) b1
nb
b1
nM
n0
where n0 and nM are the lower and upper bounds, n0 /
10 and nM /1000. In this interval, I obtain the m. l. e. of
b by iteratively searching the value b̂ that maximises
b̂ 1
Þ b̂log(g); where g is the geometric
logð b̂1
b̂1
n
n0
M
mean of the data. It is immediate to find confidence
intervals and perform contrasts of hypotheses, because
the only source of error for this estimator is the
variability in log(g), which has a Gaussian distribution
when the number of species is large.
When referring to the whole SAD instead of a
particular interval, the power law is a convenient but
inexact approximation, and the m. l. e. is not reliable.
Therefore, I estimate b by simple regression.
In the case of the logseries, I seek the m. l. e.
of Fisher’s parameter a of the logseries in its discrete
form.
p(n)kn1 efn
404
Fisher (1943) himself gave the method. It consists of
iteratively searching the values of k and f (Eq. 1) that
satisfy:
8
N
>
1
>
k
1
log
>
<
kS
>
kS
>
>
1
flog
:
N
and then calculating
a /kS.
a is a parameter independent of sample size, which
allows k and f to be obtained as a function of N.
Appendix B: why simple neutral models produce
a logseries SAD
In the subsection ‘‘The meaning of the logseries’’ above,
I state that the logseries distribution which we obtain
from simple neutral models (SNMs) has two components, a power law with b/1 and an exponential
bending function, and each has a different origin
(Fig. 1).
The origin of the power law is easy to find by
means of the diffusion equations by Engen and Lande
(1996). If we take a continuous abundance n, the
probability density function (p. d. f.) of a set of noninteracting species with the same dynamics will have
the form
n
1
2h(u)
f 0 (n)8
exp
du
(B1)
n(n)
1 n(u)
g
(from Eq. 11 in Engen and Lande 1996), where 8/
means ‘‘proportional’’, h(n) is the expected change in
the abundance of a given species in a small interval of
time when its initial abundance is n, and v(n) is the
variance of this change.
OIKOS 112:2 (2006)
In a community entirely driven by ecological drift,
h(n)0
(B2)
v(n)8n
(B3)
Equation B3 results from the fact that the variance of the
sum of a set of independent variables is the sum of their
variances (in this case, each variable is the number of
descendents of one of the integrants of a given species,
born and not dead in a short interval of time, minus one
in case the parent dies in this same interval). From Eq.
B1 /B3, we find:
f 0 (n)8n1
(B4)
In contrast, take the extreme situation of population
fluctuations driven entirely by environmental noise with
a completely different effect on each species. Environmental noise synchronises the organisms of the same
species, so v(n)8n2 : Since we assume independence
between species, we will obtain f 0 (n)8n2 :
In either of the two cases, the power law results from
random reproduction and death events. The only additional factor in SNMs to which we can attribute the
bending function is the regulation of total community size.
Why does the regulation of community size specifically produce an exponential bending function (for large
N and S)? This can be explained by applying a method
which is well known in statistical physics, with the
denomination of ‘‘maximum entropy formalism’’
(MAXENT). Interestingly, little after Jaynes in 1957
introduced this method in statistical physics (Janes
1983), MacArthur (1960) introduced it in the field of
biological diversity, with no specific denomination.
While this became an established method in statistical
physics, in the case of ecology I am aware of only some
isolated attempts of ‘‘reintroduction’’ following the work
of E. T. Jaynes (Wagensberg et al. 1988). Here I use
MAXENT, but it would not be appropriate to apply it in
its original form. Instead, I apply a generalised version
called Kullbach-Leibler norm (Frieden 1985). I do not
explain the theoretical foundations of this methodology;
however, these can be found in the above references.
In broad conditions, the generalised MAXENT
method allows the transformation of the statistical
OIKOS 112:2 (2006)
distribution f0 for a set of non-interacting entities into
the statistical distribution f to expect when we add a
constraint of the form
S
X
h(ni )k
(B5)
i1
for a constant k and a given function h. The p. d. f. that
results for large N and S is
f(n)8f 0 (n)efh(n)
(B6)
The regulation of community size in neutral models
either has the form of a zero-sum rule or is nearly
equivalent. The zero-sum rule consists of imposing a
fixed community size N, i. e. applying Eq. B5 with
h(n) /n and k/N:
S
X
ni N
i1
From Eq. B6, the p. d. f. that results from this constraint
will be:
f(n)8f 0 (n)efn
(B7)
Since f0 satisfies Eq. B4 for SNMs, we obtain the
logseries (Eq. 2).
MacArthur (1960) developed his own version of
MAXENT in order to find the SAD that results from
the zero-sum rule alone, without ecological drift. He
assumed that all abundances are equally probable a
priori, i. e. a uniform f0. It follows from Eq. B7 that the
resulting SAD is exponential. This SAD became widely
known in the ecological literature under the name
‘‘broken stick distribution’’.
I conclude that the logseries equation combines the
independent outcome of two distinct mechanisms: the
power law with b/1 results from ecological drift, and
the exponential bending function results from the
constraint on community size. The first part of the
equation is unaffected by the presence or not of this or
similar constraints, while the second part is not necessarily affected by the rules that govern the abundances of
single species in the absence of constraints.
405