Ellingsen Commonness PNAS 2014_withSI

Commonness and rarity in the marine biosphere
Sean R. Connollya,1, M. Aaron MacNeilb, M. Julian Caleyb, Nancy Knowltonc,1, Ed Crippsd, Mizue Hisanoa,
Loïc M. Thibauta, Bhaskar D. Bhattacharyae, Lisandro Benedetti-Cecchif, Russell E. Brainardg, Angelika Brandth,
Fabio Bullerif, Kari E. Ellingseni, Stefanie Kaiserj, Ingrid Krönckek, Katrin Linsel, Elena Maggif, Timothy D. O’Haram,
Laetitia Plaisancec, Gary C. B. Poorem, Santosh K. Sarkare, Kamala K. Satpathyn, Ulrike Schückelk, Alan Williamso,
and Robin S. Wilsonm
a
School of Marine and Tropical Biology, and Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD
4811, Australia; bAustralian Institute of Marine Science, Townsville, QLD 4810, Australia; cNational Museum of Natural History, Smithsonian Institution,
Washington, DC 20013; dSchool of Mathematics and Statistics, University of Western Australia, Perth, WA 6009, Australia; eDepartment of Marine Science,
University of Calcutta, Calcutta 700 019, India; fDipartimento di Biologia, University of Pisa, I-56126 Pisa, Italy; gCoral Reef Ecosystem Division, Pacific Islands
Fisheries Science Center, National Oceanic and Atmospheric Administration, Honolulu, HI 96818; hBiocenter Grindel and Zoological Museum, University of
Hamburg, 20146 Hamburg, Germany; iNorwegian Institute for Nature Research, FRAM–High North Research Centre for Climate and the Environment, 9296
Tromsø, Norway; jGerman Centre for Marine Biodiversity Research, Senckenberg am Meer, 26382 Wilhelmshaven, Germany; kMarine Research Department,
Senckenberg am Meer, 26382 Wilhelmshaven, Germany; lBritish Antarctic Survey, Cambridge CB3 0ET, United Kingdom; mMuseum Victoria, Melbourne, VIC
3001, Australia; nEnvironment and Safety Division, Indira Gandhi Centre for Atomic Research, Kalpakkam 603 102, India; and oCommonwealth Scientific
and Industrial Research Organization, Marine and Atmospheric Research, Marine Laboratories, Hobart, TAS 7001, Australia
Explaining patterns of commonness and rarity is fundamental for
understanding and managing biodiversity. Consequently, a key
test of biodiversity theory has been how well ecological models
reproduce empirical distributions of species abundances. However,
ecological models with very different assumptions can predict
similar species abundance distributions, whereas models with
similar assumptions may generate very different predictions. This
complicates inferring processes driving community structure from
model fits to data. Here, we use an approximation that captures
common features of “neutral” biodiversity models—which assume
ecological equivalence of species—to test whether neutrality is
consistent with patterns of commonness and rarity in the marine
biosphere. We do this by analyzing 1,185 species abundance distributions from 14 marine ecosystems ranging from intertidal
habitats to abyssal depths, and from the tropics to polar regions.
Neutrality performs substantially worse than a classical nonneutral alternative: empirical data consistently show greater heterogeneity of species abundances than expected under neutrality.
Poor performance of neutral theory is driven by its consistent inability to capture the dominance of the communities’ most-abundant species. Previous tests showing poor performance of a neutral
model for a particular system often have been followed by controversy about whether an alternative formulation of neutral theory
could explain the data after all. However, our approach focuses on
common features of neutral models, revealing discrepancies with
a broad range of empirical abundance distributions. These findings
highlight the need for biodiversity theory in which ecological differences among species, such as niche differences and demographic
trade-offs, play a central role.
|
metacommunities marine macroecology
Poisson-lognormal distribution
sizes, or other characteristics of an assemblage, and then deriving the aggregate assemblage properties implied by those
assumptions (4–8). One of the most commonly investigated of
these assemblage-level properties is the species abundance distribution (SAD)—the pattern of commonness and rarity among
species (9–11). Ecologists have long sought to identify mechanisms
that can explain common features of, and systematic differences
among, the shapes of such distributions, and have used the ability
to reproduce empirical SADs as a key test of biodiversity theory in
species-rich systems (4, 6, 11–14).
Over the last decade, one of the most prevalent and influential
approaches to explaining the structure of high-diversity assemblages has been neutral theory of biodiversity (12, 15, 16). Neutral
models assume that individuals are demographically and ecologically equivalent, regardless of species. Thus, variation in relative
Significance
Tests of biodiversity theory have been controversial partly
because alternative formulations of the same theory seemingly
yield different conclusions. This has been a particular challenge
for neutral theory, which has dominated tests of biodiversity
theory over the last decade. Neutral theory attributes differences in species abundances to chance variation in individuals’
fates, rather than differences in species traits. By identifying
common features of different neutral models, we conduct a
uniquely robust test of neutral theory across a global dataset
of marine assemblages. Consistently, abundances vary more
among species than neutral theory predicts, challenging the
hypothesis that community dynamics are approximately neutral, and implicating species differences as a key driver of
community structure in nature.
| species coexistence |
D
etermining how biodiversity is maintained in ecological
communities is a long-standing ecological problem. In
species-poor communities, niche and demographic differences
between species can often be estimated directly and used to infer
the importance of alternative mechanisms of species coexistence
(1–3). However, the “curse of dimensionality” prevents the application of such species-by-species approaches to high-diversity
assemblages: the number of parameters in community dynamics
models increases more rapidly than the amount of data, as species
richness increases. Moreover, most species in high-diversity
assemblages are very rare, further complicating the estimation of
strengths of ecological interactions among species, or covariation
in different species’ responses to environmental fluctuations.
Consequently, ecologists have focused instead on making assumptions about the overall distribution of demographic rates, niche
www.pnas.org/cgi/doi/10.1073/pnas.1406664111
Author contributions: S.R.C., M.A.M., and M.J.C. designed research; N.K., B.D.B., L.B.-C.,
R.E.B., A.B., F.B., K.E.E., S.K., I.K., K.L., E.M., T.D.O., L.P., G.C.B.P., S.K.S., K.K.S., U.S.,
A.W., and R.S.W. performed research; S.R.C., E.C., M.H., and L.M.T. analyzed data; S.R.C.
designed analytical approach; M.J.C. initiated the collaboration between Census of
Marine Life and S.R.C.; N.K. was the initiator of Census of Marine Life Project (producer
of dataset) for coral reefs; N.K. designed data collection protocols; B.D.B., L.B.-C.,
R.E.B., A.B., F.B., K.E.E., S.K., I.K., K.L., E.M., T.D.O., L.P., G.C.B.P., S.K.S., K.K.S., U.S.,
A.W., and R.S.W. provided advice on treatment of data for analyses; and S.R.C.,
M.A.M., M.J.C., N.K., and E.C. wrote the paper.
Reviewers: B.M., University of Maine; F.H., University of Alberta.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
To whom correspondence may be addressed. E-mail: [email protected] or Sean.
[email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1406664111/-/DCSupplemental.
PNAS Early Edition | 1 of 6
ECOLOGY
Contributed by Nancy Knowlton, April 28, 2014 (sent for review November 25, 2013; reviewed by Brian McGill and Fangliang He)
abundance among species arises purely from demographic stochasticity: chance variation in the fates of individuals (i.e., birth,
death, immigration, and speciation events). Most studies investigating neutral theory aim to determine whether community
structure in nature is consistent with the theory’s core species
equivalence assumption. This is typically done by assessing the fit
of a neutral model to empirical data, sometimes relative to a
putatively nonneutral alternative (17–20). However, although all
neutral models share the species equivalence assumption, they
differ with respect to auxiliary assumptions, such as the mode of
speciation assumed, leading to different predictions for SADs and
other ecological patterns. Indeed, attempts to draw conclusions
from tests of neutral theory are almost invariably disputed, largely
due to arguments about the extent to which alternative auxiliary
assumptions can materially alter neutral theory’s ability to explain
the data (11, 12, 18, 21).
An alternative, potentially more robust approach to evaluating
neutral theory was proposed by Pueyo (22), based on approximating neutral and nonneutral dynamics as successively higherorder perturbations of a model for the idealized case of pure
random drift in abundances. This approach predicts that a gamma
distribution should approximate the distribution of species abundances for small departures for random drift, whereas assemblages
exhibiting greater departures from neutrality should be better
approximated by a lognormal distribution. This raises the possibility that a comparison of gamma and lognormal SADs could offer
a robust test for the signature of nonneutrality in species abundance data, provided that the gamma distribution provides a sufficiently close approximation to SADs produced by neutral models.
Here, we evaluate Pueyo’s framework and apply it to patterns
of commonness and rarity in 1,185 SADs from 14 marine ecosystems ranging from shallow reef platforms to abyssal depths,
and from the tropics to polar regions (Fig. 1 and Tables S1 and S2).
First, we test the gamma neutral approximation against several
models of community dynamics that share the core neutrality
assumption of species equivalence, but make different assumptions about the speciation process, spatial structure of the metacommunity, and the nature of competition between individuals.
Then, we analyze the marine species abundance data, to evaluate
whether they are consistent with the hypothesis that marine
assemblages are neutrally structured. Finally, we ask whether patterns of commonness and rarity deviate from neutral expectation in
Fig. 1. Sampling locations of SADs. Color and symbol combinations correspond to particular ecosystems. These symbols are reproduced in the surrounding
panels, which show observed and fitted SADs for the site-level data (averaged across sites) of the corresponding ecosystem. The bars represent the mean
proportion of species at each site in different octave classes of abundance, across all sites in the corresponding dataset [the first bar represents species with
abundance 1, then abundances 2–3, abundances 4–7, etc. (10)]. The blue and red lines show the mean of fitted values from site-by-site fits of the Poissongamma and Poisson-lognormal distributions to the data, respectively.
2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111
Connolly et al.
Results
A gamma distribution of species abundances closely approximates several alternative neutral models across a broad range of
neutral model parameter values (Fig. S1; see SI Results for further discussion). Moreover, the gamma consistently outperforms
the lognormal when fitted to data simulated from neutral models. Specifically, as the number of distinct species abundance
values in the simulated data increases, the relative support for
the gamma distribution becomes consistently stronger for all of
the neutral models we considered (Fig. 2A). This reflects the fact
that datasets with only a small number of abundance values (e.g.,
a site containing 11 species, 10 of which are only represented by
one individual) provide very little information about the shape of
the SAD, whereas those with more abundance values provide
more information (e.g., a site with 100 species whose abundances
are spread over 10–20 different values).
In contrast to their relative fit to simulated neutral SADs, the
lognormal consistently outperforms the gamma distribution
when fitted to real marine species abundance data. When considered in terms of average support per SAD, relative support for
the lognormal becomes consistently stronger as the number of
observed species abundance values increases, in direct contrast
with the simulated neutral data (Fig. 2B). Moreover, when the
strength of evidence is considered cumulatively across all sites
for each dataset, the lognormal has well over 99% support as the
better model in each case (Table 1). This substantially better fit
of the lognormal is retained in every case when data are pooled
to the mesoscale, and, in all cases save one, when data are pooled
at the regional scale (Table 1, Fig. 2B, and Figs. S2 and S3). The
lognormal also remained strongly favored when we tested the
robustness of our results by classifying species into taxonomic
and ecological guilds, and restricting our analysis to the most
species-rich guild within each dataset (see SI Results and Table S3).
Inspection of the lack of fit of the gamma neutral approximation indicates that it deviates from the data in highly consistent ways: real SADs exhibit substantially more heterogeneity
than the gamma distribution can generate (Fig. 3). Specifically,
the gamma is unable to simultaneously capture the large number
of rare species and the very high abundances of the most common
Independent
Spatial
ANI
ANM
BSI
CPF
EBS
GBR
IPC
NSI
NSM
SEF
SSF
SUZ
TAP
WAF
B
0.4
0.8
A
Site
Meso
Regional
0.0
Mean Akaike weight for
lognormal over gamma
Original
Fission
Protracted
10
20
50
100 200
10
20
50
100 200
Median number of distinct abundance values
Fig. 2. Species abundances are better approximated by (A) a gamma distribution for simulated neutral communities, but (B) a lognormal distribution for the empirical data. Percentage support for the lognormal versus the
gamma is plotted as a function of the number of observed distinct species
abundances. In A, different neutral models are plotted with different colors,
and each point represents a particular neutral model parameter combination from Fig. S1. In B, each combination of symbol and color represents
a different marine ecosystem, whereas increasing symbol size indicates the increasing scale at which abundances were pooled (site, mesoscale, and regional).
Connolly et al.
species. For abundance distributions lacking an internal mode
(i.e., where the leftmost bar in the SAD is the largest one), this is
manifested as an excess of rare species and paucity of species with
intermediate abundance, relative to the best-fit neutral approximation (Fig. 3A, blue lines). Conversely, when an internal mode is
present in the data, the abundances of the most highly abundant
species are consistently higher than the gamma distribution can
produce (Fig. 3B, blue lines). In contrast, discrepancies between
the data and the lognormal are much smaller in magnitude, and
more symmetrically distributed around zero, compared with the
gamma (Fig. 3, red lines).
Detailed analysis of variation in the strength of evidence
against neutrality, within and among datasets, indicates that the
relative performance of the lognormal over the gamma is substantially driven by the fact that the most abundant species is, on
average, too dominant to be captured by the gamma neutral
approximation. After controlling for the effects of the number of
abundance values in the sample on statistical power, the relative
abundance of the most-abundant species explained over one-half
of the variation in the strength of support for the lognormal over
the gamma, for site-level, mesoscale, and regional-scale abundance distributions (Table S4, Fig. S4, and SI Results). Conversely, the prevalence of rarity was a poor predictor of the strength
of evidence against the gamma neutral approximation (Table S4,
Fig. S5, and SI Results).
In addition to outperforming the gamma neutral approximation, tests of the absolute goodness of fit of the lognormal suggest that it approximates the observed species abundance data
well. Statistically significant lack of fit (at α = 0.05) to the lognormal was detected in 4.8% of sites, approximately equal to
what would be expected by chance, under the null hypothesis
that the SADs are in fact lognormal. Moreover, lognormal-based
estimates of the number of unobserved species in the regional
species pool are realistic, and very similar to those produced by
an alternative, nonparametric jackknife method that relies on
presence–absence rather than abundance data (Fig. 4).
Discussion
Recently, the use of SADs to test biodiversity theory has been
criticized because different species abundance models often
generate very similar predictions, which can be difficult to distinguish when fitted to species abundance data (9). Consequently, some researchers have focused on other properties of
assemblages, such as community similarity (12), species–area and
species–time relationships (23, 24), and relationships between
species traits or phylogeny and species abundance (25, 26). Such
approaches are powerful when evaluating the performance of
particular species abundance models. However, because models
combine multiple assumptions, attributing a model’s failure to
one assumption in particular, such as species equivalence, is
problematic. Indeed, in the debate over neutral theory of biodiversity, studies that show failure of a neutral model (12, 25, 27)
are almost invariably followed by responses showing that packaging neutrality with a different set of alternative assumptions
can explain the data after all (11, 28, 29). Although the identification of alternative auxiliary assumptions that preserve a theory’s
core prevents premature abandonment of a promising theory, it
also can hinder progress by inhibiting the reallocation of scientific
effort to more promising research programs (30). Given the proliferation of alternative theories of biodiversity (8, 14, 31, 32),
identifying and testing predictions that are robust to auxiliary
assumptions, and therefore better target a theory’s core assumptions, should be a high priority.
Here, we showed that, as previously hypothesized (22), a
gamma distribution successfully captures features common to
several models that share the core neutrality assumptions of species equivalence, but make very different auxiliary assumptions.
We then found that this approximation cannot simultaneously
PNAS Early Edition | 3 of 6
ECOLOGY
idiosyncratic ways, or whether there are particular features of real
SADs that cannot be captured by neutral models.
Table 1. Model selection for analysis at whole-dataset level
Site
Dataset
Antarctic Isopods (ANI)
Antarctic Molluscs (ANM)
Tuscany Archipelago Fish (TAP)
Indo-Pacific Coral Crustaceans (IPC)
SE Australia: Shelf Fish (SEF)
W Australia: Deep Fish (WAF)
Scotian Shelf Fish (SSF)
Eastern Bass Strait Invertebrates (EBS)
Sunderban Zooplankton (SUZ)
Great Barrier Reef Fish (GBR)
Central Pacific Reef Fish (CPF)
Norwegian Shelf Macrobenthos (NSM)
North Sea Invertebrates (NSI)
Bass Strait Intertidal (BSI)
Overall
Mesoscale
Regional
LN
Gamma
LN
Gamma
LN
Gamma
>0.9999
0.9981
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
<0.0001
0.0019
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
>0.9999
0.9995
0.9999
NA
>0.9999
>0.9999
>0.9999
>0.9999
NA
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
>0.9999
<0.0001
0.0005
0.0001
NA
<0.0001
<0.0001
<0.0001
<0.0001
NA
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
<0.0001
>0.9999
0.9992
0.9983
>0.9999
>0.9999
>0.9999
0.9984
0.0087
0.9892
>0.9999
>0.9999
>0.9999
>0.9999
0.9998
>0.9999
<0.0001
0.0008
0.0017
<0.0001
<0.0001
<0.0001
0.0016
0.9913
0.0108
<0.0001
<0.0001
<0.0001
<0.0001
0.0002
<0.0001
Values indicate percentage support for the gamma and lognormal (LN) models fitted to the species
abundance data at three scales: site level, mesoscale, and regional. Each row represents a different dataset. For
IPC and SUZ, there were too few SADs to create mesoscale groupings. The last row is an overall test, based on
summing the log-likelihoods across all datasets. Where one of the models has at least 95% support, the model’s
weight is shown in bold.
account for the large number of very rare species, and the very
high abundances of the most abundant species, in marine species
abundance data. These discrepancies are highly consistent across
diverse taxa, habitat types, and geographical locations, suggesting that this shortcoming of neutral theory is likely to be widespread in nature.
In contrast to the simulated neutral assemblages, analysis of
the real data reveals that the lognormal distribution captures
much better the observed heterogeneity in species abundances,
and produces realistic estimates of the number of unobserved
species at the regional scale. Lognormal SADs are generated by
models that make a variety of different assumptions about how
variation in abundances is determined, but all those proposed to
date explicitly or implicitly incorporate niche or demographic
differences among species. For instance, Engen and Lande (33)
derived the lognormal from stochastic community dynamics
models that allow for differences in species’ intrinsic growth
0.0
0.2
0.4
0.6
0.8
1.0
6
4
2
0
−6 −4 −2
0.00
0.10
Residuals (log scale)
B
−0.10
Residuals (arithmetic scale)
A
0.0
0.2
0.4
0.6
0.8
1.0
Normalized abundance class
Fig. 3. Residuals (observed minus predicted number of species) of gamma
(blue lines) versus lognormal (red lines) fits, illustrating why the gamma
performs worse than the lognormal. Each line represents one ecosystem at
one spatial scale, with log2 abundance class normalized so that each dataset’s uppermost octave is 1.0. (A) Residuals for abundance distributions
lacking an internal mode, plotted on an arithmetic scale to highlight the lack
of fit for the smaller (leftmost) abundance classes. (B) Residuals for abundance distributions with an internal mode, plotted on a logarithmic scale to
illustrate the lack of fit for the larger (rightmost) abundance classes.
4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111
rates and carrying capacities, and for arbitrary patterns of covariation in species’ responses to environmental fluctuations (4).
More generally, Pueyo (22) derived the lognormal as a generic
approximation for deviations from neutral community dynamics.
Alternatively, niche apportionment models explain variation in
species abundances based on variation in niche size, rather than
stochastic population fluctuations (8). The lognormal can arise
from the hierarchical apportionment of niche space due to
a central limit theorem-like phenomenon (34). It can also be
expected to arise more generally when there are multiple niche
dimensions, and niche overlap may occur. As niche dimensions
increase, the niche hypervolume becomes the product of idiosyncratically varying niche widths along an increasingly large
number of niche axes. Provided abundance is proportional to the
niche hypervolume (potentially along with other factors, such as
body size and the proportion of resources within the niche that
have not been preempted by other species), abundance will tend
toward a lognormal distribution, again by the central limit theorem (13, 18).
The fairly general conditions under which lognormal distributions can arise suggests that a broad range of nonneutral
characteristics, such as high-dimensional niche differences, lowdimensional life history trade-offs, or differential responses to
environmental fluctuations, may give rise to approximately lognormal abundance distributions, such as those found in this study
(13, 18, 22). Thus, although the comparatively poor performance
of the gamma relative to the lognormal provides strong evidence
against neutral dynamics as the principal driver of variation in
abundances among species, the seemingly excellent fit of the
lognormal distributions cannot, by itself, unambiguously identify
which nonneutral processes are most responsible for the heterogeneity in species abundances in a given assemblage. Consequently, methods for apportioning the heterogeneity in species
abundances among its potential causes, and which can yield
inferences about species-rich assemblages, warrant further development. Some such methods, such as those based on relationships between abundance and species’ functional traits (34),
or on spatiotemporal trends in relative abundances (35), explicitly predict lognormal SADs, and therefore are likely to be
particularly promising.
Connolly et al.
100
200
500
1000
Jackknife estimate
Fig. 4. Agreement between lognormal-based and nonparametric estimates
of the total number of species in the community. Points on the horizontal
axis are richness estimates produced by the nonparametric jackknife, based
on presence–absence data across sites. The points on the vertical axis are
estimates produced by the lognormal model, fitted to the pooled regional
abundance distributions. Error bars are 95% confidence intervals. The solid
line is the unity line, where the lognormal and the nonparametric jackknife
produce the same estimate of the number of unobserved species.
Conclusions
Neutral theory explains variation in the abundances and distribution of species entirely as a consequence of demographic stochasticity—chance variation in the fates of individuals (15, 36).
Although proponents of neutral theory have always acknowledged
the existence of ecological differences between species, neutral
theory assumes that those differences are overwhelmed by the
phenomena that are explicitly included in neutral models (14, 36).
The formulation and testing of neutral theory has drawn attention
to the potential importance of demographic stochasticity as a
process that contributes to differences in species abundances that
are unrelated to species’ ecological traits, such as niche size or
competitive ability. Such effects should be particularly important
among rare species (4). Indeed, our finding that there are common
features of different neutral models suggests that it can play a role
as a robust null expectation, at least for some aspects of community structure (16). However, the most abundant few species often
numerically dominate communities and play a disproportionately
large role in community and ecosystem processes (37). We have
shown that neutral theory consistently underestimates amongspecies heterogeneity in abundances across a broad range of
marine systems. The fact that its performance is closely linked to
abundances of the most common species indicates that it is the
ecological dominance of these very highly abundant species that
cannot be explained by neutral processes alone. Commonness
itself is poorly understood, but the identities of the most common
species in ecosystems tend to remain quite consistent over ecological timescales (38). Thus, the key to understanding the distribution of abundances in communities, even species-rich ones, may
lie as much in understanding how the characteristics of common
species allow them to remain so abundant, as in understanding the
dynamics and persistence of rare species.
Materials and Methods
Approximating Neutrality. Pueyo’s framework starts with a stochastic differential equation for random drift in population size (i.e., birth rate equals
death rate, no density dependence, immigration, emigration, or speciation)
and considers approximating departures from this model in terms of successively higher-order perturbations to it. Here, we take as our candidate neutral
approximation the gamma distribution and, as our alternative model, the
lognormal distribution. More specifically, because species abundance data are
discrete, whereas the gamma and lognormal are continuous distributions, we
use the Poisson-gamma (i.e., negative binomial) and Poisson-lognormal mixture distributions, as these distributions are commonly used to approximate
discrete, random samples from underlying gamma or lognormal community
abundance distributions (see SI Materials and Methods for further details).
To assess whether the Poisson-gamma distribution provides a good approximation to the SADs produced under neutrality, we tested it against five
Connolly et al.
Empirical Data. Data were contributed to the Census of Marine Life (CoML)
project and represent a diverse range of taxa, ocean realms, depths, and
geographic locations (Table S1). To be included in our analysis, contributed
data needed to meet several criteria (see SI Materials and Methods for
details). Where datasets included samples over multiple years from the same
sites, only the most recent year of data was used. Finally, we only fitted SADs
if they contained more than five distinct species abundance values, to
minimize convergence problems associated with fitting species abundance
models to very sparse data. However, the data from such sites were still used
in the analyses that pooled abundance distributions at larger scales.
Fitting Models to Species Abundance Data. For both the simulated neutral
data, and the real species abundance data, we fitted our models using
maximum-likelihood methods (see SI Materials and Methods for details). For
the empirical data, in addition to fitting our species abundance models at
the site level, we also fitted pooled species abundances at a mesoscale level,
and at the regional (whole-dataset) level. For datasets that were spatially
hierarchically organized, we used this hierarchy to determine how to pool
sites at the mesoscale [e.g., for the Central Pacific Reef Fish (CPF) data, sites
were nested within islands, so pooling was done to the island level]. For data
that were not explicitly hierarchically organized [Antarctic Isopods (ANI),
Antarctic Molluscs (ANM), Scotian Shelf Fish (SSF), Bass Strait Intertidal (BSI)],
cluster analysis was used to identify mesoscale-level groupings. In two cases
[Sunderban Zooplankton (SUZ), Indo-Pacific Coral Crustaceans (IPC)], there
were only a few sites sampled, and no natural hierarchical structure, so these
data were omitted from the mesoscale analysis.
For both the analysis of the marine species abundance data, and the
analysis of the simulated neutral communities, model selection was based
on Akaike weights, which are calculated from Akaike’s information criterion
values and estimate the probability (expressed on a scale of 0–1) that
a model is actually the best approximating model in the set being considered. Because the Poisson-gamma and the Poisson-lognormal have the same
number of estimated parameters, this is equivalent to calculating model
weights based on the Bayesian information criterion. For the empirical data,
model selection was done at the whole-dataset level by summing the loglikelihoods for all individual sites (for the site-level analysis) or mesoscale (for
the mesoscale analysis) abundance distributions for a dataset, and calculating Akaike weights based on these values (Table 1). However, this approach
does not make sense for the analysis of the simulated neutral SADs, because
an arbitrary degree of confidence can be obtained by simulating a large
number of sites. Therefore, we instead calculated an expected level of model
support on a per-SAD basis, for each neutral model and parameter combination, by calculating the mean difference in log-likelihoods across the 100
simulated datasets, and converting this mean into an Akaike weight. We
examined these Akaike weights as functions of the number of distinct observed species abundance values, because we would expect our ability to
distinguish between alternative models to increase as the number of distinct
observed species abundance values increases. For comparison, we also calculated per-SAD Akaike weights for the marine species abundance data. This
approach is less powerful than the aggregate whole-dataset comparisons
shown in Table 1, but it facilitates visualization of the differences between
the simulated neutral SADs (Fig. 2A) and the real marine SADs (Fig. 2B).
Analysis of Variation in Performance of Neutral Approximation. The discrepancies between the data and the gamma neutral approximation suggest that
real data exhibit too much heterogeneity in species’ abundances to be
captured by the neutral approximation. To better understand this, we examined whether the relative model support varied systematically within or
among datasets as a function of the prevalence of rare species, and the
abundances of the most abundant species. As a measure of relative model
support, we used a per-observation difference in log-likelihoods (see SI Materials and Methods for details). We first confirmed that this standardization
PNAS Early Edition | 5 of 6
ECOLOGY
1000
200
50
Lognormal estimate
50
different neutral models: Hubbell’s original neutral model (39), a protracted
speciation neutral model (21), a fission speciation model (40), an independent
species model (11, 41), and a spatially explicit neutral model (42). We chose
these five models because they encompass models that relax key assumptions
of neutral theory as originally formulated; moreover, each of them meets
a strict definition of neutrality: every individual has the same demographic
rates, and the same per-capita effects on other individuals, regardless of
species. We tested the approximation in two ways. First, we assess how closely
(in absolute terms) the Poisson-gamma can approximate neutral abundance
distributions. Second, we assess whether the Poisson-gamma outperforms the
Poisson-lognormal when fitted to data generated according to neutral model
assumptions (see SI Materials and Methods for details).
controlled for the effect of sample size on statistical power (i.e., the trend illustrated in Fig. 2B). Then, we asked whether the variation in standardized
model support was better explained by the numerical dominance of the most
common species, or by the prevalence of very rare species, using mixed-effects
linear models.
ACKNOWLEDGMENTS. U.S. acknowledges S. Ehrich and A. Sell for providing
ship time. The authors thank all participants in the Census of Marine Life
project, particularly S. Campana, M. Sogin, K. Stocks, and L. A. Zettler. They
also thank R. Etienne for providing advice for obtaining numerical solutions
of the fission speciation neutral model, and J. Rosindell and S. Cornell for
sharing simulated neutral community data from their spatially explicit
neutral model. The authors thank T. Hughes for comments on an early
version of the manuscript. K.E.E. acknowledges The Norwegian Oil and Gas
Association for permitting use of data. A.B. acknowledges the support of the
Ministry for Science and Technology and the German Research Foundation
(Deutsche Forschungsgemeinschaft) for support of the Antarctic benthic
deep-sea biodiversity (ANDEEP) and ANDEEP-System Coupling (SYSTCO)
expeditions, as well as five PhD positions. A.B. also thanks the AlfredWegener-Institute for Polar and Marine Research for logistic help, as well
as the crew of the vessel and all pickers, sorters and identifiers of the extensive deep-sea material. The Census of Marine Life funded the assembly of
the metadataset. Analysis of the data was made possible by funding from
the Australian Research Council (to S.R.C.).
1. Angert AL, Huxman TE, Chesson P, Venable DL (2009) Functional tradeoffs determine
species coexistence via the storage effect. Proc Natl Acad Sci USA 106(28):11641–11645.
2. Levine JM, HilleRisLambers J (2009) The importance of niches for the maintenance of
species diversity. Nature 461(7261):254–257.
3. Adler PB, Ellner SP, Levine JM (2010) Coexistence of perennial plants: An embarrassment of niches. Ecol Lett 13(8):1019–1029.
4. Sæther BE, Engen S, Grøtan V (2013) Species diversity and community similarity in
fluctuating environments: Parametric approaches using species abundance distributions. J Anim Ecol 82(4):721–738.
5. Scheffer M, van Nes EH (2006) Self-organized similarity, the evolutionary emergence
of groups of similar species. Proc Natl Acad Sci USA 103(16):6230–6235.
6. MacArthur JW (1960) On the relative abundance of species. Am Nat 94(874):25–34.
7. Clark JS (2010) Individuals and the variation needed for high species diversity in forest
trees. Science 327(5969):1129–1132.
8. Tokeshi M (1999) Species Coexistence: Ecological and Evolutionary Perspectives
(Blackwell Science, Oxford).
9. McGill BJ, et al. (2007) Species abundance distributions: Moving beyond single prediction theories to integration within an ecological framework. Ecol Lett 10(10):
995–1015.
10. Connolly SR, Hughes TP, Bellwood DR, Karlson RH (2005) Community structure of
corals and reef fishes at multiple scales. Science 309(5739):1363–1365.
11. Volkov I, Banavar JR, Hubbell SP, Maritan A (2007) Patterns of relative species
abundance in rainforests and coral reefs. Nature 450(7166):45–49.
12. Dornelas M, Connolly SR, Hughes TP (2006) Coral reef diversity refutes the neutral
theory of biodiversity. Nature 440(7080):80–82.
13. May RM (1975) Ecology and Evolution of Communities, eds Cody ML, Diamond JM
(Belknap Press of Harvard Univ Press, Cambridge, MA), pp 81–120.
14. Hubbell SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography
(Princeton Univ Press, Princeton).
15. Clark JS (2009) Beyond neutral science. Trends Ecol Evol 24(1):8–15.
16. Rosindell J, Hubbell SP, He FL, Harmon LJ, Etienne RS (2012) The case for ecological
neutral theory. Trends Ecol Evol 27(4):203–208.
17. McGill BJ (2003) A test of the unified neutral theory of biodiversity. Nature 422(6934):
881–885.
18. Connolly SR, Dornelas M, Bellwood DR, Hughes TP (2009) Testing species abundance
models: A new bootstrap approach applied to Indo-Pacific coral reefs. Ecology 90(11):
3138–3149.
19. Etienne RS, Olff H (2005) Confronting different models of community structure to
species-abundance data: A Bayesian model comparison. Ecol Lett 8(5):493–504.
20. Muneepeerakul R, et al. (2008) Neutral metacommunity models predict fish diversity
patterns in Mississippi-Missouri basin. Nature 453(7192):220–222.
21. Rosindell J, Cornell SJ, Hubbell SP, Etienne RS (2010) Protracted speciation revitalizes
the neutral theory of biodiversity. Ecol Lett 13(6):716–727.
22. Pueyo S (2006) Diversity: Between neutrality and structure. Oikos 112(2):392–405.
23. Adler PB (2004) Neutral models fail to reproduce observed species-area and speciestime relationships in Kansas grasslands. Ecology 85(5):1265–1272.
24. McGill BJ, Hadly EA, Maurer BA (2005) Community inertia of Quaternary small
mammal assemblages in North America. Proc Natl Acad Sci USA 102(46):16701–16706.
25. Ricklefs RE, Renner SS (2012) Global correlations in tropical tree species richness and
abundance reject neutrality. Science 335(6067):464–467.
26. Bode M, Connolly SR, Pandolfi JM (2012) Species differences drive nonneutral structure in Pleistocene coral communities. Am Nat 180(5):577–588.
27. Wills C, et al. (2006) Nonrandom processes maintain diversity in tropical forests. Science 311(5760):527–531.
28. Lin K, Zhang DY, He FL (2009) Demographic trade-offs in a neutral model explain
death-rate—abundance-rank relationship. Ecology 90(1):31–38.
29. Chen AP, Wang SP, Pacala SW (2012) Comment on “Global correlations in tropical
tree species richness and abundance reject neutrality” Science 336(6089):1639, author
reply 1639.
30. Lakatos I (1978) The Methodology of Scientific Research Programmes (Cambridge
Univ Press, Cambridge, UK).
31. McGill BJ (2010) Towards a unification of unified theories of biodiversity. Ecol Lett
13(5):627–642.
32. Harte J (2011) Maximum Entropy and Ecology: A Theory of Abundance, Distribution,
and Energetics (Oxford Univ Press, Oxford).
33. Engen S, Lande R (1996) Population dynamic models generating the lognormal species abundance distribution. Math Biosci 132(2):169–183.
34. Sugihara G, Bersier LF, Southwood TRE, Pimm SL, May RM (2003) Predicted correspondence between species abundances and dendrograms of niche similarities. Proc
Natl Acad Sci USA 100(9):5246–5251.
35. Engen S, Lande R, Walla T, DeVries PJ (2002) Analyzing spatial structure of communities using the two-dimensional poisson lognormal species abundance model. Am
Nat 160(1):60–73.
36. Alonso D, Etienne RS, McKane AJ (2006) The merits of neutral theory. Trends Ecol Evol
21(8):451–457.
37. Gaston KJ (2010) Ecology. Valuing common species. Science 327(5962):154–155.
38. Gaston KJ (2011) Common ecology. Bioscience 61(5):354–362.
39. Etienne RS, Alonso D (2005) A dispersal-limited sampling theory for species and alleles. Ecol Lett 8(11):1147–1156.
40. Etienne R, Haegeman B (2011) The neutral theory of biodiversity with random fission
speciation. Theor Ecol 4(1):87–109.
41. He F (2005) Deriving a neutral model of species abundance from fundamental
mechanisms of population dynamics. Funct Ecol 19(1):187–193.
42. Rosindell J, Cornell SJ (2013) Universal scaling of species-abundance distributions
across multiple scales. Oikos 122(7):1101–1111.
Testing the Absolute Fit of the Lognormal Distribution. Goodness of fit of the
lognormal distribution to the empirical data was assessed with parametric
bootstrapping (see SI Materials and Methods for details). Also, for each
dataset’s regional-scale SAD, we compared lognormal-based estimates of
total number of species in the species pool with estimates using the nonparametric jackknife (10). See SI Materials and Methods for further details.
6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111
Connolly et al.
Supporting Information
Connolly et al. 10.1073/pnas.1406664111
SI Text
SI Materials and Methods
Candidate Neutral and Nonneutral Approximations. A set of noninteracting populations undergoing pure random drift in population
size (birth rate equals death rate, no immigration, emigration, or
environmental stochasticity) produces a species abundance distribution in which the probability that a species has a given abundance,
n, varies inversely with abundance (1). On log-log scale, this is
a straight line with a slope of −1:
logðf ðnÞÞ = logðκÞ − logðnÞ;
[S1]
where f(n) is the probability that a species has abundance n, and κ is
a normalizing constant. Neutral models have two characteristics that
cause them to depart from the case of pure random drift. First,
because species are ecologically identical, there is a constraint on
total community size that is independent of species richness. Using
a maximum entropy argument, a modification to this power-law
model can be derived that accounts for this constraint (1):
logðf ðnÞÞ = logðκÞ − logðnÞ − ϕn:
[S2]
Eq. S2 is equivalent to Fisher’s log-series distribution (1). Second, neutral models also may have characteristics that cause individual species’ dynamics to depart from the pure drift assumption,
such as dispersal limitation (2), or unequal birth and death rates (3).
Pueyo (1) conceptualizes small departures from pure drift as perturbations to the value of the slope of −1 in Eq. S1. The combination of these two extensions to Eq. S1 yields the following:
logðf ðnÞÞ = logðκÞ − β logðnÞ − ϕn:
[S3]
Note that, by setting β = 1 − k and ϕ = 1=a, and the normalization
constant κ = ðΓðkÞak Þ−1 , it becomes apparent that f(n) in Eq. S3 is
a gamma distribution with shape k and scale a. Because it is well
known that many neutral models can depart markedly from the
log-series distribution (2, 4, 5), we take the gamma distribution as
our candidate neutral approximation.
Increasingly large departures from neutrality might be poorly
approximated by a perturbation to the slope of a power-law relationship, in which case a second-order perturbation may be
needed, where a quadratic term is added to the first-order model:
logðf ðnÞÞ = logðκÞ − β logðnÞ + c ½logðnÞ2 :
[S4]
μ2
2
2
If we
pffiffiffiffiffiset β = 1 − μ=σ , c = −1=ð2σ Þ, and logðκÞ = − 2σ 2 +
logð 2π σÞ , then f(n) in Eq. S4 is a lognormal distribution
where μ and σ are the mean and SD of log(n), respectively (1).
We therefore take the lognormal as our candidate nonneutral
approximation.
Because the gamma and lognormal distributions are continuous,
whereas abundances are integer-valued, and because many species
abundance data are incomplete samples from an underlying community abundance distribution, in our analyses we assess our neutral
and nonneutral approximations by fitting Poisson-gamma (i.e.,
negative binomial) and Poisson-lognormal mixture distributions:
Z∞
PðrÞ =
λ=0
λr e−λ
f ðλÞ dλ;
r!
Connolly et al. www.pnas.org/cgi/content/short/1406664111
[S5]
where P(r) is the probability that a species has abundance r in the
sample, λ is the mean of the Poisson distribution (and thus integrated out of the likelihood), and f(λ) is either the lognormal
or the gamma distribution. These distributions are commonly
used to represent random samples of individuals from underlying
gamma or lognormal community abundance distributions, respectively (6–8). More specifically, we use the zero-truncated
forms of the Poisson-gamma and Poisson-lognormal distributions, because, by definition, a species is not observed in the
sample if it has zero abundance (6):
pðrÞ =
PðrÞ
:
1 − Pð0Þ
[S6]
Assessing the Neutral Approximation. Our five candidate neutral
models exhibited a broad range of auxiliary assumptions. In
Hubbell’s “original neutral model,” local communities are partially isolated by dispersal from the broader metacommunity, and
new species arise with a fixed probability from individual birth
events (analogous to mutation events in population-genetic
neutral models) (9). The “protracted speciation neutral model”
is similar to the original neutral model, but it incorporates a time
lag between the appearance of an incipient new lineage, and its
recognition as a distinct species (10). In the “fission speciation
model,” speciation occurs by random division of existing species
(e.g., via vicarance); this model can exhibit a more superficially
lognormal-like species abundance pattern than point speciation
models, in that its log-abundance distributions are more symmetric about a single mode than other neutral models (5). In the
“independent species model” (3, 11), population dynamics are
density independent, per-capita birth rate is less than per-capita
death rate, and there is a constant immigration rate. Finally,
in the spatially explicit neutral model (4), speciation follows
a point-mutation process (as in the original neutral model), and
dispersal distances follow a Gaussian kernel. The first four
models have explicit mathematical expressions for the species
abundance distribution at equilibrium, which facilitates formally
evaluating the neutral approximation: see equations below). For
the spatially explicit neutral model, we used the approximate
species abundance distributions generated by simulation in the
original paper and kindly provided by the authors (4).
As noted in the main text, the strict definition of neutrality that
applies to these models contrasts with symmetric models that
implicitly allow for niche or demographic differences among
species, for instance, by having within-species competition be
stronger than between species competition (12), by implicitly
including temporal niche differentiation via different responses
to environmental fluctuations (13), or by allowing species with
different life history types to differ in their speciation rates (14).
To assess how well the Poisson-gamma distribution approximates our alternative neutral models, we considered a broad
range of neutral model parameter space spanning most of the
realistic range for real species abundance data (hundreds to tens
of thousands of individuals, and from less than 10 to many
hundreds of species). For each neutral model parameter combination, we used the Kullback–Leibler (K-L) divergence,
a measure of the information lost when one distribution is used
as an approximation for another (15). Specifically, we found the
Poisson-gamma distribution parameters that minimized the K-L
divergence. For discrete data, such as counts, K-L divergence is
as follows:
1 of 14
D=
X
πðnÞlog
n
πðnÞ
;
pðnÞ
[S7]
where n indexes the possible values of the random variable (in
this case, abundance), π(n) is the distribution being approximated (the relevant neutral model), and p(n) is the approximating model—in this case, the zero-truncated Poisson-gamma
distribution (Eq. S6).
Because our analysis of the empirical data is largely a comparative assessment of the Poisson-gamma and Poisson-lognormal distributions, our conclusions rely on an implicit assumption
that a Poisson-gamma distribution would outperform a Poissonlognormal if data were actually generated by neutral dynamics.
Therefore, in addition to assessing the performance of the Poissongamma as a neutral approximation in absolute terms, we also
simulated 100 species abundance distributions from each of the 126
equilibrium neutral abundance distributions used in the previous
analysis (Fig. S1), and we compared the best-fit Poisson-gamma
and Poisson-lognormal distributions for the 12,600 simulated
abundance distributions, exactly as we did for the empirical species
abundance distributions.
Criteria for Empirical Data Inclusion. Our criteria for data inclusion
were as follows. First, the data needed to record counts of individual organisms for a given level of sampling effort (e.g.,
sample volume, or transect area). Second, data needed to be
collected by experts (i.e., survey programs including data collected by amateurs were excluded), to minimize the risks of
misidentification or miscounting. Third, data needed to be focused on the assemblage level, rather than on specific target
species. Fourth, if sampling effort varied within species abundance samples, it had to be possible to standardize to a common
level of effort. For instance, if fishes were counted on 10-m2 and
50-m2 transects, then 10/50 = 20% of the individuals on the
larger transects were subsampled and pooled with the counts
from the smaller transects (16). Three of the datasets we used
required subsampling [Great Barrier Reef Fish (GBR), National
Oceanic and Atmospheric Administration (NOAA) Central Pacific Reef Fish (CPF), and South East Fishery: Shelf Fish (SEF)].
Model Fitting. To assess the relative performance of the Poissongamma and Poisson-lognormal for both simulated neutral and
real species abundance data, we found the gamma or neutral
model parameters that maximized the log-likelihood for the zerotruncated forms of the Poisson-gamma and Poisson-lognormal
abundance distributions:
X
nr logðpðrÞÞ;
[S8]
L=
r
where nr is the number of species with abundance r in the sample, and p(r) is the zero-truncated probability that a species has
abundance r (Eq. S6). Best-fit models were obtained by finding
the gamma or neutral model parameters that maximized the loglikelihood for each site.
Analysis of Variation in the Shapes of Species Abundance Distributions.
To determine whether there was any systematic variation in the
strength of evidence for gamma-like versus lognormal-like distributions, and whether any such variation was associated with
systematic differences in the patterns of commonness and rarity in
communities, we needed a sample-standardized measure of the
relative strength of support for a candidate model. Specifically, the
maximum log-likelihood for a species abundance model at a given
site is the sum of the contributions of each species’ abundance value
to the log-likelihood. To control for this effect of the number of
observations, we computed, for each site, a per-observation average
Connolly et al. www.pnas.org/cgi/content/short/1406664111
log-likelihood: the site’s maximum log-likelihood divided by the
number of species abundances contributing to that log-likelihood.
This approach is used in time series analysis, when models that have
been fitted to different numbers of observations (e.g., models with
different time lags) must be compared (17). Our standardized
measure of model support was simply the difference between the
standardized gamma and lognormal maximum log-likelihoods.
As our measure of the dominance of common species, we took,
in the first instance, the abundance of the most abundant species,
expressed as a proportion of the total number of individuals in the
species abundance distribution. As our rarity measure, we took
the proportion of species that were singletons (i.e., represented by
a single individual in the abundance distribution). We used linear
mixed-effects models to characterize the extent to which these
two quantities explained variation within and among datasets in
the standardized support for the lognormal over the gamma, at all
scales (site, mesoscale, regional). To confirm that our results were
not sensitive to the particular commonness or rarity metrics we
considered, we repeated our analysis using the combined abundance of the three most abundant species, and using the proportion of species in the bottom two octaves of abundance (i.e.,
with proportion of species with abundance three or less).
Parametric Bootstrap Goodness of Fit. Goodness of fit to the empirical data was assessed with parametric bootstrapping, using
a hypergeometric algorithm described in detail elsewhere (7).
Parametric bootstrapping involves simulating datasets that conform to the assumptions of a particular fitted species abundance
model. For example, to test the goodness of fit of the Poissonlognormal, one simulates Poisson random sampling of individuals
from an underlying lognormal distribution of species abundances.
Then, the model is fitted to each simulated dataset, and a goodness of fit statistic calculated. The frequency distribution of this
statistic across simulated datasets approximates the statistic’s
expected distribution, under the null hypothesis that the data
conform to the model. As a goodness of fit statistic, we use a
normalized measure of model deviance, which, following convention, we term ^c (16). Deviance is a likelihood-based measure of how far away the model is from exhibiting a perfect fit
to the data. ^c is obtained by taking all deviances for the model’s
fits to the observed and simulated data, and dividing each by
the average of the simulated deviances. Thus, ^c has an expected
value of 1.0. We judged the lack of fit as statistically significant
if the ^c of the observed data was greater than 95% of the
corresponding simulated ^c values.
Species Pool Estimation. Using the maximum-likelihood estimates,
the probability that a species is present in the species pool but has
abundance zero in the sample, P(0), is calculated from Eq. S5, by
substituting 0 for r. Then, the number of species in the community
that has been sampled can be estimated from the following:
^=
S
Sobs
;
1 − Pð0Þ
[S9]
^ is the estimated number of species in the community,
where S
and Sobs is the number of species observed in the data. Nonparametric jackknife estimates were calculated using the frequency distribution of species occurrences across sites (i.e.,
presence–absence data: see ref. 16). Jackknife order was calculated separately for each dataset, using the sequential testing
procedure recommended by ref. 18.
SI Results
Performance of the Neutral Approximation. Fig. S1 depicts the fit of
the neutral approximation to our five alternative neutral models.
For the first three models, these plots encompass three order-ofmagnitude variation in local community sizes, J (102 to 104 in2 of 14
dividuals), because most species abundance distributions are on
the order of hundreds to (occasionally) tens of thousands of
individuals. Similarly, we show a broad range of immigration
rates from m = 0.01 (1% of newborns are immigrants) to 1.0 (an
entirely open local community). We plot a range of values of the
biodiversity parameter, θ, so that the expected number of species
in the community spanned a very broad range (typically from
a low of about five species, for small, isolated communities with
low θ, to many hundreds of species for large, high-immigration
communities with large θ). Note that the range of values of θ
needed to span these richness values differs between the fission
speciation model and the first two models, because the parameter is defined somewhat differently in this model. The protracted speciation model includes an additional parameter, τ′,
which is the number of generations required for speciation to
occur, relative to the metacommunity size (the special case τ′ = 0
corresponds to the original neutral model). The fourth (independent species) neutral model differs from the others in
that it does not explicitly characterize dynamics at the metacommunity scale. Rather, it implicitly assumes that species have
equal abundance in the metacommunity (and thus they all have
the same rate of immigration to the local community, γ), and that
species’ local population dynamics are independent of one another, and thus a function of only γ and the ratio of local percapita birth to death rates, x. Because within-species dynamics
are also density independent, this is consistent with the neutrality
assumption (individuals have no effect on one another’s percapita growth rates, regardless of whether they belong to the
same or different species). This density-independent assumption
means that the model is a probability distribution of species
abundances, and not a model of overall species frequencies. Consequently, unlike the previous neutral models, it does not predict
species richness. Similarly, for the spatially explicit model, the form
of the species abundance distribution depends on the speciation
probability (ν), and the ratio of the sampling area A (i.e., the local
community size) to the squared width of the dispersal kernel, L,
rather than either of the latter two variables independently (4).
Thus, a given shape for the species abundance distribution can
correspond to a broad range of different community species
richness values, depending on whether A and L are both small
or both large.
Fig. S1 shows that the Poisson-gamma neutral approximation
performs very well in the overwhelming majority of cases. There
are, however, some cases where the approximation performs less
well. These typically correspond to parameter combinations that
imply very species-poor assemblages. One class of such cases
corresponds to small (∼100 individuals), very low-immigration,
low-diversity assemblages (∼5 species: e.g., Top Left of Fig. S1A).
Here, the neutral model has an elevated probability that one
species is nearly monodominant (the curve bends upward at the
right, for species abundances close to the total community size),
which the Poisson-gamma distribution cannot capture. A second
class of cases, specific to the protracted speciation model, involves a flattening of the species abundance distribution at low
abundances (e.g., m = 0.1, θ = 4, J = 104 in Fig. S1C). This effect
is too small to see clearly for the range of parameter values
shown in Fig. S1, but is somewhat more pronounced in very large
communities with very low values of the biodiversity parameter
(θ ∼ 1), for which the ratio of individuals to species is very high
(e.g., a local community with 10,000 individuals but only about
10 species). The third class of cases are specific to the fission
speciation model and involve an excess of rare species, relative to
the Poisson-gamma distribution (e.g., m = 0.01, J = 104, θ = 40 in
Fig. S1D). As with the second class of cases, this effect is relatively small in Fig. S1, but can be more pronounced for very
large, particularly isolated, communities with few species (e.g.,
10,000 individuals and about 10 species, implying mean abundances of about 1,000). For the data analyzed in this paper,
Connolly et al. www.pnas.org/cgi/content/short/1406664111
however, most sites are very far from these extreme low-diversity
cases. The typical (median) site is a sample of 422 individuals
containing 17 species, and very few sites contain so few species at
such large sample sizes (86% of sites, for instance, have mean
species abundances of 100 or less). Moreover, the individual
datasets vary substantially in community size and observed species richness (e.g., mean site richness varies from 9 to 126 species
across the 14 datasets, and average species abundances at the
site level range from 4 to 123 across all datasets except one).
Thus, the overwhelming majority of our sites could not correspond to those regions of parameter space where the Poissongamma distribution performs less well as an approximation for
neutral dynamics.
Robustness to Ecological and Taxonomic Heterogeneity. Although
neutral models have previously been applied to very heterogeneous communities (19), including benthic marine invertebrates
(2) [and indeed their capacity to characterize such systems has
been invoked as evidence of their robustness (2)], most neutral
model communities are conceptualized as a guild of organisms
competing for a shared set of resources. Some of our datasets are
relatively taxonomically and ecologically homogeneous [e.g.,
Indo-Pacific Coral Crustaceans (IPC), which contains only
crustaceans associated with dead coral heads]. However, others
are more heterogeneous. Therefore, to determine whether our
results were sensitive to this taxonomic and ecological heterogeneity of the assemblages, we classified our species into guilds,
where information was available, and reanalyzed our species
abundance data, limiting the analysis to species from the most
species-rich guild for each dataset (Table S3). Such a classification is necessarily approximate for marine animals, given the
high degree of omnivory in the ocean. Nevertheless, the analysis
allows us to evaluate whether or not our conclusions are sensitive
to the extent of heterogeneity in the data. The resolution of the
groupings for this analysis depended somewhat on both the
taxonomic and ecological heterogeneity of the original data, and
also on the species richness in the samples. Specifically, we used
as a rule of thumb that guilds should have a minimum of 10
species, necessitating use of more coarse groupings for more
species-poor datasets.
By restricting the analysis to a subset of the species, the statistical power to detect differences between Poisson-lognormal
and Poisson-gamma species abundances is reduced—the more
heterogeneous the original dataset, the smaller the subset of
species that could be included in the analysis. Nevertheless,
strong support for the Poisson-lognormal remained: across site,
mesoscale, and regional levels, the Poisson-lognormal was strongly
(>95%) supported in 27 cases, whereas the Poisson-gamma was
strongly supported in only 1 (Table S3).
Analysis of Variation in the Shapes of Species Abundance Distributions.
Standardizing model support by dividing by the number of observed species abundances successfully controlled for the effects
of statistical power shown in Fig. 2, at least at the site level and
mesoscale: mixed-effects linear model analyses using the number of
distinct species abundance values as an explanatory variable indicated that the overall effect did not differ significantly from zero,
and explained about 1–10% of the variation in standardized model
support across datasets (see R2 values in Table S4). In contrast,
the strength of support for the Poisson-lognormal over the Poisson-gamma increased strongly with the relative abundance of the
most abundant species at site, mesoscale, and regional (wholedataset) levels. The positive relationship was highly consistent
between datasets at both the site level and mesoscale (gray lines in
Fig. S4 A and C), and explained about one-half or more of the
variation (Table S4 and Fig. S4 B, D, and E). In contrast, the
proportion of singletons was a poor predictor of relative model
performance: the estimated direction of the effect was not con3 of 14
sistent across datasets at the site level (gray lines in Fig. S5A), and
the estimated overall effect did not differ significantly from zero at
any scale (Table S4 and black lines in Fig. S5 A, C, and E) and
never explained more than 16% of the variation (Table S4).
To further assess the strength of these results, we repeated our
common-species analysis using the combined relative abundance
of the three most abundant species. This, too, was strongly positively related to support for the Poisson-lognormal distribution
(slope: 0.40 ± 0.02, pseudo-R2 = 0.44 at site level; 0.24 ± 0.05,
pseudo-R2 = 0.53 at mesoscale level; 0.15 ± 0.04, R2 = 0.55 at
regional scale). Conversely, expanding our definition of rarity to
encompass the proportion of species in the bottom two octaves
(species with abundance 3 or less) did not improve its effectiveness as a predictor of standardized support for the Poisson-lognormal over the Poisson-gamma: the overall relationship did not
differ significantly from zero at any scale (no slopes significantly
different from zero, pseudo-R2 < 0.02 at site-scale and mesoscale
levels, R2 = 0.21 at regional level).
1. Pueyo S (2006) Diversity: Between neutrality and structure. Oikos 112(2):392–405.
2. Hubbell SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography
(Princeton Univ Press, Princeton).
3. Volkov I, Banavar JR, Hubbell SP, Maritan A (2007) Patterns of relative species
abundance in rainforests and coral reefs. Nature 450(7166):45–49.
4. Rosindell J, Cornell SJ (2013) Universal scaling of species-abundance distributions
across multiple scales. Oikos 122(7):1101–1111.
5. Etienne R, Haegeman B (2011) The neutral theory of biodiversity with random fission
speciation. Theor Ecol 4(1):87–109.
6. Pielou EC (1977) Mathematical Ecology (Wiley, New York).
7. Connolly SR, Dornelas M, Bellwood DR, Hughes TP (2009) Testing species abundance
models: A new bootstrap approach applied to Indo-Pacific coral reefs. Ecology 90(11):
3138–3149.
8. Sæther BE, Engen S, Grøtan V (2013) Species diversity and community similarity in
fluctuating environments: Parametric approaches using species abundance distributions.
J Anim Ecol 82(4):721–738.
9. Etienne RS, Alonso D (2005) A dispersal-limited sampling theory for species and
alleles. Ecol Lett 8(11):1147–1156.
10. Rosindell J, Cornell SJ, Hubbell SP, Etienne RS (2010) Protracted speciation revitalizes
the neutral theory of biodiversity. Ecol Lett 13(6):716–727.
11. He F (2005) Deriving a neutral model of species abundance from fundamental
mechanisms of population dynamics. Funct Ecol 19(1):187–193.
12. Volkov I, Banavar JR, He FL, Hubbell SP, Maritan A (2005) Density dependence
explains tree species abundance and diversity in tropical forests. Nature 438(7068):
658–661.
13. Allen AP, Savage VM (2007) Setting the absolute tempo of biodiversity dynamics. Ecol
Lett 10(7):637–646.
14. Ostling A (2012) Do fitness-equalizing tradeoffs lead to neutral communities? Theor
Ecol 5(2):181–194.
15. Burnham KP, Anderson DR (2002) Model Selection and Multi-Model Inference: A
Practical Information-Theoretic Approach (Springer, Heidelberg).
16. Connolly SR, Hughes TP, Bellwood DR, Karlson RH (2005) Community structure of
corals and reef fishes at multiple scales. Science 309(5739):1363–1365.
17. McQuarrie ADR, Tsai C-L (1998) Regression and Time Series Model Selection (World
Scientific, River Edge, NJ).
18. Burnham KP, Overton WS (1979) Robust estimation of population size when capture
probabilities vary among animals. Ecology 60(5):927–936.
19. de Aguiar MAM, Baranger M, Baptestini EM, Kaufman L, Bar-Yam Y (2009) Global
patterns of speciation and diversity. Nature 460(7253):384–387.
(a) Original neutral model
m = 0.01
1
2
5
1
10−1
10−2
10−3
10−3
2
5
1
2
10−1
10−2
10−2
10−3
2
5
1
2
10−1
10−2
10−2
10−3
1
1
1
1
10−1
10−2
10−3
10−4
1
1
1
1
10−1
10−2
10−3
10−4
5
1
2
5
1
2
10−1
10−1
10−2
10−3
10−3
10−4
E(S )
1
2
50
2
50
1
5
E(S )
1
5
E(S )
1 2
5
50
100
500
100
1
20
100
10
2
5
E(S )
23
2
5
E(S )
17
5
2
1
50
10
20
10
20
500
5
1
127
10 20
50
2
1 10
1 10−5
1
5
5
5
5
10−2
10−3
10−4
10−5
1
1
1
1
10−1
10−2
10−3
10−4
1
E(S )
20
5
50
E(S )
64
5
E(S )
1
5
50
500
5000
500
26
5
50
E(S )
98
1
1
5
50
E(S )
174
20
E(S )
100
500
1 2
5
500
5000
100
5
20
2
5
E(S )
32
5
50
E(S )
500
5
100
1 2
5
1000
4
500
J
1000
20
J
1000
100
J
10000
4
50
10
20
500
5000
J
10000
20
J
10000
100
125
50
E(S )
329
20
1
J
79
240
1
100
100
23
E(S )
1
J
4 5
1 10−1
−3
100
20
10
3
20
E(S )
J
50
70
2
1
100
4
36
5
E(S )
100
10 20
2
1
J
14
5
E(S )
20 50
5
2
1
53
5
2
1
E(S )
20
E(S )
39
E(S )
17
E(S )
24
20
5
1
m =1
8
E(S )
1
11
20
2
1
5
5 10
E(S )
1
5
5 10
E(S )
1
E(S )
4
5 10
E(S )
1
Species frequency (log scale)
2
m = 0.1
500
462
20
100
Species abundance (log scale)
Fig. S1.
(Continued)
Connolly et al. www.pnas.org/cgi/content/short/1406664111
4 of 14
(b) Protracted speciation model with τ ' = τ J m = 10−8
m = 0.01
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
5 × 10−4
E(S ) = 3
1
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
1
Species frequency (log scale)
2
5 10
2
50
1
1
5 10
2
1
50
5 10
1
5
50
20
5
20
5
50
1
5
2
100
500
1
10
20
1
10
20
1
2
100
500
1
100
2
5
10
J = 100
θ = 100
3
4 5
J = 1000
θ=4
5
20
100
500
J = 1000
θ = 20
E(S ) = 79
20 50
5
J = 100
θ = 20
2
1
1
E(S ) = 127
20
50
E(S ) = 23
20
5
10 20
2
1
E(S ) = 53
100
5
E(S ) = 70
5
5
2
E(S ) = 36
E(S ) = 17
E(S ) = 39
1 2
2
1
E(S ) = 24
1
5 10
E(S ) = 23
E(S ) = 10
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
2
J = 100
θ=4
E(S ) = 14
E(S ) = 17
E(S ) = 5
5 × 10−2
5 × 10−3
5 × 10−4
5 × 10−5
m =1
E(S ) = 8
E(S ) = 5
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
5 × 10−1
1 × 10−1
2 × 10−2
5 × 10−3
1 × 10−3
2 × 10−4
m = 0.1
2
5
20
50
J = 1000
θ = 100
E(S ) = 240
10 20
50
1
2
5
10
20
1 × 10−1
−3
1 × 10
E(S ) = 19
1 × 10−5
1
5 × 10−2
5 × 10−3
5 × 10−4
5 × 10−5
5
50
E(S ) = 26
500
5000
1
E(S ) = 64
1
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
5
50
5
20
50
500
5000
1
E(S ) = 97
500
1
E(S ) = 173
1
5
5
500
1 2
50
5
5
50
500
5000
J = 10000
θ = 20
E(S ) = 125
500
1
E(S ) = 328
100
J = 10000
θ=4
E(S ) = 32
5
50
500
J = 10000
θ = 100
E(S ) = 462
20
100
1 2
5
20
100
Species abundance (log scale)
(c) Protracted speciation model with τ ' = τ J m = 10−4
m = 0.1
m = 0.01
1 × 10−1
2 × 10−2
5 × 10−3
1 × 10−3
1
5 10
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
1
2
5 10
2
50
5
1
50
20
5
1 2
5
100
500
1
5 × 10−2
5 × 10−3
5 × 10−4
5 × 10−5
5
1
20
20
50
5
100
1
5
20
50
20
2
10
100
1
2
20
100
500
5000
1
5
1
500
1 2
50
5
10
J = 100
θ = 100
3
4 5
J = 1000
θ=4
5
20
100
500
J = 1000
θ = 20
1
20
2
5
20
50
J = 1000
θ = 100
50
1
2
5
10
20
J = 10000
θ=4
E(S ) = 29
500
5000
1
5
50
500
5000
J = 10000
θ = 20
E(S ) = 111
500
E(S ) = 306
100
5
E(S ) = 232
10 20
50
5
J = 100
θ = 20
2
1
E(S ) = 93
500
2
1
E(S ) = 25
500
50
E(S ) = 77
20 50
5
10 20
E(S ) = 22
20
5
5
E(S ) = 69
5
5
2
1
E(S ) = 125
E(S ) = 170
1
10
E(S ) = 52
E(S ) = 63
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
1
E(S ) = 36
5
2
1
E(S ) = 19
1 × 10−5
50
E(S ) = 17
E(S ) = 38
1 × 10−3
2
1
E(S ) = 24
1
5 10
E(S ) = 23
5 10
1
2
E(S ) = 17
E(S ) = 10
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
1 × 10−1
1
E(S ) = 5
5 × 10−2
5 × 10−3
5 × 10−4
5 × 10−5
5 × 10−1
1 × 10−1
−2
2 × 10−3
5 × 10−3
1 × 10−4
2 × 10
50
E(S ) = 5
1
Species frequency (log scale)
2
J = 100
θ=4
E(S ) = 13
E(S ) = 8
E(S ) = 3
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
m =1
1
5
50
500
J = 10000
θ = 100
E(S ) = 394
100
1 2
5
20
100
Species abundance (log scale)
Fig. S1. (Continued)
Connolly et al. www.pnas.org/cgi/content/short/1406664111
5 of 14
(d) Fission speciation model
1 × 100
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
1
2
5 10
1
50
E(S ) = 5
1
2
1 × 100
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
1
5 10
2
1
50
5 10
1
5
50
1 2
20
5
1 2
100
1
5
1
100
2
500
1
10
20
100
500
1 2
5
1
50
100
1
4 5
J = 100
θ = 1000
1.4
1.8
J = 1000
θ = 40
2
5
20
50
J = 1000
θ = 200
2
5
10
20
J = 1000
θ = 1000
E(S ) = 500
10
20
1
2
3
4
6
J = 10000
θ = 40
E(S ) = 40
20
2
1.2
1
500
1
5
50
500
J = 10000
θ = 200
E(S ) = 196
100
1 2
E(S ) = 719
20
3
E(S ) = 167
50
5
J = 100
θ = 200
2
1.0
E(S ) = 186
20
10
E(S ) = 38
10 20
5
5
E(S ) = 91
20 50
5
2
1
E(S ) = 39
50
5
5
2
1
E(S ) = 318
1 2
20
E(S ) = 204
20
5
10
5
2
1
100
E(S ) = 140
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
1
E(S ) = 112
20
5
20
E(S ) = 67
5
2
1
E(S ) = 37
1 × 10−1
1 × 10−2
1 × 10−3
1 × 10−4
10
E(S ) = 35
E(S ) = 45
2 × 10−2
5 × 10−3
1 × 10−3
2 × 10−4
5 × 10−5
2
1
E(S ) = 38
5 × 10−1
5 × 10−2
5 × 10−3
5 × 10−4
5
E(S ) = 25
E(S ) = 22
1 × 10−1
2 × 10−2
5 × 10−3
1 × 10−3
2 × 10−4
2
J = 100
θ = 40
E(S ) = 29
E(S ) = 23
E(S ) = 5
1 × 10−1
2 × 10−2
5 × 10−3
1 × 10−3
2 × 10−4
m =1
E(S ) = 16
E(S ) = 5
1 × 100
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
Species frequency (log scale)
m = 0.1
m = 0.01
5
20
100
J = 10000
θ = 1000
E(S ) = 909
5 10
50
1
2
5
10 20
50
Species abundance (log scale)
(e) Independent species model
γ = 0.1
γ=1
γ=2
−1
2 × 10
x = 0.5
5 × 10−2
1 × 10−2
5 × 10−1
1
2
3
4
5
1
2
3
4 5
7
1
2
5
1 × 10−1
x = 0.9
Species frequency (log scale)
2 × 10−2
5 × 10−3
1 × 10−3
1
2
5
10
20
1
2
5
10
20
50
1
2
5
10
20
50
1 × 10−1
−2
1 × 10
x = 0.99
1 × 10−3
1 × 10−4
1
2
5 10
50
200
1 2
5
20 50
200
1 2
5
20
100
500
(f) Spatial neutral model
ν = 1 × 10−6
ν = 1 × 10−4
ν = 1 × 10−2
2 × 10−1
5 × 10−2
A L −2 = 4
1 × 10−2
2 × 10−3
1
2
5
10
20
1
2
5
10
1
2
5
2 × 10−1
5 × 10−2
1 × 10−2
2 × 10−3
5 × 10−4
A L −2 = 16
1
2
5
10
20
50
1
2
5
10
20
50
1
2
5
10
20
1 × 10−1
2 × 10−2
5 × 10−3
1 × 10−3
2 × 10−4
A L −2 = 64
1
2
5 10
50
200
1
2
5
10
50
1
2
5
10
20
50
Species abundance (log scale)
Fig. S1. Examples of the fit of the Poisson-gamma neutral approximation (black line) to the five candidate neutral approximations (blue lines). The fits are shown as
“Pueyo plots”: both the vertical and horizontal axes are shown on a log scale. The horizontal axis is truncated at the abundance value where the cumulative expected
number of species equals 99% of the total (i.e., on average, only 1 of 100 species would be expected to have greater abundance). (A–F) The six different neutral models, as
specified at the top of the corresponding group of panels. For models A–D, E(S) in each panel indicates the expected number of species for that parameter combination.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
6 of 14
Fig. S2. Observed and best-fit mesoscale abundance distributions. On the map, different combinations of colors and symbols correspond to different datasets:
these are reproduced in the corresponding figure panels. See Table S1 for metadata, including abbreviations. Each point on the map is located at the centroid
of the individual sites that were pooled to generate each mesoscale abundance distribution. Panels above and below the map compare observed and fitted
species abundance distributions at this scale. The bars represent the mean proportion of species in different octave classes of abundance, across all mesoscale
abundance distributions from the corresponding ecosystem (these are shown as true doubling classes: the first bar represents species with abundance 1; the
second, abundances 2–3; the third, abundances 4–7; etc.). The blue and red lines show the mean of fitted values from fits of the Poisson-gamma and Poissonlognormal distributions, respectively, to each mesoscale abundance distribution.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
7 of 14
Fig. S3. Observed and best-fit regional abundance distributions. On the map, the area over which sites were pooled for each regional abundance distribution
has been outlined. See Table S1 for metadata, including abbreviations. The panels above and below the map compare observed and fitted species abundance
distributions at this scale. The bars represent the proportion of species in different octave classes of abundance (these are shown as true doubling classes: the
first bar represents species with abundance 1; the second, abundances 2–3; the third, abundances 4–7; etc.). The blue and red lines show the fitted values from
fits of the Poisson-gamma and Poisson-lognormal distributions to the data, respectively.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
8 of 14
0.6
0.4
0.2
−0.2
0.0
Observed ΔMLLS −1
0.1 0.2 0.3 0.4
B
−0.1
ΔMLLS −1
A
0.0
0.2
0.4
0.6
0.8
1.0
−0.2
0.0
0.2
0.4
0.6
0.6
0.8
0.6
0.4
0.2
0.0
Observed ΔMLLS −1
0.1 0.2 0.3 0.4
−0.1
ΔMLLS −1
D
0.0
0.4
Predicted ΔMLLS −1
Maximum relative abundance
C
0.2
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Predicted ΔMLLS −1
Maximum relative abundance
0.1
−0.1
ΔMLLS −1
0.3
E
0.0
0.2
0.4
0.6
0.8
1.0
Maximum relative abundance
Fig. S4. Analysis of the variation in standardized support for the lognormal explained by the relative abundance of the most-abundant species (expressed as
a fraction of the number of individuals sampled), at the (A and B) site scale, (C and D) mesoscale, and (E) regional scale. Positive relationships indicate stronger
evidence against the gamma neutral approximation as the most-abundant species becomes more dominant. In A and C, the thick solid and dashed lines
represent the overall (i.e., fixed effects) relationship, with 95% confidence intervals. The gray lines represent the relationships for the 14 individual datasets,
based on the estimated random effects; individual lines are drawn to span only the range of horizontal axis values observed in the corresponding dataset. B
and D show corresponding plots of observed versus predicted values, as estimated from the full fitted model. Because there is substantial overlap of points, the
points have been color-coded according to the number of nearby observations, grading from red (high density of points) to blue. E is an ordinary least-squares
(OLS) regression: because there is only one (pooled) regional abundance distribution per site, there is no random effect.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
9 of 14
0.6
0.4
0.2
−0.2
0.0
Observed ΔMLLS−1
0.05 0.10 0.15
B
−0.05
ΔMLLS −1
A
0.0
0.2
0.4
0.6
0.8
1.0
−0.2
0.0
0.6
0.6
0.4
0.2
0.0
Observed ΔMLLS−1
0.05 0.10 0.15
D
−0.05
ΔMLLS −1
0.4
Predicted ΔMLLS−1
Proportion of singletons
C
0.2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.3
0.4
0.5
0.6
0.15
0.05
−0.05
ΔMLLS−1
0.2
Predicted ΔMLLS−1
Proportion of singletons
E
0.1
0.0
0.2
0.4
0.6
0.8
1.0
Proportion of singletons
Fig. S5. Analysis of the variation in standardized support for the lognormal explained by the proportion of species that are singletons, at the (A and B) site
scale, (C and D) mesoscale, and (E) regional scale. Positive relationships indicate stronger evidence against the gamma neutral approximation as proportion of
singletons increases. In A and C, the thick solid and dashed lines represent the overall (i.e., fixed effects) relationship, with 95% confidence intervals. The gray
lines represent the relationships for the 14 individual datasets, based on the estimated random effects; individual lines are drawn to span only the range of
horizontal axis values observed in the corresponding dataset. B and D show corresponding plots of observed versus predicted values, as estimated from the full
fitted model. Because there is substantial overlap of points, the points have been color-coded according to the number of nearby observations, grading from red
(high density of points) to blue. E is an OLS regression: because there is only one (pooled) regional abundance distribution per site, there is no random effect.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
10 of 14
Table S1. Metadata summary of global marine species abundance distribution samples
Dataset name
South East Fishery:
Shelf Fish (SEF)
Western Australia:
Deep Fish (WAF)
Great Barrier Reef Fish
(GBR)
Antarctic Molluscs
(ANM)
Indo-Pacific Coral
Crustaceans (IPC)
Tuscany Archipelago
Fish (TAP)
Eastern Bass Strait
Invertebrates (EBS)
NOAA Central Pacific
Reef Fish (CPF)
Sunderban
Zooplankton (SUZ)
Scotian Shelf Fish (SSF)
Bass Strait Intertidal
Macroinvertebrates
(BSI)
North Sea
Invertebrates (NSI)*
Summary
Latitudinal
limits
Longitudinal
limits
Depth
range, m
Sampling
method
Data contact
Fish from southeastern
Australia; total of 173
species at 189 sites pooled
into 13 mesoscale SADs
Fish from western Australia;
total of 282 species at 65
sites pooled into 23
locations
Fish from underwater visual
census surveys of coral reefs
on the Great Barrier Reef
(1); total of 195 species at
74 sites pooled into 8
mesoscale SADs
Deep-water bivalves from the
Scotia Arc, Antarctica; total
of 96 species at 20 sites
pooled into 4 mesoscale
SADs
Crustacean samples
encompassing a total of
411 species from individual
dead coral heads at 8 sites
in the Indo-Pacific; not
pooled at mesoscale
Fish abundance from the
Tuscany Archipelago; total
of 39 species at 30 sites
pooled into 4 mesoscale
SADs
Invertebrates from the
Eastern Bass Strait,
Australia (2); total of 801
species at 47 sites pooled
into 3 mesoscale SADs
Fish from underwater visual
surveys of coral reefs
throughout the Pacific;
total of 491 species at 49
sites pooled into 5
mesoscale SADs
Zooplankton from Sunderban
mangrove wetland, India;
total of 31 species at 7 sites;
not pooled at mesoscale
Fish from the Scotian Shelf,
Northwestern Atlantic;
total of 98 species at 458
sites pooled into 14
mesoscale SADs
−39.0
−36.4
146.5
150.3
16
254
Fish trawl
A.W.
[email protected]
−35.1
−20.1
111.4
115.2
197
1,580
Fish trawl
A.W.
[email protected]
−23.9
−14.5
145.3
152.7
7
−58.2
−65.5
−60.0
−23.6
774
6,348
−23.4
6.4
−113.7
−149.8
10
Hand
counts
42.2
43.1
9.8
11.1
8
12
UVS
L.B.-C.
[email protected]
−37.9
−37.8
148.2
148.7
17
51
Grab
sample
R.S.W., G.C.B.P.
[email protected]
−14.6
28.5
−154.8
142.8
21.6
22.3
88.0
88.9
42.1
45.6
−67.2
−57.3
16
176
Trawl
Invertebrates from the Bass
Strait, Australia (3); total of
98 species at 53 sites pooled
into 5 mesoscale SADs
Benthic invertebrates from
the North Sea; total of 244
species at 46 sites pooled
into 6 mesoscale SADs
−39.1
−37.6
141.4
149.8
Intertidal
UVS
54.3
60.4
−1.0
8.0
38
115
vanVeen
grab
Connolly et al. www.pnas.org/cgi/content/short/1406664111
UVS
Epibenthic
sledge
8.24
17.11
UVS
2.0
8.9
Plankton
tow
Hugh Sweatman
Australian Institute of Marine
Science, Townsville, Australia
[email protected]
K.L.
[email protected]
L.P.
[email protected]
R.E.B.
[email protected]
S.K.S.
[email protected]
Steven E. Campana
Fisheries and Oceans Canada,
Bedford Institute of
Oceanography, Dartmouth,
Canada
[email protected]
T.D.O.
[email protected]
U.S.
[email protected]
11 of 14
Table S1. Cont.
Dataset name
Norwegian Shelf
Macrobenthos
(NSM)†
Antarctic Isopods
(ANI)‡
Summary
Benthic invertebrates
collected along the
Norwegian Shelf (4); total
of 805 species at 101 sites
pooled into 4 mesoscale
SADs
Isopods from the Southern
Ocean (5); total of 502
species at 38 sites pooled
into 8 mesoscale SADs
Latitudinal
limits
Longitudinal
limits
Depth
range, m
Sampling
method
vanVeen
grab
56.0
71.8
1.7
23.5
65
434
−71.3
−58.2
0.0
−64.7
774
6,348
Epibenthic
sledge
Data contact
K.E.E.
[email protected]
A.B.
[email protected]
SAD, species abundance distribution; UVS, underwater visual survey (belt transects in all cases).
1.
2.
3.
4.
5.
Sweatman H, et al. (2008) Long-Term Monitoring of the Great Barrier Reef (Australian Institute of Marine Science, Townsville, Australia), Status Report no. 8.
Gray JS, et al. (1997) Coastal and deep-sea benthic diversities compared. Mar Ecol Prog Ser 159:97–103.
O’Hara TD, Addison PFE, Gazzard R, Costa TL, Pocklington JB (2010) A rapid biodiversity assessment methodology tested on intertidal rocky shores. Aquat Conserv 20(4):452–463.
Ellingsen KE, Gray JS (2002) Spatial patterns of benthic diversity: Is there a latitudinal gradient along the Norwegian continental shelf? J Anim Ecol 71(3):373–389.
Brandt A, et al. (2007) First insights into the biodiversity and biogeography of the Southern Ocean deep sea. Nature 447(7142):307–311.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
12 of 14
Connolly et al. www.pnas.org/cgi/content/short/1406664111
13 of 14
66.9 (25–127) 0.14 (0.09–0.20) 0.46 (0.38–0.62)
17.0 (1–36)
18.7 (8–32)
9.4 (1–21)
(71–1,018)
(2–10,803)
(18–3,714)
(6–21,436)
44.4 (7–66)
(64–2,535)
0.32 (0.08–0.87) 0.41 (0.14–0.61)
0.45 (0.24–0.70) 0.02 (0.00–0.11)
44.4 (28–77)
30.2 (15–44)
(92–4,608)
(1,200–116,444)
PS§
NA
0.56 (0.33–0.90) 0.18 (0.05–0.37)
0.37 (0.10–0.93) 0.35 (0.16–0.59)
0.37 (0.19–0.71) 0.24 (0.11–0.38)
NA
0.50 (0.48–0.54) 0.08 (0.06–0.09)
0.34 (0.12–0.56) 0.37 (0.21–0.48)
0.13 (0.08–0.24) 0.36 (0.23–0.44)
MRA‡
116.9 (78–144)
NA
NA
0.20 (0.14–0.30) 0.21 (0.14–0.33)
NA
573.7 (500–661) 0.06 (0.05–0.07) 0.16 (0.14–0.18)
37.5 (19–53)
43.8 (12–101)
60.8 (20–93)
NA
32.2 (30–34)
41.0 (27–79)
141.4 (73–193)
S†
6,183 (1,224–12,361)
17,074 (5,201–42,572)
61.6 (54–76)
101.2 (54–147)
S† MRA‡ PS§
39 0.51 0.03
96 0.21 0.33
98 0.32 0.19
31 0.36 0.00
98 0.39 0.11
37,097 244 0.29 0.21
68,298 805 0.08 0.19
474,151 491 0.19 0.09
43,768 195 0.15 0.11
4,689
71,382 801 0.05 0.13
391,484
24,272 282 0.32 0.22
124,777 173 0.26 0.16
644 135 0.06 0.44
17,303
2,175
11,788 502 0.05 0.17
N*
Regional
0.47 (0.34–0.78) 0.07 (0.05–0.09) 1,650,842
0.41 (0.10–0.83) 0.23 (0.09–0.37)
357.5 (176–552) 0.08 (0.04–0.11) 0.24 (0.19–0.29)
94,830 (35,819–223,060) 238.0 (171–293) 0.30 (0.16–0.40) 0.13 (0.08–0.16)
5,471 (686–9,315)
NA
23,794 (16,182–28,145)
27,963 (3,493–78,019)
1,055 (44–5,868)
9,598 (112–22,978)
NA
4,326 (3,276–5,867)
544 (353–816)
1,474 (284–3,531)
N*
0.61 (0.26–0.95) 0.14 (0.00–0.33) 330,168 (96,743–643,500)
0.39 (0.09–0.87) 0.35 (0.08–0.51)
86.7 (35–148) 0.15 (0.07–0.51) 0.38 (0.20–0.63)
(87–2,677)
104.1 (32–183) 0.30 (0.11–0.61) 0.20 (0.08–0.39)
15.3 (7–28)
(548–85,423)
0.65 (0.14–1.00) 0.28 (0.00–0.75)
0.37 (0.07–0.97) 0.37 (0.09–0.67)
0.51 (0.16–1.00) 0.32 (0.00–0.71)
0.49 (0.29–0.74) 0.17 (0.04–0.33)
125.7 (23–384) 0.18 (0.05–0.70) 0.34 (0.00–0.70)
(148–1,190)
(48–8,641)
21.2 (16–26)
(223–1,118)
0.39 (0.17–1.00) 0.45 (0.10–1.00)
14.0 (1–33)
(1–560)
PS§
0.24 (0.05–1.00) 0.49 (0.21–1.00)
MRA‡
40.2 (1–91)
S†
(1–2,651)
N*
Mesoscale
Values reported are means, with ranges in parentheses. Due to the small number of sites, there are no mesoscale abundance distributions for IPC and SUZ (Materials and Methods). Because the regional
abundance distribution pools all samples in that dataset, there is only one abundance distribution, so no range of values is reported.
*N, number of individuals sampled.
†
S, number of species appearing in the sample.
‡
MRA, maximum relative abundance (abundance of the most abundant species, as a proportion of the total number of individuals in the abundance distribution).
§
PS, proportion of species sampled that are singletons (i.e., abundance = 1).
Antarctic
310
Isopods (ANI)
Antarctic
109
Molluscs (ANM)
Tuscany
577
Archipelago
Fish (TAP)
Indo-Pacific Coral
336
Commensals
(IPC)
SE Australia: Shelf
660
Fish (SEF)
W Australia: Deep
373
Fish (WAF)
Scotian Shelf
855
Fish (SSF)
Eastern Bass Strait 1,519
Invertebrates
(EBS)
Sunderban
670
Zooplankton
(SUZ)
Great Barrier
591
Reef Fish (GBR)
Central Pacific
9,677
Reef Fish (CPF)
Norwegian Shelf
676
Macrobenthos
(NSM)
North Sea
806
Invertebrates
(NSI)
Bass Strait
31,148
Intertidal (BSI)
Dataset
Site
Table S2. Summary of the abundance distributions analyzed, at each of the three scales
Table S3. Groupings used and model selection for single-guild analysis
Dataset
Guild name
Antarctic Isopods (ANI)
Antarctic Molluscs (ANM)
Tuscany Archipelago Fish (TAP)
Indo-Pacific Coral Crustaceans (IPC)
SE Australia: Shelf Fish (SEF)
W Australia: Deep Fish (WAF)
Scotian Shelf Fish (SSF)
Eastern Bass Strait Invertebrates (EBS)
Sunderban Zooplankton (SUZ)
Great Barrier Reef Fish (GBR)
Central Pacific Reef Fish (CPF)
Norwegian Shelf Macrobenthos (NSM)
North Sea Invertebrates (NSI)
Bass Strait Intertidal (BSI)
Overall
% support for lognormal
No. of species
in group
% of sites
fitted at site level
Site
Mesoscale
Regional
486
24
38
334
101
121
48
347
27
60
164
76
82
0
100
100
57
38
6
91
100
53
100
4
>0.9999
NA
>0.9999
>0.9999
>0.9999
0.0199
0.0727
>0.9999
>0.9999
>0.9999
>0.9999
0.6079
>0.9999
0.7048
0.9999
NA
0.9999
>0.9999
>0.9999
>0.9999
NA
>0.9999
0.9998
0.9902
>0.9999
0.8639
0.9973
>0.9999
0.8741
0.9958
0.9998
0.2905
0.9772
0.9515
0.8776
0.7062
74
49
74
100
0.6953
>0.9999
>0.9999
0.0902
>0.9999
>0.9999
0.0593
0.9937
>0.9999
Detritus feeders
Suspension feeders
Benthic feeders
Decapods
Invertivores, benthic prey
Invertivores, benthic prey
Invertivores
Deposit feeders
Planktivores
Herbivores
Invertivores
Deposit feeders, Malacostraca
only
Deposit feeders
Grazers
Percentage support values indicate relative support for the Poisson-lognormal over Poisson-gamma model fitted to the species abundance data at three
scales: site level, mesoscale, and regional. Each row represents a different dataset. For ANM, there were no sites with more than five distinct species abundance
values for the most species-rich functional group, so model selection was only done at the mesoscale and regional scale. For IPC and SUZ, there were too few
species abundance distributions to create mesoscale groupings. The last row is an overall test, based on summing the log-likelihoods across all datasets. Where
lognormal model has at least 95% support, the model’s weight is shown in bold. Where Poisson-gamma model has at least 95% support, the model’s weight is
shown underlined.
Table S4. Mixed-effects and OLS regression model results for analysis of standardized relative
support for the Poisson-lognormal
Explanatory variable
Site level
Intercept only
Log(no. of distinct abundances)
Maximum relative abundance
Proportion of singletons
Mesoscale
Intercept only
Log(no. of distinct abundances)
Maximum relative abundance
Proportion of singletons
Regional
Intercept only
Log(no. of distinct abundances)
Maximum relative abundance
Proportion of singletons
Overall slope† ± SE
t‡
NA
−0.009 ± 0.015
0.386 ± 0.020
0.074 ± 0.051
NA
−0.58
19.66***
1.45
NA
−0.009 ± 0.013
0.261 ± 0.051
0.046 ± 0.074
Slope ± SE
NA
−0.039 ± 0.014
0.288 ± 0.043
−0.160 ± 0.108
NA
−0.74
5.08***
0.63
t
NA
−2.83*
6.75***
−1.48
Marginal R2
§
Pseudo-R2
AIC
0.00
<0.01
0.47
0.01
0.08
0.11
0.60
0.14
−1,879.6
−1,883.2
−2,641.5
−1,913.6
0.00
0.01
0.24
<0.01
0.00
0.01
0.71
<0.01
−186.4
−181.0
−270.8
−180.8
AIC
−43.3
−48.5
−63.3
−43.7
R2
0.00
0.40
0.79
0.16
Except for Akaike’s information criterion (AIC), which was calculated from maximum-likelihood fits, all values
reported in the table were obtained using restricted maximum likelihood (REML).
†
“Overall slope” refers to the fixed effects component of the model.
‡
The t statistic for the slope parameter. The number of asterisks indicates the level of statistical significance:
*0.01 < P < 0.05, **0.001 < P < 0.01, and ***P < 0.001.
§
Percentage of variation explained by the fixed effect only.
{
Calculated from the residuals of the full fitted model ð1 − ðσ 2resid =σ 2tot ÞÞ.
Connolly et al. www.pnas.org/cgi/content/short/1406664111
14 of 14