Commonness and rarity in the marine biosphere Sean R. Connollya,1, M. Aaron MacNeilb, M. Julian Caleyb, Nancy Knowltonc,1, Ed Crippsd, Mizue Hisanoa, Loïc M. Thibauta, Bhaskar D. Bhattacharyae, Lisandro Benedetti-Cecchif, Russell E. Brainardg, Angelika Brandth, Fabio Bullerif, Kari E. Ellingseni, Stefanie Kaiserj, Ingrid Krönckek, Katrin Linsel, Elena Maggif, Timothy D. O’Haram, Laetitia Plaisancec, Gary C. B. Poorem, Santosh K. Sarkare, Kamala K. Satpathyn, Ulrike Schückelk, Alan Williamso, and Robin S. Wilsonm a School of Marine and Tropical Biology, and Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD 4811, Australia; bAustralian Institute of Marine Science, Townsville, QLD 4810, Australia; cNational Museum of Natural History, Smithsonian Institution, Washington, DC 20013; dSchool of Mathematics and Statistics, University of Western Australia, Perth, WA 6009, Australia; eDepartment of Marine Science, University of Calcutta, Calcutta 700 019, India; fDipartimento di Biologia, University of Pisa, I-56126 Pisa, Italy; gCoral Reef Ecosystem Division, Pacific Islands Fisheries Science Center, National Oceanic and Atmospheric Administration, Honolulu, HI 96818; hBiocenter Grindel and Zoological Museum, University of Hamburg, 20146 Hamburg, Germany; iNorwegian Institute for Nature Research, FRAM–High North Research Centre for Climate and the Environment, 9296 Tromsø, Norway; jGerman Centre for Marine Biodiversity Research, Senckenberg am Meer, 26382 Wilhelmshaven, Germany; kMarine Research Department, Senckenberg am Meer, 26382 Wilhelmshaven, Germany; lBritish Antarctic Survey, Cambridge CB3 0ET, United Kingdom; mMuseum Victoria, Melbourne, VIC 3001, Australia; nEnvironment and Safety Division, Indira Gandhi Centre for Atomic Research, Kalpakkam 603 102, India; and oCommonwealth Scientific and Industrial Research Organization, Marine and Atmospheric Research, Marine Laboratories, Hobart, TAS 7001, Australia Explaining patterns of commonness and rarity is fundamental for understanding and managing biodiversity. Consequently, a key test of biodiversity theory has been how well ecological models reproduce empirical distributions of species abundances. However, ecological models with very different assumptions can predict similar species abundance distributions, whereas models with similar assumptions may generate very different predictions. This complicates inferring processes driving community structure from model fits to data. Here, we use an approximation that captures common features of “neutral” biodiversity models—which assume ecological equivalence of species—to test whether neutrality is consistent with patterns of commonness and rarity in the marine biosphere. We do this by analyzing 1,185 species abundance distributions from 14 marine ecosystems ranging from intertidal habitats to abyssal depths, and from the tropics to polar regions. Neutrality performs substantially worse than a classical nonneutral alternative: empirical data consistently show greater heterogeneity of species abundances than expected under neutrality. Poor performance of neutral theory is driven by its consistent inability to capture the dominance of the communities’ most-abundant species. Previous tests showing poor performance of a neutral model for a particular system often have been followed by controversy about whether an alternative formulation of neutral theory could explain the data after all. However, our approach focuses on common features of neutral models, revealing discrepancies with a broad range of empirical abundance distributions. These findings highlight the need for biodiversity theory in which ecological differences among species, such as niche differences and demographic trade-offs, play a central role. | metacommunities marine macroecology Poisson-lognormal distribution sizes, or other characteristics of an assemblage, and then deriving the aggregate assemblage properties implied by those assumptions (4–8). One of the most commonly investigated of these assemblage-level properties is the species abundance distribution (SAD)—the pattern of commonness and rarity among species (9–11). Ecologists have long sought to identify mechanisms that can explain common features of, and systematic differences among, the shapes of such distributions, and have used the ability to reproduce empirical SADs as a key test of biodiversity theory in species-rich systems (4, 6, 11–14). Over the last decade, one of the most prevalent and influential approaches to explaining the structure of high-diversity assemblages has been neutral theory of biodiversity (12, 15, 16). Neutral models assume that individuals are demographically and ecologically equivalent, regardless of species. Thus, variation in relative Significance Tests of biodiversity theory have been controversial partly because alternative formulations of the same theory seemingly yield different conclusions. This has been a particular challenge for neutral theory, which has dominated tests of biodiversity theory over the last decade. Neutral theory attributes differences in species abundances to chance variation in individuals’ fates, rather than differences in species traits. By identifying common features of different neutral models, we conduct a uniquely robust test of neutral theory across a global dataset of marine assemblages. Consistently, abundances vary more among species than neutral theory predicts, challenging the hypothesis that community dynamics are approximately neutral, and implicating species differences as a key driver of community structure in nature. | species coexistence | D etermining how biodiversity is maintained in ecological communities is a long-standing ecological problem. In species-poor communities, niche and demographic differences between species can often be estimated directly and used to infer the importance of alternative mechanisms of species coexistence (1–3). However, the “curse of dimensionality” prevents the application of such species-by-species approaches to high-diversity assemblages: the number of parameters in community dynamics models increases more rapidly than the amount of data, as species richness increases. Moreover, most species in high-diversity assemblages are very rare, further complicating the estimation of strengths of ecological interactions among species, or covariation in different species’ responses to environmental fluctuations. Consequently, ecologists have focused instead on making assumptions about the overall distribution of demographic rates, niche www.pnas.org/cgi/doi/10.1073/pnas.1406664111 Author contributions: S.R.C., M.A.M., and M.J.C. designed research; N.K., B.D.B., L.B.-C., R.E.B., A.B., F.B., K.E.E., S.K., I.K., K.L., E.M., T.D.O., L.P., G.C.B.P., S.K.S., K.K.S., U.S., A.W., and R.S.W. performed research; S.R.C., E.C., M.H., and L.M.T. analyzed data; S.R.C. designed analytical approach; M.J.C. initiated the collaboration between Census of Marine Life and S.R.C.; N.K. was the initiator of Census of Marine Life Project (producer of dataset) for coral reefs; N.K. designed data collection protocols; B.D.B., L.B.-C., R.E.B., A.B., F.B., K.E.E., S.K., I.K., K.L., E.M., T.D.O., L.P., G.C.B.P., S.K.S., K.K.S., U.S., A.W., and R.S.W. provided advice on treatment of data for analyses; and S.R.C., M.A.M., M.J.C., N.K., and E.C. wrote the paper. Reviewers: B.M., University of Maine; F.H., University of Alberta. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence may be addressed. E-mail: [email protected] or Sean. [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1406664111/-/DCSupplemental. PNAS Early Edition | 1 of 6 ECOLOGY Contributed by Nancy Knowlton, April 28, 2014 (sent for review November 25, 2013; reviewed by Brian McGill and Fangliang He) abundance among species arises purely from demographic stochasticity: chance variation in the fates of individuals (i.e., birth, death, immigration, and speciation events). Most studies investigating neutral theory aim to determine whether community structure in nature is consistent with the theory’s core species equivalence assumption. This is typically done by assessing the fit of a neutral model to empirical data, sometimes relative to a putatively nonneutral alternative (17–20). However, although all neutral models share the species equivalence assumption, they differ with respect to auxiliary assumptions, such as the mode of speciation assumed, leading to different predictions for SADs and other ecological patterns. Indeed, attempts to draw conclusions from tests of neutral theory are almost invariably disputed, largely due to arguments about the extent to which alternative auxiliary assumptions can materially alter neutral theory’s ability to explain the data (11, 12, 18, 21). An alternative, potentially more robust approach to evaluating neutral theory was proposed by Pueyo (22), based on approximating neutral and nonneutral dynamics as successively higherorder perturbations of a model for the idealized case of pure random drift in abundances. This approach predicts that a gamma distribution should approximate the distribution of species abundances for small departures for random drift, whereas assemblages exhibiting greater departures from neutrality should be better approximated by a lognormal distribution. This raises the possibility that a comparison of gamma and lognormal SADs could offer a robust test for the signature of nonneutrality in species abundance data, provided that the gamma distribution provides a sufficiently close approximation to SADs produced by neutral models. Here, we evaluate Pueyo’s framework and apply it to patterns of commonness and rarity in 1,185 SADs from 14 marine ecosystems ranging from shallow reef platforms to abyssal depths, and from the tropics to polar regions (Fig. 1 and Tables S1 and S2). First, we test the gamma neutral approximation against several models of community dynamics that share the core neutrality assumption of species equivalence, but make different assumptions about the speciation process, spatial structure of the metacommunity, and the nature of competition between individuals. Then, we analyze the marine species abundance data, to evaluate whether they are consistent with the hypothesis that marine assemblages are neutrally structured. Finally, we ask whether patterns of commonness and rarity deviate from neutral expectation in Fig. 1. Sampling locations of SADs. Color and symbol combinations correspond to particular ecosystems. These symbols are reproduced in the surrounding panels, which show observed and fitted SADs for the site-level data (averaged across sites) of the corresponding ecosystem. The bars represent the mean proportion of species at each site in different octave classes of abundance, across all sites in the corresponding dataset [the first bar represents species with abundance 1, then abundances 2–3, abundances 4–7, etc. (10)]. The blue and red lines show the mean of fitted values from site-by-site fits of the Poissongamma and Poisson-lognormal distributions to the data, respectively. 2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111 Connolly et al. Results A gamma distribution of species abundances closely approximates several alternative neutral models across a broad range of neutral model parameter values (Fig. S1; see SI Results for further discussion). Moreover, the gamma consistently outperforms the lognormal when fitted to data simulated from neutral models. Specifically, as the number of distinct species abundance values in the simulated data increases, the relative support for the gamma distribution becomes consistently stronger for all of the neutral models we considered (Fig. 2A). This reflects the fact that datasets with only a small number of abundance values (e.g., a site containing 11 species, 10 of which are only represented by one individual) provide very little information about the shape of the SAD, whereas those with more abundance values provide more information (e.g., a site with 100 species whose abundances are spread over 10–20 different values). In contrast to their relative fit to simulated neutral SADs, the lognormal consistently outperforms the gamma distribution when fitted to real marine species abundance data. When considered in terms of average support per SAD, relative support for the lognormal becomes consistently stronger as the number of observed species abundance values increases, in direct contrast with the simulated neutral data (Fig. 2B). Moreover, when the strength of evidence is considered cumulatively across all sites for each dataset, the lognormal has well over 99% support as the better model in each case (Table 1). This substantially better fit of the lognormal is retained in every case when data are pooled to the mesoscale, and, in all cases save one, when data are pooled at the regional scale (Table 1, Fig. 2B, and Figs. S2 and S3). The lognormal also remained strongly favored when we tested the robustness of our results by classifying species into taxonomic and ecological guilds, and restricting our analysis to the most species-rich guild within each dataset (see SI Results and Table S3). Inspection of the lack of fit of the gamma neutral approximation indicates that it deviates from the data in highly consistent ways: real SADs exhibit substantially more heterogeneity than the gamma distribution can generate (Fig. 3). Specifically, the gamma is unable to simultaneously capture the large number of rare species and the very high abundances of the most common Independent Spatial ANI ANM BSI CPF EBS GBR IPC NSI NSM SEF SSF SUZ TAP WAF B 0.4 0.8 A Site Meso Regional 0.0 Mean Akaike weight for lognormal over gamma Original Fission Protracted 10 20 50 100 200 10 20 50 100 200 Median number of distinct abundance values Fig. 2. Species abundances are better approximated by (A) a gamma distribution for simulated neutral communities, but (B) a lognormal distribution for the empirical data. Percentage support for the lognormal versus the gamma is plotted as a function of the number of observed distinct species abundances. In A, different neutral models are plotted with different colors, and each point represents a particular neutral model parameter combination from Fig. S1. In B, each combination of symbol and color represents a different marine ecosystem, whereas increasing symbol size indicates the increasing scale at which abundances were pooled (site, mesoscale, and regional). Connolly et al. species. For abundance distributions lacking an internal mode (i.e., where the leftmost bar in the SAD is the largest one), this is manifested as an excess of rare species and paucity of species with intermediate abundance, relative to the best-fit neutral approximation (Fig. 3A, blue lines). Conversely, when an internal mode is present in the data, the abundances of the most highly abundant species are consistently higher than the gamma distribution can produce (Fig. 3B, blue lines). In contrast, discrepancies between the data and the lognormal are much smaller in magnitude, and more symmetrically distributed around zero, compared with the gamma (Fig. 3, red lines). Detailed analysis of variation in the strength of evidence against neutrality, within and among datasets, indicates that the relative performance of the lognormal over the gamma is substantially driven by the fact that the most abundant species is, on average, too dominant to be captured by the gamma neutral approximation. After controlling for the effects of the number of abundance values in the sample on statistical power, the relative abundance of the most-abundant species explained over one-half of the variation in the strength of support for the lognormal over the gamma, for site-level, mesoscale, and regional-scale abundance distributions (Table S4, Fig. S4, and SI Results). Conversely, the prevalence of rarity was a poor predictor of the strength of evidence against the gamma neutral approximation (Table S4, Fig. S5, and SI Results). In addition to outperforming the gamma neutral approximation, tests of the absolute goodness of fit of the lognormal suggest that it approximates the observed species abundance data well. Statistically significant lack of fit (at α = 0.05) to the lognormal was detected in 4.8% of sites, approximately equal to what would be expected by chance, under the null hypothesis that the SADs are in fact lognormal. Moreover, lognormal-based estimates of the number of unobserved species in the regional species pool are realistic, and very similar to those produced by an alternative, nonparametric jackknife method that relies on presence–absence rather than abundance data (Fig. 4). Discussion Recently, the use of SADs to test biodiversity theory has been criticized because different species abundance models often generate very similar predictions, which can be difficult to distinguish when fitted to species abundance data (9). Consequently, some researchers have focused on other properties of assemblages, such as community similarity (12), species–area and species–time relationships (23, 24), and relationships between species traits or phylogeny and species abundance (25, 26). Such approaches are powerful when evaluating the performance of particular species abundance models. However, because models combine multiple assumptions, attributing a model’s failure to one assumption in particular, such as species equivalence, is problematic. Indeed, in the debate over neutral theory of biodiversity, studies that show failure of a neutral model (12, 25, 27) are almost invariably followed by responses showing that packaging neutrality with a different set of alternative assumptions can explain the data after all (11, 28, 29). Although the identification of alternative auxiliary assumptions that preserve a theory’s core prevents premature abandonment of a promising theory, it also can hinder progress by inhibiting the reallocation of scientific effort to more promising research programs (30). Given the proliferation of alternative theories of biodiversity (8, 14, 31, 32), identifying and testing predictions that are robust to auxiliary assumptions, and therefore better target a theory’s core assumptions, should be a high priority. Here, we showed that, as previously hypothesized (22), a gamma distribution successfully captures features common to several models that share the core neutrality assumptions of species equivalence, but make very different auxiliary assumptions. We then found that this approximation cannot simultaneously PNAS Early Edition | 3 of 6 ECOLOGY idiosyncratic ways, or whether there are particular features of real SADs that cannot be captured by neutral models. Table 1. Model selection for analysis at whole-dataset level Site Dataset Antarctic Isopods (ANI) Antarctic Molluscs (ANM) Tuscany Archipelago Fish (TAP) Indo-Pacific Coral Crustaceans (IPC) SE Australia: Shelf Fish (SEF) W Australia: Deep Fish (WAF) Scotian Shelf Fish (SSF) Eastern Bass Strait Invertebrates (EBS) Sunderban Zooplankton (SUZ) Great Barrier Reef Fish (GBR) Central Pacific Reef Fish (CPF) Norwegian Shelf Macrobenthos (NSM) North Sea Invertebrates (NSI) Bass Strait Intertidal (BSI) Overall Mesoscale Regional LN Gamma LN Gamma LN Gamma >0.9999 0.9981 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 <0.0001 0.0019 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 >0.9999 0.9995 0.9999 NA >0.9999 >0.9999 >0.9999 >0.9999 NA >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 >0.9999 <0.0001 0.0005 0.0001 NA <0.0001 <0.0001 <0.0001 <0.0001 NA <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 >0.9999 0.9992 0.9983 >0.9999 >0.9999 >0.9999 0.9984 0.0087 0.9892 >0.9999 >0.9999 >0.9999 >0.9999 0.9998 >0.9999 <0.0001 0.0008 0.0017 <0.0001 <0.0001 <0.0001 0.0016 0.9913 0.0108 <0.0001 <0.0001 <0.0001 <0.0001 0.0002 <0.0001 Values indicate percentage support for the gamma and lognormal (LN) models fitted to the species abundance data at three scales: site level, mesoscale, and regional. Each row represents a different dataset. For IPC and SUZ, there were too few SADs to create mesoscale groupings. The last row is an overall test, based on summing the log-likelihoods across all datasets. Where one of the models has at least 95% support, the model’s weight is shown in bold. account for the large number of very rare species, and the very high abundances of the most abundant species, in marine species abundance data. These discrepancies are highly consistent across diverse taxa, habitat types, and geographical locations, suggesting that this shortcoming of neutral theory is likely to be widespread in nature. In contrast to the simulated neutral assemblages, analysis of the real data reveals that the lognormal distribution captures much better the observed heterogeneity in species abundances, and produces realistic estimates of the number of unobserved species at the regional scale. Lognormal SADs are generated by models that make a variety of different assumptions about how variation in abundances is determined, but all those proposed to date explicitly or implicitly incorporate niche or demographic differences among species. For instance, Engen and Lande (33) derived the lognormal from stochastic community dynamics models that allow for differences in species’ intrinsic growth 0.0 0.2 0.4 0.6 0.8 1.0 6 4 2 0 −6 −4 −2 0.00 0.10 Residuals (log scale) B −0.10 Residuals (arithmetic scale) A 0.0 0.2 0.4 0.6 0.8 1.0 Normalized abundance class Fig. 3. Residuals (observed minus predicted number of species) of gamma (blue lines) versus lognormal (red lines) fits, illustrating why the gamma performs worse than the lognormal. Each line represents one ecosystem at one spatial scale, with log2 abundance class normalized so that each dataset’s uppermost octave is 1.0. (A) Residuals for abundance distributions lacking an internal mode, plotted on an arithmetic scale to highlight the lack of fit for the smaller (leftmost) abundance classes. (B) Residuals for abundance distributions with an internal mode, plotted on a logarithmic scale to illustrate the lack of fit for the larger (rightmost) abundance classes. 4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111 rates and carrying capacities, and for arbitrary patterns of covariation in species’ responses to environmental fluctuations (4). More generally, Pueyo (22) derived the lognormal as a generic approximation for deviations from neutral community dynamics. Alternatively, niche apportionment models explain variation in species abundances based on variation in niche size, rather than stochastic population fluctuations (8). The lognormal can arise from the hierarchical apportionment of niche space due to a central limit theorem-like phenomenon (34). It can also be expected to arise more generally when there are multiple niche dimensions, and niche overlap may occur. As niche dimensions increase, the niche hypervolume becomes the product of idiosyncratically varying niche widths along an increasingly large number of niche axes. Provided abundance is proportional to the niche hypervolume (potentially along with other factors, such as body size and the proportion of resources within the niche that have not been preempted by other species), abundance will tend toward a lognormal distribution, again by the central limit theorem (13, 18). The fairly general conditions under which lognormal distributions can arise suggests that a broad range of nonneutral characteristics, such as high-dimensional niche differences, lowdimensional life history trade-offs, or differential responses to environmental fluctuations, may give rise to approximately lognormal abundance distributions, such as those found in this study (13, 18, 22). Thus, although the comparatively poor performance of the gamma relative to the lognormal provides strong evidence against neutral dynamics as the principal driver of variation in abundances among species, the seemingly excellent fit of the lognormal distributions cannot, by itself, unambiguously identify which nonneutral processes are most responsible for the heterogeneity in species abundances in a given assemblage. Consequently, methods for apportioning the heterogeneity in species abundances among its potential causes, and which can yield inferences about species-rich assemblages, warrant further development. Some such methods, such as those based on relationships between abundance and species’ functional traits (34), or on spatiotemporal trends in relative abundances (35), explicitly predict lognormal SADs, and therefore are likely to be particularly promising. Connolly et al. 100 200 500 1000 Jackknife estimate Fig. 4. Agreement between lognormal-based and nonparametric estimates of the total number of species in the community. Points on the horizontal axis are richness estimates produced by the nonparametric jackknife, based on presence–absence data across sites. The points on the vertical axis are estimates produced by the lognormal model, fitted to the pooled regional abundance distributions. Error bars are 95% confidence intervals. The solid line is the unity line, where the lognormal and the nonparametric jackknife produce the same estimate of the number of unobserved species. Conclusions Neutral theory explains variation in the abundances and distribution of species entirely as a consequence of demographic stochasticity—chance variation in the fates of individuals (15, 36). Although proponents of neutral theory have always acknowledged the existence of ecological differences between species, neutral theory assumes that those differences are overwhelmed by the phenomena that are explicitly included in neutral models (14, 36). The formulation and testing of neutral theory has drawn attention to the potential importance of demographic stochasticity as a process that contributes to differences in species abundances that are unrelated to species’ ecological traits, such as niche size or competitive ability. Such effects should be particularly important among rare species (4). Indeed, our finding that there are common features of different neutral models suggests that it can play a role as a robust null expectation, at least for some aspects of community structure (16). However, the most abundant few species often numerically dominate communities and play a disproportionately large role in community and ecosystem processes (37). We have shown that neutral theory consistently underestimates amongspecies heterogeneity in abundances across a broad range of marine systems. The fact that its performance is closely linked to abundances of the most common species indicates that it is the ecological dominance of these very highly abundant species that cannot be explained by neutral processes alone. Commonness itself is poorly understood, but the identities of the most common species in ecosystems tend to remain quite consistent over ecological timescales (38). Thus, the key to understanding the distribution of abundances in communities, even species-rich ones, may lie as much in understanding how the characteristics of common species allow them to remain so abundant, as in understanding the dynamics and persistence of rare species. Materials and Methods Approximating Neutrality. Pueyo’s framework starts with a stochastic differential equation for random drift in population size (i.e., birth rate equals death rate, no density dependence, immigration, emigration, or speciation) and considers approximating departures from this model in terms of successively higher-order perturbations to it. Here, we take as our candidate neutral approximation the gamma distribution and, as our alternative model, the lognormal distribution. More specifically, because species abundance data are discrete, whereas the gamma and lognormal are continuous distributions, we use the Poisson-gamma (i.e., negative binomial) and Poisson-lognormal mixture distributions, as these distributions are commonly used to approximate discrete, random samples from underlying gamma or lognormal community abundance distributions (see SI Materials and Methods for further details). To assess whether the Poisson-gamma distribution provides a good approximation to the SADs produced under neutrality, we tested it against five Connolly et al. Empirical Data. Data were contributed to the Census of Marine Life (CoML) project and represent a diverse range of taxa, ocean realms, depths, and geographic locations (Table S1). To be included in our analysis, contributed data needed to meet several criteria (see SI Materials and Methods for details). Where datasets included samples over multiple years from the same sites, only the most recent year of data was used. Finally, we only fitted SADs if they contained more than five distinct species abundance values, to minimize convergence problems associated with fitting species abundance models to very sparse data. However, the data from such sites were still used in the analyses that pooled abundance distributions at larger scales. Fitting Models to Species Abundance Data. For both the simulated neutral data, and the real species abundance data, we fitted our models using maximum-likelihood methods (see SI Materials and Methods for details). For the empirical data, in addition to fitting our species abundance models at the site level, we also fitted pooled species abundances at a mesoscale level, and at the regional (whole-dataset) level. For datasets that were spatially hierarchically organized, we used this hierarchy to determine how to pool sites at the mesoscale [e.g., for the Central Pacific Reef Fish (CPF) data, sites were nested within islands, so pooling was done to the island level]. For data that were not explicitly hierarchically organized [Antarctic Isopods (ANI), Antarctic Molluscs (ANM), Scotian Shelf Fish (SSF), Bass Strait Intertidal (BSI)], cluster analysis was used to identify mesoscale-level groupings. In two cases [Sunderban Zooplankton (SUZ), Indo-Pacific Coral Crustaceans (IPC)], there were only a few sites sampled, and no natural hierarchical structure, so these data were omitted from the mesoscale analysis. For both the analysis of the marine species abundance data, and the analysis of the simulated neutral communities, model selection was based on Akaike weights, which are calculated from Akaike’s information criterion values and estimate the probability (expressed on a scale of 0–1) that a model is actually the best approximating model in the set being considered. Because the Poisson-gamma and the Poisson-lognormal have the same number of estimated parameters, this is equivalent to calculating model weights based on the Bayesian information criterion. For the empirical data, model selection was done at the whole-dataset level by summing the loglikelihoods for all individual sites (for the site-level analysis) or mesoscale (for the mesoscale analysis) abundance distributions for a dataset, and calculating Akaike weights based on these values (Table 1). However, this approach does not make sense for the analysis of the simulated neutral SADs, because an arbitrary degree of confidence can be obtained by simulating a large number of sites. Therefore, we instead calculated an expected level of model support on a per-SAD basis, for each neutral model and parameter combination, by calculating the mean difference in log-likelihoods across the 100 simulated datasets, and converting this mean into an Akaike weight. We examined these Akaike weights as functions of the number of distinct observed species abundance values, because we would expect our ability to distinguish between alternative models to increase as the number of distinct observed species abundance values increases. For comparison, we also calculated per-SAD Akaike weights for the marine species abundance data. This approach is less powerful than the aggregate whole-dataset comparisons shown in Table 1, but it facilitates visualization of the differences between the simulated neutral SADs (Fig. 2A) and the real marine SADs (Fig. 2B). Analysis of Variation in Performance of Neutral Approximation. The discrepancies between the data and the gamma neutral approximation suggest that real data exhibit too much heterogeneity in species’ abundances to be captured by the neutral approximation. To better understand this, we examined whether the relative model support varied systematically within or among datasets as a function of the prevalence of rare species, and the abundances of the most abundant species. As a measure of relative model support, we used a per-observation difference in log-likelihoods (see SI Materials and Methods for details). We first confirmed that this standardization PNAS Early Edition | 5 of 6 ECOLOGY 1000 200 50 Lognormal estimate 50 different neutral models: Hubbell’s original neutral model (39), a protracted speciation neutral model (21), a fission speciation model (40), an independent species model (11, 41), and a spatially explicit neutral model (42). We chose these five models because they encompass models that relax key assumptions of neutral theory as originally formulated; moreover, each of them meets a strict definition of neutrality: every individual has the same demographic rates, and the same per-capita effects on other individuals, regardless of species. We tested the approximation in two ways. First, we assess how closely (in absolute terms) the Poisson-gamma can approximate neutral abundance distributions. Second, we assess whether the Poisson-gamma outperforms the Poisson-lognormal when fitted to data generated according to neutral model assumptions (see SI Materials and Methods for details). controlled for the effect of sample size on statistical power (i.e., the trend illustrated in Fig. 2B). Then, we asked whether the variation in standardized model support was better explained by the numerical dominance of the most common species, or by the prevalence of very rare species, using mixed-effects linear models. ACKNOWLEDGMENTS. U.S. acknowledges S. Ehrich and A. Sell for providing ship time. The authors thank all participants in the Census of Marine Life project, particularly S. Campana, M. Sogin, K. Stocks, and L. A. Zettler. They also thank R. Etienne for providing advice for obtaining numerical solutions of the fission speciation neutral model, and J. Rosindell and S. Cornell for sharing simulated neutral community data from their spatially explicit neutral model. The authors thank T. Hughes for comments on an early version of the manuscript. K.E.E. acknowledges The Norwegian Oil and Gas Association for permitting use of data. A.B. acknowledges the support of the Ministry for Science and Technology and the German Research Foundation (Deutsche Forschungsgemeinschaft) for support of the Antarctic benthic deep-sea biodiversity (ANDEEP) and ANDEEP-System Coupling (SYSTCO) expeditions, as well as five PhD positions. A.B. also thanks the AlfredWegener-Institute for Polar and Marine Research for logistic help, as well as the crew of the vessel and all pickers, sorters and identifiers of the extensive deep-sea material. The Census of Marine Life funded the assembly of the metadataset. Analysis of the data was made possible by funding from the Australian Research Council (to S.R.C.). 1. Angert AL, Huxman TE, Chesson P, Venable DL (2009) Functional tradeoffs determine species coexistence via the storage effect. Proc Natl Acad Sci USA 106(28):11641–11645. 2. Levine JM, HilleRisLambers J (2009) The importance of niches for the maintenance of species diversity. Nature 461(7261):254–257. 3. Adler PB, Ellner SP, Levine JM (2010) Coexistence of perennial plants: An embarrassment of niches. Ecol Lett 13(8):1019–1029. 4. Sæther BE, Engen S, Grøtan V (2013) Species diversity and community similarity in fluctuating environments: Parametric approaches using species abundance distributions. J Anim Ecol 82(4):721–738. 5. Scheffer M, van Nes EH (2006) Self-organized similarity, the evolutionary emergence of groups of similar species. Proc Natl Acad Sci USA 103(16):6230–6235. 6. MacArthur JW (1960) On the relative abundance of species. Am Nat 94(874):25–34. 7. Clark JS (2010) Individuals and the variation needed for high species diversity in forest trees. Science 327(5969):1129–1132. 8. Tokeshi M (1999) Species Coexistence: Ecological and Evolutionary Perspectives (Blackwell Science, Oxford). 9. McGill BJ, et al. (2007) Species abundance distributions: Moving beyond single prediction theories to integration within an ecological framework. Ecol Lett 10(10): 995–1015. 10. Connolly SR, Hughes TP, Bellwood DR, Karlson RH (2005) Community structure of corals and reef fishes at multiple scales. Science 309(5739):1363–1365. 11. Volkov I, Banavar JR, Hubbell SP, Maritan A (2007) Patterns of relative species abundance in rainforests and coral reefs. Nature 450(7166):45–49. 12. Dornelas M, Connolly SR, Hughes TP (2006) Coral reef diversity refutes the neutral theory of biodiversity. Nature 440(7080):80–82. 13. May RM (1975) Ecology and Evolution of Communities, eds Cody ML, Diamond JM (Belknap Press of Harvard Univ Press, Cambridge, MA), pp 81–120. 14. Hubbell SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography (Princeton Univ Press, Princeton). 15. Clark JS (2009) Beyond neutral science. Trends Ecol Evol 24(1):8–15. 16. Rosindell J, Hubbell SP, He FL, Harmon LJ, Etienne RS (2012) The case for ecological neutral theory. Trends Ecol Evol 27(4):203–208. 17. McGill BJ (2003) A test of the unified neutral theory of biodiversity. Nature 422(6934): 881–885. 18. Connolly SR, Dornelas M, Bellwood DR, Hughes TP (2009) Testing species abundance models: A new bootstrap approach applied to Indo-Pacific coral reefs. Ecology 90(11): 3138–3149. 19. Etienne RS, Olff H (2005) Confronting different models of community structure to species-abundance data: A Bayesian model comparison. Ecol Lett 8(5):493–504. 20. Muneepeerakul R, et al. (2008) Neutral metacommunity models predict fish diversity patterns in Mississippi-Missouri basin. Nature 453(7192):220–222. 21. Rosindell J, Cornell SJ, Hubbell SP, Etienne RS (2010) Protracted speciation revitalizes the neutral theory of biodiversity. Ecol Lett 13(6):716–727. 22. Pueyo S (2006) Diversity: Between neutrality and structure. Oikos 112(2):392–405. 23. Adler PB (2004) Neutral models fail to reproduce observed species-area and speciestime relationships in Kansas grasslands. Ecology 85(5):1265–1272. 24. McGill BJ, Hadly EA, Maurer BA (2005) Community inertia of Quaternary small mammal assemblages in North America. Proc Natl Acad Sci USA 102(46):16701–16706. 25. Ricklefs RE, Renner SS (2012) Global correlations in tropical tree species richness and abundance reject neutrality. Science 335(6067):464–467. 26. Bode M, Connolly SR, Pandolfi JM (2012) Species differences drive nonneutral structure in Pleistocene coral communities. Am Nat 180(5):577–588. 27. Wills C, et al. (2006) Nonrandom processes maintain diversity in tropical forests. Science 311(5760):527–531. 28. Lin K, Zhang DY, He FL (2009) Demographic trade-offs in a neutral model explain death-rate—abundance-rank relationship. Ecology 90(1):31–38. 29. Chen AP, Wang SP, Pacala SW (2012) Comment on “Global correlations in tropical tree species richness and abundance reject neutrality” Science 336(6089):1639, author reply 1639. 30. Lakatos I (1978) The Methodology of Scientific Research Programmes (Cambridge Univ Press, Cambridge, UK). 31. McGill BJ (2010) Towards a unification of unified theories of biodiversity. Ecol Lett 13(5):627–642. 32. Harte J (2011) Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics (Oxford Univ Press, Oxford). 33. Engen S, Lande R (1996) Population dynamic models generating the lognormal species abundance distribution. Math Biosci 132(2):169–183. 34. Sugihara G, Bersier LF, Southwood TRE, Pimm SL, May RM (2003) Predicted correspondence between species abundances and dendrograms of niche similarities. Proc Natl Acad Sci USA 100(9):5246–5251. 35. Engen S, Lande R, Walla T, DeVries PJ (2002) Analyzing spatial structure of communities using the two-dimensional poisson lognormal species abundance model. Am Nat 160(1):60–73. 36. Alonso D, Etienne RS, McKane AJ (2006) The merits of neutral theory. Trends Ecol Evol 21(8):451–457. 37. Gaston KJ (2010) Ecology. Valuing common species. Science 327(5962):154–155. 38. Gaston KJ (2011) Common ecology. Bioscience 61(5):354–362. 39. Etienne RS, Alonso D (2005) A dispersal-limited sampling theory for species and alleles. Ecol Lett 8(11):1147–1156. 40. Etienne R, Haegeman B (2011) The neutral theory of biodiversity with random fission speciation. Theor Ecol 4(1):87–109. 41. He F (2005) Deriving a neutral model of species abundance from fundamental mechanisms of population dynamics. Funct Ecol 19(1):187–193. 42. Rosindell J, Cornell SJ (2013) Universal scaling of species-abundance distributions across multiple scales. Oikos 122(7):1101–1111. Testing the Absolute Fit of the Lognormal Distribution. Goodness of fit of the lognormal distribution to the empirical data was assessed with parametric bootstrapping (see SI Materials and Methods for details). Also, for each dataset’s regional-scale SAD, we compared lognormal-based estimates of total number of species in the species pool with estimates using the nonparametric jackknife (10). See SI Materials and Methods for further details. 6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1406664111 Connolly et al. Supporting Information Connolly et al. 10.1073/pnas.1406664111 SI Text SI Materials and Methods Candidate Neutral and Nonneutral Approximations. A set of noninteracting populations undergoing pure random drift in population size (birth rate equals death rate, no immigration, emigration, or environmental stochasticity) produces a species abundance distribution in which the probability that a species has a given abundance, n, varies inversely with abundance (1). On log-log scale, this is a straight line with a slope of −1: logðf ðnÞÞ = logðκÞ − logðnÞ; [S1] where f(n) is the probability that a species has abundance n, and κ is a normalizing constant. Neutral models have two characteristics that cause them to depart from the case of pure random drift. First, because species are ecologically identical, there is a constraint on total community size that is independent of species richness. Using a maximum entropy argument, a modification to this power-law model can be derived that accounts for this constraint (1): logðf ðnÞÞ = logðκÞ − logðnÞ − ϕn: [S2] Eq. S2 is equivalent to Fisher’s log-series distribution (1). Second, neutral models also may have characteristics that cause individual species’ dynamics to depart from the pure drift assumption, such as dispersal limitation (2), or unequal birth and death rates (3). Pueyo (1) conceptualizes small departures from pure drift as perturbations to the value of the slope of −1 in Eq. S1. The combination of these two extensions to Eq. S1 yields the following: logðf ðnÞÞ = logðκÞ − β logðnÞ − ϕn: [S3] Note that, by setting β = 1 − k and ϕ = 1=a, and the normalization constant κ = ðΓðkÞak Þ−1 , it becomes apparent that f(n) in Eq. S3 is a gamma distribution with shape k and scale a. Because it is well known that many neutral models can depart markedly from the log-series distribution (2, 4, 5), we take the gamma distribution as our candidate neutral approximation. Increasingly large departures from neutrality might be poorly approximated by a perturbation to the slope of a power-law relationship, in which case a second-order perturbation may be needed, where a quadratic term is added to the first-order model: logðf ðnÞÞ = logðκÞ − β logðnÞ + c ½logðnÞ2 : [S4] μ2 2 2 If we pffiffiffiffiffiset β = 1 − μ=σ , c = −1=ð2σ Þ, and logðκÞ = − 2σ 2 + logð 2π σÞ , then f(n) in Eq. S4 is a lognormal distribution where μ and σ are the mean and SD of log(n), respectively (1). We therefore take the lognormal as our candidate nonneutral approximation. Because the gamma and lognormal distributions are continuous, whereas abundances are integer-valued, and because many species abundance data are incomplete samples from an underlying community abundance distribution, in our analyses we assess our neutral and nonneutral approximations by fitting Poisson-gamma (i.e., negative binomial) and Poisson-lognormal mixture distributions: Z∞ PðrÞ = λ=0 λr e−λ f ðλÞ dλ; r! Connolly et al. www.pnas.org/cgi/content/short/1406664111 [S5] where P(r) is the probability that a species has abundance r in the sample, λ is the mean of the Poisson distribution (and thus integrated out of the likelihood), and f(λ) is either the lognormal or the gamma distribution. These distributions are commonly used to represent random samples of individuals from underlying gamma or lognormal community abundance distributions, respectively (6–8). More specifically, we use the zero-truncated forms of the Poisson-gamma and Poisson-lognormal distributions, because, by definition, a species is not observed in the sample if it has zero abundance (6): pðrÞ = PðrÞ : 1 − Pð0Þ [S6] Assessing the Neutral Approximation. Our five candidate neutral models exhibited a broad range of auxiliary assumptions. In Hubbell’s “original neutral model,” local communities are partially isolated by dispersal from the broader metacommunity, and new species arise with a fixed probability from individual birth events (analogous to mutation events in population-genetic neutral models) (9). The “protracted speciation neutral model” is similar to the original neutral model, but it incorporates a time lag between the appearance of an incipient new lineage, and its recognition as a distinct species (10). In the “fission speciation model,” speciation occurs by random division of existing species (e.g., via vicarance); this model can exhibit a more superficially lognormal-like species abundance pattern than point speciation models, in that its log-abundance distributions are more symmetric about a single mode than other neutral models (5). In the “independent species model” (3, 11), population dynamics are density independent, per-capita birth rate is less than per-capita death rate, and there is a constant immigration rate. Finally, in the spatially explicit neutral model (4), speciation follows a point-mutation process (as in the original neutral model), and dispersal distances follow a Gaussian kernel. The first four models have explicit mathematical expressions for the species abundance distribution at equilibrium, which facilitates formally evaluating the neutral approximation: see equations below). For the spatially explicit neutral model, we used the approximate species abundance distributions generated by simulation in the original paper and kindly provided by the authors (4). As noted in the main text, the strict definition of neutrality that applies to these models contrasts with symmetric models that implicitly allow for niche or demographic differences among species, for instance, by having within-species competition be stronger than between species competition (12), by implicitly including temporal niche differentiation via different responses to environmental fluctuations (13), or by allowing species with different life history types to differ in their speciation rates (14). To assess how well the Poisson-gamma distribution approximates our alternative neutral models, we considered a broad range of neutral model parameter space spanning most of the realistic range for real species abundance data (hundreds to tens of thousands of individuals, and from less than 10 to many hundreds of species). For each neutral model parameter combination, we used the Kullback–Leibler (K-L) divergence, a measure of the information lost when one distribution is used as an approximation for another (15). Specifically, we found the Poisson-gamma distribution parameters that minimized the K-L divergence. For discrete data, such as counts, K-L divergence is as follows: 1 of 14 D= X πðnÞlog n πðnÞ ; pðnÞ [S7] where n indexes the possible values of the random variable (in this case, abundance), π(n) is the distribution being approximated (the relevant neutral model), and p(n) is the approximating model—in this case, the zero-truncated Poisson-gamma distribution (Eq. S6). Because our analysis of the empirical data is largely a comparative assessment of the Poisson-gamma and Poisson-lognormal distributions, our conclusions rely on an implicit assumption that a Poisson-gamma distribution would outperform a Poissonlognormal if data were actually generated by neutral dynamics. Therefore, in addition to assessing the performance of the Poissongamma as a neutral approximation in absolute terms, we also simulated 100 species abundance distributions from each of the 126 equilibrium neutral abundance distributions used in the previous analysis (Fig. S1), and we compared the best-fit Poisson-gamma and Poisson-lognormal distributions for the 12,600 simulated abundance distributions, exactly as we did for the empirical species abundance distributions. Criteria for Empirical Data Inclusion. Our criteria for data inclusion were as follows. First, the data needed to record counts of individual organisms for a given level of sampling effort (e.g., sample volume, or transect area). Second, data needed to be collected by experts (i.e., survey programs including data collected by amateurs were excluded), to minimize the risks of misidentification or miscounting. Third, data needed to be focused on the assemblage level, rather than on specific target species. Fourth, if sampling effort varied within species abundance samples, it had to be possible to standardize to a common level of effort. For instance, if fishes were counted on 10-m2 and 50-m2 transects, then 10/50 = 20% of the individuals on the larger transects were subsampled and pooled with the counts from the smaller transects (16). Three of the datasets we used required subsampling [Great Barrier Reef Fish (GBR), National Oceanic and Atmospheric Administration (NOAA) Central Pacific Reef Fish (CPF), and South East Fishery: Shelf Fish (SEF)]. Model Fitting. To assess the relative performance of the Poissongamma and Poisson-lognormal for both simulated neutral and real species abundance data, we found the gamma or neutral model parameters that maximized the log-likelihood for the zerotruncated forms of the Poisson-gamma and Poisson-lognormal abundance distributions: X nr logðpðrÞÞ; [S8] L= r where nr is the number of species with abundance r in the sample, and p(r) is the zero-truncated probability that a species has abundance r (Eq. S6). Best-fit models were obtained by finding the gamma or neutral model parameters that maximized the loglikelihood for each site. Analysis of Variation in the Shapes of Species Abundance Distributions. To determine whether there was any systematic variation in the strength of evidence for gamma-like versus lognormal-like distributions, and whether any such variation was associated with systematic differences in the patterns of commonness and rarity in communities, we needed a sample-standardized measure of the relative strength of support for a candidate model. Specifically, the maximum log-likelihood for a species abundance model at a given site is the sum of the contributions of each species’ abundance value to the log-likelihood. To control for this effect of the number of observations, we computed, for each site, a per-observation average Connolly et al. www.pnas.org/cgi/content/short/1406664111 log-likelihood: the site’s maximum log-likelihood divided by the number of species abundances contributing to that log-likelihood. This approach is used in time series analysis, when models that have been fitted to different numbers of observations (e.g., models with different time lags) must be compared (17). Our standardized measure of model support was simply the difference between the standardized gamma and lognormal maximum log-likelihoods. As our measure of the dominance of common species, we took, in the first instance, the abundance of the most abundant species, expressed as a proportion of the total number of individuals in the species abundance distribution. As our rarity measure, we took the proportion of species that were singletons (i.e., represented by a single individual in the abundance distribution). We used linear mixed-effects models to characterize the extent to which these two quantities explained variation within and among datasets in the standardized support for the lognormal over the gamma, at all scales (site, mesoscale, regional). To confirm that our results were not sensitive to the particular commonness or rarity metrics we considered, we repeated our analysis using the combined abundance of the three most abundant species, and using the proportion of species in the bottom two octaves of abundance (i.e., with proportion of species with abundance three or less). Parametric Bootstrap Goodness of Fit. Goodness of fit to the empirical data was assessed with parametric bootstrapping, using a hypergeometric algorithm described in detail elsewhere (7). Parametric bootstrapping involves simulating datasets that conform to the assumptions of a particular fitted species abundance model. For example, to test the goodness of fit of the Poissonlognormal, one simulates Poisson random sampling of individuals from an underlying lognormal distribution of species abundances. Then, the model is fitted to each simulated dataset, and a goodness of fit statistic calculated. The frequency distribution of this statistic across simulated datasets approximates the statistic’s expected distribution, under the null hypothesis that the data conform to the model. As a goodness of fit statistic, we use a normalized measure of model deviance, which, following convention, we term ^c (16). Deviance is a likelihood-based measure of how far away the model is from exhibiting a perfect fit to the data. ^c is obtained by taking all deviances for the model’s fits to the observed and simulated data, and dividing each by the average of the simulated deviances. Thus, ^c has an expected value of 1.0. We judged the lack of fit as statistically significant if the ^c of the observed data was greater than 95% of the corresponding simulated ^c values. Species Pool Estimation. Using the maximum-likelihood estimates, the probability that a species is present in the species pool but has abundance zero in the sample, P(0), is calculated from Eq. S5, by substituting 0 for r. Then, the number of species in the community that has been sampled can be estimated from the following: ^= S Sobs ; 1 − Pð0Þ [S9] ^ is the estimated number of species in the community, where S and Sobs is the number of species observed in the data. Nonparametric jackknife estimates were calculated using the frequency distribution of species occurrences across sites (i.e., presence–absence data: see ref. 16). Jackknife order was calculated separately for each dataset, using the sequential testing procedure recommended by ref. 18. SI Results Performance of the Neutral Approximation. Fig. S1 depicts the fit of the neutral approximation to our five alternative neutral models. For the first three models, these plots encompass three order-ofmagnitude variation in local community sizes, J (102 to 104 in2 of 14 dividuals), because most species abundance distributions are on the order of hundreds to (occasionally) tens of thousands of individuals. Similarly, we show a broad range of immigration rates from m = 0.01 (1% of newborns are immigrants) to 1.0 (an entirely open local community). We plot a range of values of the biodiversity parameter, θ, so that the expected number of species in the community spanned a very broad range (typically from a low of about five species, for small, isolated communities with low θ, to many hundreds of species for large, high-immigration communities with large θ). Note that the range of values of θ needed to span these richness values differs between the fission speciation model and the first two models, because the parameter is defined somewhat differently in this model. The protracted speciation model includes an additional parameter, τ′, which is the number of generations required for speciation to occur, relative to the metacommunity size (the special case τ′ = 0 corresponds to the original neutral model). The fourth (independent species) neutral model differs from the others in that it does not explicitly characterize dynamics at the metacommunity scale. Rather, it implicitly assumes that species have equal abundance in the metacommunity (and thus they all have the same rate of immigration to the local community, γ), and that species’ local population dynamics are independent of one another, and thus a function of only γ and the ratio of local percapita birth to death rates, x. Because within-species dynamics are also density independent, this is consistent with the neutrality assumption (individuals have no effect on one another’s percapita growth rates, regardless of whether they belong to the same or different species). This density-independent assumption means that the model is a probability distribution of species abundances, and not a model of overall species frequencies. Consequently, unlike the previous neutral models, it does not predict species richness. Similarly, for the spatially explicit model, the form of the species abundance distribution depends on the speciation probability (ν), and the ratio of the sampling area A (i.e., the local community size) to the squared width of the dispersal kernel, L, rather than either of the latter two variables independently (4). Thus, a given shape for the species abundance distribution can correspond to a broad range of different community species richness values, depending on whether A and L are both small or both large. Fig. S1 shows that the Poisson-gamma neutral approximation performs very well in the overwhelming majority of cases. There are, however, some cases where the approximation performs less well. These typically correspond to parameter combinations that imply very species-poor assemblages. One class of such cases corresponds to small (∼100 individuals), very low-immigration, low-diversity assemblages (∼5 species: e.g., Top Left of Fig. S1A). Here, the neutral model has an elevated probability that one species is nearly monodominant (the curve bends upward at the right, for species abundances close to the total community size), which the Poisson-gamma distribution cannot capture. A second class of cases, specific to the protracted speciation model, involves a flattening of the species abundance distribution at low abundances (e.g., m = 0.1, θ = 4, J = 104 in Fig. S1C). This effect is too small to see clearly for the range of parameter values shown in Fig. S1, but is somewhat more pronounced in very large communities with very low values of the biodiversity parameter (θ ∼ 1), for which the ratio of individuals to species is very high (e.g., a local community with 10,000 individuals but only about 10 species). The third class of cases are specific to the fission speciation model and involve an excess of rare species, relative to the Poisson-gamma distribution (e.g., m = 0.01, J = 104, θ = 40 in Fig. S1D). As with the second class of cases, this effect is relatively small in Fig. S1, but can be more pronounced for very large, particularly isolated, communities with few species (e.g., 10,000 individuals and about 10 species, implying mean abundances of about 1,000). For the data analyzed in this paper, Connolly et al. www.pnas.org/cgi/content/short/1406664111 however, most sites are very far from these extreme low-diversity cases. The typical (median) site is a sample of 422 individuals containing 17 species, and very few sites contain so few species at such large sample sizes (86% of sites, for instance, have mean species abundances of 100 or less). Moreover, the individual datasets vary substantially in community size and observed species richness (e.g., mean site richness varies from 9 to 126 species across the 14 datasets, and average species abundances at the site level range from 4 to 123 across all datasets except one). Thus, the overwhelming majority of our sites could not correspond to those regions of parameter space where the Poissongamma distribution performs less well as an approximation for neutral dynamics. Robustness to Ecological and Taxonomic Heterogeneity. Although neutral models have previously been applied to very heterogeneous communities (19), including benthic marine invertebrates (2) [and indeed their capacity to characterize such systems has been invoked as evidence of their robustness (2)], most neutral model communities are conceptualized as a guild of organisms competing for a shared set of resources. Some of our datasets are relatively taxonomically and ecologically homogeneous [e.g., Indo-Pacific Coral Crustaceans (IPC), which contains only crustaceans associated with dead coral heads]. However, others are more heterogeneous. Therefore, to determine whether our results were sensitive to this taxonomic and ecological heterogeneity of the assemblages, we classified our species into guilds, where information was available, and reanalyzed our species abundance data, limiting the analysis to species from the most species-rich guild for each dataset (Table S3). Such a classification is necessarily approximate for marine animals, given the high degree of omnivory in the ocean. Nevertheless, the analysis allows us to evaluate whether or not our conclusions are sensitive to the extent of heterogeneity in the data. The resolution of the groupings for this analysis depended somewhat on both the taxonomic and ecological heterogeneity of the original data, and also on the species richness in the samples. Specifically, we used as a rule of thumb that guilds should have a minimum of 10 species, necessitating use of more coarse groupings for more species-poor datasets. By restricting the analysis to a subset of the species, the statistical power to detect differences between Poisson-lognormal and Poisson-gamma species abundances is reduced—the more heterogeneous the original dataset, the smaller the subset of species that could be included in the analysis. Nevertheless, strong support for the Poisson-lognormal remained: across site, mesoscale, and regional levels, the Poisson-lognormal was strongly (>95%) supported in 27 cases, whereas the Poisson-gamma was strongly supported in only 1 (Table S3). Analysis of Variation in the Shapes of Species Abundance Distributions. Standardizing model support by dividing by the number of observed species abundances successfully controlled for the effects of statistical power shown in Fig. 2, at least at the site level and mesoscale: mixed-effects linear model analyses using the number of distinct species abundance values as an explanatory variable indicated that the overall effect did not differ significantly from zero, and explained about 1–10% of the variation in standardized model support across datasets (see R2 values in Table S4). In contrast, the strength of support for the Poisson-lognormal over the Poisson-gamma increased strongly with the relative abundance of the most abundant species at site, mesoscale, and regional (wholedataset) levels. The positive relationship was highly consistent between datasets at both the site level and mesoscale (gray lines in Fig. S4 A and C), and explained about one-half or more of the variation (Table S4 and Fig. S4 B, D, and E). In contrast, the proportion of singletons was a poor predictor of relative model performance: the estimated direction of the effect was not con3 of 14 sistent across datasets at the site level (gray lines in Fig. S5A), and the estimated overall effect did not differ significantly from zero at any scale (Table S4 and black lines in Fig. S5 A, C, and E) and never explained more than 16% of the variation (Table S4). To further assess the strength of these results, we repeated our common-species analysis using the combined relative abundance of the three most abundant species. This, too, was strongly positively related to support for the Poisson-lognormal distribution (slope: 0.40 ± 0.02, pseudo-R2 = 0.44 at site level; 0.24 ± 0.05, pseudo-R2 = 0.53 at mesoscale level; 0.15 ± 0.04, R2 = 0.55 at regional scale). Conversely, expanding our definition of rarity to encompass the proportion of species in the bottom two octaves (species with abundance 3 or less) did not improve its effectiveness as a predictor of standardized support for the Poisson-lognormal over the Poisson-gamma: the overall relationship did not differ significantly from zero at any scale (no slopes significantly different from zero, pseudo-R2 < 0.02 at site-scale and mesoscale levels, R2 = 0.21 at regional level). 1. Pueyo S (2006) Diversity: Between neutrality and structure. Oikos 112(2):392–405. 2. Hubbell SP (2001) The Unified Neutral Theory of Biodiversity and Biogeography (Princeton Univ Press, Princeton). 3. Volkov I, Banavar JR, Hubbell SP, Maritan A (2007) Patterns of relative species abundance in rainforests and coral reefs. Nature 450(7166):45–49. 4. Rosindell J, Cornell SJ (2013) Universal scaling of species-abundance distributions across multiple scales. Oikos 122(7):1101–1111. 5. Etienne R, Haegeman B (2011) The neutral theory of biodiversity with random fission speciation. Theor Ecol 4(1):87–109. 6. Pielou EC (1977) Mathematical Ecology (Wiley, New York). 7. Connolly SR, Dornelas M, Bellwood DR, Hughes TP (2009) Testing species abundance models: A new bootstrap approach applied to Indo-Pacific coral reefs. Ecology 90(11): 3138–3149. 8. Sæther BE, Engen S, Grøtan V (2013) Species diversity and community similarity in fluctuating environments: Parametric approaches using species abundance distributions. J Anim Ecol 82(4):721–738. 9. Etienne RS, Alonso D (2005) A dispersal-limited sampling theory for species and alleles. Ecol Lett 8(11):1147–1156. 10. Rosindell J, Cornell SJ, Hubbell SP, Etienne RS (2010) Protracted speciation revitalizes the neutral theory of biodiversity. Ecol Lett 13(6):716–727. 11. He F (2005) Deriving a neutral model of species abundance from fundamental mechanisms of population dynamics. Funct Ecol 19(1):187–193. 12. Volkov I, Banavar JR, He FL, Hubbell SP, Maritan A (2005) Density dependence explains tree species abundance and diversity in tropical forests. Nature 438(7068): 658–661. 13. Allen AP, Savage VM (2007) Setting the absolute tempo of biodiversity dynamics. Ecol Lett 10(7):637–646. 14. Ostling A (2012) Do fitness-equalizing tradeoffs lead to neutral communities? Theor Ecol 5(2):181–194. 15. Burnham KP, Anderson DR (2002) Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach (Springer, Heidelberg). 16. Connolly SR, Hughes TP, Bellwood DR, Karlson RH (2005) Community structure of corals and reef fishes at multiple scales. Science 309(5739):1363–1365. 17. McQuarrie ADR, Tsai C-L (1998) Regression and Time Series Model Selection (World Scientific, River Edge, NJ). 18. Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5):927–936. 19. de Aguiar MAM, Baranger M, Baptestini EM, Kaufman L, Bar-Yam Y (2009) Global patterns of speciation and diversity. Nature 460(7253):384–387. (a) Original neutral model m = 0.01 1 2 5 1 10−1 10−2 10−3 10−3 2 5 1 2 10−1 10−2 10−2 10−3 2 5 1 2 10−1 10−2 10−2 10−3 1 1 1 1 10−1 10−2 10−3 10−4 1 1 1 1 10−1 10−2 10−3 10−4 5 1 2 5 1 2 10−1 10−1 10−2 10−3 10−3 10−4 E(S ) 1 2 50 2 50 1 5 E(S ) 1 5 E(S ) 1 2 5 50 100 500 100 1 20 100 10 2 5 E(S ) 23 2 5 E(S ) 17 5 2 1 50 10 20 10 20 500 5 1 127 10 20 50 2 1 10 1 10−5 1 5 5 5 5 10−2 10−3 10−4 10−5 1 1 1 1 10−1 10−2 10−3 10−4 1 E(S ) 20 5 50 E(S ) 64 5 E(S ) 1 5 50 500 5000 500 26 5 50 E(S ) 98 1 1 5 50 E(S ) 174 20 E(S ) 100 500 1 2 5 500 5000 100 5 20 2 5 E(S ) 32 5 50 E(S ) 500 5 100 1 2 5 1000 4 500 J 1000 20 J 1000 100 J 10000 4 50 10 20 500 5000 J 10000 20 J 10000 100 125 50 E(S ) 329 20 1 J 79 240 1 100 100 23 E(S ) 1 J 4 5 1 10−1 −3 100 20 10 3 20 E(S ) J 50 70 2 1 100 4 36 5 E(S ) 100 10 20 2 1 J 14 5 E(S ) 20 50 5 2 1 53 5 2 1 E(S ) 20 E(S ) 39 E(S ) 17 E(S ) 24 20 5 1 m =1 8 E(S ) 1 11 20 2 1 5 5 10 E(S ) 1 5 5 10 E(S ) 1 E(S ) 4 5 10 E(S ) 1 Species frequency (log scale) 2 m = 0.1 500 462 20 100 Species abundance (log scale) Fig. S1. (Continued) Connolly et al. www.pnas.org/cgi/content/short/1406664111 4 of 14 (b) Protracted speciation model with τ ' = τ J m = 10−8 m = 0.01 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 5 × 10−4 E(S ) = 3 1 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 1 Species frequency (log scale) 2 5 10 2 50 1 1 5 10 2 1 50 5 10 1 5 50 20 5 20 5 50 1 5 2 100 500 1 10 20 1 10 20 1 2 100 500 1 100 2 5 10 J = 100 θ = 100 3 4 5 J = 1000 θ=4 5 20 100 500 J = 1000 θ = 20 E(S ) = 79 20 50 5 J = 100 θ = 20 2 1 1 E(S ) = 127 20 50 E(S ) = 23 20 5 10 20 2 1 E(S ) = 53 100 5 E(S ) = 70 5 5 2 E(S ) = 36 E(S ) = 17 E(S ) = 39 1 2 2 1 E(S ) = 24 1 5 10 E(S ) = 23 E(S ) = 10 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 2 J = 100 θ=4 E(S ) = 14 E(S ) = 17 E(S ) = 5 5 × 10−2 5 × 10−3 5 × 10−4 5 × 10−5 m =1 E(S ) = 8 E(S ) = 5 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 5 × 10−1 1 × 10−1 2 × 10−2 5 × 10−3 1 × 10−3 2 × 10−4 m = 0.1 2 5 20 50 J = 1000 θ = 100 E(S ) = 240 10 20 50 1 2 5 10 20 1 × 10−1 −3 1 × 10 E(S ) = 19 1 × 10−5 1 5 × 10−2 5 × 10−3 5 × 10−4 5 × 10−5 5 50 E(S ) = 26 500 5000 1 E(S ) = 64 1 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 5 50 5 20 50 500 5000 1 E(S ) = 97 500 1 E(S ) = 173 1 5 5 500 1 2 50 5 5 50 500 5000 J = 10000 θ = 20 E(S ) = 125 500 1 E(S ) = 328 100 J = 10000 θ=4 E(S ) = 32 5 50 500 J = 10000 θ = 100 E(S ) = 462 20 100 1 2 5 20 100 Species abundance (log scale) (c) Protracted speciation model with τ ' = τ J m = 10−4 m = 0.1 m = 0.01 1 × 10−1 2 × 10−2 5 × 10−3 1 × 10−3 1 5 10 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 1 2 5 10 2 50 5 1 50 20 5 1 2 5 100 500 1 5 × 10−2 5 × 10−3 5 × 10−4 5 × 10−5 5 1 20 20 50 5 100 1 5 20 50 20 2 10 100 1 2 20 100 500 5000 1 5 1 500 1 2 50 5 10 J = 100 θ = 100 3 4 5 J = 1000 θ=4 5 20 100 500 J = 1000 θ = 20 1 20 2 5 20 50 J = 1000 θ = 100 50 1 2 5 10 20 J = 10000 θ=4 E(S ) = 29 500 5000 1 5 50 500 5000 J = 10000 θ = 20 E(S ) = 111 500 E(S ) = 306 100 5 E(S ) = 232 10 20 50 5 J = 100 θ = 20 2 1 E(S ) = 93 500 2 1 E(S ) = 25 500 50 E(S ) = 77 20 50 5 10 20 E(S ) = 22 20 5 5 E(S ) = 69 5 5 2 1 E(S ) = 125 E(S ) = 170 1 10 E(S ) = 52 E(S ) = 63 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 1 E(S ) = 36 5 2 1 E(S ) = 19 1 × 10−5 50 E(S ) = 17 E(S ) = 38 1 × 10−3 2 1 E(S ) = 24 1 5 10 E(S ) = 23 5 10 1 2 E(S ) = 17 E(S ) = 10 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 1 × 10−1 1 E(S ) = 5 5 × 10−2 5 × 10−3 5 × 10−4 5 × 10−5 5 × 10−1 1 × 10−1 −2 2 × 10−3 5 × 10−3 1 × 10−4 2 × 10 50 E(S ) = 5 1 Species frequency (log scale) 2 J = 100 θ=4 E(S ) = 13 E(S ) = 8 E(S ) = 3 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 m =1 1 5 50 500 J = 10000 θ = 100 E(S ) = 394 100 1 2 5 20 100 Species abundance (log scale) Fig. S1. (Continued) Connolly et al. www.pnas.org/cgi/content/short/1406664111 5 of 14 (d) Fission speciation model 1 × 100 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 1 2 5 10 1 50 E(S ) = 5 1 2 1 × 100 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 1 5 10 2 1 50 5 10 1 5 50 1 2 20 5 1 2 100 1 5 1 100 2 500 1 10 20 100 500 1 2 5 1 50 100 1 4 5 J = 100 θ = 1000 1.4 1.8 J = 1000 θ = 40 2 5 20 50 J = 1000 θ = 200 2 5 10 20 J = 1000 θ = 1000 E(S ) = 500 10 20 1 2 3 4 6 J = 10000 θ = 40 E(S ) = 40 20 2 1.2 1 500 1 5 50 500 J = 10000 θ = 200 E(S ) = 196 100 1 2 E(S ) = 719 20 3 E(S ) = 167 50 5 J = 100 θ = 200 2 1.0 E(S ) = 186 20 10 E(S ) = 38 10 20 5 5 E(S ) = 91 20 50 5 2 1 E(S ) = 39 50 5 5 2 1 E(S ) = 318 1 2 20 E(S ) = 204 20 5 10 5 2 1 100 E(S ) = 140 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 1 E(S ) = 112 20 5 20 E(S ) = 67 5 2 1 E(S ) = 37 1 × 10−1 1 × 10−2 1 × 10−3 1 × 10−4 10 E(S ) = 35 E(S ) = 45 2 × 10−2 5 × 10−3 1 × 10−3 2 × 10−4 5 × 10−5 2 1 E(S ) = 38 5 × 10−1 5 × 10−2 5 × 10−3 5 × 10−4 5 E(S ) = 25 E(S ) = 22 1 × 10−1 2 × 10−2 5 × 10−3 1 × 10−3 2 × 10−4 2 J = 100 θ = 40 E(S ) = 29 E(S ) = 23 E(S ) = 5 1 × 10−1 2 × 10−2 5 × 10−3 1 × 10−3 2 × 10−4 m =1 E(S ) = 16 E(S ) = 5 1 × 100 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 Species frequency (log scale) m = 0.1 m = 0.01 5 20 100 J = 10000 θ = 1000 E(S ) = 909 5 10 50 1 2 5 10 20 50 Species abundance (log scale) (e) Independent species model γ = 0.1 γ=1 γ=2 −1 2 × 10 x = 0.5 5 × 10−2 1 × 10−2 5 × 10−1 1 2 3 4 5 1 2 3 4 5 7 1 2 5 1 × 10−1 x = 0.9 Species frequency (log scale) 2 × 10−2 5 × 10−3 1 × 10−3 1 2 5 10 20 1 2 5 10 20 50 1 2 5 10 20 50 1 × 10−1 −2 1 × 10 x = 0.99 1 × 10−3 1 × 10−4 1 2 5 10 50 200 1 2 5 20 50 200 1 2 5 20 100 500 (f) Spatial neutral model ν = 1 × 10−6 ν = 1 × 10−4 ν = 1 × 10−2 2 × 10−1 5 × 10−2 A L −2 = 4 1 × 10−2 2 × 10−3 1 2 5 10 20 1 2 5 10 1 2 5 2 × 10−1 5 × 10−2 1 × 10−2 2 × 10−3 5 × 10−4 A L −2 = 16 1 2 5 10 20 50 1 2 5 10 20 50 1 2 5 10 20 1 × 10−1 2 × 10−2 5 × 10−3 1 × 10−3 2 × 10−4 A L −2 = 64 1 2 5 10 50 200 1 2 5 10 50 1 2 5 10 20 50 Species abundance (log scale) Fig. S1. Examples of the fit of the Poisson-gamma neutral approximation (black line) to the five candidate neutral approximations (blue lines). The fits are shown as “Pueyo plots”: both the vertical and horizontal axes are shown on a log scale. The horizontal axis is truncated at the abundance value where the cumulative expected number of species equals 99% of the total (i.e., on average, only 1 of 100 species would be expected to have greater abundance). (A–F) The six different neutral models, as specified at the top of the corresponding group of panels. For models A–D, E(S) in each panel indicates the expected number of species for that parameter combination. Connolly et al. www.pnas.org/cgi/content/short/1406664111 6 of 14 Fig. S2. Observed and best-fit mesoscale abundance distributions. On the map, different combinations of colors and symbols correspond to different datasets: these are reproduced in the corresponding figure panels. See Table S1 for metadata, including abbreviations. Each point on the map is located at the centroid of the individual sites that were pooled to generate each mesoscale abundance distribution. Panels above and below the map compare observed and fitted species abundance distributions at this scale. The bars represent the mean proportion of species in different octave classes of abundance, across all mesoscale abundance distributions from the corresponding ecosystem (these are shown as true doubling classes: the first bar represents species with abundance 1; the second, abundances 2–3; the third, abundances 4–7; etc.). The blue and red lines show the mean of fitted values from fits of the Poisson-gamma and Poissonlognormal distributions, respectively, to each mesoscale abundance distribution. Connolly et al. www.pnas.org/cgi/content/short/1406664111 7 of 14 Fig. S3. Observed and best-fit regional abundance distributions. On the map, the area over which sites were pooled for each regional abundance distribution has been outlined. See Table S1 for metadata, including abbreviations. The panels above and below the map compare observed and fitted species abundance distributions at this scale. The bars represent the proportion of species in different octave classes of abundance (these are shown as true doubling classes: the first bar represents species with abundance 1; the second, abundances 2–3; the third, abundances 4–7; etc.). The blue and red lines show the fitted values from fits of the Poisson-gamma and Poisson-lognormal distributions to the data, respectively. Connolly et al. www.pnas.org/cgi/content/short/1406664111 8 of 14 0.6 0.4 0.2 −0.2 0.0 Observed ΔMLLS −1 0.1 0.2 0.3 0.4 B −0.1 ΔMLLS −1 A 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6 0.6 0.8 0.6 0.4 0.2 0.0 Observed ΔMLLS −1 0.1 0.2 0.3 0.4 −0.1 ΔMLLS −1 D 0.0 0.4 Predicted ΔMLLS −1 Maximum relative abundance C 0.2 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Predicted ΔMLLS −1 Maximum relative abundance 0.1 −0.1 ΔMLLS −1 0.3 E 0.0 0.2 0.4 0.6 0.8 1.0 Maximum relative abundance Fig. S4. Analysis of the variation in standardized support for the lognormal explained by the relative abundance of the most-abundant species (expressed as a fraction of the number of individuals sampled), at the (A and B) site scale, (C and D) mesoscale, and (E) regional scale. Positive relationships indicate stronger evidence against the gamma neutral approximation as the most-abundant species becomes more dominant. In A and C, the thick solid and dashed lines represent the overall (i.e., fixed effects) relationship, with 95% confidence intervals. The gray lines represent the relationships for the 14 individual datasets, based on the estimated random effects; individual lines are drawn to span only the range of horizontal axis values observed in the corresponding dataset. B and D show corresponding plots of observed versus predicted values, as estimated from the full fitted model. Because there is substantial overlap of points, the points have been color-coded according to the number of nearby observations, grading from red (high density of points) to blue. E is an ordinary least-squares (OLS) regression: because there is only one (pooled) regional abundance distribution per site, there is no random effect. Connolly et al. www.pnas.org/cgi/content/short/1406664111 9 of 14 0.6 0.4 0.2 −0.2 0.0 Observed ΔMLLS−1 0.05 0.10 0.15 B −0.05 ΔMLLS −1 A 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.6 0.6 0.4 0.2 0.0 Observed ΔMLLS−1 0.05 0.10 0.15 D −0.05 ΔMLLS −1 0.4 Predicted ΔMLLS−1 Proportion of singletons C 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.3 0.4 0.5 0.6 0.15 0.05 −0.05 ΔMLLS−1 0.2 Predicted ΔMLLS−1 Proportion of singletons E 0.1 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of singletons Fig. S5. Analysis of the variation in standardized support for the lognormal explained by the proportion of species that are singletons, at the (A and B) site scale, (C and D) mesoscale, and (E) regional scale. Positive relationships indicate stronger evidence against the gamma neutral approximation as proportion of singletons increases. In A and C, the thick solid and dashed lines represent the overall (i.e., fixed effects) relationship, with 95% confidence intervals. The gray lines represent the relationships for the 14 individual datasets, based on the estimated random effects; individual lines are drawn to span only the range of horizontal axis values observed in the corresponding dataset. B and D show corresponding plots of observed versus predicted values, as estimated from the full fitted model. Because there is substantial overlap of points, the points have been color-coded according to the number of nearby observations, grading from red (high density of points) to blue. E is an OLS regression: because there is only one (pooled) regional abundance distribution per site, there is no random effect. Connolly et al. www.pnas.org/cgi/content/short/1406664111 10 of 14 Table S1. Metadata summary of global marine species abundance distribution samples Dataset name South East Fishery: Shelf Fish (SEF) Western Australia: Deep Fish (WAF) Great Barrier Reef Fish (GBR) Antarctic Molluscs (ANM) Indo-Pacific Coral Crustaceans (IPC) Tuscany Archipelago Fish (TAP) Eastern Bass Strait Invertebrates (EBS) NOAA Central Pacific Reef Fish (CPF) Sunderban Zooplankton (SUZ) Scotian Shelf Fish (SSF) Bass Strait Intertidal Macroinvertebrates (BSI) North Sea Invertebrates (NSI)* Summary Latitudinal limits Longitudinal limits Depth range, m Sampling method Data contact Fish from southeastern Australia; total of 173 species at 189 sites pooled into 13 mesoscale SADs Fish from western Australia; total of 282 species at 65 sites pooled into 23 locations Fish from underwater visual census surveys of coral reefs on the Great Barrier Reef (1); total of 195 species at 74 sites pooled into 8 mesoscale SADs Deep-water bivalves from the Scotia Arc, Antarctica; total of 96 species at 20 sites pooled into 4 mesoscale SADs Crustacean samples encompassing a total of 411 species from individual dead coral heads at 8 sites in the Indo-Pacific; not pooled at mesoscale Fish abundance from the Tuscany Archipelago; total of 39 species at 30 sites pooled into 4 mesoscale SADs Invertebrates from the Eastern Bass Strait, Australia (2); total of 801 species at 47 sites pooled into 3 mesoscale SADs Fish from underwater visual surveys of coral reefs throughout the Pacific; total of 491 species at 49 sites pooled into 5 mesoscale SADs Zooplankton from Sunderban mangrove wetland, India; total of 31 species at 7 sites; not pooled at mesoscale Fish from the Scotian Shelf, Northwestern Atlantic; total of 98 species at 458 sites pooled into 14 mesoscale SADs −39.0 −36.4 146.5 150.3 16 254 Fish trawl A.W. [email protected] −35.1 −20.1 111.4 115.2 197 1,580 Fish trawl A.W. [email protected] −23.9 −14.5 145.3 152.7 7 −58.2 −65.5 −60.0 −23.6 774 6,348 −23.4 6.4 −113.7 −149.8 10 Hand counts 42.2 43.1 9.8 11.1 8 12 UVS L.B.-C. [email protected] −37.9 −37.8 148.2 148.7 17 51 Grab sample R.S.W., G.C.B.P. [email protected] −14.6 28.5 −154.8 142.8 21.6 22.3 88.0 88.9 42.1 45.6 −67.2 −57.3 16 176 Trawl Invertebrates from the Bass Strait, Australia (3); total of 98 species at 53 sites pooled into 5 mesoscale SADs Benthic invertebrates from the North Sea; total of 244 species at 46 sites pooled into 6 mesoscale SADs −39.1 −37.6 141.4 149.8 Intertidal UVS 54.3 60.4 −1.0 8.0 38 115 vanVeen grab Connolly et al. www.pnas.org/cgi/content/short/1406664111 UVS Epibenthic sledge 8.24 17.11 UVS 2.0 8.9 Plankton tow Hugh Sweatman Australian Institute of Marine Science, Townsville, Australia [email protected] K.L. [email protected] L.P. [email protected] R.E.B. [email protected] S.K.S. [email protected] Steven E. Campana Fisheries and Oceans Canada, Bedford Institute of Oceanography, Dartmouth, Canada [email protected] T.D.O. [email protected] U.S. [email protected] 11 of 14 Table S1. Cont. Dataset name Norwegian Shelf Macrobenthos (NSM)† Antarctic Isopods (ANI)‡ Summary Benthic invertebrates collected along the Norwegian Shelf (4); total of 805 species at 101 sites pooled into 4 mesoscale SADs Isopods from the Southern Ocean (5); total of 502 species at 38 sites pooled into 8 mesoscale SADs Latitudinal limits Longitudinal limits Depth range, m Sampling method vanVeen grab 56.0 71.8 1.7 23.5 65 434 −71.3 −58.2 0.0 −64.7 774 6,348 Epibenthic sledge Data contact K.E.E. [email protected] A.B. [email protected] SAD, species abundance distribution; UVS, underwater visual survey (belt transects in all cases). 1. 2. 3. 4. 5. Sweatman H, et al. (2008) Long-Term Monitoring of the Great Barrier Reef (Australian Institute of Marine Science, Townsville, Australia), Status Report no. 8. Gray JS, et al. (1997) Coastal and deep-sea benthic diversities compared. Mar Ecol Prog Ser 159:97–103. O’Hara TD, Addison PFE, Gazzard R, Costa TL, Pocklington JB (2010) A rapid biodiversity assessment methodology tested on intertidal rocky shores. Aquat Conserv 20(4):452–463. Ellingsen KE, Gray JS (2002) Spatial patterns of benthic diversity: Is there a latitudinal gradient along the Norwegian continental shelf? J Anim Ecol 71(3):373–389. Brandt A, et al. (2007) First insights into the biodiversity and biogeography of the Southern Ocean deep sea. Nature 447(7142):307–311. Connolly et al. www.pnas.org/cgi/content/short/1406664111 12 of 14 Connolly et al. www.pnas.org/cgi/content/short/1406664111 13 of 14 66.9 (25–127) 0.14 (0.09–0.20) 0.46 (0.38–0.62) 17.0 (1–36) 18.7 (8–32) 9.4 (1–21) (71–1,018) (2–10,803) (18–3,714) (6–21,436) 44.4 (7–66) (64–2,535) 0.32 (0.08–0.87) 0.41 (0.14–0.61) 0.45 (0.24–0.70) 0.02 (0.00–0.11) 44.4 (28–77) 30.2 (15–44) (92–4,608) (1,200–116,444) PS§ NA 0.56 (0.33–0.90) 0.18 (0.05–0.37) 0.37 (0.10–0.93) 0.35 (0.16–0.59) 0.37 (0.19–0.71) 0.24 (0.11–0.38) NA 0.50 (0.48–0.54) 0.08 (0.06–0.09) 0.34 (0.12–0.56) 0.37 (0.21–0.48) 0.13 (0.08–0.24) 0.36 (0.23–0.44) MRA‡ 116.9 (78–144) NA NA 0.20 (0.14–0.30) 0.21 (0.14–0.33) NA 573.7 (500–661) 0.06 (0.05–0.07) 0.16 (0.14–0.18) 37.5 (19–53) 43.8 (12–101) 60.8 (20–93) NA 32.2 (30–34) 41.0 (27–79) 141.4 (73–193) S† 6,183 (1,224–12,361) 17,074 (5,201–42,572) 61.6 (54–76) 101.2 (54–147) S† MRA‡ PS§ 39 0.51 0.03 96 0.21 0.33 98 0.32 0.19 31 0.36 0.00 98 0.39 0.11 37,097 244 0.29 0.21 68,298 805 0.08 0.19 474,151 491 0.19 0.09 43,768 195 0.15 0.11 4,689 71,382 801 0.05 0.13 391,484 24,272 282 0.32 0.22 124,777 173 0.26 0.16 644 135 0.06 0.44 17,303 2,175 11,788 502 0.05 0.17 N* Regional 0.47 (0.34–0.78) 0.07 (0.05–0.09) 1,650,842 0.41 (0.10–0.83) 0.23 (0.09–0.37) 357.5 (176–552) 0.08 (0.04–0.11) 0.24 (0.19–0.29) 94,830 (35,819–223,060) 238.0 (171–293) 0.30 (0.16–0.40) 0.13 (0.08–0.16) 5,471 (686–9,315) NA 23,794 (16,182–28,145) 27,963 (3,493–78,019) 1,055 (44–5,868) 9,598 (112–22,978) NA 4,326 (3,276–5,867) 544 (353–816) 1,474 (284–3,531) N* 0.61 (0.26–0.95) 0.14 (0.00–0.33) 330,168 (96,743–643,500) 0.39 (0.09–0.87) 0.35 (0.08–0.51) 86.7 (35–148) 0.15 (0.07–0.51) 0.38 (0.20–0.63) (87–2,677) 104.1 (32–183) 0.30 (0.11–0.61) 0.20 (0.08–0.39) 15.3 (7–28) (548–85,423) 0.65 (0.14–1.00) 0.28 (0.00–0.75) 0.37 (0.07–0.97) 0.37 (0.09–0.67) 0.51 (0.16–1.00) 0.32 (0.00–0.71) 0.49 (0.29–0.74) 0.17 (0.04–0.33) 125.7 (23–384) 0.18 (0.05–0.70) 0.34 (0.00–0.70) (148–1,190) (48–8,641) 21.2 (16–26) (223–1,118) 0.39 (0.17–1.00) 0.45 (0.10–1.00) 14.0 (1–33) (1–560) PS§ 0.24 (0.05–1.00) 0.49 (0.21–1.00) MRA‡ 40.2 (1–91) S† (1–2,651) N* Mesoscale Values reported are means, with ranges in parentheses. Due to the small number of sites, there are no mesoscale abundance distributions for IPC and SUZ (Materials and Methods). Because the regional abundance distribution pools all samples in that dataset, there is only one abundance distribution, so no range of values is reported. *N, number of individuals sampled. † S, number of species appearing in the sample. ‡ MRA, maximum relative abundance (abundance of the most abundant species, as a proportion of the total number of individuals in the abundance distribution). § PS, proportion of species sampled that are singletons (i.e., abundance = 1). Antarctic 310 Isopods (ANI) Antarctic 109 Molluscs (ANM) Tuscany 577 Archipelago Fish (TAP) Indo-Pacific Coral 336 Commensals (IPC) SE Australia: Shelf 660 Fish (SEF) W Australia: Deep 373 Fish (WAF) Scotian Shelf 855 Fish (SSF) Eastern Bass Strait 1,519 Invertebrates (EBS) Sunderban 670 Zooplankton (SUZ) Great Barrier 591 Reef Fish (GBR) Central Pacific 9,677 Reef Fish (CPF) Norwegian Shelf 676 Macrobenthos (NSM) North Sea 806 Invertebrates (NSI) Bass Strait 31,148 Intertidal (BSI) Dataset Site Table S2. Summary of the abundance distributions analyzed, at each of the three scales Table S3. Groupings used and model selection for single-guild analysis Dataset Guild name Antarctic Isopods (ANI) Antarctic Molluscs (ANM) Tuscany Archipelago Fish (TAP) Indo-Pacific Coral Crustaceans (IPC) SE Australia: Shelf Fish (SEF) W Australia: Deep Fish (WAF) Scotian Shelf Fish (SSF) Eastern Bass Strait Invertebrates (EBS) Sunderban Zooplankton (SUZ) Great Barrier Reef Fish (GBR) Central Pacific Reef Fish (CPF) Norwegian Shelf Macrobenthos (NSM) North Sea Invertebrates (NSI) Bass Strait Intertidal (BSI) Overall % support for lognormal No. of species in group % of sites fitted at site level Site Mesoscale Regional 486 24 38 334 101 121 48 347 27 60 164 76 82 0 100 100 57 38 6 91 100 53 100 4 >0.9999 NA >0.9999 >0.9999 >0.9999 0.0199 0.0727 >0.9999 >0.9999 >0.9999 >0.9999 0.6079 >0.9999 0.7048 0.9999 NA 0.9999 >0.9999 >0.9999 >0.9999 NA >0.9999 0.9998 0.9902 >0.9999 0.8639 0.9973 >0.9999 0.8741 0.9958 0.9998 0.2905 0.9772 0.9515 0.8776 0.7062 74 49 74 100 0.6953 >0.9999 >0.9999 0.0902 >0.9999 >0.9999 0.0593 0.9937 >0.9999 Detritus feeders Suspension feeders Benthic feeders Decapods Invertivores, benthic prey Invertivores, benthic prey Invertivores Deposit feeders Planktivores Herbivores Invertivores Deposit feeders, Malacostraca only Deposit feeders Grazers Percentage support values indicate relative support for the Poisson-lognormal over Poisson-gamma model fitted to the species abundance data at three scales: site level, mesoscale, and regional. Each row represents a different dataset. For ANM, there were no sites with more than five distinct species abundance values for the most species-rich functional group, so model selection was only done at the mesoscale and regional scale. For IPC and SUZ, there were too few species abundance distributions to create mesoscale groupings. The last row is an overall test, based on summing the log-likelihoods across all datasets. Where lognormal model has at least 95% support, the model’s weight is shown in bold. Where Poisson-gamma model has at least 95% support, the model’s weight is shown underlined. Table S4. Mixed-effects and OLS regression model results for analysis of standardized relative support for the Poisson-lognormal Explanatory variable Site level Intercept only Log(no. of distinct abundances) Maximum relative abundance Proportion of singletons Mesoscale Intercept only Log(no. of distinct abundances) Maximum relative abundance Proportion of singletons Regional Intercept only Log(no. of distinct abundances) Maximum relative abundance Proportion of singletons Overall slope† ± SE t‡ NA −0.009 ± 0.015 0.386 ± 0.020 0.074 ± 0.051 NA −0.58 19.66*** 1.45 NA −0.009 ± 0.013 0.261 ± 0.051 0.046 ± 0.074 Slope ± SE NA −0.039 ± 0.014 0.288 ± 0.043 −0.160 ± 0.108 NA −0.74 5.08*** 0.63 t NA −2.83* 6.75*** −1.48 Marginal R2 § Pseudo-R2 AIC 0.00 <0.01 0.47 0.01 0.08 0.11 0.60 0.14 −1,879.6 −1,883.2 −2,641.5 −1,913.6 0.00 0.01 0.24 <0.01 0.00 0.01 0.71 <0.01 −186.4 −181.0 −270.8 −180.8 AIC −43.3 −48.5 −63.3 −43.7 R2 0.00 0.40 0.79 0.16 Except for Akaike’s information criterion (AIC), which was calculated from maximum-likelihood fits, all values reported in the table were obtained using restricted maximum likelihood (REML). † “Overall slope” refers to the fixed effects component of the model. ‡ The t statistic for the slope parameter. The number of asterisks indicates the level of statistical significance: *0.01 < P < 0.05, **0.001 < P < 0.01, and ***P < 0.001. § Percentage of variation explained by the fixed effect only. { Calculated from the residuals of the full fitted model ð1 − ðσ 2resid =σ 2tot ÞÞ. Connolly et al. www.pnas.org/cgi/content/short/1406664111 14 of 14
© Copyright 2026 Paperzz