Ecology Letters, (2011) 14: 1043–1051 doi: 10.1111/j.1461-0248.2011.01675.x LETTER Combining a - and b -diversity models to fill gaps in our knowledge of biodiversity Karel Mokany,1* Thomas D. Harwood,1 Jacob McC. Overton,2 Gary M. Barker2 and Simon Ferrier1 1 CSIRO Ecosystem Sciences, Climate Adaptation Flagship, PO Box 1700, Canberra ACT 2601, Australia 2 Landcare Research, Private Bag 3127, Hamilton, New Zealand *Correspondence: E-mail: [email protected] Abstract For many taxonomic groups, sparse information on the spatial distribution of biodiversity limits our capacity to answer a variety of theoretical and applied ecological questions. Modelling community-level attributes (a- and b-diversity) over space can help overcome this shortfall in our knowledge, yet individually, predictions of aor b-diversity have their limitations. In this study, we present a novel approach to combining models of a- and b-diversity, with sparse survey data, to predict the community composition for all sites in a region. We applied our new approach to predict land snail community composition across New Zealand. As we demonstrate, these predictions of metacommunity composition have diverse potential applications, including predicting c-diversity for any set of sites, identifying target areas for conservation reserves, locating priority areas for future ecological surveys, generating realistic compositional data for metacommunity models and simultaneously predicting the distribution of all species in a taxon consistent with known community diversity patterns. Keywords Alpha, beta, community, dissimilarity, diversity, gamma, metacommunity, macroecology, richness, snail. Ecology Letters (2011) 14: 1043–1051 INTRODUCTION Reliable knowledge of how biodiversity varies over space is vital in our quest to answer a range of important ecological questions, both applied and theoretical. For example, predicting how climate change will affect biodiversity in the future first requires good knowledge of current spatial patterns in biodiversity (Botkin et al. 2007). Testing theoretical models for how communities are structured also requires reliable information on spatial patterns in biodiversity (e.g. Driscoll & Lindenmayer 2009). Despite the importance of information on how biodiversity varies across space, reliable data on community composition are generally available for only a small number of ecological survey sites. This limitation in our knowledge of the spatial distribution of species has been termed the ÔWallacean shortfallÕ (Lomolino 2004) and severely restricts our capacity to address a range of important questions in ecology. One common approach to dealing with the Wallacean shortfall is to predict the spatial distribution of individual species over a region using statistical or mechanistic models (Elith & Leathwick 2009; Kearney & Porter 2009). Species-distribution modelling can generate useful information on likely patterns in the occurrence of species across an area, and is particularly applicable to taxonomic groups that are well studied across the region of interest (e.g. vertebrates) (Araújo et al. 2006; Thuiller et al. 2006; Hole et al. 2009). In such cases, species-distribution modelling can play an important role in filling gaps in our knowledge of biodiversity as a whole (i.e. all species in a taxon). However, many taxonomic groups are highly speciose, with little information on the distribution or other attributes of every component species and with new species still being discovered (e.g. tropical plants and insects, marine invertebrates). In these situations, species-distribution modelling is limited in its ability to predict occurrences for all species in a taxon and hence predict patterns for biodiversity as a whole (Mokany & Ferrier 2011). Community-level modelling has the capacity to complement species-level approaches by predicting spatial patterns in biodiversity for highly diverse, poorly studied taxa. Although there is a broad range of community-level modelling approaches (Ferrier & Guisan 2006), two of the most commonly employed approaches predict either the species richness of communities (a-diversity) occurring at individual sites within a region, or the dissimilarity in community composition between pairs of sites (b-diversity), as a function of underlying environmental gradients (Mokany & Ferrier 2011). While a variety of alternative methods can be applied to quantify b-diversity (Tuomisto 2010), in this study we focus on pair-wise dissimilarity, due to its utility in describing diversity patterns over large regions at a fine spatial resolution. Community-level modelling approaches can generate useful predictions of how community properties, such as richness or compositional dissimilarity, vary across a region for poorly studied taxa (e.g. Marsh et al. 2010). However, these approaches do not provide information on the spatial distribution of component species within each community, which is important for a variety of applied and theoretical objectives. In addition, models of species richness and compositional dissimilarity each predict how one property of biodiversity (either richness or dissimilarity) varies across a region and it is not immediately evident how these separate predictions can best be combined to produce a more holistic assessment of spatial patterns in community diversity. To date, there have been few attempts to synthesise information from community-level modelling of a- and b-diversity. Approaches focusing on the partitioning of diversity into its components (a-, b-, c-diversity) (Loreau 2000; Crist & Veech 2006) are less relevant here, as they view b-diversity in terms of a single value for whole regions rather than as a property relating to pairs of communities at specific sites, and therefore exhibiting variation within a given region (Baselga 2010). One approach synthesising community richness and dissimilarity information is the maximisation of complementary richness (MCR: Arponen et al. 2008). This approach is focused on conservation 2011 Blackwell Publishing Ltd/CNRS 1044 K. Mokany et al. Letter planning and selects a given number of sites that best represent the diversity of a region based on predicted patterns in a- and b-diversity. Arponen et al. (2008) raise the possibility of combining modelled estimates of a- and b-diversity in more powerful ways, but alternative approaches are yet to emerge. In this study, we present and apply a novel approach for synthesising information from community-level models of a- and b-diversity (the Dynamic Framework for Occurrence Allocation in Metacommunities: DynamicFOAM). Our approach applies an optimisation algorithm, which constructs species lists for each community in a region (i.e. every cell in a regular grid) under the constraints of modelled estimates of the number of species present (a-diversity), the dissimilarity in species composition between each pair of communities (b-diversity) and any available data on the occurrences of specific species at specific sites (Fig. 1). Our approach essentially predicts the composition of all communities across a region (or the distribution of every species in a taxon), based on the specified models of a- and b-diversity. Depending on the level of knowledge for a taxon, predicted communities can comprise real species, purely hypothetical species (e.g. undescribed species), or a mixture of both. In this study, we describe our approach and demonstrate its application by extending existing models of a- and b-diversity for land Limited survey data Environmental variables Site Species 1 1 0 0 1 1 1 1 1 0 0 1 1 1 α- & β -diversity models Predicted richness (α) Predicted dissimilarity (β) Site Site Site DynamicFOAM Predicted composition Site Species 1 1 1 0 1 1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 0 1 1 Figure 1 Graphical depiction of the procedure for predicting the composition of all communities using limited community survey data, relevant environmental variables, models of a- and b-diversity and the Dynamic Framework for Occurrence Allocation in Metacommunities (DynamicFOAM). 2011 Blackwell Publishing Ltd/CNRS snails in New Zealand (Overton et al. 2009). We show how community composition predicted with this approach can be used for a wide range of ecological applications, including predicting c-diversity for any set of sites; identifying target areas for new conservation reserves; locating priority areas for future ecological surveys; generating realistic compositional data for metacommunity models; testing macroecological theory and; predicting the spatial distribution of all species in a taxon simultaneously. Finally, we highlight the emergence of species-level phenomena, such as aggregated distributions and realised environmental niches, from the community-level diversity models. These analyses illustrate the utility in combining community-level models of a- and b-diversity, especially for highly diverse taxonomic groups where there are substantial gaps in our knowledge of biodiversity. METHODS Predicting the composition of communities from a- and b-diversity The approach described in this study, predicts the presence ⁄ absence of all species in all communities (i.e. sites, or cells in a regular grid) by applying output from community-level modelling of species richness and compositional dissimilarity (e.g. Overton et al. 2009; Marsh et al. 2010). Our approach can apply predictions generated from either correlative or mechanistic models of a- and b-diversity. Most commonly, correlative modelling techniques would be applied to model how the number of species in a community (a-diversity), or the difference in species composition between two communities (b-diversity), changes with important environmental variables. These models then predict a- or b-diversity over all sites, based on their environmental conditions, and this is the starting point for our analysis (Fig. 1). Our approach requires pair-wise compositional dissimilarity (bij) to be modelled as the complement of SørensenÕs similarity coefficient (i.e. SørensenÕs dissimilarity) bij = 1 – (2cij ⁄ (ai + aj)), where ai and aj are the number of species in sites i and j respectively, and cij is the number of species in common between the two sites. Current applications of community-level diversity modelling can predict the richness of each site (ai and aj above) and the dissimilarity between each pair of communities (bij) (Ferrier et al. 2007), from which the expected number of species in common between site pairs (cij) can be derived (Appendix S1). The objective of the DynamicFOAM optimisation algorithm is to allocate occurrences (presence ⁄ absence) for each species in each site such that the total number of species in each site equals the predicted richness (ai, aj) and the number of species shared between all pairs of sites (cij) equals that derived from the predicted dissimilarity in composition (bij). Although this optimisation problem is conceptually reasonably straightforward, the search space for an optimal solution is very large, even for simple cases (e.g. for 20 species over 10 sites, there are 1.6 · 1060 permutations of species presences ⁄ absences). We reduce the search space by setting the number of species present in each site to the modelled species richness (a-diversity). The optimisation process then seeks to organise the designated number of presences and absences in the species · site matrix so as to match as closely as possible the modelled number of species in common between each pair of sites (Fig. 2). A further input requirement is an estimate of the total number of species across all communities in the region (total c-diversity), which dictates the size of the species list to be solved. The c-diversity for a metacommunity (all the communities in a region) can be specified Letter Filling gaps in biodiversity knowledge 1045 Initial solution Species Site 0 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 A New permuted solution B Best solution so far Species Site 0 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 Number of species in common Site Modelled Site 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 C0 1 0 1 0 0 1 0 0 0 1 0 C 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 D0 0 0 0 Number of species in common Site Modelled Site Best solution so far New permuted solution E F G Check Final solution Species Site 0 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 Figure 2 A graphical representation of the DynamicFOAM procedure. An initial solution is generated randomly (A), given the predicted richness of each site. The best solution so far is permuted by selecting a site at random (B), then selecting a species presence (= 1) and absence (= 0) at random (C) and permuting the selected presence and absence (D). The objective value is obtained as the absolute difference between the modelled number of species in common between each pair of sites and that generated by the permuted species list (E). The solution with the lowest objective value (best solution so far vs. new permuted solution) is retained as the best solution so far for the next iteration (F). The final solution is obtained when either the objective value reaches zero (perfect solution), a stipulated time elapses, or a stipulated number of permutations are made without improvement in the objective value (G). using prior knowledge, estimated using rarefaction techniques (Gotelli & Colwell 2001), or for modest-sized metacommunities, c-diversity can be estimated using the DynamicFOAM algorithm itself. In the latter case, the algorithm is applied to a range of feasible c-diversities. The estimated c-diversity is identified as described in Appendix S2, as the c-diversity whose solution is better (smaller error) than the next in the ascending sequence of tested c-diversities (i.e. the first local minimum in solution error). This approach is robust in estimating the lower bounds on c-diversity (Appendix S2). The sequence of steps in the DynamicFOAM algorithm is shown diagrammatically in Fig. 2A–G. First, an initial solution is generated by allocating the predicted number of presences (modelled richness) to species in each site (A). This initial state may be purely random, or can incorporate some prior knowledge (e.g. weighted-random allocation; see Appendix S1, Fig. S6). The objective value for this initial solution is then determined, as the absolute difference between the modelled and generated number of species in common (cij), summed over all site pairs. An iterative procedure is then applied, where a randomly selected presence and absence (C) from a randomly selected site (B) are swapped (D), the objective value for the altered species · site list is assessed (E) and the permuted species list retained for the next iteration if the objective value is lower than the previous best (F). This iterative procedure continues until (1) the global minimum is achieved or (2) a stipulated time period elapses (maximum processing time), or (3) no further improvement is achieved after a given number of iterations (maximum unimproved iterations) (G). This optimisation process will often terminate at local minima. Consequently, the full algorithm (Fig. 2) is repeated multiple times, generating a population of solutions, with the optimal solution identified as that with the lowest objective value. Where there are no data available on the presence ⁄ absence of specific species in specific sites, the algorithm generates a species · site matrix of presences ⁄ absences using purely hypothetical species. However, where data on the presence ⁄ absence of specific species in specific sites are available, they can be added as further constraints in the optimisation process. In this case, DynamicFOAM predicts the occurrence of these specific species over all sites. Including known occurrences of specific species in the algorithm can markedly reduce the error in the predicted occurrences of species, with perfect solutions obtainable where there is no error in the input a- and b-diversity models (Fig. S1). A more detailed description of the procedure is given in Appendix S1, and software to apply DynamicFOAM is available from the author on request. Applying DynamicFOAM to New Zealand land snails We assessed the practical utility of the DynamicFOAM procedure by applying it to the land snail fauna of New Zealand, which is both highly diverse (998 recognised species) and highly endemic (99.5% of species) (Barker 2005; Overton et al. 2009). Overton et al. (2009) recently modelled both richness (a-diversity) and compositional dissimilarity (b-diversity) of land snails in New Zealand, based on 2330 snail community surveys and a range of environmental and vegetation variables. The richness model of Overton et al. (2009) was developed using generalised additive modelling (within the GRASP software package: Lehmann et al. 2002) with 14 environmental variables, explaining 27% of variation in a-diversity. Compositional dissimilarity was modelled using generalised dissimilarity modelling (Ferrier et al. 2007) with the same 14 environmental variables, with the final model explaining 57% of variation in b-diversity (Overton et al. 2009). We applied these predictions of richness and compositional dissimilarity, along with the known community composition of all 2330 survey sites, to predict land snail community composition over all of New Zealand. The approach was applied using a 200 m resolution grid for all New Zealand, such that communities were defined as square 4 ha areas. Although the 2330 community surveys detected 845 of the 998 recognised species, rarefaction analyses suggest that the total number of land snail species in New Zealand is likely to be 1350–1400 (Appendix S3). We therefore applied a c-diversity of 1350 species in the DynamicFOAM algorithm, 845 species of known identity and 505 ÔunknownÕ species. The original predictions of a-diversity (Overton et al. 2009) were scaled by a factor 2011 Blackwell Publishing Ltd/CNRS 1046 K. Mokany et al. of 2.5 to extend them from a richness value per 20 · 20 m sample plot to a richness value for each 200 · 200 m grid cell (Appendix S4). The algorithm could not be applied to all sites in New Zealand simultaneously (c. 6.7 million sites) due to computational constraints. We therefore applied DynamicFOAM sequentially to contiguous square blocks of 2500 sites (100 km2). All sites within a block were solved simultaneously, with 200 randomly selected sites from any adjacent blocks already solved included as ÔknownÕ compositional data constraints, along with the 2330 community survey sites. This allowed practical application of the algorithm in a sequential process, gradually solving the entire land area of New Zealand, whilst ensuring consistency in solutions between each block solved. A weighted-random procedure was used in generating the initial state (Fig. 2A) for each 2500 site block, to minimise error in the final solution (Appendix S1, Fig. S6). Error quantification Error in the community compositions predicted by DynamicFOAM was quantified in two ways. First, we calculated the mean absolute error in the predicted compositional dissimilarity (MAEb) of each site-pair relative to the modelled (target) dissimilarity between each site-pair. This measure indicates how closely the procedure generated communities with the target level of compositional dissimilarity, and is directly related to the objective value of the optimisation algorithm. Second, we used known compositional data from the 2330 survey sites to quantify the proportion of correctly predicted species occurrences in those sites (the number of observed species correctly predicted divided by the total number of observed species). We determined the significance of this proportion by comparing it with 1000 random selections of the specified number of species from a pool of 1350 species. Error in the prediction of species occurrences can come from three sources: (1) error in the specified species richness of each site (the a-diversity model), (2) error in the specified compositional dissimilarity between each pair of sites (the b -diversity model) and (3) the DynamicFOAM procedure failing to identify the optimal solution, given the specified constraints. We quantified the relative importance of these three sources of error in our predictions of land snail community composition using the 2330 survey sites, where both known and predicted composition, richness and pair-wise dissimilarities were available. To do this, we used the algorithm to predict the composition of 50 randomly selected survey sites, with the remaining 2280 survey sites included as Ôknown dataÕ, as with the analysis for all New Zealand. We applied the algorithm using all four combinations of known and predicted richness (a) and dissimilarities (b) for the 50 randomly selected survey sites (i.e. (i) known-a known-b; (ii) known-a predicted-b; (iii) predicted-a known-b; (iv) predicted-a predicted-b). The error in predicted occurrences from each of these combinations enabled us to quantify the relative contribution of the richness predictions [(iii – i) ⁄ iv], the dissimilarity predictions [(ii – i) ⁄ iv] and the DynamicFOAM procedure itself (i ⁄ iv) to the total error in predicted occurrences (iv). This was repeated 10 times. RESULTS Predicting the composition of all land snail communities The DynamicFOAM procedure successfully generated predictions of the land snail community composition for each of the 6.7 million grid cells across New Zealand. As the algorithm fixes the species richness 2011 Blackwell Publishing Ltd/CNRS Letter of each site to the richness specified by the a-diversity model, the fit of the predicted community compositions is assessed by how well the resulting pair-wise compositional dissimilarities match those specified by the b-diversity model. For the New Zealand land snails, the mean absolute error in the pair-wise dissimilarities (MAEb) of the predicted community compositions was a Sørensen dissimilarity value of 0.075 (± 0.0004 SE). For a pair of sites, each with average richness (= 38) and with half their species in common (= 19), this level of error is equivalent to community compositions with either three too few, or three too many species in common. The capacity of the procedure to correctly predict the occurrences of specific species in specific sites can be quantified by comparing the predicted composition with the known composition for all the community survey sites. The mean proportion of correctly predicted occurrences for all survey sites was 0.498 (± 0.004 SE). This represents correct prediction for approximately half the species known to occur in a specific site. For 72 survey sites, no species occurrences were correctly predicted and for all but two of the remaining survey sites, the proportion of correctly predicted occurrences was significantly higher than that for a random selection of species (mean P-value = 0.0005). Our error partitioning analysis revealed that the greatest sources of error in the prediction of the occurrences of specific species in specific sites was the underlying b-diversity model (responsible for 49.8% of total error) and the DynamicFOAM optimisation procedure itself (40.0% of error), with a relatively small proportion of error coming from the underlying a-diversity model (10.2% of error). The proportion of correctly predicted occurrences was robust to decreases in the amount of known compositional information included in the algorithm (Fig. S5). Extending the predictions of community composition To demonstrate how the predictions of community composition can be used to estimate c-diversity in any sub-region, we extracted the total number of species within circular regions of radius 200 m and 3200 m centred on every grid cell in New Zealand (Fig. 3a,b). We also identified the most suitable regions in which to discover undescribed species, by determining the total number of ÔunknownÕ species predicted to occur within a circular region of arbitrary radius 400 m centred on every grid cell (Fig. 3c). The predictions of community composition for all sites are equivalent to predicting the spatial distributions of all 845 ÔknownÕ and 505 ÔunknownÕ land snail species (e.g. Fig. 4a–c). For the 845 ÔknownÕ species for which occurrence data were available, the spatial pattern of predicted occurrences closely matched predictions obtained from previously applied species-distribution modelling for individual species (see Fig. S2 for examples). When the predicted occurrences of individual species were related to a key environmental variable (mean annual temperature), clear species–environment correlations (realised environmental niches) emerged (e.g. Fig. 4d–f). The predicted area of occupancy for the 845 species surveyed was highly correlated with the number of survey sites in which those species were recorded (PearsonÕs r = 0.902, P < 0.001; Fig. S3). This indicates a high level of consistency between the predicted occurrences and the survey data in terms of which species were common and rare. We assessed the reliability of the c-diversity predictions that emerge from the results by comparing the observed c-diversity across Letter Filling gaps in biodiversity knowledge 1047 (a) (b) Number of species 0 – 50 50 – 100 100 – 150 150 – 200 200 – 220 Applying our predictions of land snail c-diversity for regions of varying size (e.g. Fig. 2A,B), we explored more theoretical aspects of how c-diversity changes with area. For 100 randomly selected sites in New Zealand, we extracted our predictions of c-diversity for concentric circular regions of different radii (200 m, 400 m, 800 m, 1600 m and 3200 m) centred around those sites. We then examined the correlation between the c–diversity for each sized region and the mean a– and b–diversity within those regions. As the size of the region increased, the correlation of c–diversity with mean a– diversity decreased, while the correlation of c–diversity with mean b– diversity increased (Fig. 6). The threshold at which b– diversity became more strongly correlated with c–diversity was an area with radius of c. 600 m (Fig. 6). Number of species 0 – 300 300 – 600 600 – 900 900 – 1200 1200 – 1350 (c) Number of species 0 – 30 30 – 60 60 – 90 90 – 120 120 – 150 Figure 3 The predicted number of land snail species within a circular region of radius (a) 200 m (= total area of five grid cells) and (b) 3200 m, centred around each grid cell in New Zealand and (c) the predicted number of unsurveyed land snail species within a circular region of radius 400 m centred around each grid cell in New Zealand. randomly selected subsets of community survey sites with the predicted c-diversity over the same sites. Note that here we are examining c-diversity for a selected number of sites, rather than the global c-diversity (1350 species) used as a model input. We assessed 500 random combinations of sites, with the number of sites in a combination ranging from 3 to 400, according to a stratified random sample. The predicted c-diversities explained 99% of the variation in observed c-diversities of the survey sites (Fig. 5). Note that the predicted c-diversities are greater in magnitude than the observed values due to the scaling of a-diversity by a factor of 2.5 from the plot level (where the observations were made) to the grid cell level (where the predictions were made) (Appendix S4). DISCUSSION Filling gaps in our knowledge of biodiversity Major shortfalls in current knowledge restrict our capacity to best conserve, manage and understand biodiversity (Lomolino 2004). In this study, we have presented a novel approach to filling gaps in our knowledge of biodiversity, by combining models of a- and b-diversity. As we have demonstrated, our procedure predicts the composition of all communities and this information can be used to help answer a variety of theoretical and applied ecological questions. Our approach is particularly well suited to speciose taxonomic groups for which there is sparse information on the distribution of all species across the region of interest, where there are likely to be a substantial number of undescribed species, and hence, where species-level modelling for all species in the taxon is infeasible. For the land snail fauna in New Zealand, our procedure generated predictions of the composition of all 6.7 million communities, which exactly matched the underlying model of a-diversity and approached the modelled b-diversities to within an average Sørensen dissimilarity of 0.075 between site pairs. The spatial patterns in biodiversity of the predicted communities therefore closely match the community-level diversity models. A unique feature of our procedure is that properties of individual species emerge from the constraints of the community-level models of a- and b-diversity, combined with the community survey data. For example, the predicted occurrences of individual species over space (e.g. Fig. 4a–c) show clearly aggregated spatial patterns, and closely match the distributions predicted through conventional speciesdistribution modelling approaches (e.g. Fig. S2). The procedure also performed moderately well at predicting the occurrence of specific species in specific sites, with almost half (49.8%) of the known species occurrences correctly predicted. We believe that this level of predictive accuracy is quite high, given the inherent errors in the a- and b-diversity models, and the fact that occurrences for all 1350 species (including many rare species) are being predicted simultaneously over millions of locations consistent with observed patterns in a- and b-diversity. Through a data thinning test, we found that the accuracy of occurrence predictions was relatively insensitive to reductions in the amount of known compositional data included in the algorithm, suggesting that this approach will be useful for much sparser data sets than that analysed here (Fig. S5). Of further interest is the emergence of clear and distinct relationships between the occurrence frequency of individual species and key environmental variables such as mean annual temperature (Fig. 4d–f), akin to realised environmental niches (Hutchinson 1957). 2011 Blackwell Publishing Ltd/CNRS 1048 K. Mokany et al. Letter (a) (b) (c) (e) (d) f (f) f f MAT MAT MAT Figure 4 Emergence of species-level attributes from community-level models. The predicted occurrences across New Zealand for three of the 1350 land snail species applied in the DynamicFOAM procedure (A = Chaureopa planulatu, B = Allodiscus austrodimorphus, C = Cavellia irregularis). Black indicates predicted presence, grey indicates predicted absence, with known occurrences shown in white. Also shown are the relative frequency of occurrence (f) as a function of mean annual temperature (MAT) for each species (black bars) against the background environmental frequency for all New Zealand (grey bars) for the same three snail species (D–F). 600 Correlation coefficient ( Pearson's R) Observed gamma-diversity 1.0 400 200 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 1200 Predicted gamma-diversity Figure 5 Observed vs. predicted c-diversity for 500 random combinations of the community survey sites, with the number of sites in each combination ranging from 3 to 400. Predicted c-diversity is extracted from the predicted land snail community compositions across New Zealand. The grey line is a linear model fit to the data (y = 0.524 x–5.096, R2 = 0.99, P < 0.001). The emergence of species-level attributes, such as aggregated distributions and realised niches, from the higher level community diversity patterns contrasts starkly with the common approach of examining emergent community-level properties from the combina 2011 Blackwell Publishing Ltd/CNRS 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Area radius (km) Figure 6 The correlation between the c-diversity within a circular area of a given radius and either the mean a-diversity (black circles) or mean b-diversity (grey triangles) within that area. Data are for circular areas of different radii centred around 100 randomly selected sites across New Zealand. tion of many species-level models (e.g. Hole et al. 2009). One advantage of our approach is that while sensible predictions emerge at the species level (Fig. S2), predictions at the community level tightly adhere to the observed patterns of biodiversity (i.e. the a- and b-diversity models). In contrast, summing many species-level predictions Letter may generate predictions of community diversity that conflict with observed community-level diversity patterns (Pineda & Lobo 2009), such as the a- and b-diversity models. Applying complete metacommunity predictions The predictions made by our procedure have a broad range of possible applications, including conservation planning, ecological survey design, exploration of macroecological theory and application in mechanistic metacommunity models. One of the most obvious benefits of predicting the composition of all communities across a region is that we can assess the total number of species (c-diversity) predicted to occur in any set of sites within that region, regardless of their spatial arrangement. Estimates of c-diversity extracted from our predictions of community composition were strongly related to actual c-diversity for land snails in New Zealand (R2 = 0.99, Fig. 5) indicating a high level of reliability. Note that the capacity for our algorithm to predict c-diversity for any set of sites within a region is very different from diversity partitioning approaches (e.g. Loreau 2000) where a single region-wide b-diversity value (Baselga 2010) can be combined with a single mean a-diversity value to predict a single c-diversity value for a whole region. We have illustrated the potential of DynamicFOAM predictions to assess c-diversity of any set of sites with a simple example, where we determined the number of land snail species predicted to occur in a circular area of radius 200 m or 3200 m centred around each grid cell in New Zealand (Fig. 3a,b). Although the region around some sites possessed high c-diversity regardless of the area examined, other sites possessed relatively low c-diversity for the 200 m radius, but relatively high c-diversity for the 3200 m radius. The most notable example of a rapid increase in c-diversity with increasing area was in the mountainous parts of the South Island of New Zealand (Fig. 3a,b),where richness tended to be low, but pair-wise b-diversity was relatively high. The capacity to interrogate the predictions of community composition and extract a prediction of the c-diversity for any combination of sites has obvious practical applications for conservation planning, where we may want to identify the best area or set of areas to situate new conservation reserves, so as to maximise the number of species represented (Margules & Pressey 2000). Our procedure also incorporates species that are yet to be discovered in its predictions of community composition. In the case of New Zealand land snails, it was necessary to include ÔunknownÕ or ÔundescribedÕ species in the optimisation procedure, together with the 845 species encountered in the community surveys. Including unknown species in the procedure enables the identification of areas that are predicted to have the largest numbers of undescribed species (e.g. Fig. 3C), based on the community-level diversity models and the community surveys. This information could be used in planning the best sites for future ecological surveys to both identify new species and improve the community-level diversity models. These predictions could save valuable ecological research funding and effort, whilst maximising our capacity for documenting and describing the EarthÕs biota, bridging the so called ÔLinnaean ShortfallÕ in our knowledge of biodiversity (Brown & Lomolino 1998). The development and testing of macroecological theory is another endeavour which can be restricted by shortfalls in our knowledge of biodiversity (Gaston & Blackburn 2000). For example, the way in which a- and b-diversity interact to influence the c-diversity for a Filling gaps in biodiversity knowledge 1049 region continues to be an important topic of debate among ecologists (e.g. Loreau 2000; Crist & Veech 2006). In this study, we demonstrate how the DynamicFOAM predictions can contribute to our understanding of basic macroecological forces, such as the relative importance of a- and b-diversity in influencing c-diversity. In a simple analysis, we assessed the correlation between predicted land snail c-diversity and both mean a- and mean b-diversity, for areas of different size centred on 100 randomly selected sites in New Zealand. This analysis suggests that as the size of a region increases, mean a-diversity becomes less important in influencing c-diversity while mean b-diversity becomes more important (Fig. 6). A wide variety of alternative macroecological explorations are conceivable using the DynamicFOAM predictions of metacommunity composition. As we have demonstrated, our procedure can generate community compositions for all sites across large regions that approach known patterns in biodiversity (a- and b-diversity) and in which component species possess realistic distributions (Figs 4 and S2). Such predictions could facilitate the first application of mechanistic metacommunity models to predict biodiversity change over large areas under scenarios of global change (Kerr et al. 2007). Utilising complete predictions of metacommunity composition from the DynamicFOAM procedure as an initial state, metacommunity models could incorporate key processes (e.g. dispersal, interspecific interactions) in predicting likely impacts of global change on biodiversity as a whole (Mokany & Ferrier 2011). This exciting new opportunity for metacommunity modelling may require the adaptation of existing model frameworks (Holyoak et al. 2005), or the derivation of new metacommunity modelling approaches, to enable their application to the large regions for which biodiversity management decisions are often made. Looking forward The reliability of community composition predictions made with the DynamicFOAM procedure depends on a number of factors, including the amount of ecological survey data available, the robustness of the underpinning community-level models (the a- and b-diversity models) and the capacity of the algorithm to generate a solution that best meets the constraints. Our error partitioning analysis for the land snail example suggested that improvements in the b-diversity model would yield the greatest improvement in the accuracy of predicting specific species in specific sites (responsible for 49.8% of the total error). Despite the modest amount of variation in species richness explained by the a-diversity model (27%), an improved richness model would yield at most a 10.2% reduction in the error of predicted occurrences. Our analyses also found a large component of error (40.0%) stemming from the DynamicFOAM procedure failing to reach the globally optimal solution. This error can be reduced directly by increasing the processing time; however, there are decreasing marginal returns in error reduction as processing time increases (data not shown). It is highly likely that further improvements in the predictive accuracy and computation time of our approach are possible through further methodological development. These improvements could be achieved through programming efficiencies, parallelisation or alternative optimisation algorithms (though alternative approaches, such as binary linear programming and binary genetic algorithms, were found to be inferior in preliminary testing of our approach). There are also numerous possible approaches for sequentially solving community compositions for regions with large numbers of communities and for 2011 Blackwell Publishing Ltd/CNRS 1050 K. Mokany et al. initialising species lists prior to the iterative permutation procedure (Fig. 2A). The solution presented in this study, for land snails in New Zealand required c. 6 days of computing time on a standard desktop. Following substantial gains in computational efficiency, multiple solutions could be generated for a given taxon and region, allowing for quantification of confidence intervals associated with the predictions of species occurrences and gamma diversities. Testing our generic approach with different taxa in different regions is another important step in further improving the DynamicFOAM procedure. In conclusion, our approach to combining community-level models of a- and b-diversity represents a powerful way to help fill the gaps in our knowledge of biodiversity, especially for highly diverse, poorly studied taxa. Our results are consistent with theory at both the species and community levels. As we have demonstrated, the predictions made from our procedure have a wide range of potential applications, including conservation planning, survey design, testing macroecological theory and facilitating the application of metacommunity models to predict the impacts of global change on biodiversity. ACKNOWLEDGEMENTS We thank S. H. Roxburgh for advice on optimisation algorithms, as well as D. R. Paini, J. B. Pichancourt and four anonymous reviewers for comments on earlier versions of this manuscript. AUTHORSHIP KM and SF identified the problem; KM designed the study, derived the method, wrote the manuscript; KM and TDH analysed the data; JMO and GMB provided data and expert advice; TDH, JMO, GMB and SF contributed to the manuscript. REFERENCES Araújo, M.B., Thuiller, W. & Pearson, R.G. (2006). Climate warming and the decline of amphibians and reptiles in Europe. J. Biogeogr., 33, 1712–1728. Arponen, A., Moilanen, A. & Ferrier, S. (2008). A successful community-level strategy for conservation prioritization. J. Appl. Ecol., 45, 1436–1445. Barker, G.M. (2005). The character of the New Zealand land snail fauna and communities: some evolutionary and ecological perspectives. Rec. West. Aust. Mus. Supp., 68, 53–102. Baselga, A. (2010). Partitioning the turnover and nestedness components of beta diversity. Glob. Ecol. Biogeogr., 19, 134–143. Botkin, D.B., Saxe, H., Araujo, M.B., Betts, R., Bradshaw, R.H.W., Cedhagen, T. et al. (2007). Forecasting the effects of global warming on biodiversity. Bioscience, 57, 227–236. Brown, J. & Lomolino, M. (1998). Biogeography, 2nd edn. Sinauer Press, Sunderland Massachusetts. Crist, T.O. & Veech, J.A. (2006). Additive partitioning of rarefaction curves and species-area relationships: unifying alpha-, beta- and gamma-diversity with sample size and habitat area. Ecol. Lett., 9, 923–932. Driscoll, D.A. & Lindenmayer, D.B. (2009). Empirical tests of metacommunity theory using an isolation gradient. Ecol. Monogr., 79, 485–501. Elith, J. & Leathwick, J.R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol. Evol. Syst., 40, 677–697. Ferrier, S. & Guisan, A. (2006). Spatial modelling of biodiversity at the community level. J. Appl. Ecol., 43, 393–404. Ferrier, S., Manion, G., Elith, J. & Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Divers. Distrib., 13, 252–264. Gaston, K.J. & Blackburn, T.M. (2000). Pattern and Process in Macroecology. Blackwell Publishing, Oxford. 2011 Blackwell Publishing Ltd/CNRS Letter Gotelli, N.J. & Colwell, R.K. (2001). Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol. Lett., 4, 379–391. Hole, D.G., Willis, S.G., Pain, D.J., Fishpool, L.D., Butchart, S.H.M., Collingham, Y.C. et al. (2009). Projected impacts of climate change on a continent-wide protected area network. Ecol. Lett., 12, 420–431. Holyoak, M., Leibold, M. & Holt, R. (2005). Metacommunities: Spatial Dynamics and Ecological Communities. The University of Chicago Press, Chicago. Hutchinson, G. (1957). Concluding remarks. Cold Spring Harb. Symp. on Quant. Biol., 22, 415–427. Kearney, M. & Porter, W. (2009). Mechanistic niche modelling: combining physiological and spatial data to predict speciesÕ ranges. Ecol. Lett., 12, 334–350. Kerr, J.T., Kharouba, H.M. & Currie, D.J. (2007). The macroecological contribution to global change solutions. Science, 316, 1581–1584. Lehmann, A., Overton, J.M. & Leathwick, J.R. (2002). GRASP: generalized regression analysis and spatial prediction. Ecol. Modell., 160, 165–183. Lomolino, M. (2004). Conservation biogeography. In: Frontiers of Biogeography: New directions in the Geography of Nature (eds Lomolino, M. & Heaney, L.). Sinauer Associates Sunderland, Massachusetts, pp. 293–296. Loreau, M. (2000). Are communities saturated? On the relationship between a, b and y diversity. Ecol. Lett., 3, 73–76. Margules, C.R. & Pressey, R.L. (2000). Systematic conservation planning. Nature, 405, 243–253. Marsh, C.J., Lewis, O.T., Said, I. & Ewers, R.M. (2010). Community-level diversity modelling of birds and butterflies on Anjouan, Comoro Islands. Biol. Conserv., 143, 1364–1374. Mokany, K. & Ferrier, S. (2011). Predicting impacts of climate change on biodiversity: a role for semi-mechanistic community-level modelling. Divers. Distrib., 17, 374–380. Overton, J., Barker, G. & Price, R. (2009). Estimating and conserving patterns of invertebrate diversity: a test case of New Zealand land snails. Divers. Distrib., 15, 731–741. Pineda, E. & Lobo, J.M. (2009). Assessing the accuracy of species distribution models to predict amphibian species richness patterns. J. Anim. Ecol., 78, 182– 190. Thuiller, W., Broennimann, O., Hughes, G., Alkemade, J.R.M., Midgley, G.F. & Corsi, F. (2006). Vulnerability of African mammals to anthropogenic climate change under conservative land transformation assumptions. Glob. Change Biol., 12, 424–440. Tuomisto, H. (2010). A diversity of beta diversities: straightening up a concept gone awry. Part 2. Quantifying beta diversity and related phenomena. Ecography, 33, 23–45. SUPPORTING INFORMATION Additional Supporting Information may be found in the online version of this article: Figure S1 Increasing the accuracy of occurrence predictions by including known data as a constraint (a test with synthetic metacommunities). Figure S2 Examples of species distributions predicted using DynamicFOAM compared with those using individual species-distribution modelling. Figure S3 Predicted vs. surveyed extent of occurrence for the 845 surveyed species. Figure S4 Rank-occurrence relationships for both predicted occurrence and surveyed occurrence. Figure S5 The effect of data scarcity on the accuracy of occurrence predictions. Figure S6 The effect of a weighted-random procedure for generating an initial solution on the predictive accuracy of the algorithm. Appendix S1 Detailed description of the DynamicFOAM algorithm. Appendix S2 Applying DynamicFOAM directly to predict gammadiversity. Letter Appendix S3 Estimation of the total number of land snail species. Appendix S4 Scaling richness predictions from 20 m sample plot to 200 m grid cell. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organised for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors. Filling gaps in biodiversity knowledge 1051 Editor, Howard Cornell Manuscript received 28 March 2011 First decision made 03 May 2011 Second decision made 1 July 2011 Manuscript accepted 15 July 2011 2011 Blackwell Publishing Ltd/CNRS
© Copyright 2026 Paperzz