A Species-Specific Approach to Modeling Biological Communities and Its Potential for Conservation JULIAN D. OLDEN Department of Biology and Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO 80523–1878, U.S.A., email [email protected] Abstract: Community-level approaches to biological conservation are now recognized as a major advance in most current single-species conservation and management practices. Existing approaches for modeling biological communities, have limited utility, however, because they commonly examine community metrics (e.g., species richness, assemblage “types”) and consequently do not consider the identity of the species that compose the community. This is a critical shortcoming because the functional differences among species ultimately translate into the differential vulnerability of biological communities to natural and human-related environmental change. To address this concern I present a novel, species-specific approach to modeling communities using a multiresponse, artificial neural network. This provides an analytical approach that facilitates the development of a single, integrative model that predicts the entire species membership of a community while still respecting differences in the functional relationship between each species and its environment. I used temperate-lake fish communities to illustrate the utility of this modeling approach and found that predictions of community composition by the neural network were highly concordant with observed compositions of the 286 study lakes. Average similarity between observed and predicted community composition was 80% (22 out of the 27 species correctly classified), and the model predicted a significant portion of the community composition in 91% of the lakes. I discuss the importance of the lake habitat variables for predicting community composition and explore the spatial distribution of model predictions in light of recent species invasions. The proposed modeling approach provides a powerful, quantitative tool for developing community predictive models that explicitly consider species membership, and thus each species’ functional role in the community. Such models will contribute significantly to the study and conservation of biological communities. Key Words: artificial neural network, classification, ecology, fish, prediction, presence and absence Un Enfoque Especie–Específico al Modelaje de Comunidades Biológicas y Su Potencial para la Conservación Resumen: La conservación biológica a nivel de comunidad se reconoce actualmente como un importante avance en la mayoría de las prácticas actuales de conservación y manejo de una especie. Sin embargo, los enfoques actuales al modelaje tienen una utilidad limitada ya que, por lo general, consideran medidas de las comunidades (por ejemplo, la riqueza de especies, “tipos” de ensamblaje”) sin considerar así la identidad de las especies que integran la comunidad. Esta es una limitación crucial dado que las diferencias funcionales entre especies se traducen, en última instancia, en una vulnerabilidad diferencial de las comunidades biológicas a cambios ambientales naturales y antropogénicos. Para atender esta preocupación, presento un enfoque novedoso especie – específica al modelaje de comunidades utilizando una red neural artificial y de respuestas múltiples. Este enfoque proporciona una aproximación analítica que facilita el desarrollo de un solo modelo integrativo que predice el total de especies que integran una comunidad, respetando a su vez las diferencias en la relación funcional entre cada especie y su ambiente. Utilicé comunidades de peces en lagos templados para ejemplificar la utilidad de este modelo y pude determinar que las predicciones en cuanto a la composición de la comunidad usando la red neural eran altamente concordantes con las composiciones observadas de los 286 lagos estudiados. La similitud promedio entre la composición observada y la predicha fue del 80% (22 de 27 especies correctamente clasificadas), y el modelo predijo una porción significativa de Paper submitted June 18, 2001; revised manuscript accepted August 5, 2001. 854 Conservation Biology, Pages 854–863 Volume 17, No. 3, June 2003 Olden Modeling Biological Communities 855 la comunidad en 91% de los lagos. Se discute la importancia de las variables del hábitat lacustre para predecir la composición de la comunidad y la distribución espacial de las predicciones del modelo es explorada a la luz de recientes invasiones de especies. La aproximación propuesta proporciona una poderosa herramienta cuantitativa para desarrollar modelos predictivos que consideran explícitamente las especies integrantes de una comunidad (y por lo tanto, la función de cada especie en la comunidad). Tales modelos contribuirán significativamente al estudio y a la conservación de comunidades biológicas. Introduction Conservation biology is deeply rooted in population biology, both historically with its origins in wildlife management and contemporarily with the emergence of conservation biology as an academic discipline. As a result, researchers have traditionally focused their efforts on the conservation and management of individual populations because species are generally easier to monitor, they are more amenable to explicit modeling, and extinction occurs one population at a time. Single-species approaches to conservation continue to be the major paradigm in conservation biology ( Caughley 1994 ), although researchers continue to recognize their limitations ( Simberloff 1997 ). Given the limited time and resources available to conserve and manage biodiversity on a species-by-species basis and the agony of choosing among the increasing number of species that are imperiled ( Vane-Wright et al. 1991 ), our ability to meet the challenge of preserving biodiversity hinges on the successful conservation of entire biological communities. Approaching the conservation of aquatic and terrestrial biodiversity at a community level is now perceived as being a major advance in conservation biology (Franklin 1993), in restoration ecology (Young 2000), and in the interface of the two disciplines (Urbanska 2000). Critical components of any effective community-level conservation, management, or restoration strategy are a working knowledge of the relationship between the target community and the environment and formulation of this understanding into a predictive model. This is an essential process because without predictive community models—for example, to forecast the effects of habitat modification on community structure and function—a community-level approach will likely play no constructive role in conservation biology. Unfortunately, the development of such models is a difficult task because it is necessary to model all the species of the community ( i.e., multiple response variables ), as opposed to the conventional task of modeling a single species. Consequently, researchers have employed an array of analytical approaches to model communities, with the common goal of gaining both explanatory insight into the factors shaping community structure and predictive insight into future states of the community. Of the nu- merous techniques used to model biological communities, three general approaches have emerged. First, the most common approach involves modeling a single community metric that summarizes the general structure of the biological community (e.g., species richness, redundancy, evenness, total biomass; Washington 1984), an indicator species that serves as a surrogate for the status of other species or environmental conditions, or an umbrella species whose conservation serves to protect sympatric species ( Noss 1990 ). Although the use of community metrics or single species eases the quantitative analysis of community-environment relationships, these approaches have a number of drawbacks. For example, community metrics such as species richness poorly represent the characteristics of a community because they provide no information regarding actual species membership. Consequently, species richness can be a particularly misleading metric because species are differentially prone to extinction ( McKinney 1997 ), species-rich assemblages often do not include those species most in need of conservation ( Prendergrast et al. 1993), and local or regional richness may be similar in space or remain constant in time even though substantial variation in species composition may exist ( Brown et al. 2001 and citations therein). Similarly, the indicator and umbrella species concepts for conserving entire communities continue to receive criticism both theoretically (Niemi et al. 1997; Simberloff 1997; Caro & O’Doherty 1999 ) and empirically ( Chase et al. 2000; Fleishman et al. 2001). Second, a community classification approach is used to define assemblage “types” based on best professional judgment, numerically dominant species ( e.g., Magnuson et al. 1998), guild classifications (e.g., Simberloff & Dayan 1991 ) or groups of species based on patterns of co-occurrence elucidated from multivariate ordination or clustering analyses ( e.g., Capone & Kushlan 1991; Mucina 1997). Community classification is also the first step prior to modeling aquatic invertebrate communities in many biomonitoring programs ( Wright et al. 2000 ). Although this approach is used to make the study of complex multispecies assemblages more tractable, it has a number of methodological and ecological shortcomings. Methodologically, the results of cluster analyses are often compromised by the variable nature of cluster rec- Conservation Biology Volume 17, No. 3, June 2003 856 Modeling Biological Communities ognition (Anderson & Clements 2000), by the choice of similarity coefficients ( Jackson et al. 1989), and by the erroneous identification of clusters that do not truly exist (Milligan & Cooper 1985). Furthermore, a subjective decision is often required to determine the degree of similarity necessary to group species together. For instance, Angermeier and Winston (1999) found over 90 statistically distinct fish assemblage types in Virginia, emphasizing that the division of communities into discrete assemblages is arbitrary in that different spatial scales can yield greater or fewer numbers of assemblage types. In addition to methodological pitfalls, probably the greatest disadvantage of aggregating species into assemblage types is the loss of information about individual species (Hay 1994). If the types are not a product of natural processes and are not temporally and spatially stable, any prediction of community state or change based on that assemblage type may have little relevance. This approach also assumes that the species that comprise each assemblage type always coexist, that each species “belongs” to a single assemblage type, and that species in the same assemblage type exhibit exactly the same functional and numeric responses to changes in the environment. Furthermore, after predictive models have been developed they may not be applicable to systems outside the sites used to classify the assemblages because of the existence of different species pools and thus different assemblage types. Finally, nonrandom species assemblages may not always exist ( e.g., Jackson et al. 1992), and discrete assemblage types will not always be definable because assemblages may be composed of species exhibiting individual responses to the environment (e.g., Pusey et al. 2000). Lastly, multivariate statistical approaches, such as canonical analysis and matrix comparison methods, are commonly used to relate the structure of biological communities to the various components of the environment (e.g., Rodriguez & Lewis 1997; Zampella & Bunnell 1998). Multivariate approaches have been advocated because they efficiently summarize community-environment relationships into a simplified and easily illustrated format. Unfortunately, although multivariate approaches help us understand the major factors likely to be structuring biological communities, they do not produce a quantitative framework from which the composition of such communities can be predicted and the importance of the environmental variables can be assessed. In short, current approaches for modeling the composition of biological communities ignore the critical fact that important differences exist among species, thus limiting their suitability for conserving entire communities. Although single species remain the primary foci of conservation efforts, it is now overwhelmingly evident that a habitat-based, multispecies conservation approach is required (Martin 1994). For a community-level approach to conservation to be successful, however, innovative Conservation Biology Volume 17, No. 3, June 2003 Olden approaches are needed to facilitate the modeling of communities while still respecting differences in the identity of the species. Such models would contribute significantly to the effective conservation of the complete spectrum of biodiversity, as opposed to current efforts that focus solely on individual populations or subsets of the community. Driven by this goal, I propose a species-specific approach to modeling the composition of biological communities. The approach uses a multiresponse, artificial neural network to predict the occurrence of each species in a community from a common set of predictor (i.e., environmental) variables, resulting in a single model that predicts the entire composition of a community. Here, I detail the neural network methodology for developing predictive community models, and illustrate its construction and interpretation with northtemperate lake fish communities. Methods Fish Communities of Algonquin Provincial Park, Ontario, Canada I used fish communities of 286 lakes located in Algonquin Provincial Park, Ontario, Canada (lat 4550N, long 7820 W), to illustrate the neural network approach to modeling biological communities. Aquatic communities in this region are representative of relatively natural ecosystems because these lakes are located in a provincial park and are currently subject to minimal perturbations from human development (except for limited introduction of smallmouth bass during the early 1900s). A community model was constructed by modeling the presence/absence of 27 fish species as a function of nine whole-lake habitat variables that are related to known habitat requirements of fish in this geographic region ( Matuszek & Beggs 1988; Minns 1989). The variables included surface area (ha), maximum depth (m), volume (m3 ), total shoreline perimeter (km), elevation (m), total dissolved solids ( mg/L ), pH, growing-degree days ( basis 5 C ), and occurrence of summer stratification. All data were obtained from the Algonquin Park Fish Inventory database (1989–1991), which involved a combination of extensive sampling of lakes in the park with multiple gear types ( multipanel survey gill nets, plastic traps, seines, Gee minnow traps ) and records obtained from the Ontario Ministry of Natural Resources Lake Inventory and the Royal Ontario Museum collection (Crossman & Mandrak 1992). Modeling Biological Communities with an Artificial Neural Network Artificial neural networks (ANNs) have been advocated in many disciplines for addressing complex pattern- Olden recognition problems. The advantages of ANNs over traditional, linear approaches include their ability to model nonlinear associations with a variety of data types (e.g., continuous, discrete) and to accommodate interactions among predictor variables without any a priori specification (Bishop 1995). Neural networks are considered universal approximators of continuous functions, and as such they exhibit flexibility for modeling nonlinear relationships between variables. For example, ANNs exhibit substantially greater predictive power than traditional, linear approaches when modeling nonlinear data (based on empirical and simulated data; Olden & Jackson 2002b ). Here, I discuss the characteristics of the inner architecture of the neural network and refer the reader to the text of Bishop (1995) and the papers of Lek et al. (1996) and Olden and Jackson (2001) for more comprehensive coverage. I used a feed-forward neural network trained by the back-propagation algorithm ( Rumelhart et al. 1986 ). The architecture of this network consists of a single input, hidden, and output layer, with each layer containing one or more neurons ( Fig. 1 ). The input layer contains p neurons, each of which represents one of the p predictor variables. The number of hidden neurons in the neural network varies and is chosen to minimize the trade-off between network bias and variance ( Bishop 1995 ). The output layer contains one neuron for each response variable being modeled ( e.g., one neuron for the presence/absence of each species ). Additional bias neurons with a constant output ( equal to 1 ) are added to the hidden and output layers, which play a role similar to that of the constant term in regression analysis. A two-layer network—that is, a network without a hidden layer—is computationally identical to a linear regression model. I determined the optimal number of neurons in the hidden layer of the network by comparing the per- Modeling Biological Communities 857 formances of different cross-validated networks, with 1 to 25 hidden neurons, and choosing the number that produced the greatest network performance. This resulted in a network with 9 input neurons ( i.e., nine habitat variables), 7 hidden neurons, and 27 output neurons (i.e., the presence/absence of 27 species) (Fig. 1). Each neuron in the network ( excluding the bias neurons) is connected to all neurons from adjacent layers by axons. Axon connections among neurons are assigned a weight that dictates the intensity of the signal transmitted by the axon. The “state” or “activity level” of each input neuron is defined by the incoming signal ( i.e., values ) of the predictor variables, whereas the state of the other neurons is evaluated locally by calculating the weighted sum of the incoming signals from the neurons of the previous layer. The entire process is written mathematically as y k = φ o β k + ∑ w jk φ h β j + ∑ wij x i , j i (1) where xi are the input signals, yk are the output signals, wij are the weights between input neuron i and hidden neuron j, wjk are the weights between hidden neuron j and output neuron k, j, k are the biases associated with the hidden and output layers, and h and o are activation functions for the hidden and output layers. I used the logistic (or sigmoid) function to transform the state of each output neuron to range between 0 and 1 ( representing the probability of species occurrence ) and a decision threshold of 0.5 to classify a species as present or absent. The neural network is trained with a back-propagation algorithm in which all the connection weights are iteratively adjusting, with the goal of finding a set of connection weights that minimizes the error of the network (in Figure 1. Architecture of the artificial neural network used to model the presence/absence of 27 fish species as a function of nine lake habitat variables. Conservation Biology Volume 17, No. 3, June 2003 858 Modeling Biological Communities this case the cross-entropy error function ). During network training, observations are sequentially presented to the network and weights are adjusted depending on the magnitude and direction of the error. Learning rate ( ) and momentum ( ) parameters ( varying as a function of network error ) were included during network training to ensure a high probability of global network convergence (Bishop 1995), and a maximum of 1000 iterations was set for the back-propagation algorithm to determine the optimal axon weights. Prior to training the neural network, the predictor variables were converted to z scores to standardize the measurement scales of the inputs into the network and thus to ensure that the same percentage change in the weighted sum of the inputs caused a similar percentage change in the unit output. The neural network was validated with n-fold cross-validation. Given that the state of each output neuron is defined by the sum of its incoming signals, the overall contribution of the habitat variables to the predictive output of the neural network can be quantified with the axon connection weights. The explanatory importance of the lake habitat variables for predicting community composition was quantified by calculating the product of the input-hidden and hidden-output connection weights between each input neuron and output neuron and then summing the products across all hidden neurons. This procedure was repeated for each habitat variable, and the relative contributions of the variables were calculated by dividing the absolute value of each variable contribution by the grand mean (sum of all absolute variable contributions ). The relative contributions of each variable were subsequently assessed for their statistical significance with a randomization test, which randomizes the response variable and then constructs a neural network based on the randomized data and records the relative explanatory importance of each habitat variable. This process is repeated 9999 times to generate a null distribution for the relative importance of each variable, which is then compared with the observed values to calculate the significance level. I refer the reader to Olden and Jackson (2002a ) for more details on calculating variable contributions and testing their significance with the randomization approach. I used a randomization test to assess whether the neural network predicted a statistically significant number of species in the communities. For each lake, the test involved randomly permuting the predicted community composition from the neural network 9999 times, each time calculating the percent similarity (derived from the simple matching coefficient; Legendre & Legendre 1998) with observed community composition. Significance levels were then calculated as the proportion of randomized trials that the percent similarity was equal or greater than the percent similarity between the observed and predicted community composition. All pro- Conservation Biology Volume 17, No. 3, June 2003 Olden cedures, including the construction of the neural networks, were conducted with computer macros in the MatLab programing language written by the author. Results Predictions of fish community composition by the neural network were highly concordant with observed species membership in the 286 study lakes. For the majority of the lakes the degree of similarity between predicted and observed fish-community composition ranged between 70% and 90%, indicating that the neural network correctly predicted the presence/absence of 19–24 out of the 27 total species (Fig. 2). Results from the randomization test showed that the neural network correctly predicted a significant portion of the species in the community in 91% of the study lakes ( 260 out of the 286 lakes). The neural network exhibited varying success in predicting individual species in the communities (Table 1). For example, northern redbelly dace, pearl dace, golden shiner, and brown bullhead ( scientific names provided in Table 1) were the most poorly predicted species (61.5– 65.4% ), indicating that their memberships in the communities were the most often misclassified by the neural network relative to other species. In contrast, blackchin shiner, trout-perch, and rock bass were the most successfully predicted species (90.9–94.4%), indicating that their memberships in the communities were consistently correctly classified. Results from the randomization test ( based on network connection weights ) illustrated that the ranked importance of the habitat variables for predicting individual species membership in the communities varied Figure 2. Percent similarity between observed fish community composition and predicted fish community composition by the neural network. Arrow represents mean percent similarity. Olden Table 1. Modeling Biological Communities 859 Predictive performance of the neural network for predicting species composition in the 286 study lakes. Common name Blackchin shiner Blacknose shiner Brook stickleback Brook trout Brown bullhead Burbot Cisco Creek chub Common shiner Fallfish Finescale dace Golden shiner Iowa darter Lake chub Lake trout Lake whitefish Longnose sucker Northern redbelly dace Pearl dace Pumpkinseed Rock bass Round whitefish Smallmouth bass Splake Trout-perch White sucker Yellow perch Scientific name Occurrence % Correctly classified % Significant habitat variables* Notropis heterodon Notropis heterolepis Culaea inconstans Salvelinus fontinalis Ameiurus nebulosus Lota lota Coregonus artedi Semotilus atromaculatus Luxilus cornutus Semotilus corporalis Phoxinus neogaeus Notemigonus crysoleucas Etheostoma exile Couesius plumbeus Salvelinus namaycush Coregonus clupeaformis Catostomus catostomus Phoxinus eos Margariscus margarita Lepomis gibbosus Ambloplites rupestris Prosopium cylindraceum Micropterus dolomieu S.fontinalis S.namaycush Percopsis omiscomayus Catostomus commersoni Perca flavescens 5.6 40.6 27.3 76.9 47.2 30.8 21.3 68.2 54.2 11.2 15.0 39.9 20.3 16.4 52.8 13.6 18.5 54.5 41.3 65.0 12.2 10.5 21.3 7.7 9.4 83.6 71.0 94.4 71.7 76.6 80.1 65.4 81.5 83.9 74.1 72.0 89.2 89.2 64.7 81.2 83.9 76.2 86.4 82.9 61.5 64.3 70.6 90.9 89.2 79.0 92.3 91.3 86.0 72.0 Elev, ShPer., SS ShPer, Elev, SArea MaxDep, SArea, Volume SArea, ShPer ShPer, SArea ShPer, Elev Elev, SArea ShPer, MaxDep, SArea ShPer, Elev, SArea SArea, ShPer, Elev, pH ShPer, Volume, SArea ShPer, Elev, SArea Elev, MaxDep, TDS MaxDep, ShPer MaxDep, ShPer ShPer, SArea SArea, GDD MaxDep, Elev MaxDep, ShPer, SArea ShPer, Elev Elev, SArea, GDD ShPer, SArea, GDD ShPer, Elev, GDD Elev, MaxDep, pH Elev, TDS ShPer ShPer, TDS, SArea *Habitat variables are listed in decreasing importance. Variable abbreviations: lake surface area (SArea); maximum depth (MaxDep); lake volume (Volume); shoreline perimeter (ShPer); elevation (Elev); total dissolved solids (TDS); pH; growing-degree days (GDD); and occurrence of summer stratification (SS). across species (Table 1). For instance, perimeter of lake shoreline, elevation, and growing-degree days were significant predictors of smallmouth bass occurrence, whereas maximum depth of the lake and shoreline perimeter were the most important predictors of lake trout occurrence. Habitat variables quantifying the overall size and shoreline complexity of lakes were the most important predictors of entire communities. These variables included shoreline perimeter ( relative contribution 19.7% ), lake surface area ( 16.3% ), elevation ( 16.2% ), and maximum depth ( 12.5% ), which were significant predictors of presence/absence for 19, 16, 14, and 8 species, respectively (Table 1; Fig. 3). Fish community composition of the study lakes was predicted with varying spatial success across Algonquin Provincial Park (Fig. 4). There was close agreement between the number of lake communities that were predicted with low success (i.e., percent similarity 70% between observed and predicted community composition) and the number of total lakes in each of the five major watersheds. Therefore, no evidence exists for differential model predictive performance across the watersheds, although within watersheds the species memberships of particular communities were predicted with low success. Discussion Modeling Biological Communities with an Artificial Neural Network Herein I propose a novel artificial neural-network approach for modeling biological communities in a species-specific manner. The methodology gives researchers the ability to develop a single, integrative model for predicting complete species membership from a common set of environmental variables, an approach that is not possible with traditional, statistical techniques. Given the utility of the approach, it is important to appreciate how the architecture and dynamic optimization of the neural network enable the modeling of different species-habitat relationships. The hidden layer of neurons in the network plays a vital role by individually mediating the information transfer between the input and output neurons—that is, they mediate the relative influence of the habitat variables on predicting the occurrence of each species. This feature makes it possible to model multiple species-habitat relationships based on the same set of predictor variables within a single integrative model. Following is a hypothetical example to illustrate the function of the hidden neuron layer. Imagine that the uppermost hidden neuron in Fig. 1 becomes Conservation Biology Volume 17, No. 3, June 2003 860 Modeling Biological Communities Olden In summary, the neural network contains a set of input-hidden connection weights that reflect the universal importance of the habitat variables for predicting the presence/absence of all the species, whereas the hidden-output connection weights modify the first set of weights by reducing, enhancing, or even reversing the signals to maximize the probability of correctly classifying each individual species in the community. Therefore, the architecture of the neural network enables the development of predictive models for community composition in which species-specific responses to environmental conditions are accounted for. Community Predictive Models and Their Potential for Fish Conservation Figure 3. Relative contributions of the lake habitat variables for predicting community composition averaged across all 27 fish species. Whiskers represent 1 SD, and the numbers in each bar represent the number of species for which the habitat variable was a statistically significant predictor of presence/absence (Table 1). highly activated when lake surface area is large (i.e., the connection weight between the surface-area neuron and the hidden neuron is much larger than the connection weights of other input neurons ). The activated hidden neuron then mediates the importance of this information ( i.e., the importance of large lake surface area ) by modifying the connection weight between itself and the layer of output neurons to any value between negative and positive infinity. Therefore, if a large lake surface area provides no discriminator power for predicting the occurrence of species 1, then the connection weight linking the activated hidden neuron and the output neuron for species 1 would be set equal to a small value ( i.e., close to zero). This results in lake surface area having no influence on the predicted probability of occurrence for species 1. On the other hand, if large lake surface area is predictive of the presence or absence of species 2, then the connection weight between the activated hidden neuron and the corresponding species 2 output neuron would be set equal to a large positive ( species present ) or negative ( species absent ) value. The dynamic nature of the network is a result of the fact that each hidden neuron has the ability to modify the signals (i.e., information) provided by all input neurons before defining its outgoing signal to each output neuron. Therefore, the hidden neurons dictate the relative importance of each input variable for predicting the occurrence of each species in the network. Conservation Biology Volume 17, No. 3, June 2003 Establishing explanatory and predictive relationships between biological communities and the environment has considerable potential in conservation biology. Although a thorough discussion of the potential applications of community predictive models to conservation is beyond the scope of this paper, I contend that the development of statistical models that provide testable, community-wide predictions of complete species membership will contribute significantly to more-efficient conservation, management, and restoration strategies. I limit my discussion to the potential utility of community predictive models for fish conservation by examining applications of the neural network to the fish communities of Algonquin Provincial Park, Ontario, Canada. The species-specific community model I present provides a unique opportunity to determine how lake habitat conditions contribute to observed patterns in fish community composition. The community neural network was successful in predicting complete species membership in the study lakes and emphasized the importance of overall lake size and elevation in shaping the composition of the fish communities of Algonquin Provincial Park. Interestingly, lake size and elevation are also important factors in Ontario for shaping patterns in fish species richness ( Eadie & Keast 1984; Matuszek & Beggs 1988; Minns 1989 ). Predictability of individual species by the model was also consistent with findings of single-species models for north-temperate fish populations ( e.g., Tonn et al. 1990; Magnuson et al. 1998; Olden & Jackson 2001 ). For example, the presence/ absence of lake trout ( a cold-water species ) was best predicted with maximum depth; a habitat factor that influences the mixing characteristics and the thermal regime of lakes. Lake elevation and the number of growingdegree days were the most important predictors of smallmouth bass occurrence; a finding consistent with the sensitive thermal requirements of this species (Shuter & Post 1990). Furthermore, shoreline perimeter was important for both these species and serves as an indirect measure of the diversity of habitats available in lakes, Olden Modeling Biological Communities 861 Figure 4. Spatial distribution of the neural network’s performance for predicting fish community composition in the 286 study lakes of Algonquin Provincial Park, Ontario, Canada (lat 4550N, long 7820W). Symbols represent the percent similarity between observed and predicted community composition. Increasing symbol size represents increased percent similarity between predicted and observed community composition. which may be important to the small-bodied forage fish upon which these species feed. Interestingly, shoreline perimeter was also a significant predictor for 9 out of 13 small-bodied fish species modeled by the neural network, which suggests the importance of diverse nearshore habitats for providing refuge from predation ( Jackson et al. 2001). Community models can play an important role in basic and applied fish ecology. For example, given the large body of literature relating various fish species to biotic and abiotic factors, community models such as the one presented here could provide a means of gaining a greater understanding of the role of the environment in mediating patterns in community composition. The ability to correctly identify the entire membership of a community based on environmental conditions could be an excellent test of our current state of knowledge regarding the factors governing community dynamics, as well as where potential gaps in this knowledge exist. In an applied sense, fish-community models could provide important predictions of the effects of environmental degradation in aquatic ecosystems. A community approach is particularly attractive because indicator species are often absent for reasons unrelated to the source of degradation, and as such the use of individual species as measures of anthropogenic impact can be misleading (Fausch et al. 1990). Therefore, by modeling communities found at reference sites ( areas that have been minimally affected by human activities ), model predictions could be used as a benchmark for detecting and assessing the impact of habitat alteration or loss, thus offering a means of quantifying the loss of biological diversity. This approach would be similar to those of many biomonitoring programs (e.g., Wright et al. 2000), but it would have the distinct advantage of predicting community composition and not merely the total number of species. For example, Zampella and Bunnell (1998) showed that changes in the composition of fish assemblages in Pinelands streams are tightly associated with watershed disturbance gradients, whereas species richness does not reflect this disturbance gradient. The aquatic communities of Algonquin Provincial Park may represent a good model system for central Ontario because these lakes have been minimally affected by human activities (except perhaps for limited introduction of smallmouth bass during the early 1900s ). Fish com- Conservation Biology Volume 17, No. 3, June 2003 862 Modeling Biological Communities munities that were predicted poorly by the neural network ( i.e., percent similarity between observed and predicted communities was 70% ) indicate lake communities that differ from expectations based on habitat conditions ( Fig. 4 ). These lakes possibly represent systems that are, or have been, affected by human activities in the park. A number of the poorly predicted lakes (i.e., Basin, Kearney, Shall, Tepee, Tim, and Whitefish) are located in “access zones” that provide the only points of entry to the park by public roads. Anglers are a primary vector for fish movement across the landscape (Litvak & Mandrak 1993), so access zones may act as source points of species introductions. For example, the first report of northern pike ( Esox lucius ) in the park dates back to the mid-1980s ( first verified record in 1990 ) in Shall Lake. Currently, northern pike occur in a series of lakes between Shall Lake and Booth Lake ( N. Mandrak, personal communication ). The fish communities of all of these lakes ( i.e., Booth, Farm, Shall, Tip Up ) were predicted poorly, suggesting that northern pike may be an important biotic force in structuring these communities, resulting in the decoupling of the community-environment relationships modeled by the neural network. Indeed, northern pike adversely affect littoral prey-fish communities in north-temperate lakes ( Whittier et al. 1997; Findlay et al. 2000). Community models that incorporate the occurrence of northern pike (or other non-native species) as a predictor variable may provide important insights into how the local establishment of these species shapes the structure of the invaded fish communities and therefore may help indicate proactive conservation priorities for preserving communities vulnerable to future invasions. Ultimately, the nature of the disturbance could be inferred by identifying species that were absent from the target community but predicted to occur in the reference community. For example, because there are major differences in the sensitivity of fish species to habitat loss and predation (Whittier & Hughes 1998; Findlay et al. 2000), one might expect that two fish assemblages with the same number of species may differ substantially in their susceptibility to species invasions and environmental change. My species-specific modeling approach would thus provide the opportunity to understand and predict changes in the vulnerability of communities, whereas this is not possible with community metrics or assemblage types. Conclusion It is now generally recognized that species are complementary in biological communities, in that species redundancy may provide insurance against potential biological collapse ( Walker 1992 ). Differences among species are critical for the maintenance of a diverse re- Conservation Biology Volume 17, No. 3, June 2003 Olden gional species pool because regional diversity will be determined largely by spatial variation in the environment and the differential responses of species with varying requirements. For example, differences among species can influence the response of ecosystems to environmental change (Brown et al. 2001); therefore, the ability to predict the membership of species in communities will be critical to forecasting whether or not ecosystem processes will be preserved. As such, the ability to develop a single, integrative model for predicting complete community membership from a common set of environmental variables is particularly advantageous. Given that biological diversity and integrity can be lost in a variety of ways, including at the genetic, species, assemblage, and ecosystem levels of biological organization, a community-level approach to developing predictive models takes an important step beyond the focus on populations and toward an appreciation of the hierarchy of biological diversity that needs to be conserved and managed. The development and application of community predictive models that explicitly consider species membership—and thus each species’ functional role in the community—should contribute significantly to more-efficient management strategies and should minimize the problems of integrating multiple management strategies for each species of a community. Acknowledgments I am thankful for the helpful comments provided by E. Meir, S. Lek, and B. Neff and for the continuous intellectual contributions made by D. Jackson and P. PeresNeto over the years over morning coffee. I especially thank N. Mandrak for his insights into the fish communities of Algonquin Park and the Poff Field Station for providing a simulating working environment during this project. Funding for this research was provided by a Natural Sciences and Engineering Council of Canada Graduate Scholarship. Literature Cited Anderson, M. J., and A. Clements. 2000. Resolving environmental disputes: a statistical method for choosing among competing cluster models. Ecological Applications 10:1341–1355. Angermeier, P. L., and M. R. Winston. 1999. Characterizing fish community diversity across Virginia landscapes: prerequisite for conservation. Ecological Applications 9:335–349. Bishop, C. M. 1995. Neural networks for pattern recognition. Clarendon Press, Oxford, United Kingdom. Brown, J. H., M. S. K. Ernest, J. M. Parody, and J. P. Haskell. 2001. Regulation of diversity: maintenance of species richness in changing environments. Oecologia 126:321–332. Capone, T. A., and J. A. Kushlan. 1991. Fish community structure in dry-season stream pools. Ecology 72:983–992. Caro, T. M., and G. O’Doherty. 1999. On the use of surrogate species in conservation biology. Conservation Biology 13:805–814. Olden Caughley, G. 1994. Directions in conservation biology. Journal of Animal Ecology 63:215–235. Chase, M. K., W. B. I. Kristan, A. J. Lynam, M. V. Price, and J. T. Rotenberry. 2000. Single species as indicators of species richness and composition in California coastal sage scrub birds and small mammals. Conservation Biology 14:474–487. Crossman, E. J., and N. E. Mandrak. 1992. An analysis of fish distribution and community structure in Algonquin Park: annual report for 1991 and completion report, 1989–1991. Ontario Ministry of Natural Resources, Toronto, Ontario, Canada. Eadie, J. M., and A. Keast. 1984. Resource heterogeneity and fish species diversity in lakes. Canadian Journal of Zoology 62:1689–1695. Fausch, K. D., J. Lyons, J. R. Karr, and P. L. Angermeier. 1990. Fish communities as indicators of environmental degradation. American Fisheries Society Symposium 8:123–144. Findlay, C. S., D. G. Bert, and L. Zheng. 2000. Effect of introduced piscivores on native minnow communities in Adirondack lakes. Canadian Journal of Fisheries and Aquatic Sciences 57:570–580. Fleishman, E., R. B. Blair, and D. D. Murphy. 2001. Empirical validation of a method for umbrella species selection. Ecological Application 11:1489–1501. Franklin, J. F. 1993. Preserving biodiversity: species, ecosystems or landscapes? Ecological Applications 3:202–205. Hay, M. E. 1994. Species as ‘noise’ in community ecology: do seaweeds block our view of the kelp forests? Trends in Ecology & Evolution 9:414–416. Jackson, D. A., K. M. Somers, and H. H. Harvey. 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence? The American Naturalist 133:436–453. Jackson, D. A., K. M. Somers, and H. H. Harvey. 1992. Null models and fish communities: evidence of nonrandom patterns. The American Naturalist 139:930–951. Jackson, D. A., P. R. Peres-Neto, and J. D. Olden. 2001. What controls who is where in freshwater fish communities: the roles of biotic, abiotic, and spatial factors. Canadian Journal of Fisheries and Aquatic Sciences 58:157–170. Legendre, P., and L. Legendre. 1998. Numerical ecology. Elsevier Scientific, Amsterdam. Lek, S., M. Delacoste, P. Baran, I. Dimopoulos, J. Lauga, and S. Aulagnier. 1996. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling 90:39–52. Litvak, M. K., and N. E. Mandrak. 1993. Ecology of fresh-water baitfish use in Canada and the United States. Fisheries 18:6–13. Magnuson, J. J., W. M. Tonn, A. Banerjee, J. Toivonen, O. Sanchez, and M. Rask. 1998. Isolation vs. extinction in the assembly of fishes in small northern lakes. Ecology 79:2941–2956. Martin, C. M. 1994. Recovering endangered species and restoring ecosystems: conservation planning for the twenty-first century in the United States. Ibis 137:S198–S203. Matuszek, J. E., and G. L. Beggs. 1988. Fish species richness in relation to lake area, pH, and other abiotic factors in Ontario lakes. Canadian Journal of Fisheries and Aquatic Sciences 45:1931–1941. McKinney, M. L. 1997. Extinction vulnerability and selectivity: combining ecological and paleontological views. Annual Review of Ecology and Systematics 28:495–516. Milligan, G. W., and M. C. Cooper. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179. Minns, C. K. 1989. Factors affecting fish species richness in Ontario lakes. Transactions of the American Fisheries Society 118:533–545. Mucina, L. 1997. Classification of vegetation: past, present and future. Journal of Vegetation Science 8:751–760. Niemi, G. J., J. M. Hanowski, A. R. Lima, T. Nicholls, and N. Weiland. Modeling Biological Communities 863 1997. A critical analysis on the use of indicator species in management. Journal of Wildlife Management 61:1240–1251. Noss, R. F. 1990. Indicators for monitoring biodiversity: a hierarchical approach. Conservation Biology 4:355–364. Olden, J. D., and D. A. Jackson. 2001. Fish-habitat relationships in lakes: gaining predictive and explanatory insight by using artificial neural networks. Transactions of the American Fisheries Society 130:878–897. Olden, J. D., and D. A. Jackson. 2002a. Illuminating the “black box”: understanding variable contributions in artificial neural networks. Ecological Modelling 154:135–150. Olden, J. D., and D. A. Jackson. 2002b. A comparison of statistical approaches for modelling fish species distributions. Freshwater Biology 47:1976–1995. Prendergrast, J. R., R. M. Quinn, J. H. Lawton, B. C. Eversham, and D. W. Gibbons. 1993. Rare species, the coincidence of diversity hotspots and conservation strategies. Nature 365:335–337. Pusey, B. J., M. J. Kennard, and H. Arthington. 2000. Discharge variability and the development of predictive models relating stream fish assemblage structure to habitat in northeastern Australia. Ecology of Freshwater Fishes 9:30–50. Rodriguez, M. A., and W. M. J. Lewis. 1997. Structure of fish assemblages along environmental gradients in floodplain lakes of the Orinoco River. Ecological Monographs 67:109–128. Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-propagation errors. Nature 323:533–536. Shuter, B. J., and J. R. Post. 1990. Climate, population viability, and the zoogeography of temperature fishes. Transactions of the American Fisheries Society 119:314–336. Simberloff, D. 1997. Flagships, umbrellas, and keystones: Is single-species management passé in the landscape era? Biological Conservation 83:247–257. Simberloff, D., and T. Dayan. 1991. The guild concept and the structure of ecological communities. Annual Review of Ecology and Systematics 22:115–143. Tonn, W. M., J. J. Magnuson, M. Rask, and J. Toivonen. 1990. Intercontinental comparison of small-lake fish assemblages: the balance between local and regional processes. The American Naturalist 136: 345–375. Urbanska, K. M. 2000. Environmental conservation and restoration ecology: two facets of the same problem. Web Ecology 1:20–27. Vane-Wright, R. I., C. J. Humphries, and P. H. Williams. 1991. What to protect? Systematics and the agony of choice. Biological Conservation 55:235–254. Walker, B. H. 1992. Biodiversity and ecological redundancy. Conservation Biology 6:18–23. Washington, H. G. 1984. Diversity, biotic and similarity indices. New Zealand Journal of Marine and Freshwater Research 18:653–694. Whittier, T. R., and R. M. Hughes. 1998. Evaluation of fish species tolerances to environmental stressors in lakes in the northeastern United States. North American Journal of Fisheries Management 18:236–252. Whittier, T. R., D. B. Halliwell, and S. G. Paulsen. 1997. Cyprinid distributions in Northeast U.S.A. lakes: evidence of regional-scale minnow biodiversity losses. Canadian Journal of Fisheries and Aquatic Sciences 54:1593–1607. Wright, J. F., D. W. Sutcliffe, and M. T. Furse. 2000. Assessing the biological quality of fresh waters: RIVPACS and other techniques. Freshwater Biological Association, Ambleside, Cumbria, U.K. Young, T. P. 2000. Restoration ecology and conservation biology. Biological Conservation 92:73–83. Zampella, R. A., and J. F. Bunnell. 1998. Use of reference-site fish assemblages to assess aquatic degradation in Pinelands streams. Ecological Applications 8:645–658. Conservation Biology Volume 17, No. 3, June 2003
© Copyright 2026 Paperzz