A Species-Specific Approach to Modeling Biological Communities

A Species-Specific Approach to Modeling Biological
Communities and Its Potential for Conservation
JULIAN D. OLDEN
Department of Biology and Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO
80523–1878, U.S.A., email [email protected]
Abstract: Community-level approaches to biological conservation are now recognized as a major advance
in most current single-species conservation and management practices. Existing approaches for modeling biological communities, have limited utility, however, because they commonly examine community metrics (e.g.,
species richness, assemblage “types”) and consequently do not consider the identity of the species that compose the community. This is a critical shortcoming because the functional differences among species ultimately translate into the differential vulnerability of biological communities to natural and human-related
environmental change. To address this concern I present a novel, species-specific approach to modeling communities using a multiresponse, artificial neural network. This provides an analytical approach that facilitates the development of a single, integrative model that predicts the entire species membership of a community while still respecting differences in the functional relationship between each species and its environment.
I used temperate-lake fish communities to illustrate the utility of this modeling approach and found that predictions of community composition by the neural network were highly concordant with observed compositions of the 286 study lakes. Average similarity between observed and predicted community composition was
80% (22 out of the 27 species correctly classified), and the model predicted a significant portion of the community composition in 91% of the lakes. I discuss the importance of the lake habitat variables for predicting
community composition and explore the spatial distribution of model predictions in light of recent species invasions. The proposed modeling approach provides a powerful, quantitative tool for developing community
predictive models that explicitly consider species membership, and thus each species’ functional role in the
community. Such models will contribute significantly to the study and conservation of biological communities.
Key Words: artificial neural network, classification, ecology, fish, prediction, presence and absence
Un Enfoque Especie–Específico al Modelaje de Comunidades Biológicas y Su Potencial para la Conservación
Resumen: La conservación biológica a nivel de comunidad se reconoce actualmente como un importante
avance en la mayoría de las prácticas actuales de conservación y manejo de una especie. Sin embargo, los
enfoques actuales al modelaje tienen una utilidad limitada ya que, por lo general, consideran medidas de las
comunidades (por ejemplo, la riqueza de especies, “tipos” de ensamblaje”) sin considerar así la identidad de
las especies que integran la comunidad. Esta es una limitación crucial dado que las diferencias funcionales
entre especies se traducen, en última instancia, en una vulnerabilidad diferencial de las comunidades biológicas a cambios ambientales naturales y antropogénicos. Para atender esta preocupación, presento un enfoque novedoso especie – específica al modelaje de comunidades utilizando una red neural artificial y de respuestas múltiples. Este enfoque proporciona una aproximación analítica que facilita el desarrollo de un
solo modelo integrativo que predice el total de especies que integran una comunidad, respetando a su vez las
diferencias en la relación funcional entre cada especie y su ambiente. Utilicé comunidades de peces en lagos
templados para ejemplificar la utilidad de este modelo y pude determinar que las predicciones en cuanto a la
composición de la comunidad usando la red neural eran altamente concordantes con las composiciones observadas de los 286 lagos estudiados. La similitud promedio entre la composición observada y la predicha
fue del 80% (22 de 27 especies correctamente clasificadas), y el modelo predijo una porción significativa de
Paper submitted June 18, 2001; revised manuscript accepted August 5, 2001.
854
Conservation Biology, Pages 854–863
Volume 17, No. 3, June 2003
Olden
Modeling Biological Communities
855
la comunidad en 91% de los lagos. Se discute la importancia de las variables del hábitat lacustre para predecir la composición de la comunidad y la distribución espacial de las predicciones del modelo es explorada
a la luz de recientes invasiones de especies. La aproximación propuesta proporciona una poderosa herramienta cuantitativa para desarrollar modelos predictivos que consideran explícitamente las especies integrantes de una comunidad (y por lo tanto, la función de cada especie en la comunidad). Tales modelos contribuirán significativamente al estudio y a la conservación de comunidades biológicas.
Introduction
Conservation biology is deeply rooted in population biology, both historically with its origins in wildlife management and contemporarily with the emergence of
conservation biology as an academic discipline. As a result, researchers have traditionally focused their efforts
on the conservation and management of individual populations because species are generally easier to monitor,
they are more amenable to explicit modeling, and extinction occurs one population at a time. Single-species
approaches to conservation continue to be the major
paradigm in conservation biology ( Caughley 1994 ), although researchers continue to recognize their limitations ( Simberloff 1997 ). Given the limited time and resources available to conserve and manage biodiversity
on a species-by-species basis and the agony of choosing
among the increasing number of species that are imperiled ( Vane-Wright et al. 1991 ), our ability to meet the
challenge of preserving biodiversity hinges on the successful conservation of entire biological communities.
Approaching the conservation of aquatic and terrestrial
biodiversity at a community level is now perceived as
being a major advance in conservation biology (Franklin
1993), in restoration ecology (Young 2000), and in the
interface of the two disciplines (Urbanska 2000).
Critical components of any effective community-level
conservation, management, or restoration strategy are a
working knowledge of the relationship between the target community and the environment and formulation of
this understanding into a predictive model. This is an essential process because without predictive community
models—for example, to forecast the effects of habitat
modification on community structure and function—a
community-level approach will likely play no constructive role in conservation biology. Unfortunately, the development of such models is a difficult task because it is
necessary to model all the species of the community
( i.e., multiple response variables ), as opposed to the
conventional task of modeling a single species. Consequently, researchers have employed an array of analytical approaches to model communities, with the
common goal of gaining both explanatory insight into
the factors shaping community structure and predictive
insight into future states of the community. Of the nu-
merous techniques used to model biological communities, three general approaches have emerged.
First, the most common approach involves modeling a
single community metric that summarizes the general
structure of the biological community (e.g., species richness, redundancy, evenness, total biomass; Washington
1984), an indicator species that serves as a surrogate for
the status of other species or environmental conditions,
or an umbrella species whose conservation serves to
protect sympatric species ( Noss 1990 ). Although the
use of community metrics or single species eases the
quantitative analysis of community-environment relationships, these approaches have a number of drawbacks. For example, community metrics such as species
richness poorly represent the characteristics of a community because they provide no information regarding
actual species membership. Consequently, species richness can be a particularly misleading metric because species are differentially prone to extinction ( McKinney
1997 ), species-rich assemblages often do not include
those species most in need of conservation ( Prendergrast et al. 1993), and local or regional richness may be
similar in space or remain constant in time even though
substantial variation in species composition may exist
( Brown et al. 2001 and citations therein). Similarly, the
indicator and umbrella species concepts for conserving
entire communities continue to receive criticism both
theoretically (Niemi et al. 1997; Simberloff 1997; Caro &
O’Doherty 1999 ) and empirically ( Chase et al. 2000;
Fleishman et al. 2001).
Second, a community classification approach is used
to define assemblage “types” based on best professional
judgment, numerically dominant species ( e.g., Magnuson et al. 1998), guild classifications (e.g., Simberloff &
Dayan 1991 ) or groups of species based on patterns of
co-occurrence elucidated from multivariate ordination
or clustering analyses ( e.g., Capone & Kushlan 1991;
Mucina 1997). Community classification is also the first
step prior to modeling aquatic invertebrate communities
in many biomonitoring programs ( Wright et al. 2000 ).
Although this approach is used to make the study of
complex multispecies assemblages more tractable, it has
a number of methodological and ecological shortcomings. Methodologically, the results of cluster analyses are
often compromised by the variable nature of cluster rec-
Conservation Biology
Volume 17, No. 3, June 2003
856
Modeling Biological Communities
ognition (Anderson & Clements 2000), by the choice of
similarity coefficients ( Jackson et al. 1989), and by the
erroneous identification of clusters that do not truly exist (Milligan & Cooper 1985). Furthermore, a subjective
decision is often required to determine the degree of
similarity necessary to group species together. For instance, Angermeier and Winston (1999) found over 90
statistically distinct fish assemblage types in Virginia,
emphasizing that the division of communities into discrete assemblages is arbitrary in that different spatial
scales can yield greater or fewer numbers of assemblage
types. In addition to methodological pitfalls, probably
the greatest disadvantage of aggregating species into assemblage types is the loss of information about individual species (Hay 1994). If the types are not a product of
natural processes and are not temporally and spatially
stable, any prediction of community state or change
based on that assemblage type may have little relevance.
This approach also assumes that the species that comprise each assemblage type always coexist, that each
species “belongs” to a single assemblage type, and that
species in the same assemblage type exhibit exactly the
same functional and numeric responses to changes in
the environment. Furthermore, after predictive models
have been developed they may not be applicable to systems outside the sites used to classify the assemblages
because of the existence of different species pools and
thus different assemblage types. Finally, nonrandom species assemblages may not always exist ( e.g., Jackson et
al. 1992), and discrete assemblage types will not always
be definable because assemblages may be composed of
species exhibiting individual responses to the environment (e.g., Pusey et al. 2000).
Lastly, multivariate statistical approaches, such as canonical analysis and matrix comparison methods, are
commonly used to relate the structure of biological communities to the various components of the environment
(e.g., Rodriguez & Lewis 1997; Zampella & Bunnell 1998).
Multivariate approaches have been advocated because
they efficiently summarize community-environment relationships into a simplified and easily illustrated format.
Unfortunately, although multivariate approaches help us
understand the major factors likely to be structuring biological communities, they do not produce a quantitative
framework from which the composition of such communities can be predicted and the importance of the environmental variables can be assessed.
In short, current approaches for modeling the composition of biological communities ignore the critical fact
that important differences exist among species, thus limiting their suitability for conserving entire communities.
Although single species remain the primary foci of conservation efforts, it is now overwhelmingly evident that
a habitat-based, multispecies conservation approach is
required (Martin 1994). For a community-level approach
to conservation to be successful, however, innovative
Conservation Biology
Volume 17, No. 3, June 2003
Olden
approaches are needed to facilitate the modeling of
communities while still respecting differences in the
identity of the species. Such models would contribute
significantly to the effective conservation of the complete spectrum of biodiversity, as opposed to current efforts that focus solely on individual populations or subsets of the community. Driven by this goal, I propose a
species-specific approach to modeling the composition
of biological communities. The approach uses a multiresponse, artificial neural network to predict the occurrence of each species in a community from a common
set of predictor (i.e., environmental) variables, resulting
in a single model that predicts the entire composition of
a community. Here, I detail the neural network methodology for developing predictive community models, and
illustrate its construction and interpretation with northtemperate lake fish communities.
Methods
Fish Communities of Algonquin Provincial Park,
Ontario, Canada
I used fish communities of 286 lakes located in Algonquin Provincial Park, Ontario, Canada (lat 4550N, long
7820 W), to illustrate the neural network approach to
modeling biological communities. Aquatic communities
in this region are representative of relatively natural ecosystems because these lakes are located in a provincial
park and are currently subject to minimal perturbations
from human development (except for limited introduction of smallmouth bass during the early 1900s). A community model was constructed by modeling the presence/absence of 27 fish species as a function of nine
whole-lake habitat variables that are related to known
habitat requirements of fish in this geographic region
( Matuszek & Beggs 1988; Minns 1989). The variables included surface area (ha), maximum depth (m), volume
(m3 ), total shoreline perimeter (km), elevation (m), total dissolved solids ( mg/L ), pH, growing-degree days
( basis 5 C ), and occurrence of summer stratification.
All data were obtained from the Algonquin Park Fish Inventory database (1989–1991), which involved a combination of extensive sampling of lakes in the park with
multiple gear types ( multipanel survey gill nets, plastic
traps, seines, Gee minnow traps ) and records obtained
from the Ontario Ministry of Natural Resources Lake Inventory and the Royal Ontario Museum collection (Crossman & Mandrak 1992).
Modeling Biological Communities with an Artificial
Neural Network
Artificial neural networks (ANNs) have been advocated
in many disciplines for addressing complex pattern-
Olden
recognition problems. The advantages of ANNs over traditional, linear approaches include their ability to model
nonlinear associations with a variety of data types (e.g.,
continuous, discrete) and to accommodate interactions
among predictor variables without any a priori specification (Bishop 1995). Neural networks are considered universal approximators of continuous functions, and as
such they exhibit flexibility for modeling nonlinear relationships between variables. For example, ANNs exhibit
substantially greater predictive power than traditional,
linear approaches when modeling nonlinear data (based
on empirical and simulated data; Olden & Jackson
2002b ). Here, I discuss the characteristics of the inner
architecture of the neural network and refer the reader
to the text of Bishop (1995) and the papers of Lek et al.
(1996) and Olden and Jackson (2001) for more comprehensive coverage.
I used a feed-forward neural network trained by the
back-propagation algorithm ( Rumelhart et al. 1986 ).
The architecture of this network consists of a single input, hidden, and output layer, with each layer containing one or more neurons ( Fig. 1 ). The input layer contains p neurons, each of which represents one of the p
predictor variables. The number of hidden neurons in
the neural network varies and is chosen to minimize the
trade-off between network bias and variance ( Bishop
1995 ). The output layer contains one neuron for each
response variable being modeled ( e.g., one neuron for
the presence/absence of each species ). Additional bias
neurons with a constant output ( equal to 1 ) are added
to the hidden and output layers, which play a role similar to that of the constant term in regression analysis. A
two-layer network—that is, a network without a hidden
layer—is computationally identical to a linear regression
model. I determined the optimal number of neurons in
the hidden layer of the network by comparing the per-
Modeling Biological Communities
857
formances of different cross-validated networks, with
1 to 25 hidden neurons, and choosing the number that
produced the greatest network performance. This resulted
in a network with 9 input neurons ( i.e., nine habitat
variables), 7 hidden neurons, and 27 output neurons (i.e.,
the presence/absence of 27 species) (Fig. 1).
Each neuron in the network ( excluding the bias
neurons) is connected to all neurons from adjacent layers by axons. Axon connections among neurons are assigned a weight that dictates the intensity of the signal
transmitted by the axon. The “state” or “activity level” of
each input neuron is defined by the incoming signal
( i.e., values ) of the predictor variables, whereas the
state of the other neurons is evaluated locally by calculating the weighted sum of the incoming signals from
the neurons of the previous layer. The entire process is
written mathematically as


y k = φ o  β k + ∑ w jk φ h  β j + ∑ wij x i ,


j
i


(1)
where xi are the input signals, yk are the output signals,
wij are the weights between input neuron i and hidden
neuron j, wjk are the weights between hidden neuron j
and output neuron k, j, k are the biases associated
with the hidden and output layers, and h and o are activation functions for the hidden and output layers. I
used the logistic (or sigmoid) function to transform the
state of each output neuron to range between 0 and 1
( representing the probability of species occurrence )
and a decision threshold of 0.5 to classify a species as
present or absent.
The neural network is trained with a back-propagation
algorithm in which all the connection weights are iteratively adjusting, with the goal of finding a set of connection weights that minimizes the error of the network (in
Figure 1. Architecture of the artificial
neural network used to model the presence/absence of 27 fish species as a function of nine lake habitat variables.
Conservation Biology
Volume 17, No. 3, June 2003
858
Modeling Biological Communities
this case the cross-entropy error function ). During network training, observations are sequentially presented
to the network and weights are adjusted depending on
the magnitude and direction of the error. Learning rate
( ) and momentum ( ) parameters ( varying as a function of network error ) were included during network
training to ensure a high probability of global network
convergence (Bishop 1995), and a maximum of 1000 iterations was set for the back-propagation algorithm to
determine the optimal axon weights. Prior to training
the neural network, the predictor variables were converted to z scores to standardize the measurement scales
of the inputs into the network and thus to ensure that
the same percentage change in the weighted sum of the
inputs caused a similar percentage change in the unit
output. The neural network was validated with n-fold
cross-validation.
Given that the state of each output neuron is defined
by the sum of its incoming signals, the overall contribution of the habitat variables to the predictive output of
the neural network can be quantified with the axon connection weights. The explanatory importance of the
lake habitat variables for predicting community composition was quantified by calculating the product of the
input-hidden and hidden-output connection weights between each input neuron and output neuron and then
summing the products across all hidden neurons. This
procedure was repeated for each habitat variable, and
the relative contributions of the variables were calculated by dividing the absolute value of each variable contribution by the grand mean (sum of all absolute variable
contributions ). The relative contributions of each variable were subsequently assessed for their statistical significance with a randomization test, which randomizes
the response variable and then constructs a neural network based on the randomized data and records the relative explanatory importance of each habitat variable.
This process is repeated 9999 times to generate a null
distribution for the relative importance of each variable,
which is then compared with the observed values to calculate the significance level. I refer the reader to Olden
and Jackson (2002a ) for more details on calculating variable contributions and testing their significance with the
randomization approach.
I used a randomization test to assess whether the neural network predicted a statistically significant number
of species in the communities. For each lake, the test involved randomly permuting the predicted community
composition from the neural network 9999 times, each
time calculating the percent similarity (derived from the
simple matching coefficient; Legendre & Legendre 1998)
with observed community composition. Significance
levels were then calculated as the proportion of randomized trials that the percent similarity was equal or
greater than the percent similarity between the observed and predicted community composition. All pro-
Conservation Biology
Volume 17, No. 3, June 2003
Olden
cedures, including the construction of the neural networks, were conducted with computer macros in the
MatLab programing language written by the author.
Results
Predictions of fish community composition by the neural network were highly concordant with observed species membership in the 286 study lakes. For the majority
of the lakes the degree of similarity between predicted
and observed fish-community composition ranged between 70% and 90%, indicating that the neural network
correctly predicted the presence/absence of 19–24 out
of the 27 total species (Fig. 2). Results from the randomization test showed that the neural network correctly
predicted a significant portion of the species in the community in 91% of the study lakes ( 260 out of the 286
lakes).
The neural network exhibited varying success in predicting individual species in the communities (Table 1).
For example, northern redbelly dace, pearl dace, golden
shiner, and brown bullhead ( scientific names provided
in Table 1) were the most poorly predicted species (61.5–
65.4% ), indicating that their memberships in the communities were the most often misclassified by the neural
network relative to other species. In contrast, blackchin
shiner, trout-perch, and rock bass were the most successfully predicted species (90.9–94.4%), indicating that
their memberships in the communities were consistently correctly classified.
Results from the randomization test ( based on network connection weights ) illustrated that the ranked
importance of the habitat variables for predicting individual species membership in the communities varied
Figure 2. Percent similarity between observed fish
community composition and predicted fish community composition by the neural network. Arrow represents mean percent similarity.
Olden
Table 1.
Modeling Biological Communities
859
Predictive performance of the neural network for predicting species composition in the 286 study lakes.
Common name
Blackchin shiner
Blacknose shiner
Brook stickleback
Brook trout
Brown bullhead
Burbot
Cisco
Creek chub
Common shiner
Fallfish
Finescale dace
Golden shiner
Iowa darter
Lake chub
Lake trout
Lake whitefish
Longnose sucker
Northern redbelly dace
Pearl dace
Pumpkinseed
Rock bass
Round whitefish
Smallmouth bass
Splake
Trout-perch
White sucker
Yellow perch
Scientific name
Occurrence %
Correctly classified %
Significant habitat variables*
Notropis heterodon
Notropis heterolepis
Culaea inconstans
Salvelinus fontinalis
Ameiurus nebulosus
Lota lota
Coregonus artedi
Semotilus atromaculatus
Luxilus cornutus
Semotilus corporalis
Phoxinus neogaeus
Notemigonus crysoleucas
Etheostoma exile
Couesius plumbeus
Salvelinus namaycush
Coregonus clupeaformis
Catostomus catostomus
Phoxinus eos
Margariscus margarita
Lepomis gibbosus
Ambloplites rupestris
Prosopium cylindraceum
Micropterus dolomieu
S.fontinalis
S.namaycush
Percopsis omiscomayus
Catostomus commersoni
Perca flavescens
5.6
40.6
27.3
76.9
47.2
30.8
21.3
68.2
54.2
11.2
15.0
39.9
20.3
16.4
52.8
13.6
18.5
54.5
41.3
65.0
12.2
10.5
21.3
7.7
9.4
83.6
71.0
94.4
71.7
76.6
80.1
65.4
81.5
83.9
74.1
72.0
89.2
89.2
64.7
81.2
83.9
76.2
86.4
82.9
61.5
64.3
70.6
90.9
89.2
79.0
92.3
91.3
86.0
72.0
Elev, ShPer., SS
ShPer, Elev, SArea
MaxDep, SArea, Volume
SArea, ShPer
ShPer, SArea
ShPer, Elev
Elev, SArea
ShPer, MaxDep, SArea
ShPer, Elev, SArea
SArea, ShPer, Elev, pH
ShPer, Volume, SArea
ShPer, Elev, SArea
Elev, MaxDep, TDS
MaxDep, ShPer
MaxDep, ShPer
ShPer, SArea
SArea, GDD
MaxDep, Elev
MaxDep, ShPer, SArea
ShPer, Elev
Elev, SArea, GDD
ShPer, SArea, GDD
ShPer, Elev, GDD
Elev, MaxDep, pH
Elev, TDS
ShPer
ShPer, TDS, SArea
*Habitat variables are listed in decreasing importance. Variable abbreviations: lake surface area (SArea); maximum depth (MaxDep); lake
volume (Volume); shoreline perimeter (ShPer); elevation (Elev); total dissolved solids (TDS); pH; growing-degree days (GDD); and occurrence
of summer stratification (SS).
across species (Table 1). For instance, perimeter of lake
shoreline, elevation, and growing-degree days were significant predictors of smallmouth bass occurrence,
whereas maximum depth of the lake and shoreline perimeter were the most important predictors of lake trout
occurrence.
Habitat variables quantifying the overall size and
shoreline complexity of lakes were the most important
predictors of entire communities. These variables included shoreline perimeter ( relative contribution 19.7% ), lake surface area ( 16.3% ), elevation ( 16.2% ),
and maximum depth ( 12.5% ), which were significant
predictors of presence/absence for 19, 16, 14, and 8 species, respectively (Table 1; Fig. 3).
Fish community composition of the study lakes was predicted with varying spatial success across Algonquin Provincial Park (Fig. 4). There was close agreement between
the number of lake communities that were predicted with
low success (i.e., percent similarity 70% between observed and predicted community composition) and the
number of total lakes in each of the five major watersheds.
Therefore, no evidence exists for differential model predictive performance across the watersheds, although within
watersheds the species memberships of particular communities were predicted with low success.
Discussion
Modeling Biological Communities with an Artificial
Neural Network
Herein I propose a novel artificial neural-network approach for modeling biological communities in a species-specific manner. The methodology gives researchers the ability to develop a single, integrative model for
predicting complete species membership from a common set of environmental variables, an approach that is
not possible with traditional, statistical techniques.
Given the utility of the approach, it is important to appreciate how the architecture and dynamic optimization
of the neural network enable the modeling of different
species-habitat relationships. The hidden layer of neurons in the network plays a vital role by individually mediating the information transfer between the input and
output neurons—that is, they mediate the relative influence of the habitat variables on predicting the occurrence of each species. This feature makes it possible to
model multiple species-habitat relationships based on
the same set of predictor variables within a single integrative model. Following is a hypothetical example to illustrate the function of the hidden neuron layer. Imagine that the uppermost hidden neuron in Fig. 1 becomes
Conservation Biology
Volume 17, No. 3, June 2003
860
Modeling Biological Communities
Olden
In summary, the neural network contains a set of input-hidden connection weights that reflect the universal
importance of the habitat variables for predicting the
presence/absence of all the species, whereas the hidden-output connection weights modify the first set of
weights by reducing, enhancing, or even reversing the
signals to maximize the probability of correctly classifying each individual species in the community. Therefore, the architecture of the neural network enables the
development of predictive models for community composition in which species-specific responses to environmental conditions are accounted for.
Community Predictive Models and Their Potential for
Fish Conservation
Figure 3. Relative contributions of the lake habitat
variables for predicting community composition averaged across all 27 fish species. Whiskers represent 1
SD, and the numbers in each bar represent the number of species for which the habitat variable was a statistically significant predictor of presence/absence (Table 1).
highly activated when lake surface area is large (i.e., the
connection weight between the surface-area neuron and
the hidden neuron is much larger than the connection
weights of other input neurons ). The activated hidden
neuron then mediates the importance of this information ( i.e., the importance of large lake surface area ) by
modifying the connection weight between itself and the
layer of output neurons to any value between negative
and positive infinity. Therefore, if a large lake surface
area provides no discriminator power for predicting the
occurrence of species 1, then the connection weight
linking the activated hidden neuron and the output neuron for species 1 would be set equal to a small value
( i.e., close to zero). This results in lake surface area having no influence on the predicted probability of occurrence for species 1. On the other hand, if large lake surface area is predictive of the presence or absence of
species 2, then the connection weight between the activated hidden neuron and the corresponding species 2
output neuron would be set equal to a large positive
( species present ) or negative ( species absent ) value.
The dynamic nature of the network is a result of the fact
that each hidden neuron has the ability to modify the
signals (i.e., information) provided by all input neurons
before defining its outgoing signal to each output neuron. Therefore, the hidden neurons dictate the relative
importance of each input variable for predicting the occurrence of each species in the network.
Conservation Biology
Volume 17, No. 3, June 2003
Establishing explanatory and predictive relationships between biological communities and the environment
has considerable potential in conservation biology. Although a thorough discussion of the potential applications of community predictive models to conservation is
beyond the scope of this paper, I contend that the development of statistical models that provide testable, community-wide predictions of complete species membership will contribute significantly to more-efficient
conservation, management, and restoration strategies. I
limit my discussion to the potential utility of community
predictive models for fish conservation by examining
applications of the neural network to the fish communities of Algonquin Provincial Park, Ontario, Canada.
The species-specific community model I present provides a unique opportunity to determine how lake habitat conditions contribute to observed patterns in fish
community composition. The community neural network was successful in predicting complete species
membership in the study lakes and emphasized the importance of overall lake size and elevation in shaping the
composition of the fish communities of Algonquin Provincial Park. Interestingly, lake size and elevation are
also important factors in Ontario for shaping patterns in
fish species richness ( Eadie & Keast 1984; Matuszek &
Beggs 1988; Minns 1989 ). Predictability of individual
species by the model was also consistent with findings
of single-species models for north-temperate fish populations ( e.g., Tonn et al. 1990; Magnuson et al. 1998;
Olden & Jackson 2001 ). For example, the presence/
absence of lake trout ( a cold-water species ) was best
predicted with maximum depth; a habitat factor that influences the mixing characteristics and the thermal regime of lakes. Lake elevation and the number of growingdegree days were the most important predictors of
smallmouth bass occurrence; a finding consistent with
the sensitive thermal requirements of this species (Shuter
& Post 1990). Furthermore, shoreline perimeter was important for both these species and serves as an indirect
measure of the diversity of habitats available in lakes,
Olden
Modeling Biological Communities
861
Figure 4. Spatial distribution of the neural network’s performance for predicting fish community composition in
the 286 study lakes of Algonquin Provincial Park, Ontario, Canada (lat 4550N, long 7820W). Symbols represent the percent similarity between observed and predicted community composition. Increasing symbol size represents increased percent similarity between predicted and observed community composition.
which may be important to the small-bodied forage fish
upon which these species feed. Interestingly, shoreline
perimeter was also a significant predictor for 9 out of 13
small-bodied fish species modeled by the neural network, which suggests the importance of diverse nearshore habitats for providing refuge from predation ( Jackson et al. 2001).
Community models can play an important role in basic and applied fish ecology. For example, given the large
body of literature relating various fish species to biotic
and abiotic factors, community models such as the one
presented here could provide a means of gaining a
greater understanding of the role of the environment in
mediating patterns in community composition. The ability to correctly identify the entire membership of a community based on environmental conditions could be an
excellent test of our current state of knowledge regarding the factors governing community dynamics, as well
as where potential gaps in this knowledge exist.
In an applied sense, fish-community models could provide important predictions of the effects of environmental degradation in aquatic ecosystems. A community approach is particularly attractive because indicator
species are often absent for reasons unrelated to the
source of degradation, and as such the use of individual
species as measures of anthropogenic impact can be
misleading (Fausch et al. 1990). Therefore, by modeling
communities found at reference sites ( areas that have
been minimally affected by human activities ), model
predictions could be used as a benchmark for detecting
and assessing the impact of habitat alteration or loss,
thus offering a means of quantifying the loss of biological diversity. This approach would be similar to those of
many biomonitoring programs (e.g., Wright et al. 2000),
but it would have the distinct advantage of predicting
community composition and not merely the total number of species. For example, Zampella and Bunnell (1998)
showed that changes in the composition of fish assemblages in Pinelands streams are tightly associated with
watershed disturbance gradients, whereas species richness does not reflect this disturbance gradient.
The aquatic communities of Algonquin Provincial Park
may represent a good model system for central Ontario
because these lakes have been minimally affected by human activities (except perhaps for limited introduction
of smallmouth bass during the early 1900s ). Fish com-
Conservation Biology
Volume 17, No. 3, June 2003
862
Modeling Biological Communities
munities that were predicted poorly by the neural network ( i.e., percent similarity between observed and
predicted communities was 70% ) indicate lake communities that differ from expectations based on habitat
conditions ( Fig. 4 ). These lakes possibly represent systems that are, or have been, affected by human activities
in the park. A number of the poorly predicted lakes (i.e.,
Basin, Kearney, Shall, Tepee, Tim, and Whitefish) are located in “access zones” that provide the only points of
entry to the park by public roads. Anglers are a primary
vector for fish movement across the landscape (Litvak &
Mandrak 1993), so access zones may act as source points
of species introductions. For example, the first report of
northern pike ( Esox lucius ) in the park dates back to
the mid-1980s ( first verified record in 1990 ) in Shall
Lake. Currently, northern pike occur in a series of lakes
between Shall Lake and Booth Lake ( N. Mandrak, personal communication ). The fish communities of all of
these lakes ( i.e., Booth, Farm, Shall, Tip Up ) were predicted poorly, suggesting that northern pike may be an
important biotic force in structuring these communities,
resulting in the decoupling of the community-environment relationships modeled by the neural network. Indeed, northern pike adversely affect littoral prey-fish
communities in north-temperate lakes ( Whittier et al.
1997; Findlay et al. 2000).
Community models that incorporate the occurrence
of northern pike (or other non-native species) as a predictor variable may provide important insights into how
the local establishment of these species shapes the structure of the invaded fish communities and therefore may
help indicate proactive conservation priorities for preserving communities vulnerable to future invasions. Ultimately, the nature of the disturbance could be inferred
by identifying species that were absent from the target
community but predicted to occur in the reference community. For example, because there are major differences in the sensitivity of fish species to habitat loss and
predation (Whittier & Hughes 1998; Findlay et al. 2000),
one might expect that two fish assemblages with the
same number of species may differ substantially in their
susceptibility to species invasions and environmental
change. My species-specific modeling approach would
thus provide the opportunity to understand and predict
changes in the vulnerability of communities, whereas
this is not possible with community metrics or assemblage types.
Conclusion
It is now generally recognized that species are complementary in biological communities, in that species redundancy may provide insurance against potential biological collapse ( Walker 1992 ). Differences among
species are critical for the maintenance of a diverse re-
Conservation Biology
Volume 17, No. 3, June 2003
Olden
gional species pool because regional diversity will be determined largely by spatial variation in the environment
and the differential responses of species with varying requirements. For example, differences among species
can influence the response of ecosystems to environmental change (Brown et al. 2001); therefore, the ability
to predict the membership of species in communities
will be critical to forecasting whether or not ecosystem
processes will be preserved. As such, the ability to develop a single, integrative model for predicting complete community membership from a common set of environmental variables is particularly advantageous.
Given that biological diversity and integrity can be lost
in a variety of ways, including at the genetic, species,
assemblage, and ecosystem levels of biological organization, a community-level approach to developing predictive models takes an important step beyond the focus on
populations and toward an appreciation of the hierarchy
of biological diversity that needs to be conserved and
managed. The development and application of community predictive models that explicitly consider species
membership—and thus each species’ functional role
in the community—should contribute significantly to
more-efficient management strategies and should minimize the problems of integrating multiple management
strategies for each species of a community.
Acknowledgments
I am thankful for the helpful comments provided by
E. Meir, S. Lek, and B. Neff and for the continuous intellectual contributions made by D. Jackson and P. PeresNeto over the years over morning coffee. I especially
thank N. Mandrak for his insights into the fish communities of Algonquin Park and the Poff Field Station for
providing a simulating working environment during this
project. Funding for this research was provided by a Natural Sciences and Engineering Council of Canada Graduate Scholarship.
Literature Cited
Anderson, M. J., and A. Clements. 2000. Resolving environmental disputes: a statistical method for choosing among competing cluster
models. Ecological Applications 10:1341–1355.
Angermeier, P. L., and M. R. Winston. 1999. Characterizing fish community diversity across Virginia landscapes: prerequisite for conservation. Ecological Applications 9:335–349.
Bishop, C. M. 1995. Neural networks for pattern recognition. Clarendon Press, Oxford, United Kingdom.
Brown, J. H., M. S. K. Ernest, J. M. Parody, and J. P. Haskell. 2001. Regulation of diversity: maintenance of species richness in changing
environments. Oecologia 126:321–332.
Capone, T. A., and J. A. Kushlan. 1991. Fish community structure in
dry-season stream pools. Ecology 72:983–992.
Caro, T. M., and G. O’Doherty. 1999. On the use of surrogate species
in conservation biology. Conservation Biology 13:805–814.
Olden
Caughley, G. 1994. Directions in conservation biology. Journal of Animal Ecology 63:215–235.
Chase, M. K., W. B. I. Kristan, A. J. Lynam, M. V. Price, and J. T. Rotenberry. 2000. Single species as indicators of species richness and
composition in California coastal sage scrub birds and small mammals. Conservation Biology 14:474–487.
Crossman, E. J., and N. E. Mandrak. 1992. An analysis of fish distribution and community structure in Algonquin Park: annual report for
1991 and completion report, 1989–1991. Ontario Ministry of Natural Resources, Toronto, Ontario, Canada.
Eadie, J. M., and A. Keast. 1984. Resource heterogeneity and fish species diversity in lakes. Canadian Journal of Zoology 62:1689–1695.
Fausch, K. D., J. Lyons, J. R. Karr, and P. L. Angermeier. 1990. Fish
communities as indicators of environmental degradation. American
Fisheries Society Symposium 8:123–144.
Findlay, C. S., D. G. Bert, and L. Zheng. 2000. Effect of introduced piscivores on native minnow communities in Adirondack lakes. Canadian Journal of Fisheries and Aquatic Sciences 57:570–580.
Fleishman, E., R. B. Blair, and D. D. Murphy. 2001. Empirical validation
of a method for umbrella species selection. Ecological Application
11:1489–1501.
Franklin, J. F. 1993. Preserving biodiversity: species, ecosystems or
landscapes? Ecological Applications 3:202–205.
Hay, M. E. 1994. Species as ‘noise’ in community ecology: do seaweeds block our view of the kelp forests? Trends in Ecology & Evolution 9:414–416.
Jackson, D. A., K. M. Somers, and H. H. Harvey. 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence? The American Naturalist 133:436–453.
Jackson, D. A., K. M. Somers, and H. H. Harvey. 1992. Null models and
fish communities: evidence of nonrandom patterns. The American
Naturalist 139:930–951.
Jackson, D. A., P. R. Peres-Neto, and J. D. Olden. 2001. What controls
who is where in freshwater fish communities: the roles of biotic,
abiotic, and spatial factors. Canadian Journal of Fisheries and
Aquatic Sciences 58:157–170.
Legendre, P., and L. Legendre. 1998. Numerical ecology. Elsevier Scientific, Amsterdam.
Lek, S., M. Delacoste, P. Baran, I. Dimopoulos, J. Lauga, and S. Aulagnier.
1996. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling 90:39–52.
Litvak, M. K., and N. E. Mandrak. 1993. Ecology of fresh-water baitfish
use in Canada and the United States. Fisheries 18:6–13.
Magnuson, J. J., W. M. Tonn, A. Banerjee, J. Toivonen, O. Sanchez, and
M. Rask. 1998. Isolation vs. extinction in the assembly of fishes in
small northern lakes. Ecology 79:2941–2956.
Martin, C. M. 1994. Recovering endangered species and restoring ecosystems: conservation planning for the twenty-first century in the
United States. Ibis 137:S198–S203.
Matuszek, J. E., and G. L. Beggs. 1988. Fish species richness in relation
to lake area, pH, and other abiotic factors in Ontario lakes. Canadian Journal of Fisheries and Aquatic Sciences 45:1931–1941.
McKinney, M. L. 1997. Extinction vulnerability and selectivity: combining ecological and paleontological views. Annual Review of
Ecology and Systematics 28:495–516.
Milligan, G. W., and M. C. Cooper. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179.
Minns, C. K. 1989. Factors affecting fish species richness in Ontario
lakes. Transactions of the American Fisheries Society 118:533–545.
Mucina, L. 1997. Classification of vegetation: past, present and future.
Journal of Vegetation Science 8:751–760.
Niemi, G. J., J. M. Hanowski, A. R. Lima, T. Nicholls, and N. Weiland.
Modeling Biological Communities
863
1997. A critical analysis on the use of indicator species in management. Journal of Wildlife Management 61:1240–1251.
Noss, R. F. 1990. Indicators for monitoring biodiversity: a hierarchical
approach. Conservation Biology 4:355–364.
Olden, J. D., and D. A. Jackson. 2001. Fish-habitat relationships in lakes:
gaining predictive and explanatory insight by using artificial neural networks. Transactions of the American Fisheries Society 130:878–897.
Olden, J. D., and D. A. Jackson. 2002a. Illuminating the “black box”:
understanding variable contributions in artificial neural networks.
Ecological Modelling 154:135–150.
Olden, J. D., and D. A. Jackson. 2002b. A comparison of statistical approaches for modelling fish species distributions. Freshwater Biology 47:1976–1995.
Prendergrast, J. R., R. M. Quinn, J. H. Lawton, B. C. Eversham, and D.
W. Gibbons. 1993. Rare species, the coincidence of diversity
hotspots and conservation strategies. Nature 365:335–337.
Pusey, B. J., M. J. Kennard, and H. Arthington. 2000. Discharge variability and the development of predictive models relating stream
fish assemblage structure to habitat in northeastern Australia. Ecology of Freshwater Fishes 9:30–50.
Rodriguez, M. A., and W. M. J. Lewis. 1997. Structure of fish assemblages along environmental gradients in floodplain lakes of the
Orinoco River. Ecological Monographs 67:109–128.
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-propagation errors. Nature 323:533–536.
Shuter, B. J., and J. R. Post. 1990. Climate, population viability, and the
zoogeography of temperature fishes. Transactions of the American
Fisheries Society 119:314–336.
Simberloff, D. 1997. Flagships, umbrellas, and keystones: Is single-species management passé in the landscape era? Biological Conservation 83:247–257.
Simberloff, D., and T. Dayan. 1991. The guild concept and the structure of ecological communities. Annual Review of Ecology and Systematics 22:115–143.
Tonn, W. M., J. J. Magnuson, M. Rask, and J. Toivonen. 1990. Intercontinental comparison of small-lake fish assemblages: the balance between local and regional processes. The American Naturalist 136:
345–375.
Urbanska, K. M. 2000. Environmental conservation and restoration
ecology: two facets of the same problem. Web Ecology 1:20–27.
Vane-Wright, R. I., C. J. Humphries, and P. H. Williams. 1991. What to
protect? Systematics and the agony of choice. Biological Conservation 55:235–254.
Walker, B. H. 1992. Biodiversity and ecological redundancy. Conservation Biology 6:18–23.
Washington, H. G. 1984. Diversity, biotic and similarity indices. New
Zealand Journal of Marine and Freshwater Research 18:653–694.
Whittier, T. R., and R. M. Hughes. 1998. Evaluation of fish species tolerances to environmental stressors in lakes in the northeastern
United States. North American Journal of Fisheries Management
18:236–252.
Whittier, T. R., D. B. Halliwell, and S. G. Paulsen. 1997. Cyprinid distributions in Northeast U.S.A. lakes: evidence of regional-scale minnow biodiversity losses. Canadian Journal of Fisheries and Aquatic
Sciences 54:1593–1607.
Wright, J. F., D. W. Sutcliffe, and M. T. Furse. 2000. Assessing the biological quality of fresh waters: RIVPACS and other techniques.
Freshwater Biological Association, Ambleside, Cumbria, U.K.
Young, T. P. 2000. Restoration ecology and conservation biology. Biological Conservation 92:73–83.
Zampella, R. A., and J. F. Bunnell. 1998. Use of reference-site fish assemblages to assess aquatic degradation in Pinelands streams. Ecological Applications 8:645–658.
Conservation Biology
Volume 17, No. 3, June 2003