Department of Information Science Home

Ecological Modelling 161 (2003) 67 /78
www.elsevier.com/locate/ecolmodel
Modelling Microcystis aeruginosa bloom dynamics in the
Nakdong River by means of evolutionary computation and
statistical approach
Kwang-Seuk Jeong a, Dong-Kyun Kim a, Peter Whigham b, Gea-Jae Joo a,*
a
Department of Biology, Pusan National University, Jang-Jeon Dong, Gum-Jeong Gu, Busan 609-735, South Korea
b
Department of Information Science, University of Otago, PO Box 56, Dunedin, New Zealand
Received 15 January 2002; received in revised form 22 July 2002; accepted 31 July 2002
Abstract
Dynamics of a bloom-forming cyanobacteria (Microcystis aeruginosa ) in a eutrophic river /reservoir hybrid system
were modelled using a genetic programming (GP) algorithm and multivariate linear regression (MLR). The lower
Nakdong River has been influenced by cultural eutrophication since construction of an estuarine barrage in 1987.
During 1994 /1998, the average concentrations of nutrients and phytoplankton were: NO3 /N, 2.7 mg l 1; NH4 /N,
1
0.6 mg l 1; PO3
; and chlorophyll a , 50.2 mg l 1. Blooms of M. aeruginosa occurred in summers when
4 /P, 34.7 mg l
there were droughts. Using data from 1995 to 1998, GP and MLR were used to construct equation models for
predicting the occurrence of M. aeruginosa . Validation of the model was done using data from 1994, a year when there
were severe summer blooms. GP model was very successful in predicting the temporal dynamics and magnitude of
blooms while MLR resulted rather insufficient predictability. The lower Nakdong River exhibits reservoir-like
ecological dynamics rather than riverine, and for this reason a previous river mechanistic model failed to describe
uncertainty and complexity. Results of this study suggest that an inductive-empirical approach is more suitable for
modelling the dynamics of bloom-forming algal species in a river /reservoir transitional system.
# 2002 Elsevier Science B.V. All rights reserved.
Keywords: Genetic programming; Multivariate linear regression; Microcystis aeruginosa ; Algal blooms; Ecological modelling;
Nakdong River
1. Introduction
A comprehensive understanding of ecosystem
dynamics requires numerous approaches. Ecologi-
* Corresponding author. Tel.: /82-51-510-2258; fax: /8251-581-2962.
E-mail address: [email protected] (G.-J. Joo).
cal modelling is a good solution for this purpose
when it contains adequate data and expressions for
the system of interest (Straskraba, 1994). To
interpret data or forecast ecological conditions,
researchers traditionally select deductive mathematical or statistical models. Due to the difficulties
involved in solving complex interactions among
diverse variables and parameters, sophisticated
machine learning techniques [e.g. artificial neural
0304-3800/03/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 2 8 0 - 6
68
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
networks (ANNs) and evolutionary computation
(EC)] recently have been applied in ecological
modelling (Lek et al., 1996; Recknagel, 1997;
Fielding, 1999; Whigham, 2000).
Genetic programming (GP) is a technique
derived from evolutionary computation, originally
based on evolving variable-length LISP programs
(Koza, 1992). GP is a population-based search
that evolves tree-like structures that represent
functions, equations, or programs. GP performs
generational evolution of the solution candidates
to find the best solution for a certain problem from
the solution space (Banzhaf et al., 1998). This is
achieved by applying various search operators
such as crossover (swapping sub-trees between
parents) and mutation (randomly recreating a
subtree of a parent). The inductive stochastic
approach of GP, combined with few assumptions
regarding the form or limitations of developed
models, shows some advantages over other approaches in modelling freshwater ecosystems.
Of the various freshwater resources, river systems are particularly impacted by human activities, frequently displaying cultural eutrophication.
This sometimes is exacerbated by regulation of
water flow (Moss, 1998). Extended retention time
as well as excessive nutrient loads can result in
severe blooms of blue /green algae in rivers. Algal
blooms are stimulated by various circumstances so
that it is difficult to develop a fixed model, which
considers all possible situations (Recknagel, 1997;
Jeong, 2000). Genetic programming can search for
suitable variables as well as their interactions by
evaluating the underlying data for significant
patterns. This allows models to be developed that
consider the behavior of algal blooms in each
specific environment.
The lower Nakdong River has exhibited severe
blue /green algal blooms in hot summer months
(Ha, 1999). Acceleration of eutrophication due to
the construction of the barrage at the river mouth,
coupled with high nutrient loads caused this
situation (Joo et al., 1997). Some modelling efforts
have attempted to predict the blooms, but they
considered mainly water quality, and did not
provide an ecosystem perspective in this river.
In this study, time-series dynamics of Microcystis aeruginosa blooms were modelled using an
extended GP technique, specifically designed for
time-series models. The empirical database from
the lower Nakdong River was used to evolve the
best model to predict the time-series changes of M.
aeruginosa . By varying model inputs in a forecasting mode, this model can be used in water quality
and ecosystem management applications. To compare the capability of GP modelling, multivariate
linear regression (MLR) model was also constructed, and time-series prediction between them
was evaluated. The present study provides a good
example of application of GP to a river/reservoir
hybrid system.
2. Description of the study site
The Nakdong River basin is situated in the
monsoon climate of South Korea (35 378 N, 127/
1298 E) (Fig. 1). South Korea experiences four
distinct seasons, and is characterized by heavy
rainfall during the monsoon period and several
typhoon events. The annual mean rainfall across
the Nakdong River basin is about 1200 mm, and
more than 50% of the total amount is concentrated
during the hot summer months (June /August).
The annual mean water temperature at the study
site was 13.7 8C. The mean water temperature was
2.2 8C during the coldest month (January), and
25.9 8C in the warmest (August).
The main channel of the river is 521.5 km long,
and the catchment area occupies about 25% of the
whole country, covering 23,635 km2. The Mulgum
station of the Nakdong River, from which data for
the model were collected, is situated 27.4 km
upstream of the estuarine dam at the river mouth.
It has a maximum water depth of /11 m, a mean
depth of /4 m, and a river width of 250/300 m.
A degradation of river water quality and a loss
of riparian zone have occurred in the last three
decades due to urban development and high water
demand (Joo et al., 1997). The Nakdong River has
four multi-purpose dams and an estuarine barrage
for preventing salt-water intrusion. Over 10 million people depend on the river for drinking water,
and it also is a source of agricultural and industrial
water supply. Physical alterations combined with
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
Fig. 1. Map of the study site. , multi-purpose dams;
,
estuarine barrage; , rainfall gauging stations; , river study site
(Mulgum, RK 27).
sewage input have accelerated eutrophication of
the lower part of the river (Kim et al., 1998).
3. Materials and methods
3.1. Limnological data collection
Precipitation data were obtained from five
representative meteorological stations within the
Nakdong River basin (Andong, Daegu, Hapchun,
Jinju, and Miryang) from 1994 to 1998. River flow
data were obtained from the Flood Control
Center. Irradiance, wind velocity, and evaporation
data were collected from the Busan Meteorological
69
Station, which is the nearest station to the study
site.
Weekly water samples were collected at 0.5 m
depth at the river site, and the following limnological parameters were measured: temperature,
Secchi transparency, pH, turbidity, concentrations
of dissolved oxygen (DO), nitrate (NO3 /N),
ammonia (NH4 /N), phosphate (PO3
4 /P), silica
(SiO2) and chlorophyll a, phytoplankton biovolume, and zooplankton abundance. Water temperature and DO were determined with a YSI
model 58 meter. Transparencies were determined
using a 20 cm Secchi disk. An Orion model 250A
meter was used to measure pH, and turbidity
(NTU) was measured by a model 11052 turbidimeter. Water samples were filtered using 0.45 mm
Whatman GF/C filters to determine nutrient
concentrations. Filtrates were frozen and analyzed
by a QuikChem Automated Ion Analyzer (NO3 /
N, no. 10-107-04-1-O; NH4 /N, no. 10-107-06-1B; PO3
4 /P, no. 10-115-01-1-B; SiO2, no. 10-11427-1-A). Chlorophyll a concentrations were determined spectrophotometrically after extraction according to Wetzel and Likens (1991).
Upon collection, phytoplankton was immediately preserved with Lugol’s solution. Identification of species was conducted with a Nikon light
microscope (/1000), using the following keys:
Foged (1978), Cassie (1989) and Round et al.
(1990). Phytoplankton was enumerated using an
inverted microscope (ZEISS, /400) by the Utermöhl (1958) sedimentation method. Biovolumes of
individual species were estimated from mean cell
dimensions and the cellular shape of each species
as described in Wetzel and Likens (1991). Individual cell volumes of 10 /25 cells were measured to
calculate mean species biovolume.
Zooplankton was collected from 0.5-m depth
using a 3.2-l Van Dorn water sampler until a total
of 8 l of water was obtained. This water was
filtered through a 35-mm net, and the retained
zooplankton was preserved with 10% formalin (4%
final concentration). Macrozooplankton (almost
exclusively Copepoda and Cladocera) was counted
with an inverted microscope at /25 /50 magnification. Microzooplankton (mostly Rotifera) was
counted with an inverted microscope at /100 /
400 magnification. Identifications of zooplankton
70
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
taxa were made to genus or species level (except
for juvenile Copepoda) using Koste (1978), Smirnov and Timms (1983) and Einsle (1993).
3.2. GP and MLR for the prediction of M.
aeruginosa blooms
The algal bloom model was developed using the
time-series optimization genetic programming
(TSOGP) system (Whigham and Keukelaar,
2001) (Fig. 2), a component of a TimeSeries
Toolbox solution developed at the University of
Otago. TSOGP is a grammar-based extension of
GP (Koza, 1992) that evolves candidate solutions
to a problem by using a population-based search
method (Holland, 1992). Solutions are evolved by
mixing and mutating selected individuals to create
new populations, where selection is driven by the
fitness of each candidate and therefore mimics
aspects of Darwinian Selection. The TSOGP
allows both the constants and the independent
variables within an evolving equation to be tuned
to yield the best prediction for a dependent state
variable, based on a language defined by a
context-free grammar. The grammar expresses
the form of the language (i.e. the functions,
operators, and their structure) that is used to
express the candidate solutions during model
construction (Whigham, 1995). Individuals created
by the grammar can be represented as a derivation
tree that has a certain depth based on the number
of productions used to construct the tree. This
depth can be used to limit the complexity of the
Fig. 2. Basic steps of TSOGP (modified from Whigham and
Keukelaar, 2001).
candidate solutions, and therefore gives some
control over the generalization of candidate solutions.
The evolutionary approach allows a large
search-space, defined by the grammar, to be
explored in an efficient manner, to discover nearoptimal solutions to the modelling problem defined by the user. An introduction to evolutionary
computation techniques may be found in Goldberg and Holland (1988), Goldberg (1989), Fogel
(1998) and Yao (1999).
The limnological variables investigated in this
study were used to evolve equations predicting
time-series changes of M . aeruginosa . The space of
possible equations was defined by the following
context-free grammar, that allowed linear combinations of the variables and random constants to
be expressed. Note that this grammar does not
bias the selection of variables or functions that are
used to construct the predictive equation.
Expression 0 ExpressionExpression
j Expression + Expression
Expression 0 Expression=Expression
j ExpressionExpression
Expression 0 ln(Expression)
Expression 0 Variable
Expression 0 Constant
Equation discovery was performed using data
from 1995 to 1998, while 1994 data were used for
model validation. Meteorological (wind velocity),
hydrological (rainfall, evaporation, and discharge), physico-chemical (turbidity, water temperature, Secchi depth, DO, pH, and nutrient
concentrations), and biological (Rotifera, Cladocera, Copepoda, Anabaena flos-aquae , Oscillatoria
limosa , and Stephanodiscus hantzschii ) data were
used. Chlorophyll a was excluded from the training data to avoid autocorrelation with M. aeruginosa . Weekly data were smoothly interpolated to a
daily time-step to satisfy both: (1) matching the
scale of daily averaged and weekly sampled data;
and (2) applicability of the developed model to
water management on a daily basis.
The GP algorithm evolved a 1-day forecast
model by having the independent input data lag
by 1 day from the prediction. In general, this n -
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
day-ahead input vector can be used to give future
predicting capacity to the developed model (Recknagel, 1997). Current algal biovolume was, thus,
calculated from day-before input values. The
depth of the evolving equation tree in TSOGP
was fixed at 3, to ensure that the constructed
models were not overly specific. The mathematical
operators defined by the grammar were simply
‘plus, minus, multiply, division, and natural logarithm’.
To find the best-predicting equation, various
cases of crossover and mutation rates were considered. In the case of crossover rate, six cases
from 70 to 95% (5% interval) were adapted. With a
1% interval, mutation rate varied from 1 to 5%. By
combining both parameters, a total of 30 cases
could be experimented, and each experiment having 10 replicates (total 300 replicates). This was
required since evolutionary computation systems
can be sensitive to initial conditions and the search
characteristics defined by these parameters.
Model selection was based on root mean
squared error (RMSE) during the evolution, and
among 300 equations, the best-predicting equation
was chosen by comparing the predicted and
observed values of M. aeruginosa . With the best
performing equation, two types of sensitivity
analyses were implemented [‘most influencing
parameter (MIP)’ and ‘sensitivity on wide-ranged
disturbance (SWD)]’, as applied by Jeong et al.
(2001a).
The model was disturbed by 9/1 standard
deviation (SD) for the sensitivity analyses. According to Zar (1999), 9/1 S.D. represents common
variation in a population, and 9/1.96 S.D. covers
about 95% of total data. The sensitivity analysis
with 9/1 S.D. can explain general circumstances of
interactions between algal species and input variables. The results of sensitivity analyses were
interpreted compared with known ecological information. The model was developed by means of
the GP shell time-series toolbox (Whigham et al.,
2001).
MLR modelling was achieved to compare the
predictability between linear modelling, with the
same modelling solution. Usually this algorithm is
used to analyze relationship among more than two
independent variables and a dependent variable
71
(Zar, 1999). Model equation of MLR could be
dictated as the following:
Yi ab1 X1i b2 X2i bj Xji o i
(1)
where Yi is dependent variable and X1i , X2i ,. . . are
independent variables. o i is an error term of this
equation. The criterion for defining the ‘best fit’
multiple regression equation is most commonly
that of least squares, which results in the regression equation with the minimum residual sum of
squares, i.e. as the following:
n
X
(Yi Ŷ i )2
(2)
i1
Regression methods could be utilized in various
ways (see Renshaw, 1991), both experimental as
well as field-surveyed data. From the comparison
of TSOGP and MLR, the performance of empirically searching algorithm can be evaluated.
The same input variables used in GP modelling
were adopted to MLR, and time-series prediction
with the produced equation was done. On the
same scale of time, results of both models were
compared with observed data.
4. Results
4.1. Limnological aspects
Most limnological data from the lower Nakdong River exhibited distinct inter-annual variability (Table 1). Most physico-chemical parameters
were related to rainfall amount in a certain year
except Secchi depth and DO. For example, water
temperature, turbidity, conductivity, alkalinity,
and nutrient concentrations varied according to
the fluctuation of total annual rainfall. In the case
of turbidity, a high value was observed in both
1994 and 1998, when there was algal proliferation
and high rainfall runoff.
Plankton communities displayed complex annual variability. Rotifera dominated the zooplankton during the study period, with a maximum in
1997. Blue/greens, including M. aeruginosa , A.
flos-aquae, and O. limosa , increased during
drought years (1994 /1996). Chlorophyll a con-
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
72
Table 1
The limnological characteristics of the lower Nakdong River for five years (1994 /1998)
Division
Parameters
Unit
Mean9/S.D.
5 years’
Meteorological
Hydrological
Physical
Chemical
Biological
Irradiance
Wind velocity
Precipitation
Discharge
Evaporation
Water
temperature
Secchi depth
Turbidity
pH
DO
Conductivity
Alkalinity
Nitrate-N
Ammonia-N
Phosphate-P
Silica
Rotifera
Cladocera
Copepoda
M. aeruginosa.
A. flos-aquae
O. limosa
S. hantzschii
Chlorophyll a
MJ m 2 day 1
m s 1
mm day 1
CMS
mm day 1
8C
cm
NTU
mg l 1
ms cm 1
mg CaCO3 l 1
mg l 1
mg l 1
mg l 1
mg l 1
ind. l 1
ind. l 1
ind. l 1
/106 mm3 ml 1
/106 mm3 ml 1
/106 mm3 ml 1
/106 mm3 ml 1
mg l 1
1994
1995
1996
1997
1998
12.89/6.5*
3.99/1.4
9749/306
5679/714
39/2
179/9
149/7
3.99/1.4
765
3999/79
49/2
209/10
149/6
4.09/1.3
841
4669/358
39/2
169/10
129/6
3.89/1.2
1007
4889/480
39/2
179/10
139/6
3.99/1.5
1352
6869/825
39/2
189/9
129/6
3.89/1.4
1670
7949/1184
39/1
179/8
749/25
189/54
8.49/0.8
10.89/4.0
3499/128
579/17
2.79/1.0
0.69/0.7
34.79/25.2
4.39/3.8
16449/3250
919/311
609/151
2.849/12.34
0.569/1.41
0.699/1.66
15.109/24.14
50.29/91.5
729/22
209/64
8.79/0.9
9.99/3.8
3129/92
559/13
1.89/0.9
0.39/0.3
33.19/22.1
3.69/2.3
12419/2086
259/58
239/43
5.349/18.81
0.549/0.93
1.089/1.19
12.979/26.74
84.79/178.5
759/20
129/35
8.39/0.6
11.49/3.6
4059/118
669/13
2.59/1.0
0.89/0.8
34.39/25.2
2.69/2.8
12859/1764
2019/588
659/147
1.429/3.08
0.879/1.78
1.119/2.71
17.249/29.42
65.59/74.7
749/22
99/9
8.49/0.7
11.99/3.9
3969/114
679/13
2.39/1.0
0.79/0.6
20.59/15.2
3.09/2.3
10219/1274
719/176
439/67
3.669/11.51
0.809/1.95
0.689/1.29
20.899/27.11
48.59/49.2
749/32
199/38
8.59/0.8
10.29/4.5
3749/146
589/17
3.39/0.8
0.39/0.3
32.79/23.0
4.69/4.2
30469/5713
799/140
1099/251
3.649/15.82
0.599/1.34
0.539/1.67
10.229/22.88
37.59/80.6
749/23
279/91
8.09/0.8
10.59/3.4
2509/76
419/9
3.29/0.5
0.89/1.0
52.89/27.9
7.59/4.4
13049/1747
309/61
369/62
0.159/0.39
0.019/0.03
0.079/0.18
9.509/11.48
28.09/26.4
* Mean9/S.D., 5 years’ data (n/263; 52 /53 in each year).
centration had its highest annual average in 1994
when the three cyanobacteria species severely
proliferated.
4.2. Equation discovery and model performance
Genetic programming successfully achieved
equation-discovery for the prediction of timeseries changes of M. aeruginosa in the river system,
with 1-day-ahead data input. The RMSE of the
developed model was less than 0.001. Various
input parameters were selected during the evolution of equations (Fig. 3), and A . flos-aquae ,
turbidity, and silica concentration frequently
were used. In particular, A. flos-aquae was included in all 300 equations, indicating an underlying relationship with the dependent variable.
Variable selectivity for nutrient concentration
was higher than for meteorological and hydrological parameters. Zooplankton was less used,
and other algal species except A. flos-aquae were
relatively infrequent.
The best-predicting model equation had a 75%
crossover rate and 5% mutation rate. Eq. (3) is the
simple model:
Microcystis aeruginosa(t1)
(0:49293Sec(t)0:50707Ana(t))
(0:83014Eva(t)0:16986Turb(t)) (3)
where Sec is Secchi depth, Ana is the biovolume of
A . flos-aquae , Eva is evaporation, and Turb is
turbidity. Both the variables and their constants
were simultaneously tuned during model evolution.
Compared with the observed time-series
changes, the prediction of M. aeruginosa fit quite
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
73
Fig. 3. Variable selection during evolution of GP model. Wind, wind velocity; Rain, rainfall; Dis, discharge; Eva, evaporation; WT,
water temperature; Sec, Secchi depth; Turb, turbidity; DO, dissolved oxygen; NO3, nitrate; NH4, ammonia; PO4, phosphate; SiO2,
dissolved silica; Rot, Rotifera; Cla, Cladocera; Cop, Copepoda; Ana, A. flos-aquae ; Osc, O. limosa ; Ste, S. hantzschii .
well (Fig. 4A), with the timing and magnitude of
bloom being well represented. Although a slight
over-estimation occurred during April and June,
the highest peak was effectively modelled by the
equation.
Among the four input variables, A. flos-aquae
had the most influence on the time-series changes
of M. aeruginosa (Fig. 4B), followed by turbidity
and evaporation. While Secchi depth had almost
no influence, turbidity (NTU) was highly related
to the output calculation. The result of SWD
analysis indicated a linear relationship between
input values and output (Fig. 4C). Apart from
Secchi depth, the other three variables had positive
effects on M. aeruginosa .
MLR induced an equation with the whole input
variables (Eq. (4)). Prediction on time series
showed that this model is rather unsatisfactory
(regression coefficient r for test data was 0.08
versus GP accuracy of 0.77) on the accuracy, even
though the timing of peaks was relatively correct.
(Fig. 4A). The model produced negative values on
the prediction.
Microcystis
47223:339Rain0:009Wind
585821:152Eva3:057105 Dis
0:0111Sec13054:426Turb85020:205
WT1102600:058pH
167271:631DO0:0285NO3 2124:290
NH4 0:001PO4
62798:045SiO84:270Rot8:766Cla
0:0002Cop2:377Ana
1:326Osc1:585Ste
(4)
74
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
trophication is largely influenced by flow regulation and nutrient loadings (Webb and Walling,
1992). Joo et al. (1997) suggested that eutrophication in the lower Nakdong River was mainly due
to the regulation of water flow. The construction
of an estuarine barrage in 1987 had a synergistic
effect with nutrient inputs, due to increased water
retention time with the slower slope of the river
bed (approximately 1:380) (Song et al., 1993; Heo
et al., 1995). Intense regulation of flow at the
multi-purpose dams and the estuarine dam at the
river mouth have further contributed to the
accelerated eutrophication and complex behavior
of the river ecosystem (e.g. grazing impacts of
zooplankton on phytoplankton causing ‘clear
water phase’) (Jeong, 2000; Kim et al., 2001; Jeong
et al., 2001a).
Blue/green algal proliferation is an unusual
phenomenon in flowing waters, and this could be
the result of complicated ecological interactions.
Blue /greens rarely occur in streams and rivers
except in pool-like reaches (Reynolds, 1992). Ha
(1999) reported the occurrence of Microcystis
blooms in the lower Nakdong River, and a distinct
vertical distribution of Microcystis was observed
(Ha et al., 2000). Recently this genus has been
reported in other rivers, including the Great Ouse
(UK), Neuse (USA) and Hawkesbury (Australia)
(Paerl, 1987; Marker and Collett, 1997; Rose and
Balbi, 1997; Mitrovic et al., 1999). Ha et al. (1999)
suggested that intensive water flow regulation is
one of the most common causes of the cyanobacteria blooms.
5.2. Ecological modelling of algal dynamics
Fig. 4. Result of prediction and sensitivity analysis with the GP
model. (A) Time-series prediction; (B) sensitivity of MIP
analysis; (C) result of SWD analysis.
5. Discussion
5.1. Limnology of the lower Nakdong River
The lower Nakdong River displayed reservoirlike characteristics, and its nutrient and chlorophyll a concentrations indicated a eutrophic
situation according to Wetzel (1983). River eu-
The proliferation of particular algal taxa in river
systems is a complicated problem to solve using
deductive approaches. The unique characteristic of
flow, which distinguishes rivers from other freshwater ecosystems, typically governs patterns of
plankton dynamics (Reynolds, 1984, 1992).
Although great efforts have been undertaken in
modelling algal blooms in diverse ways (e.g.
Kamp-Nielsen, 1978; Reynolds, 1984; Sommer et
al., 1986; Kromkamp and Walsby, 1990), they
were mainly from deterministic or heuristic methodologies. Controlled flow may change normal
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
conditions and situations of river ecology, and this
can result in poor or incorrect model performance
(Jeong et al., 2001a). Data-driven inductive methods are thus feasible for developing predictive and
elucidative models for certain phenomena, because
they can incorporate all specific contributions to
the dependent output variable.
Genetic programming has been shown as a
technique that produces good model development
on time-series plankton dynamics. There are other
previous examples of applying GP to freshwater
algal dynamics (e.g. Whigham and Recknagel,
1999, 2000). Their results were mainly from lakes,
where they demonstrated that time-series prediction of blue /green algae could be successfully
achieved. Similar to their findings, the present
results support use of evolutionary computation
approaches to a river/reservoir ecosystem.
5.3. Model performance
Inductive modelling produced a good performance against general statistical model. For many
cases MLR can be used to approximate solutions
on time-series dataset. In the lower Nakdong
River, MLR failed to produce an accurate prediction. Jeong et al. (2001a,b) and Jeong et al. (2002)
emphasized during neural network model developments that the limnological phenomena in the
river are much complicated to analyze with traditional methods. Furthermore, the number of
selected variables in GP is simpler than MLR,
which may encourage the application of GP to
ecosystem easily.
Time-series prediction with an n -day-ahead
vector was successfully achieved in this study. All
input data were fed to the algorithm, and its
predictability was good. This type of prediction
was done in Recknagel (1997) and Recknagel and
Wilson (2000) for neural network models. Most
water quality management uses mechanistic models, and this type of generalized model is able to
approximate environmental changes. However,
their capacity is partially limited due to factors
such as geological and geographic difference, nonlinearity and uncertainty, and especially the absence of biological information. Machine learning
techniques, including GP with n -day-ahead pre-
75
diction, can easily encourage the development and
utilization of inductive models. These approaches
fulfill the general requirements of a management
strategy, such as cost-benefit efficiency and accurate prediction.
5.4. Evolutionary computation for ecological
modelling
From several decades the interdisciplinary research between mathematics and ecology has
encouraged analytical and simulative approaches
on ecological dynamics. Great efforts of developing equation models on algal dynamics could be
consulted for this purpose (see Dillon and Rigler,
1974; Bierman, 1976). Usually the deductive
approach produced reductionistic models based
on existing theories and knowledge which enable
users to simulate rather to predict behavior
(Recknagel, 1997; see Dzeroski et al., 1999). Also
reductionisms may cause emission of information
which could be important to explain specific
features. Costanza and Sklar (1985) emphasized
less relationship between larger articulations (used
variables in models) and model effectiveness. On
the contrary too many variables caused deterioration of model performance. Ecological and environmental models are themselves simplified
representations of real nature so that the right
complexity, adequate components and processes
selection are most important for the problem in
focus (Jørgensen, 1997). The empirical model of
TSOGP in this study might satisfy this point of
view.
The characteristics of TSOGP may become one
of reasonable solver on ecological modelling.
Genetic programming is as itself a derivative of
GA so that it is based on global search of the best
solution. This has advantages of preventing local
minima exploration (Banzhaf et al., 1998). Also
TSOGP could select articulations and their parameter as well, which is so-called ‘automated model
construction’ according to intra-relationship in
data. Equation discovery to ecological modelling
was utilized in various directions indeed, and
systems of discovery algorithms have been developed continuously. Dzeroski et al. (1999) summarized algorithms of equation discovery, and
76
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
emphasized the goal purpose of these
algorithms */model reconstruction and information search. Equation searching of TSOGP
achieved in this study could satisfy criteria suggested by Dzeroski et al. (1999), and had some
more advantages indicated above. Good performance of the equation model developed by
TSOGP in this study might be explained in this
perspective.
Compared with a neural network, GP used a
smaller number of articulates in the developed
model. In the lower Nakdong River, Jeong et al.
(2001a) used 16 variables to predict time-series
changes of algal biomass with a time-delayed
recurrent neural network. In this study, although
19 variables were selected as training data, only
four variables were required to predict cyanobacteria biovolume with a high accuracy. In general,
neural network models are good tools for classification and prediction of ecological data (Chon et
al., 1996; Recknagel, 1997). However, compared
with neural networks, inductive equation models
have advantages in terms of the types of expressions that can be explored (equations, rules, etc),
and the fact that the results can be interpreted as
proper predictive equations.
Ensembles among EC */including TSOGP */
and other informative systems encouragingly
spread nowadays, and capacity of ecological
modelling is expanded. For instance, ANN architecture and its weights can be determined by EC,
which is so-called ‘evolving neural networks
(ENNs)’. Also Medsker (1996) summarized various possibilities of computational works and
those can be suitably adapted to ecological researches. If interdisciplinary efforts and accurate
ecological data acquisition are available, more
information we have not seen can be found and,
feasible modelling technique for specific and
complicated features must be possible as well.
6. Conclusion
Mining dynamic and complicated ecological
data require a suitable methodology to obtain a
satisfactory result. The increasing amount of data
that has become available today for aquatic
ecosystems can support the basic requirements of
machine learning techniques. Models derived from
such techniques can address the changing environments that occur due to human intervention.
Machine learning approaches such as GP are
good tools for this purpose, and suitable ecological
models derived from these approaches, based on
accumulated datasets, can give valuable insights
into ecosystem function and behavior. Results of
this study suggest that the evolutionary computation was suitable for modelling the dynamics of
bloom-forming algal species in a river/reservoir
transitional system.
Acknowledgements
The authors are grateful to Dr. H.W. Kim of
Sunchon National University, and Dr. K. Ha of
Pusan National University (PNU) for providing
plankton community data. We also thank Mr. S.B.
Park, Mr. J.S. Kim, and Mr. J.G. Kim of PNU for
assistance in the field. We also are indebted to Dr.
Friedrich Recknagel of the University of Adelaide
for paying warm attention during the preparation
of this article. This study was financially supported
by the Institute of Environmental Technology and
Industry (IETI) (project no. 01-10-99-01-A-1).
This is a contribution no. 28 of the Nakdong
River Ecosystem Study in Limnology Lab., PNU.
References
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D., 1998.
Genetic Programming. On the Automatic Evolution of
Computer Programs and its Applications. Morgan Kaufmann Publishers, California, p. 470.
Bierman, V.J., 1976. Mathematical model of the selective
enhancement of blue /green algae by nutrient enrichment.
In: Canale, R.P. (Ed.), Modeling Biochemical Processes in
Aquatic Ecosystems. Ann Arbor Science Publishers, Ann
Arbor, MI, pp. 1 /32.
Cassie, V., 1989. A Contribution to the Study of New Zealand
Diatoms. Cramer, Berlin, p. 266.
Chon, T.S., Park, Y.S., Moon, K.H., Cha, E.Y., 1996.
Patternizing communities by using an artificial neural network. Ecol. Modelling 90, 69 /78.
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
Costanza, R., Sklar, F.H., 1985. Articulation, accuracy and
effectiveness of mathematical models: a review of freshwater
wetlands applications. Ecol. Modelling 27, 45 /69.
Dillon, P.J., Rigler, F.H., 1974. The phosphorus /chlorophyll
relationship in lakes. Limnol. Oceanogr. 19, 767 /773.
Dzeroski, S., Todorovski, L., Bratko, I., Kompare, B., Krizman, V., 1999. Equation discovery with ecological applications. In: Fielding, A.H. (Ed.), Machine Learning Methods
for Ecological Applications. Kluwer Academic Publishers,
Massachusetts, pp. 185 /208.
Einsle, U., 1993. Crustacea, Copepoda, Calanoidia and Cyclopoida. Susswasserfauna von Mitteleuropa, Part 4-1, vol. 8.
Fisher, Stuttgart, p. 208.
Fielding, A.H., 1999. An introduction to machine learning
methods. In: Fielding, A.H. (Ed.), Machine Learning
Methods for Ecological Applications. Kluwer Academic
Publishers, Massachusetts, pp. 1 /35.
Foged, E., 1978. Diatoms in Eastern Australia. Cramer, Berlin,
p. 243.
Fogel, D., 1998. Evolutionary Computation: The Fossil Record. IEEE Press, Piscataway, NJ, p. 641.
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York,
p. 412.
Goldberg, D.E., Holland, J.H., 1988. Genetic algorithms and
machine learning. Machine Learn. 3 (2 /3), 95 /99.
Ha K., 1999. Phytoplankton community dynamics and Microcystis bloom development in a hypertrophic river (Nakdong
River, Korea). PhD dissertation. Pusan National University, Busan, p. 140.
Ha, K., Cho, E.A., Kim, H.W., Joo, G.J., 1999. Microcystis
bloom formation in the lower Nakdong River, South
Korea: importance of hydrodynamics and nutrient loading.
Mar. Freshwater Res. 50, 89 /94.
Ha, K., Kim, H.W., Jeong, K.S., Joo, G.J., 2000. Vertical
distribution of Microcystis population in the regulated
Nakdong River. Kor. J. Limnol. 1, 225 /230.
Heo, W.M., Kim, B.C., Hwang, G.S., Choi, K.S., Park, W.K.,
1995. The distributions of phosphorus, nitrogen, and
chlorophyll a concentration in the Nakdong River. Kor. J.
Limnol. 28, 175 /181.
Holland, J.H., 1992. Adaptation in Natural and Artificial
Systems, 2nd ed.. MIT Press, New York, p. 211.
Jeong, K.S., 2000. Statistical evaluation and application of
artificial neural networks on water quality of the lower
Nakdong River. MSc thesis, Pusan National Univeristy,
Busan, p. 74.
Jeong, K.S., Joo, G.J., Kim, H.W., Ha, K., Recknagel, F.,
2001a. Prediction and elucidation of algal dynamics in the
Nakdong River (Korea) by means of a recurrent artificial
neural network. Ecol. Model. 146, 115 /129.
Jeong, K.S., Jang, M.H., Park, S.B., Cho, G.I. And Joo, G.J.,
2001b. Neuro-genetic learning to the algal dynamics: a
preliminary experiment for the new technique to the
ecological modeling. Proceeding of the Korean Environmental Science Society, pp. 234 /235.
77
Jeong, K.S., Recknagel, F.S. Joo, G.J., 2002. Prediction and
elucidation of population dynamics of a blue-green alga
(Microcystis aeruginosa ) and diatom (Stephanodiscus hantzchii ) in the Nakdong River-Reservoir System (South Korea)
by a recurrent artificial neural network. In: Recknagel, F.
(Ed.). Ecological Informatics. Springer-Verlag (in press).
Joo, G.J., Kim, H.W., Ha, K., Kim, J.K., 1997. Long-term
trend of the eutrophication of the lower Nakdong River.
Kor. J. Limnol. (Suppl.) 30, 472 /480.
Jørgensen, S.V., 1997. Integration of Ecosystem Theories: a
Pattern, 2nd ed. Kluwer Academic Publishers, Dordrecht, p.
388.
Kamp-Nielsen, L., 1978. Modelling the vertical gradients in
sedimentary phosphorus fractions. Verh. Int. Verein. Limnol. 20, 720 /727.
Kim, H.W., Ha, K., Joo, G.J., 1998. Eutrophication of the
lower Nakdong River after the construction of an estuarine
dam in 1987. Int. Rev. Hydrobiol. 83, 65 /72.
Kim, H.W., Joo, G.J., Walz, N., 2001. Zooplankton dynamics
in the hyper-eutrophic Nakdong River system (Korea)
regulated by an estuary dam and side channels. Int. Rev.
Hydrobiol. 86, 127 /143.
Koste, W., 1978. Rotatoria. Die Radertiere Mitteleuropes. Ein
Bestimmungswerk begrunder von Max Voigt. 2nd ed.
Borntrager, Stuttgart, Vol. 1, Textband 673 pp., vol. 2.
Tafelband p. 234.
Koza, J.R., 1992. Genetic Programming. On the Programming
of Computers by Means of Natural Selection. The MIT
Press, New York, p. 819.
Kromkamp, J., Walsby, A.E., 1990. A computer model of
buoyancy and vertical migration in cyanobacteria. J.
Plankton Res. 12, 161 /183.
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J.,
Aulagnier, S., 1996. Application of neural networks to
modelling nonlinear relationships in ecology. Ecol. Modelling 90, 39 /52.
Marker, A.F.H., Collett, G.D., 1997. Spatial and temporal
characteristics of algae in the River Great Ouse. I.
Phytoplankton. Regulated Rivers. Res. Manag. 13, 219 /
233.
Medsker, L.R., 1996. Microcomputer applications of hybrid
intelligent systems. J. Netw. Comput. Appl. 19, 213 /234.
Mitrovic, S.M., Hawkins, P.R., Bowling, L.C., Buckney, R.T.,
Cheng, D.H.M., 1999. Low nitrate concentrations in a
tidally mixed river allow replacement of green algae by the
cyanobacteria Microcystis . Verh. Int. Verein. Limnol. 27,
924 /929.
Moss, B., 1998. Ecology of Fresh Waters. Man and Medium,
Past to Future, 3rd ed.. Blackwell Science, Oxford, p. 557.
Paerl, H.W., 1987. Dynamics of blue-green algal (Microcystis
aeruginosa ) blooms in the lower Neuse River, North
Carolina: causative factors and potential controls. Water
Resources Research Institute of the University of North
Carolina, UNC-WRRI-87-229.
Recknagel, F., 1997. ANNA */artificial neural network model
for predicting species abundance and succession of blue /
green algae. Hydrobiologia 349, 47 /57.
78
K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78
Recknagel, F., Wilson, H., 2000. Elucidation and prediction of
aquatic ecosystems by artificial neuronal networks. In: Lek,
S., Guégan, J.F. (Eds.), Artificial Neuronal Networks.
Application to Ecology and Evolution. Springer-Verlag,
Berlin, pp. 143 /155.
Renshaw, E., 1991. Modelling Biological Populations in Space
and Time. Cambridge University Press, New York, p. 403.
Reynolds, C.S., 1984. The Ecology of Freshwater Phytoplankton. Cambridge University Press, New York, p. 384.
Reynolds, C.S., 1992. Algae. In: Calow, P., Petts, G.E. (Eds.),
The River Handbook. Hydrological and Ecological Principles, vol. I. Blackwell Scientific Publication, Oxford, pp.
195 /215.
Rose, M., Balbi, D., 1997. Rivers Nene and Great Ouse
eutrophication studies: final report. Environment Agency,
Peterborough, UK, 77 pp.
Round, F.E., Crawford, R.M., Mann, D.G., 1990. The
Diatoms. Cambridge University Press, New York, p. 747.
Smirnov, N.N., Timms, B.V., 1983. A revision of the Australian
Cladocera (Crustacea). Rec. Aust. Museum Suppl. 1, 1 /
132.
Sommer, U., Gliwicz, Z.M., Lampert, W., Duncan, A., 1986.
The PEG-model of seasonal succession of planktonic events
in fresh waters. Arch. Hydrobiol. 106, 433 /471.
Song, K.O., Park, H.Y., Park, C.G., 1993. Water quality
modeling in the Nakdnog river (I) */a study on the
characteristics of nutrients distribution. J. Kor. Soc. Water
Qual. 9, 41 /53.
Straskraba, M., 1994. Ecotechnological models for reservoir
water quality management. Ecol. Modelling 74, 1 /38.
Utermöhl, H., 1958. Zur vervollkommnung der quantitativen
phytoplankton. Methodik. Mitt. Verh. Int. Verein. Limnol.
9, 1 /38.
Webb, B.W., Walling, D.E., 1992. Water quality. II. Chemical
characteristics. In: Calow, P., Petts, G.E. (Eds.), The River
Handbook, vol. I. Blackwell Science Publications, Oxford,
pp. 73 /100.
Wetzel, R.G., 1983. Limnology, 2nd ed. Sunders College
Publishing, New York, p. 767.
Wetzel, R.G., Likens, G.E., 1991. Limnological Analyses, 2nd
ed. Springer-Verlag, New York, p. 391.
Whigham, P.A., 1995. Inductive Bias and Genetic Programming. Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA’95), pp. 461 /467.
Whigham, P.A., 2000. Induction of a marsupial density model
using genetic programming and spatial relationships. Ecol.
Modelling 131, 299 /317.
Whigham, P.A., Recknagel, F., 1999. Predictive modelling of
plankton dynamics in freshwater lakes using genetic programming. In: Oxley, L., Scrimgeour, F. (Eds.), International Congress on Modelling and Simulation, vol. 3. The
Modelling and Simulation Society of Australia and New
Zealand, Hamilton, New Zealand, pp. 679 /685.
Whigham, P.A., Recknagel, F., 2000. Evolving difference
equations to model freshwater phytoplankton. Proceeding
of the 2000 Congress on Evolutionary Computation, vol. 2.
pp. 967 /973.
Whigham, P. A. and Keukelaar, J., 2001. Evolving structureoptimizing content. Proceedings of the Congress on Evolutionary Computation 2001, pp. 1228 /1235.
Whigham, P., Schallenberg, M., Keukelaar, J., Box, K.,
McKitrick, P., 2001. Time-Series Toolbox (Ver. 1.4.2).
Yao, X., 1999. Evolutionary Computation Theory and Applications. World Scientific Publishing Co, Singapore, p. 360.
Zar, J.H., 1999. Biostatistical Analysis, 4nd ed.. Prentice-Hall,
New Jersey, p. 663.