Ecological Modelling 161 (2003) 67 /78 www.elsevier.com/locate/ecolmodel Modelling Microcystis aeruginosa bloom dynamics in the Nakdong River by means of evolutionary computation and statistical approach Kwang-Seuk Jeong a, Dong-Kyun Kim a, Peter Whigham b, Gea-Jae Joo a,* a Department of Biology, Pusan National University, Jang-Jeon Dong, Gum-Jeong Gu, Busan 609-735, South Korea b Department of Information Science, University of Otago, PO Box 56, Dunedin, New Zealand Received 15 January 2002; received in revised form 22 July 2002; accepted 31 July 2002 Abstract Dynamics of a bloom-forming cyanobacteria (Microcystis aeruginosa ) in a eutrophic river /reservoir hybrid system were modelled using a genetic programming (GP) algorithm and multivariate linear regression (MLR). The lower Nakdong River has been influenced by cultural eutrophication since construction of an estuarine barrage in 1987. During 1994 /1998, the average concentrations of nutrients and phytoplankton were: NO3 /N, 2.7 mg l 1; NH4 /N, 1 0.6 mg l 1; PO3 ; and chlorophyll a , 50.2 mg l 1. Blooms of M. aeruginosa occurred in summers when 4 /P, 34.7 mg l there were droughts. Using data from 1995 to 1998, GP and MLR were used to construct equation models for predicting the occurrence of M. aeruginosa . Validation of the model was done using data from 1994, a year when there were severe summer blooms. GP model was very successful in predicting the temporal dynamics and magnitude of blooms while MLR resulted rather insufficient predictability. The lower Nakdong River exhibits reservoir-like ecological dynamics rather than riverine, and for this reason a previous river mechanistic model failed to describe uncertainty and complexity. Results of this study suggest that an inductive-empirical approach is more suitable for modelling the dynamics of bloom-forming algal species in a river /reservoir transitional system. # 2002 Elsevier Science B.V. All rights reserved. Keywords: Genetic programming; Multivariate linear regression; Microcystis aeruginosa ; Algal blooms; Ecological modelling; Nakdong River 1. Introduction A comprehensive understanding of ecosystem dynamics requires numerous approaches. Ecologi- * Corresponding author. Tel.: /82-51-510-2258; fax: /8251-581-2962. E-mail address: [email protected] (G.-J. Joo). cal modelling is a good solution for this purpose when it contains adequate data and expressions for the system of interest (Straskraba, 1994). To interpret data or forecast ecological conditions, researchers traditionally select deductive mathematical or statistical models. Due to the difficulties involved in solving complex interactions among diverse variables and parameters, sophisticated machine learning techniques [e.g. artificial neural 0304-3800/03/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 2 8 0 - 6 68 K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 networks (ANNs) and evolutionary computation (EC)] recently have been applied in ecological modelling (Lek et al., 1996; Recknagel, 1997; Fielding, 1999; Whigham, 2000). Genetic programming (GP) is a technique derived from evolutionary computation, originally based on evolving variable-length LISP programs (Koza, 1992). GP is a population-based search that evolves tree-like structures that represent functions, equations, or programs. GP performs generational evolution of the solution candidates to find the best solution for a certain problem from the solution space (Banzhaf et al., 1998). This is achieved by applying various search operators such as crossover (swapping sub-trees between parents) and mutation (randomly recreating a subtree of a parent). The inductive stochastic approach of GP, combined with few assumptions regarding the form or limitations of developed models, shows some advantages over other approaches in modelling freshwater ecosystems. Of the various freshwater resources, river systems are particularly impacted by human activities, frequently displaying cultural eutrophication. This sometimes is exacerbated by regulation of water flow (Moss, 1998). Extended retention time as well as excessive nutrient loads can result in severe blooms of blue /green algae in rivers. Algal blooms are stimulated by various circumstances so that it is difficult to develop a fixed model, which considers all possible situations (Recknagel, 1997; Jeong, 2000). Genetic programming can search for suitable variables as well as their interactions by evaluating the underlying data for significant patterns. This allows models to be developed that consider the behavior of algal blooms in each specific environment. The lower Nakdong River has exhibited severe blue /green algal blooms in hot summer months (Ha, 1999). Acceleration of eutrophication due to the construction of the barrage at the river mouth, coupled with high nutrient loads caused this situation (Joo et al., 1997). Some modelling efforts have attempted to predict the blooms, but they considered mainly water quality, and did not provide an ecosystem perspective in this river. In this study, time-series dynamics of Microcystis aeruginosa blooms were modelled using an extended GP technique, specifically designed for time-series models. The empirical database from the lower Nakdong River was used to evolve the best model to predict the time-series changes of M. aeruginosa . By varying model inputs in a forecasting mode, this model can be used in water quality and ecosystem management applications. To compare the capability of GP modelling, multivariate linear regression (MLR) model was also constructed, and time-series prediction between them was evaluated. The present study provides a good example of application of GP to a river/reservoir hybrid system. 2. Description of the study site The Nakdong River basin is situated in the monsoon climate of South Korea (35 378 N, 127/ 1298 E) (Fig. 1). South Korea experiences four distinct seasons, and is characterized by heavy rainfall during the monsoon period and several typhoon events. The annual mean rainfall across the Nakdong River basin is about 1200 mm, and more than 50% of the total amount is concentrated during the hot summer months (June /August). The annual mean water temperature at the study site was 13.7 8C. The mean water temperature was 2.2 8C during the coldest month (January), and 25.9 8C in the warmest (August). The main channel of the river is 521.5 km long, and the catchment area occupies about 25% of the whole country, covering 23,635 km2. The Mulgum station of the Nakdong River, from which data for the model were collected, is situated 27.4 km upstream of the estuarine dam at the river mouth. It has a maximum water depth of /11 m, a mean depth of /4 m, and a river width of 250/300 m. A degradation of river water quality and a loss of riparian zone have occurred in the last three decades due to urban development and high water demand (Joo et al., 1997). The Nakdong River has four multi-purpose dams and an estuarine barrage for preventing salt-water intrusion. Over 10 million people depend on the river for drinking water, and it also is a source of agricultural and industrial water supply. Physical alterations combined with K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 Fig. 1. Map of the study site. , multi-purpose dams; , estuarine barrage; , rainfall gauging stations; , river study site (Mulgum, RK 27). sewage input have accelerated eutrophication of the lower part of the river (Kim et al., 1998). 3. Materials and methods 3.1. Limnological data collection Precipitation data were obtained from five representative meteorological stations within the Nakdong River basin (Andong, Daegu, Hapchun, Jinju, and Miryang) from 1994 to 1998. River flow data were obtained from the Flood Control Center. Irradiance, wind velocity, and evaporation data were collected from the Busan Meteorological 69 Station, which is the nearest station to the study site. Weekly water samples were collected at 0.5 m depth at the river site, and the following limnological parameters were measured: temperature, Secchi transparency, pH, turbidity, concentrations of dissolved oxygen (DO), nitrate (NO3 /N), ammonia (NH4 /N), phosphate (PO3 4 /P), silica (SiO2) and chlorophyll a, phytoplankton biovolume, and zooplankton abundance. Water temperature and DO were determined with a YSI model 58 meter. Transparencies were determined using a 20 cm Secchi disk. An Orion model 250A meter was used to measure pH, and turbidity (NTU) was measured by a model 11052 turbidimeter. Water samples were filtered using 0.45 mm Whatman GF/C filters to determine nutrient concentrations. Filtrates were frozen and analyzed by a QuikChem Automated Ion Analyzer (NO3 / N, no. 10-107-04-1-O; NH4 /N, no. 10-107-06-1B; PO3 4 /P, no. 10-115-01-1-B; SiO2, no. 10-11427-1-A). Chlorophyll a concentrations were determined spectrophotometrically after extraction according to Wetzel and Likens (1991). Upon collection, phytoplankton was immediately preserved with Lugol’s solution. Identification of species was conducted with a Nikon light microscope (/1000), using the following keys: Foged (1978), Cassie (1989) and Round et al. (1990). Phytoplankton was enumerated using an inverted microscope (ZEISS, /400) by the Utermöhl (1958) sedimentation method. Biovolumes of individual species were estimated from mean cell dimensions and the cellular shape of each species as described in Wetzel and Likens (1991). Individual cell volumes of 10 /25 cells were measured to calculate mean species biovolume. Zooplankton was collected from 0.5-m depth using a 3.2-l Van Dorn water sampler until a total of 8 l of water was obtained. This water was filtered through a 35-mm net, and the retained zooplankton was preserved with 10% formalin (4% final concentration). Macrozooplankton (almost exclusively Copepoda and Cladocera) was counted with an inverted microscope at /25 /50 magnification. Microzooplankton (mostly Rotifera) was counted with an inverted microscope at /100 / 400 magnification. Identifications of zooplankton 70 K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 taxa were made to genus or species level (except for juvenile Copepoda) using Koste (1978), Smirnov and Timms (1983) and Einsle (1993). 3.2. GP and MLR for the prediction of M. aeruginosa blooms The algal bloom model was developed using the time-series optimization genetic programming (TSOGP) system (Whigham and Keukelaar, 2001) (Fig. 2), a component of a TimeSeries Toolbox solution developed at the University of Otago. TSOGP is a grammar-based extension of GP (Koza, 1992) that evolves candidate solutions to a problem by using a population-based search method (Holland, 1992). Solutions are evolved by mixing and mutating selected individuals to create new populations, where selection is driven by the fitness of each candidate and therefore mimics aspects of Darwinian Selection. The TSOGP allows both the constants and the independent variables within an evolving equation to be tuned to yield the best prediction for a dependent state variable, based on a language defined by a context-free grammar. The grammar expresses the form of the language (i.e. the functions, operators, and their structure) that is used to express the candidate solutions during model construction (Whigham, 1995). Individuals created by the grammar can be represented as a derivation tree that has a certain depth based on the number of productions used to construct the tree. This depth can be used to limit the complexity of the Fig. 2. Basic steps of TSOGP (modified from Whigham and Keukelaar, 2001). candidate solutions, and therefore gives some control over the generalization of candidate solutions. The evolutionary approach allows a large search-space, defined by the grammar, to be explored in an efficient manner, to discover nearoptimal solutions to the modelling problem defined by the user. An introduction to evolutionary computation techniques may be found in Goldberg and Holland (1988), Goldberg (1989), Fogel (1998) and Yao (1999). The limnological variables investigated in this study were used to evolve equations predicting time-series changes of M . aeruginosa . The space of possible equations was defined by the following context-free grammar, that allowed linear combinations of the variables and random constants to be expressed. Note that this grammar does not bias the selection of variables or functions that are used to construct the predictive equation. Expression 0 ExpressionExpression j Expression + Expression Expression 0 Expression=Expression j ExpressionExpression Expression 0 ln(Expression) Expression 0 Variable Expression 0 Constant Equation discovery was performed using data from 1995 to 1998, while 1994 data were used for model validation. Meteorological (wind velocity), hydrological (rainfall, evaporation, and discharge), physico-chemical (turbidity, water temperature, Secchi depth, DO, pH, and nutrient concentrations), and biological (Rotifera, Cladocera, Copepoda, Anabaena flos-aquae , Oscillatoria limosa , and Stephanodiscus hantzschii ) data were used. Chlorophyll a was excluded from the training data to avoid autocorrelation with M. aeruginosa . Weekly data were smoothly interpolated to a daily time-step to satisfy both: (1) matching the scale of daily averaged and weekly sampled data; and (2) applicability of the developed model to water management on a daily basis. The GP algorithm evolved a 1-day forecast model by having the independent input data lag by 1 day from the prediction. In general, this n - K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 day-ahead input vector can be used to give future predicting capacity to the developed model (Recknagel, 1997). Current algal biovolume was, thus, calculated from day-before input values. The depth of the evolving equation tree in TSOGP was fixed at 3, to ensure that the constructed models were not overly specific. The mathematical operators defined by the grammar were simply ‘plus, minus, multiply, division, and natural logarithm’. To find the best-predicting equation, various cases of crossover and mutation rates were considered. In the case of crossover rate, six cases from 70 to 95% (5% interval) were adapted. With a 1% interval, mutation rate varied from 1 to 5%. By combining both parameters, a total of 30 cases could be experimented, and each experiment having 10 replicates (total 300 replicates). This was required since evolutionary computation systems can be sensitive to initial conditions and the search characteristics defined by these parameters. Model selection was based on root mean squared error (RMSE) during the evolution, and among 300 equations, the best-predicting equation was chosen by comparing the predicted and observed values of M. aeruginosa . With the best performing equation, two types of sensitivity analyses were implemented [‘most influencing parameter (MIP)’ and ‘sensitivity on wide-ranged disturbance (SWD)]’, as applied by Jeong et al. (2001a). The model was disturbed by 9/1 standard deviation (SD) for the sensitivity analyses. According to Zar (1999), 9/1 S.D. represents common variation in a population, and 9/1.96 S.D. covers about 95% of total data. The sensitivity analysis with 9/1 S.D. can explain general circumstances of interactions between algal species and input variables. The results of sensitivity analyses were interpreted compared with known ecological information. The model was developed by means of the GP shell time-series toolbox (Whigham et al., 2001). MLR modelling was achieved to compare the predictability between linear modelling, with the same modelling solution. Usually this algorithm is used to analyze relationship among more than two independent variables and a dependent variable 71 (Zar, 1999). Model equation of MLR could be dictated as the following: Yi ab1 X1i b2 X2i bj Xji o i (1) where Yi is dependent variable and X1i , X2i ,. . . are independent variables. o i is an error term of this equation. The criterion for defining the ‘best fit’ multiple regression equation is most commonly that of least squares, which results in the regression equation with the minimum residual sum of squares, i.e. as the following: n X (Yi Ŷ i )2 (2) i1 Regression methods could be utilized in various ways (see Renshaw, 1991), both experimental as well as field-surveyed data. From the comparison of TSOGP and MLR, the performance of empirically searching algorithm can be evaluated. The same input variables used in GP modelling were adopted to MLR, and time-series prediction with the produced equation was done. On the same scale of time, results of both models were compared with observed data. 4. Results 4.1. Limnological aspects Most limnological data from the lower Nakdong River exhibited distinct inter-annual variability (Table 1). Most physico-chemical parameters were related to rainfall amount in a certain year except Secchi depth and DO. For example, water temperature, turbidity, conductivity, alkalinity, and nutrient concentrations varied according to the fluctuation of total annual rainfall. In the case of turbidity, a high value was observed in both 1994 and 1998, when there was algal proliferation and high rainfall runoff. Plankton communities displayed complex annual variability. Rotifera dominated the zooplankton during the study period, with a maximum in 1997. Blue/greens, including M. aeruginosa , A. flos-aquae, and O. limosa , increased during drought years (1994 /1996). Chlorophyll a con- K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 72 Table 1 The limnological characteristics of the lower Nakdong River for five years (1994 /1998) Division Parameters Unit Mean9/S.D. 5 years’ Meteorological Hydrological Physical Chemical Biological Irradiance Wind velocity Precipitation Discharge Evaporation Water temperature Secchi depth Turbidity pH DO Conductivity Alkalinity Nitrate-N Ammonia-N Phosphate-P Silica Rotifera Cladocera Copepoda M. aeruginosa. A. flos-aquae O. limosa S. hantzschii Chlorophyll a MJ m 2 day 1 m s 1 mm day 1 CMS mm day 1 8C cm NTU mg l 1 ms cm 1 mg CaCO3 l 1 mg l 1 mg l 1 mg l 1 mg l 1 ind. l 1 ind. l 1 ind. l 1 /106 mm3 ml 1 /106 mm3 ml 1 /106 mm3 ml 1 /106 mm3 ml 1 mg l 1 1994 1995 1996 1997 1998 12.89/6.5* 3.99/1.4 9749/306 5679/714 39/2 179/9 149/7 3.99/1.4 765 3999/79 49/2 209/10 149/6 4.09/1.3 841 4669/358 39/2 169/10 129/6 3.89/1.2 1007 4889/480 39/2 179/10 139/6 3.99/1.5 1352 6869/825 39/2 189/9 129/6 3.89/1.4 1670 7949/1184 39/1 179/8 749/25 189/54 8.49/0.8 10.89/4.0 3499/128 579/17 2.79/1.0 0.69/0.7 34.79/25.2 4.39/3.8 16449/3250 919/311 609/151 2.849/12.34 0.569/1.41 0.699/1.66 15.109/24.14 50.29/91.5 729/22 209/64 8.79/0.9 9.99/3.8 3129/92 559/13 1.89/0.9 0.39/0.3 33.19/22.1 3.69/2.3 12419/2086 259/58 239/43 5.349/18.81 0.549/0.93 1.089/1.19 12.979/26.74 84.79/178.5 759/20 129/35 8.39/0.6 11.49/3.6 4059/118 669/13 2.59/1.0 0.89/0.8 34.39/25.2 2.69/2.8 12859/1764 2019/588 659/147 1.429/3.08 0.879/1.78 1.119/2.71 17.249/29.42 65.59/74.7 749/22 99/9 8.49/0.7 11.99/3.9 3969/114 679/13 2.39/1.0 0.79/0.6 20.59/15.2 3.09/2.3 10219/1274 719/176 439/67 3.669/11.51 0.809/1.95 0.689/1.29 20.899/27.11 48.59/49.2 749/32 199/38 8.59/0.8 10.29/4.5 3749/146 589/17 3.39/0.8 0.39/0.3 32.79/23.0 4.69/4.2 30469/5713 799/140 1099/251 3.649/15.82 0.599/1.34 0.539/1.67 10.229/22.88 37.59/80.6 749/23 279/91 8.09/0.8 10.59/3.4 2509/76 419/9 3.29/0.5 0.89/1.0 52.89/27.9 7.59/4.4 13049/1747 309/61 369/62 0.159/0.39 0.019/0.03 0.079/0.18 9.509/11.48 28.09/26.4 * Mean9/S.D., 5 years’ data (n/263; 52 /53 in each year). centration had its highest annual average in 1994 when the three cyanobacteria species severely proliferated. 4.2. Equation discovery and model performance Genetic programming successfully achieved equation-discovery for the prediction of timeseries changes of M. aeruginosa in the river system, with 1-day-ahead data input. The RMSE of the developed model was less than 0.001. Various input parameters were selected during the evolution of equations (Fig. 3), and A . flos-aquae , turbidity, and silica concentration frequently were used. In particular, A. flos-aquae was included in all 300 equations, indicating an underlying relationship with the dependent variable. Variable selectivity for nutrient concentration was higher than for meteorological and hydrological parameters. Zooplankton was less used, and other algal species except A. flos-aquae were relatively infrequent. The best-predicting model equation had a 75% crossover rate and 5% mutation rate. Eq. (3) is the simple model: Microcystis aeruginosa(t1) (0:49293Sec(t)0:50707Ana(t)) (0:83014Eva(t)0:16986Turb(t)) (3) where Sec is Secchi depth, Ana is the biovolume of A . flos-aquae , Eva is evaporation, and Turb is turbidity. Both the variables and their constants were simultaneously tuned during model evolution. Compared with the observed time-series changes, the prediction of M. aeruginosa fit quite K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 73 Fig. 3. Variable selection during evolution of GP model. Wind, wind velocity; Rain, rainfall; Dis, discharge; Eva, evaporation; WT, water temperature; Sec, Secchi depth; Turb, turbidity; DO, dissolved oxygen; NO3, nitrate; NH4, ammonia; PO4, phosphate; SiO2, dissolved silica; Rot, Rotifera; Cla, Cladocera; Cop, Copepoda; Ana, A. flos-aquae ; Osc, O. limosa ; Ste, S. hantzschii . well (Fig. 4A), with the timing and magnitude of bloom being well represented. Although a slight over-estimation occurred during April and June, the highest peak was effectively modelled by the equation. Among the four input variables, A. flos-aquae had the most influence on the time-series changes of M. aeruginosa (Fig. 4B), followed by turbidity and evaporation. While Secchi depth had almost no influence, turbidity (NTU) was highly related to the output calculation. The result of SWD analysis indicated a linear relationship between input values and output (Fig. 4C). Apart from Secchi depth, the other three variables had positive effects on M. aeruginosa . MLR induced an equation with the whole input variables (Eq. (4)). Prediction on time series showed that this model is rather unsatisfactory (regression coefficient r for test data was 0.08 versus GP accuracy of 0.77) on the accuracy, even though the timing of peaks was relatively correct. (Fig. 4A). The model produced negative values on the prediction. Microcystis 47223:339Rain0:009Wind 585821:152Eva3:057105 Dis 0:0111Sec13054:426Turb85020:205 WT1102600:058pH 167271:631DO0:0285NO3 2124:290 NH4 0:001PO4 62798:045SiO84:270Rot8:766Cla 0:0002Cop2:377Ana 1:326Osc1:585Ste (4) 74 K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 trophication is largely influenced by flow regulation and nutrient loadings (Webb and Walling, 1992). Joo et al. (1997) suggested that eutrophication in the lower Nakdong River was mainly due to the regulation of water flow. The construction of an estuarine barrage in 1987 had a synergistic effect with nutrient inputs, due to increased water retention time with the slower slope of the river bed (approximately 1:380) (Song et al., 1993; Heo et al., 1995). Intense regulation of flow at the multi-purpose dams and the estuarine dam at the river mouth have further contributed to the accelerated eutrophication and complex behavior of the river ecosystem (e.g. grazing impacts of zooplankton on phytoplankton causing ‘clear water phase’) (Jeong, 2000; Kim et al., 2001; Jeong et al., 2001a). Blue/green algal proliferation is an unusual phenomenon in flowing waters, and this could be the result of complicated ecological interactions. Blue /greens rarely occur in streams and rivers except in pool-like reaches (Reynolds, 1992). Ha (1999) reported the occurrence of Microcystis blooms in the lower Nakdong River, and a distinct vertical distribution of Microcystis was observed (Ha et al., 2000). Recently this genus has been reported in other rivers, including the Great Ouse (UK), Neuse (USA) and Hawkesbury (Australia) (Paerl, 1987; Marker and Collett, 1997; Rose and Balbi, 1997; Mitrovic et al., 1999). Ha et al. (1999) suggested that intensive water flow regulation is one of the most common causes of the cyanobacteria blooms. 5.2. Ecological modelling of algal dynamics Fig. 4. Result of prediction and sensitivity analysis with the GP model. (A) Time-series prediction; (B) sensitivity of MIP analysis; (C) result of SWD analysis. 5. Discussion 5.1. Limnology of the lower Nakdong River The lower Nakdong River displayed reservoirlike characteristics, and its nutrient and chlorophyll a concentrations indicated a eutrophic situation according to Wetzel (1983). River eu- The proliferation of particular algal taxa in river systems is a complicated problem to solve using deductive approaches. The unique characteristic of flow, which distinguishes rivers from other freshwater ecosystems, typically governs patterns of plankton dynamics (Reynolds, 1984, 1992). Although great efforts have been undertaken in modelling algal blooms in diverse ways (e.g. Kamp-Nielsen, 1978; Reynolds, 1984; Sommer et al., 1986; Kromkamp and Walsby, 1990), they were mainly from deterministic or heuristic methodologies. Controlled flow may change normal K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 conditions and situations of river ecology, and this can result in poor or incorrect model performance (Jeong et al., 2001a). Data-driven inductive methods are thus feasible for developing predictive and elucidative models for certain phenomena, because they can incorporate all specific contributions to the dependent output variable. Genetic programming has been shown as a technique that produces good model development on time-series plankton dynamics. There are other previous examples of applying GP to freshwater algal dynamics (e.g. Whigham and Recknagel, 1999, 2000). Their results were mainly from lakes, where they demonstrated that time-series prediction of blue /green algae could be successfully achieved. Similar to their findings, the present results support use of evolutionary computation approaches to a river/reservoir ecosystem. 5.3. Model performance Inductive modelling produced a good performance against general statistical model. For many cases MLR can be used to approximate solutions on time-series dataset. In the lower Nakdong River, MLR failed to produce an accurate prediction. Jeong et al. (2001a,b) and Jeong et al. (2002) emphasized during neural network model developments that the limnological phenomena in the river are much complicated to analyze with traditional methods. Furthermore, the number of selected variables in GP is simpler than MLR, which may encourage the application of GP to ecosystem easily. Time-series prediction with an n -day-ahead vector was successfully achieved in this study. All input data were fed to the algorithm, and its predictability was good. This type of prediction was done in Recknagel (1997) and Recknagel and Wilson (2000) for neural network models. Most water quality management uses mechanistic models, and this type of generalized model is able to approximate environmental changes. However, their capacity is partially limited due to factors such as geological and geographic difference, nonlinearity and uncertainty, and especially the absence of biological information. Machine learning techniques, including GP with n -day-ahead pre- 75 diction, can easily encourage the development and utilization of inductive models. These approaches fulfill the general requirements of a management strategy, such as cost-benefit efficiency and accurate prediction. 5.4. Evolutionary computation for ecological modelling From several decades the interdisciplinary research between mathematics and ecology has encouraged analytical and simulative approaches on ecological dynamics. Great efforts of developing equation models on algal dynamics could be consulted for this purpose (see Dillon and Rigler, 1974; Bierman, 1976). Usually the deductive approach produced reductionistic models based on existing theories and knowledge which enable users to simulate rather to predict behavior (Recknagel, 1997; see Dzeroski et al., 1999). Also reductionisms may cause emission of information which could be important to explain specific features. Costanza and Sklar (1985) emphasized less relationship between larger articulations (used variables in models) and model effectiveness. On the contrary too many variables caused deterioration of model performance. Ecological and environmental models are themselves simplified representations of real nature so that the right complexity, adequate components and processes selection are most important for the problem in focus (Jørgensen, 1997). The empirical model of TSOGP in this study might satisfy this point of view. The characteristics of TSOGP may become one of reasonable solver on ecological modelling. Genetic programming is as itself a derivative of GA so that it is based on global search of the best solution. This has advantages of preventing local minima exploration (Banzhaf et al., 1998). Also TSOGP could select articulations and their parameter as well, which is so-called ‘automated model construction’ according to intra-relationship in data. Equation discovery to ecological modelling was utilized in various directions indeed, and systems of discovery algorithms have been developed continuously. Dzeroski et al. (1999) summarized algorithms of equation discovery, and 76 K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 emphasized the goal purpose of these algorithms */model reconstruction and information search. Equation searching of TSOGP achieved in this study could satisfy criteria suggested by Dzeroski et al. (1999), and had some more advantages indicated above. Good performance of the equation model developed by TSOGP in this study might be explained in this perspective. Compared with a neural network, GP used a smaller number of articulates in the developed model. In the lower Nakdong River, Jeong et al. (2001a) used 16 variables to predict time-series changes of algal biomass with a time-delayed recurrent neural network. In this study, although 19 variables were selected as training data, only four variables were required to predict cyanobacteria biovolume with a high accuracy. In general, neural network models are good tools for classification and prediction of ecological data (Chon et al., 1996; Recknagel, 1997). However, compared with neural networks, inductive equation models have advantages in terms of the types of expressions that can be explored (equations, rules, etc), and the fact that the results can be interpreted as proper predictive equations. Ensembles among EC */including TSOGP */ and other informative systems encouragingly spread nowadays, and capacity of ecological modelling is expanded. For instance, ANN architecture and its weights can be determined by EC, which is so-called ‘evolving neural networks (ENNs)’. Also Medsker (1996) summarized various possibilities of computational works and those can be suitably adapted to ecological researches. If interdisciplinary efforts and accurate ecological data acquisition are available, more information we have not seen can be found and, feasible modelling technique for specific and complicated features must be possible as well. 6. Conclusion Mining dynamic and complicated ecological data require a suitable methodology to obtain a satisfactory result. The increasing amount of data that has become available today for aquatic ecosystems can support the basic requirements of machine learning techniques. Models derived from such techniques can address the changing environments that occur due to human intervention. Machine learning approaches such as GP are good tools for this purpose, and suitable ecological models derived from these approaches, based on accumulated datasets, can give valuable insights into ecosystem function and behavior. Results of this study suggest that the evolutionary computation was suitable for modelling the dynamics of bloom-forming algal species in a river/reservoir transitional system. Acknowledgements The authors are grateful to Dr. H.W. Kim of Sunchon National University, and Dr. K. Ha of Pusan National University (PNU) for providing plankton community data. We also thank Mr. S.B. Park, Mr. J.S. Kim, and Mr. J.G. Kim of PNU for assistance in the field. We also are indebted to Dr. Friedrich Recknagel of the University of Adelaide for paying warm attention during the preparation of this article. This study was financially supported by the Institute of Environmental Technology and Industry (IETI) (project no. 01-10-99-01-A-1). This is a contribution no. 28 of the Nakdong River Ecosystem Study in Limnology Lab., PNU. References Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D., 1998. Genetic Programming. On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann Publishers, California, p. 470. Bierman, V.J., 1976. Mathematical model of the selective enhancement of blue /green algae by nutrient enrichment. In: Canale, R.P. (Ed.), Modeling Biochemical Processes in Aquatic Ecosystems. Ann Arbor Science Publishers, Ann Arbor, MI, pp. 1 /32. Cassie, V., 1989. A Contribution to the Study of New Zealand Diatoms. Cramer, Berlin, p. 266. Chon, T.S., Park, Y.S., Moon, K.H., Cha, E.Y., 1996. Patternizing communities by using an artificial neural network. Ecol. Modelling 90, 69 /78. K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 Costanza, R., Sklar, F.H., 1985. Articulation, accuracy and effectiveness of mathematical models: a review of freshwater wetlands applications. Ecol. Modelling 27, 45 /69. Dillon, P.J., Rigler, F.H., 1974. The phosphorus /chlorophyll relationship in lakes. Limnol. Oceanogr. 19, 767 /773. Dzeroski, S., Todorovski, L., Bratko, I., Kompare, B., Krizman, V., 1999. Equation discovery with ecological applications. In: Fielding, A.H. (Ed.), Machine Learning Methods for Ecological Applications. Kluwer Academic Publishers, Massachusetts, pp. 185 /208. Einsle, U., 1993. Crustacea, Copepoda, Calanoidia and Cyclopoida. Susswasserfauna von Mitteleuropa, Part 4-1, vol. 8. Fisher, Stuttgart, p. 208. Fielding, A.H., 1999. An introduction to machine learning methods. In: Fielding, A.H. (Ed.), Machine Learning Methods for Ecological Applications. Kluwer Academic Publishers, Massachusetts, pp. 1 /35. Foged, E., 1978. Diatoms in Eastern Australia. Cramer, Berlin, p. 243. Fogel, D., 1998. Evolutionary Computation: The Fossil Record. IEEE Press, Piscataway, NJ, p. 641. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York, p. 412. Goldberg, D.E., Holland, J.H., 1988. Genetic algorithms and machine learning. Machine Learn. 3 (2 /3), 95 /99. Ha K., 1999. Phytoplankton community dynamics and Microcystis bloom development in a hypertrophic river (Nakdong River, Korea). PhD dissertation. Pusan National University, Busan, p. 140. Ha, K., Cho, E.A., Kim, H.W., Joo, G.J., 1999. Microcystis bloom formation in the lower Nakdong River, South Korea: importance of hydrodynamics and nutrient loading. Mar. Freshwater Res. 50, 89 /94. Ha, K., Kim, H.W., Jeong, K.S., Joo, G.J., 2000. Vertical distribution of Microcystis population in the regulated Nakdong River. Kor. J. Limnol. 1, 225 /230. Heo, W.M., Kim, B.C., Hwang, G.S., Choi, K.S., Park, W.K., 1995. The distributions of phosphorus, nitrogen, and chlorophyll a concentration in the Nakdong River. Kor. J. Limnol. 28, 175 /181. Holland, J.H., 1992. Adaptation in Natural and Artificial Systems, 2nd ed.. MIT Press, New York, p. 211. Jeong, K.S., 2000. Statistical evaluation and application of artificial neural networks on water quality of the lower Nakdong River. MSc thesis, Pusan National Univeristy, Busan, p. 74. Jeong, K.S., Joo, G.J., Kim, H.W., Ha, K., Recknagel, F., 2001a. Prediction and elucidation of algal dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network. Ecol. Model. 146, 115 /129. Jeong, K.S., Jang, M.H., Park, S.B., Cho, G.I. And Joo, G.J., 2001b. Neuro-genetic learning to the algal dynamics: a preliminary experiment for the new technique to the ecological modeling. Proceeding of the Korean Environmental Science Society, pp. 234 /235. 77 Jeong, K.S., Recknagel, F.S. Joo, G.J., 2002. Prediction and elucidation of population dynamics of a blue-green alga (Microcystis aeruginosa ) and diatom (Stephanodiscus hantzchii ) in the Nakdong River-Reservoir System (South Korea) by a recurrent artificial neural network. In: Recknagel, F. (Ed.). Ecological Informatics. Springer-Verlag (in press). Joo, G.J., Kim, H.W., Ha, K., Kim, J.K., 1997. Long-term trend of the eutrophication of the lower Nakdong River. Kor. J. Limnol. (Suppl.) 30, 472 /480. Jørgensen, S.V., 1997. Integration of Ecosystem Theories: a Pattern, 2nd ed. Kluwer Academic Publishers, Dordrecht, p. 388. Kamp-Nielsen, L., 1978. Modelling the vertical gradients in sedimentary phosphorus fractions. Verh. Int. Verein. Limnol. 20, 720 /727. Kim, H.W., Ha, K., Joo, G.J., 1998. Eutrophication of the lower Nakdong River after the construction of an estuarine dam in 1987. Int. Rev. Hydrobiol. 83, 65 /72. Kim, H.W., Joo, G.J., Walz, N., 2001. Zooplankton dynamics in the hyper-eutrophic Nakdong River system (Korea) regulated by an estuary dam and side channels. Int. Rev. Hydrobiol. 86, 127 /143. Koste, W., 1978. Rotatoria. Die Radertiere Mitteleuropes. Ein Bestimmungswerk begrunder von Max Voigt. 2nd ed. Borntrager, Stuttgart, Vol. 1, Textband 673 pp., vol. 2. Tafelband p. 234. Koza, J.R., 1992. Genetic Programming. On the Programming of Computers by Means of Natural Selection. The MIT Press, New York, p. 819. Kromkamp, J., Walsby, A.E., 1990. A computer model of buoyancy and vertical migration in cyanobacteria. J. Plankton Res. 12, 161 /183. Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S., 1996. Application of neural networks to modelling nonlinear relationships in ecology. Ecol. Modelling 90, 39 /52. Marker, A.F.H., Collett, G.D., 1997. Spatial and temporal characteristics of algae in the River Great Ouse. I. Phytoplankton. Regulated Rivers. Res. Manag. 13, 219 / 233. Medsker, L.R., 1996. Microcomputer applications of hybrid intelligent systems. J. Netw. Comput. Appl. 19, 213 /234. Mitrovic, S.M., Hawkins, P.R., Bowling, L.C., Buckney, R.T., Cheng, D.H.M., 1999. Low nitrate concentrations in a tidally mixed river allow replacement of green algae by the cyanobacteria Microcystis . Verh. Int. Verein. Limnol. 27, 924 /929. Moss, B., 1998. Ecology of Fresh Waters. Man and Medium, Past to Future, 3rd ed.. Blackwell Science, Oxford, p. 557. Paerl, H.W., 1987. Dynamics of blue-green algal (Microcystis aeruginosa ) blooms in the lower Neuse River, North Carolina: causative factors and potential controls. Water Resources Research Institute of the University of North Carolina, UNC-WRRI-87-229. Recknagel, F., 1997. ANNA */artificial neural network model for predicting species abundance and succession of blue / green algae. Hydrobiologia 349, 47 /57. 78 K.-S. Jeong et al. / Ecological Modelling 161 (2003) 67 /78 Recknagel, F., Wilson, H., 2000. Elucidation and prediction of aquatic ecosystems by artificial neuronal networks. In: Lek, S., Guégan, J.F. (Eds.), Artificial Neuronal Networks. Application to Ecology and Evolution. Springer-Verlag, Berlin, pp. 143 /155. Renshaw, E., 1991. Modelling Biological Populations in Space and Time. Cambridge University Press, New York, p. 403. Reynolds, C.S., 1984. The Ecology of Freshwater Phytoplankton. Cambridge University Press, New York, p. 384. Reynolds, C.S., 1992. Algae. In: Calow, P., Petts, G.E. (Eds.), The River Handbook. Hydrological and Ecological Principles, vol. I. Blackwell Scientific Publication, Oxford, pp. 195 /215. Rose, M., Balbi, D., 1997. Rivers Nene and Great Ouse eutrophication studies: final report. Environment Agency, Peterborough, UK, 77 pp. Round, F.E., Crawford, R.M., Mann, D.G., 1990. The Diatoms. Cambridge University Press, New York, p. 747. Smirnov, N.N., Timms, B.V., 1983. A revision of the Australian Cladocera (Crustacea). Rec. Aust. Museum Suppl. 1, 1 / 132. Sommer, U., Gliwicz, Z.M., Lampert, W., Duncan, A., 1986. The PEG-model of seasonal succession of planktonic events in fresh waters. Arch. Hydrobiol. 106, 433 /471. Song, K.O., Park, H.Y., Park, C.G., 1993. Water quality modeling in the Nakdnog river (I) */a study on the characteristics of nutrients distribution. J. Kor. Soc. Water Qual. 9, 41 /53. Straskraba, M., 1994. Ecotechnological models for reservoir water quality management. Ecol. Modelling 74, 1 /38. Utermöhl, H., 1958. Zur vervollkommnung der quantitativen phytoplankton. Methodik. Mitt. Verh. Int. Verein. Limnol. 9, 1 /38. Webb, B.W., Walling, D.E., 1992. Water quality. II. Chemical characteristics. In: Calow, P., Petts, G.E. (Eds.), The River Handbook, vol. I. Blackwell Science Publications, Oxford, pp. 73 /100. Wetzel, R.G., 1983. Limnology, 2nd ed. Sunders College Publishing, New York, p. 767. Wetzel, R.G., Likens, G.E., 1991. Limnological Analyses, 2nd ed. Springer-Verlag, New York, p. 391. Whigham, P.A., 1995. Inductive Bias and Genetic Programming. Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA’95), pp. 461 /467. Whigham, P.A., 2000. Induction of a marsupial density model using genetic programming and spatial relationships. Ecol. Modelling 131, 299 /317. Whigham, P.A., Recknagel, F., 1999. Predictive modelling of plankton dynamics in freshwater lakes using genetic programming. In: Oxley, L., Scrimgeour, F. (Eds.), International Congress on Modelling and Simulation, vol. 3. The Modelling and Simulation Society of Australia and New Zealand, Hamilton, New Zealand, pp. 679 /685. Whigham, P.A., Recknagel, F., 2000. Evolving difference equations to model freshwater phytoplankton. Proceeding of the 2000 Congress on Evolutionary Computation, vol. 2. pp. 967 /973. Whigham, P. A. and Keukelaar, J., 2001. Evolving structureoptimizing content. Proceedings of the Congress on Evolutionary Computation 2001, pp. 1228 /1235. Whigham, P., Schallenberg, M., Keukelaar, J., Box, K., McKitrick, P., 2001. Time-Series Toolbox (Ver. 1.4.2). Yao, X., 1999. Evolutionary Computation Theory and Applications. World Scientific Publishing Co, Singapore, p. 360. Zar, J.H., 1999. Biostatistical Analysis, 4nd ed.. Prentice-Hall, New Jersey, p. 663.
© Copyright 2026 Paperzz