Comparative Evaluation of Statistical and Mechanistic Models of

Article
pubs.acs.org/est
Comparative Evaluation of Statistical and Mechanistic Models of
Escherichia coli at Beaches in Southern Lake Michigan
Ammar Safaie,† Aaron Wendzel,† Zhongfu Ge,‡ Meredith B. Nevers,‡ Richard L. Whitman,‡
Steven R. Corsi,§ and Mantha S. Phanikumar*,†
†
Department of Civil and Environmental Engineering, Michigan State University, 1449 Engineering Research Court, East Lansing,
Michigan 48824, United States
‡
U.S. Geological Survey, Great Lakes Science Center, Lake Michigan Ecological Research Station, 1574 N. County Road,
300 E. Chesterton, Indiana 46304, United States
§
U.S. Geological Survey, Wisconsin Water Science Center, 8505 Research Way, Middleton, Wisconsin 53562, United States
S Supporting Information
*
ABSTRACT: Statistical and mechanistic models are popular
tools for predicting the levels of indicator bacteria at
recreational beaches. Researchers tend to use one class of
model or the other, and it is difficult to generalize statements
about their relative performance due to differences in how the
models are developed, tested, and used. We describe a
cooperative modeling approach for freshwater beaches
impacted by point sources in which insights derived from
mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis
for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution
time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation
of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved
models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions
based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other
sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of
beach closures.
1. INTRODUCTION
In the context of bathing water quality and beach closures,
significant progress has been made in modeling the levels of
FIOs such as E. coli using mechanistic models at both
marine10−17 and freshwater beaches.4,5,18−20 Mechanistic
models have been used in these environments to estimate the
extent of impact of a river or creek point source of FIO
contamination on nearby coastal beaches. They have also been
used to calculate the effects of coastline breakwaters on the
containment and concentration of bacterial contaminants.20
While these models provide detailed information on nearshore
dynamics, typically resulting in a three-dimensional visualization of the area, they also require a significant investment in
model development, testing, and application. Further, the
computational nature of the models makes them useful for
understanding the system but less practical for direct
applications for daily water quality estimations. Statistical
models have also been widely used in marine and freshwater
systems, increasingly in applications used directly by beach
Modeling in nearshore coastal waters can provide insights for
source tracking,1,2 dispersion and diffusion3−7 and persistence
of environmental contaminants.8 By quantifying the processes
responsible for water quality changes in time and space, risks to
wildlife or human health can be estimated. Models for
nearshore waters may include mechanistic models or empirical
statistical models. Mechanistic models, which are based on
conservation principles, have been widely used to track
contamination and in recent years, more advanced mechanistic
models have included biological processes, such as bacterial
inactivation, to account for a broader range of variation in water
quality. Statistical models, however, are data-based and rely on
the relationships between measurements of hydrometeorology
and known concentrations of the target contaminants.9 Both
types of models are increasingly being used to predict fecal
indicator organisms (FIO, e.g., bacteria and viruses) responsible
for degrading beach water quality, resulting in swimming
advisories and closed beaches throughout coastal areas. Rarely,
however, have the two modeling approaches been subjected to
a careful comparison in a given location based on the same data
sets.
© 2016 American Chemical Society
Received:
Revised:
Accepted:
Published:
2442
November 1, 2015
January 26, 2016
January 29, 2016
January 29, 2016
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
managers to estimate water quality at their swimming
beaches.21,22 In these applications, statistical models are used
in place of routine monitoring for culturable FIO. Use of these
models has been encouraged by the US EPA because they
provide results in a fraction of the time associated with
culturing analyses.23,24 While statistical models can be as simple
as a relationship between rainfall and FIO,25 they can also
incorporate high-intensity automated data collection at multiple
beaches.26 Most active statistical models fall somewhere
between these two examples, but their development requires
the collection of multiple years of data, statistical interpretation
of the information, and validation and improvement. Statistical
models are a practical solution for beach management, but their
specificity and sensitivity are hard to improve due to the highly
variable nature of FIO in a variety of beach environments and
inadequate predictors.27,28
Few studies have used mechanistic and statistical models
together to inform the usefulness of models and to improve the
accuracy of the two modeling approaches. Because both types
of models use similar performance metrics such as R2 or the
root-mean squared error, comparisons of the two models could
help determine the appropriate model application in a given
situation. Further, a comparison of models at an individual
location would provide insight into the inner workings of each
model. Froelich et al.29 used mechanistic and statistical models
for bacteria of the genus Vibrio in the Neuse River Estuary in
North Carolina. The two models were compared using data
collected at different stations in the estuary, and the
mechanistic model generally outperformed their statistical
model. Feng et al.17 used a two-dimensional, depth-averaged
numerical mass balance model based on advection, dispersion,
and reactions as well as a statistical regression model to predict
enterococci levels at the Hobie beach, a marine beach on the
Atlantic coast of South Florida. Both models correctly predicted
approximately 70% of advisories based on data collected at
knee-depth while 90% of advisories were correctly predicted at
waist-depth. The authors recommend the mass balance model
for more informed management decisions due to its ability to
describe the spatiotemporal evolution of enterococci levels.
In this study, we compare mechanistic and statistical models
using data collected over a three-month period at three
southern Lake Michigan beaches to compare the accuracy and
predictive capabilities of the two approaches. Information
derived from mechanistic models were used to improve the
statistical models and vice versa. The use of this type of
cooperative modeling can be used to improve predictions of
water quality, resulting in more accurate estimates of beach
conditions in the Great Lakes and along marine coasts. Further,
the findings can be used for developing remediation activities
for sources or conditions that lead to high concentrations of
FIO at these and similar beaches.30
Figure 1. Map of the study area showing the three Ogden Dunes
beaches (OD1, OD2, and OD3) and the source at Burns Ditch (BD).
Locations of the ADCPs deployed during summer 2008 are also
marked.
beaches. The three beaches OD1, OD2, and OD3 are about
1500, 800, and 500 m, west of the Burns Ditch outfall,
respectively.
Burns Ditch is the outfall of the Little Calumet River, which
drains a mixed land-use watershed that includes heavy industry,
agricultural, and residential areas.9 Periodically, combined sewer
overflows (CSOs) are discharged into Burns Ditch from several
municipal wastewater treatment plants. During heavy rain
events that result in a CSO, concentrations of E. coli as high as
10 000 CFU/100 mL have been recorded.32 E. coli concentrations fluctuate widely, however, with concentrations often
below 100 CFU/100 mL during dry years. The geometric
means of E. coli at BD, OD1, OD2, and OD3 locations during
the study period were 222, 12, 19, and 21 CFU/100 mL,
respectively.
Semidiurnal water samples were usually collected in kneedeep water around 7:00 AM and 2:00 PM from early June to
late August 2008, and samples were analyzed within 4 h of
collection for E. coli using the IDEXX Colilert-18 reagent and
IDEXX Quanti-Tray 2000 method (Standard Methods
SM9223B, American Public Health Association). Additional
water quality variables monitored in this study include turbidity,
specific conductance or electrical conductivity (Econ), water
temperature, and outflow of the Burns Ditch. E. coli
concentrations, turbidity, and the additional water quality
variables were measured twice daily on all days except during
weekends. In addition, available meteorological observations
were obtained from the National Climatic Data Center
(NCDC) and National Data Buoy Center (NDBC) weather
stations surrounding the lake. Additional details of the site
sampling are available in Thupaki et al.5
2.2. Mechanistic Model. A coupled three-dimensional
hydrodynamic and water quality model was used to simulate
temporal and spatial distribution of E. coli in Lake Michigan.
The model was based on the unstructured grid Finite Volume
Community Ocean Model (FVCOM).33 Details can be found
in Chen et al.33,34 The E. coli fate and transport equation
appears as shown below.
2. MATERIALS AND METHODS
2.1. Study Site. The study area is located near the PortageBurns waterway (Burns Ditch) in southern Lake Michigan
(USGS station number 04095090). The three Ogden Dunes
beaches OD1, OD2, and OD3, the focus of the current
research, are impacted by the Burns Ditch outfall (BD in Figure
1). To measure nearshore currents for testing the mechanistic
hydrodynamic model, bottom-mounted, upward-looking
Acoustic Doppler Current Profilers (ADCPs) were deployed
in the study region31 from early June through late August 2008.
Water samples were collected in knee-deep waters at the
∂C
∂C
∂C
∂C
+u
+v
+w
=
∂t
∂x
∂y
∂z
∂ ⎛⎜ ∂C ⎞⎟
∂ ⎛ ∂C ⎞
∂ ⎛⎜ ∂C ⎞⎟
+
+S
KH
KV
⎜KH ⎟ +
⎝
⎠
∂x
∂x
∂y ⎝ ∂y ⎠ ∂z ⎝ ∂z ⎠
2443
(1)
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
⎛ ∂(f vsC)
⎞
p
S = −⎜⎜
+ kII0e−kezC + kdC ⎟⎟θ T − 20
⎝ ∂z
⎠
2.2.1. Electrical Conductivity Modeling. To identify mixing
parameters that best describe transport in the nearshore region,
conservative solute transport modeling is an important first
step. Dye tracer and drifter studies4,40 provide useful data;
however, they are expensive, time-consuming, and the data
collected tend to be limited (typically a few days). Long time
series data (e.g., over the entire summer season) are most
helpful, therefore we have explored the possibility of using
natural tracers. A requirement for successful modeling is that
there be a clear gradient/contrast between the background lake
water and tracer values at the source (river mouth at BD). This
requirement was satisfied by Econ, an easily measured water
quality variable used here as a tracer with the following caveats.
Major ions including Ca, Cl, F, K, Na, NO3, Mg, PO4, and SO4
affect Econ values in lake water leading to many sources and
sinks in a natural environment; therefore Econ is not
conservative in general. However, due to the proximity of the
three Ogden Dunes beaches to the Burns Waterway outfall (∼1
km), no sources and sinks were considered significant. Econ
measurements are also known to be dependent on water
temperature;41 however, the simulated temperature in the nearshore region was relatively constant between the three sample
locations, indicating that an Econ-temperature relation was not
needed. Despite all the factors that are known to influence
Econ values in lake water, Schimmelpfennig et al.37 concluded
that Econ can be used as a suitable tracer in Lake Tegel in
Germany.
2.2.2. Hourly E. coli at the BD Outfall. In the mechanistic
model, observed river discharge and E. coli concentrations at
Burns Ditch were used as inputs to the modeling domain. Since
E. coli data were measured twice daily, hourly river discharge
information was used to approximate the hourly distribution of
E. coli at unsampled times using statistical techniques. To do
this, logistic distributions were fitted to empirical cumulative
distributions of river discharge and observed E. coli to
determine parameters of the distribution of each variable.
The cumulative distribution function (CDF) of the logistic
distribution is given by the following:
(2)
where C denotes concentration of E. coli (CFU/100 mL), and
(u, v, w) are the x, y, and z components of velocity (m/d). KH
and KV are the horizontal and vertical mixing coefficients (m2/
d), respectively. S denotes a sink term for E. coli. f p is the
fraction of E. coli attached to particles, vs is the settling velocity
(m/d), KI is the inactivation rate of E. coli due to sunlight (m2/
W. d), I0 denotes short-wave radiation at the water surface (W/
m2), ke is the sunlight extinction coefficient (m−1), kd is the base
mortality rate (d−1), and the effect of temperature (T) on the
loss rate is modeled by the term θT−20.
The inactivation formulation (eq 2) used is essentially similar
to the one described in Liu et al.18 who used a vertically
integrated two-dimensional model. This formulation was
modified to account for 3D geometry in the Princeton Ocean
Model,4 and the same formulation was later adapted to the
FVCOM unstructured-grid framework.35,36 This inactivation
formulation and transport model are used in this work with
several major changes to further improve model performance
including (a) the use of observed Econ to serve as a long-term
tracer to improve conservative transport simulations, (b) the
use of statistical distributions to generate high-resolution time
series of E. coli at the source, (c) the use of LIDAR data for
nearshore bathymetry, and (d) an accurate interpolation of
meteorological data using a natural neighbor method. All of
these improvements were triggered by an initial comparison
with results from our statistical modeling and details are
described below.
Econ was used as a conservative tracer37 to evaluate
hydrodynamic and mixing parameters by comparing the
simulated and observed values at the beaches. The discharge
of the Burns Ditch, as a point source of E. coli and Econ, were
added from a node point located at the USGS Burns Ditch
station (BD in Figure 1). The background concentrations of
Econ and E. coli were set to 286 μS/cm (lake background
value) and zero, respectively.
The unstructured mesh used in the mechanistic model had
12 684 nodes and 23 602 triangular elements in the horizontal
direction and 20 vertical layers. The horizontal spatial
resolution of the unstructured triangular meshes ranged from
40 m near the BD outfall to 2−5 km in the center of the lake
which provided a good representation of complex nearshore
geometry and features, especially near the Ogden Dunes
beaches (Figure S1 in the Supporting Information, SI). Six arcsecond bathymetry data were obtained from the NOAA
National Geophysical Data Center (NGDC) and interpolated
to the unstructured mesh using the natural neighbor method.
Along the Indiana coast where finer resolution was needed, a
two-meter resolution bathymetric data from the NOAA was
utilized based on a 2008 LIDAR data set. Hourly meteorological observations, including wind speed and direction, air
temperature, cloud cover, dew point, and relative humidity,
were interpolated over the computational grids using a
smoothed natural neighbor algorithm to calculate wind and
heat flux fields over the lake surface.38 For the heat flux
calculations, long-wave solar radiation was calculated using the
model based on air temperature and cloud cover,39 and shortwave solar radiation was calculated using the clear-sky value and
the measured cloud cover.40
F(z) =
⎛ (x − μ) ⎞
1
1
1
⎟=
+ tanh⎜
−z =
⎝ 2s ⎠
(1 + e )
2
2
⎛ π(x − μ) ⎞
1
1
+ tanh⎜
⎟
⎝ 2 3σ ⎠
2
2
x−μ
z=
, −∞ < x < ∞ , −∞ < μ < ∞ , s > 0
s
(3)
where μ is mean, and s is a scale parameter (related to the
standard deviation σ). On the basis of E. coli data from two-day
hourly intensive sampling, we found that the hourly river
discharge and hourly E. coli distributions at BD had the same
empirical CDFs. In other words, the magnitudes of flow and E.
coli concentrations in the river had the same frequency of
occurrence within the hourly interval. Details are in the SI. As
described in the SI, this technique produced better results
compared to a model with linearly interpolated hourly E. coli
data.
2.3. Statistical Model. In a separate project, we developed
multiple linear regression (MLR) models with and without
interaction effects based on data collected in the morning. The
data collected in the afternoon were excluded because they are
correlated with the morning data of the same day and violate
the requirement of independent sample points in MLR models.
Several explanatory variables were considered for inclusion in
2444
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
Figure 2. Comparison of observed (symbols) and simulated (color lines) electrical conductivity at the three beaches.
The 3/2 power in (eq 8) indicates that E. coli concentrations
at the source (BD) are sensitive to changes in turbidity. Similar
relations between C and τ for the beaches indicated that both
the exponent of τ and the R2 values decrease for the individual
beaches. Combining the source characteristic relation (eq 7)
with (eq 5) for all beaches, we get the following:
the model (SI). Parsimonious models were identified by
backward elimination. The following parsimonious model with
only four explanatory variables was finally obtained:
E[log10 C ] = 1.094 − 0.092 ln(Q ) + 0.179 ln(HS) −
1.595I + 0.393 ln(τ )
(4)
E[C ] = 1.09ττ00.488 ≈ τ τ0 ,
with an adjusted R2 = 0.468, where E[] denotes expectation, C
is the E. coli concentration, τ is the turbidity at the beaches, Q is
the river discharge (15 min average), Hs is the significant wave
height, and I is the solar irradiance. Equation 4 was once
believed to be the best model we could obtain following a
standard procedure for the development of empirical models.
To further improve the model, insights obtained from our
mechanistic modeling were used. Mechanistic modeling
indicated that E. coli concentrations at the individual beaches
are strongly dependent on the dynamics of the plume
originating from BD although a weak correlation often existed
with water quality variables such as turbidity or conductivity at
the same location. This observation prompted us to look for a
relation for source-normalized E. coli concentration at the
beaches as a function of normalized turbidity. This resulted in
the following regression equation for all three beaches:
⎡C⎤
⎛ τ ⎞0.98
E⎢ ⎥ = 0.2⎜ ⎟ ,
⎣ C0 ⎦
⎝ τ0 ⎠
After the significance of the term τ(τ0)
had been
recognized, a better model was obtained when turbidity was
transformed to the natural log space:
E[log10 C ] = 0.537 + 0.172 ln(τ )ln(τ0),
R2 = 0.749(N = 129)
(5)
RMSE =
(6)
R2 = 0.82
R2 =
1
n
n
∑ (Oi − Pi)2
i=1
n
∑i = 1 (Oi
n
∑i = 1 (Oi
− O̅ )(Pi − P ̅ )
n
− O̅ )2 ∑i = 1 (Pi − P ̅ )2
n
PBIAS =
(7)
Fn =
Or on the original scale, we have approximately:
E[C0] = 5.485τ03/2
(10)
The above relation indicates that a simple and highly
parsimonious model based on an interaction term between
turbidity at the source, and the individual beaches have high
predictive ability. The model is also easy to apply as it requires
only turbidity measurement from a morning sample. Model
coefficients were found to be significant based on t tests and an
F-test (p < 0.001). The model in (eq 10) was based on
morning data and afternoon samples were used to evaluate
model performance using the following metrics:
where C and τ are the E. coli concentration and turbidity,
respectively, at the beaches, and C0 and τ0 are their
corresponding values at BD. Examining the source characteristics at BD alone, we obtained the following relation:
E[log10 C0] = 1.702 + 1.488(ln τ0),
(9)
1/2
R2 = 0.343
⎡C⎤
⎛τ⎞
or approximately: E⎢ ⎥ = 0.2⎜ ⎟
⎣ C0 ⎦
⎝ τ0 ⎠
R2 = 0.521
(8)
∑i = 1 (Oi − Pi) × 100
n
∑i = 1 Oi
|| Oi , Pi ||
where || Oi , Pi || =
|| Oi , 0 ||
⎛1
⎜⎜
⎝n
n
⎞
i=1
⎠
∑ |Oi − Pi|2 ⎟⎟
(11)
2445
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
Figure 3. Comparison of observed (symbols) and simulated (color lines) E. coli concentrations at the three beaches.
Figure 4. Comparison of statistical (diamonds) and mechanistic (box plots) models with observations for the three beaches. Data and models for the
morning samples are shown using blue color symbols while red symbols denote afternoon samples. In the box plots, the median is shown using a
symbol (⊙) and outliers are denotes using the plus (+) symbol.
Here Oi and Pi denote observed and predicted values of a
variable, respectively. For E. coli, all metrics are based on the
log10-transformed values. The R2 and RMSE are well-known
metrics, while PBIAS is a measure of the tendency of the
simulated data to be higher or lower relative to the
observations.42 The Fourier norm provides an indication of
the variance in the observed data that is not captured by the
model.5 A Fourier norm of zero indicates perfect agreement
between data and model results. Two other metrics (described
in the SI) − the Nash−Sutcliffe efficiency (NSE) and RSR, a
standardized version of the RMSE are used to compare the
performance of the two models.
observed Econ at the Ogden Dunes beaches, and statistics for
the comparison are available in the SI. The results have R2
values ranging from 0.54 to 0.62, and RMSE ranging from 41.5
to 55.8 μS/cm. At OD3 which is the nearest beach to the
source of BD, Econ varied with depth, with a higher
concentration near the surface while the distribution was
nearly vertically well-mixed at OD1. Overall, the model
described the observed Econ reasonably well. The long time
series data allowed us to identify the best mixing parameters
(KH, KV in eq 1) that described conservative solute transport
accurately over the three month period. This was useful because
mixing and inactivation parameters both influence E. coli peaks
in the model leading to considerable uncertainty in model
outcomes. For E. coli, comparison of observations with
simulated results based on the mechanistic model are shown
in Figure 3. The following parameters were used.
3. RESULTS
The model simulated the hydrodynamics accurately except for a
short period during an intense storm around Julian Day (JD)
220 (SI). Figure 2 shows the comparison of simulated and
2446
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
Table 1. Summary Statistics for the Mechanistic and Statistical Models for E. coli Based on the Morning and Afternoon Samples
station
OD3
sample time
model
NSE
PBIAS
RSR
R2
RMSE
Fn
morning samples
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
statistical model
mechanistic model
0.628
0.459
0.652
0.228
0.469
0.199
0.517
0.534
0.544
−0.319
0.015
−0.100
0.554
0.133
0.444
0.299
−4.427
−5.980
−0.965
3.411
1.560
15.021
−9.796
6.993
−9.148
15.758
−27.829
12.692
−3.792
8.094
−11.553
7.250
0.610
0.735
0.590
0.878
0.729
0.895
0.695
0.683
0.675
1.148
0.992
1.049
0.667
0.931
0.745
0.837
0.797
0.744
0.816
0.691
0.686
0.646
0.743
0.817
0.768
0.426
0.565
0.563
0.749
0.603
0.710
0.722
0.407
0.491
0.337
0.502
0.457
0.561
0.467
0.459
0.428
0.728
0.560
0.592
0.431
0.601
0.464
0.521
0.291
0.351
0.222
0.330
0.324
0.398
0.323
0.318
0.334
0.567
0.468
0.495
0.316
0.440
0.333
0.373
afternoon samples
OD2
morning samples
afternoon samples
OD1
morning samples
afternoon samples
all stations
morning samples
afternoon samples
based on natural neighbor interpolation (earlier versions of the
model used the nearest neighbor method). Both models are
suitable for real-time and near real-time predictions as discussed
below.
vS = 1m/d, ke = 0.55m−1, kI = 0.003m 2 /(W. d), fP
= 0.05, kd = 0.777d −1 , and θ = 1.07.
The results indicate that contamination originating from
discharge of the BD is the key contributor to the E. coli levels at
all three beaches. As we get further away from the BD station,
E. coli concentration has a lower variation with depth similar to
Econ.
Figure 4 shows the observed and predicted E. coli based on
the mechanistic and statistical models for morning (blue
symbols) and afternoon (red symbols) samples. Since predicted
E. coli values from the mechanistic model vary with depth, a box
plot with symbols to denote the median [⊙] and outliers (+)
was used to show the distribution. For beach closures, the
Indiana standard for single sample maximum is 235 CFU/100
mL, and this value is marked using dashed lines in Figure 4 to
easily spot the false positives and negatives. Summary statistics
of E. coli concentrations from the mechanistic model were
calculated for each beach for all sampling times and compared
with the results of the statistical model in Table 1 for the
morning and afternoon samples separately. Considering all
stations and the morning and afternoon samples separately, we
found that the mechanistic model outperformed the statistical
model for the afternoon samples while the opposite is true for
the morning samples (Table 1). Since the statistical model was
developed using only the morning data, model performance
deteriorated slightly for the afternoon data. The mechanistic
model is a three-dimensional transport model and large data
sets corresponding to the spatiotemporal evolution of E. coli
plumes are generated. Detailed statistics corresponding to the
predicted variability within the water column are summarized in
Table S4 based on the metrics in eq 11 and compared with the
results from the statistical model. Overall, although both
models produced comparable results, the simple and
parsimonious statistical model was found to generally outperform the particular version of the mechanistic model
considered in the present study. The mechanistic model itself
benefitted from an initial comparison with results from the
statistical model and significant improvements resulted from
long-term “tracer” transport modeling using Econ, the use of an
unstructured grid model to better resolve nearshore features,
the use of statistical methods to generate high-frequency E. coli
data at BD and improved assimilation of meteorological data
4. DISCUSSION
Our initial development of statistical22,26,28 and mechanistic4,5,18 models for the Ogden Dunes beaches proceeded as
separate activities. The RMSE values for log-transformed
observed and simulated values of E. coli based on the Princeton
Ocean Model (a structured grid model) reported in Thupaki et
al.5 were around 1.36 for the same beach sites considered in the
present work (although Thupaki et al.5 used a shorter period to
test their model). By explicitly modeling waves, sediment
transport, and bacteria-sediment interactions, Thupaki et al.5
reported a significant improvement in their model performance
as indicated by the reduced values of RMSE (between 0.49 and
0.54 with sediment processes included). Summary statistics
provided in Tables 1 and S4 indicate that the current version of
the mechanistic model without including sediment-bacteria
interactions was able to perform just as well as the model with
sediment processes in Thupaki et al.5
From the perspective of health risks, predicting the
occurrence of E. coli levels exceeding 235 CFU/100 mL at
beaches is crucial. The statistical model correctly captured six
out of the eight exceedances while the mechanistic model
captured four (Figures 3 and 4). In the mechanistic model,
accurately capturing meteorological forcing during intense
storm events has been a challenge, and the model suffered both
in terms of hydrodynamics (SI) and transport. Improving the
meteorological forcing is expected to further improve the
mechanistic model. For JD 161.5 (Figure 3) high E. coli (2,419
CFU/100 mL) levels were noted at the beaches OD1 and OD2
(Figure 4), and these were underestimated by both models
producing false-negatives. On JD 220, the large storm caused a
high fluctuation of E. coli which was well-captured by the
mechanistic model. However, the model yielded a false-positive
during the afternoon at all beaches. Overall both models
predicted the observed trends well as summarized in Tables 1
and S4.
Finally a few comments on the application of the models
developed in this paper for beach management are in order.
The simple statistical model based on eq 10 requires only two
2447
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Environmental Science & Technology
turbidity measurementsone value at the beach of interest (τ)
and another at the nearby source (τ0) impacting the beach.
Although not examined in this paper, it will be interesting to
understand how turbidity from point sources located on either
side of a beach impacts water quality at the beach (e.g., the Mt.
Baldy beach impacted by Trail Creek and Kintzele Ditch in
southern Lake Michigan, reported in Liu et al.18 and Nevers et
al.43). The simplicity and ease of application are the major
strengths of statistical models in general and eq 10 in particular.
Combining statistical models with commercially available
sensors, data-logging, telemetry and web scripts opens up
exciting possibilities for automating beach management and
such systems were recently implemented for nine Chicago
beaches.44 However, making mechanistic models operational
involves considerable effort, and only a few institutions/
agencies have the infrastructure to make operational forecasts
(e.g., NOAA). The statistical models seem to have an advantage
here, and using insights gained from more complex mechanistic
models to further improve empirical models appears to be a
promising avenue for the near future. Mechanistic models have
the advantage that they can help make more informed decisions
due to their ability to provide detailed information.17 We also
note that model testing as reported in the paper is
retrospective. Evaluating model performance using additional
data (not used for model development) represents another
level of model testing, not reported in this paper. While the
simpler statistical model can be used to make real-time forecasts
using turbidity values from morning samples, the mechanistic
model requires both discharge and E. coli values at the source. A
variety of approaches can be used to make short-term forecasts
of source characteristics including process-based river models,45−47 watershed models,48,49 statistical methods based on
the use of probability distributions as well as approaches based
on wavelet and neural network methods.50,51 The watershed
models48 can also be used to address questions related to
nonpoint source pollution.
In summary, we refined statistical and mechanistic models of
indicator bacteria (E. coli) at beaches in southern Lake
Michigan using a cooperative modeling approach. Using
process-based reasoning derived from observations of simulated
plumes from mechanistic models, we were able to identify
parsimonious empirical models with considerable predictive
power and the ability to generate real-time forecasts. From a
mechanistic modeling point of view, the greatest advantage to
having the improved statistical models is that they provide a
basis for assessment. Without such a basis, it is difficult to know
if the model can be improved further and by how much. This
cooperative modeling approach is expected to lead to gains in
both types of models at other sites impacted by point sources.
■
■
ACKNOWLEDGMENTS
■
REFERENCES
This research was funded in part through the USGS Oceans
Research Priorities Plan and the Great Lakes Restoration
Initiative. We thank Muruleedhara Byappanahalli, Dawn
Shively, Kasia Przybyla-Kelly, Ashley Spoljaric, Pramod
Thupaki, Mark Blouin, and Glen Black for their contributions
to this research. Any use of trade, product, or firm names is for
descriptive purposes only and does not imply endorsement by
the U.S. Government. This article is Contribution 2014 of the
USGS Great Lakes Science Center.
(1) Sokolova, E.; Åström, J.; Pettersson, T. J. R.; Bergstedt, O.;
Hermansson, M. Estimation of pathogen concentrations in a drinking
water source using hydrodynamic modelling and microbial source
tracking. J. Water Health 2012, 10 (3), 358−370.
(2) Sokolova, E.; Pettersson, T. J. R.; Bergstedt, O.; Hermansson, M.
Hydrodynamic modelling of the microbial water quality in a drinking
water source as input for risk reduction management. J. Hydrol. 2013,
497, 15−23.
(3) Boehm, A. B.; Sanders, B. F.; Winant, C. D. Cross-shelf transport
at Huntington Beach. Implications for the fate of sewage discharged
through an offshore ocean outfall. Environ. Sci. Technol. 2002, 36 (9),
1899−1906.
(4) Thupaki, P.; Phanikumar, M. S.; Beletsky, D.; Schwab, D. J.
Nevers, M. B.; Whitman, R. L. Budget analysis of Escherichia coli at a
southern Lake Michigan beach. Environ. Sci. Technol. 2010, 44 (3),
1010−1016.
(5) Thupaki, P.; Phanikumar, M. S.; Schwab, D. J.; Nevers, M. B.;
Whitman, R. L. Evaluating the role of sediment-bacteria interactions
on Escherichia coli concentrations at beaches in southern Lake
Michigan. J. Geophys. Res. Oceans 2013, 118 (12), 7049−7065.
(6) Grant, S. B.; Litton-Mueller, R. M.; Ahn, J. H. Measuring and
modeling the flux of fecal bacteria across the sediment-water interface
in a turbulent stream. Water Resour. Res. 2011, 47 (5), 1−13 W05517.
(7) Rippy, M. A.; Franks, P. J. S.; Feddersen, F.; Guza, R. T.; Moore,
D. F. Physical dynamics controlling variability in nearshore fecal
pollution: Fecal indicator bacteria as passive particles. Mar. Pollut. Bull.
2013, 66 (1−2), 151−157.
(8) Fries, J. S.; Characklis, G. W.; Noble, R. T. Attachment of fecal
indicator bacteria to particles in the Neuse River Estuary, NC. J.
Environ. Eng. 2006, 132 (10), 1338−1345.
(9) Nevers, M. B.; Whitman, R. L. Nowcast modeling of Escherichia
coli concentrations at multiple urban beaches of southern Lake
Michigan. Water Res. 2005, 39 (20), 5250−5260.
(10) McCorquodale, J. A.; Georgiou, I.; Carnelos, S.; Englande, A. J.
Modeling coliforms in storm water plumes. J. Environ. Eng. Sci. 2004, 3
(5), 419−431.
(11) Kim, J. H.; Grant, S. B.; McGee, C. D.; Sanders, B. F.; Largier, J.
L. Locating Sources of Surf Zone Pollution: A Mass Budget Analysis of
Fecal Indicator Bacteria at Huntington Beach, California. Environ. Sci.
Technol. 2004, 38 (9), 2626−2636.
(12) Boehm, A. B.; Keymer, D. P.; Shellenbarger, G. G. An analytical
model of enterococci inactivation, grazing, and transport in the surf
zone of a marine beach. Water Res. 2005, 39 (15), 3565−3578.
(13) Grant, S. B.; Kim, J. H.; Jones, B. H.; Jenkins, S. A.; Wasyl, J.;
Cudaback, C. Surf zone entrainment, along-shore transport, and
human health implications of pollution from tidal outlets. J. Geophys.
Res. 2005, 110 (C10), C10025.
(14) Sanders, B. F.; Arega, F.; Sutula, M. Modeling the dry-weather
tidal cycling of fecal indicator bacteria in surface waters of an intertidal
wetland. Water Res. 2005, 39 (14), 3394−3408.
(15) de Brauwere, A.; de Brye, B.; Servais, P.; Passerat, J.;
Deleersnijder, E. Modelling Escherichia coli concentrations in the
tidal Scheldt river and estuary. Water Res. 2011, 45 (9), 2724−2738.
(16) Feng, Z.; Reniers, A.; Haus, B. K.; Solo-Gabriele, H. M.
Modeling sediment-related enterococci loading, transport, and
ASSOCIATED CONTENT
S Supporting Information
*
The Supporting Information is available free of charge on the
ACS Publications website at DOI: 10.1021/acs.est.5b05378.
Additional comparisons, details of methods and discussion are available as noted in the text (PDF)
■
Article
AUTHOR INFORMATION
Corresponding Author
*Phone: 517-432-0851; e-mail: [email protected] (M.S.P.).
Notes
The authors declare no competing financial interest.
2448
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449
Article
Environmental Science & Technology
inactivation at an embayed nonpoint source beach. Water Resour. Res.
2013, 49 (2), 693−712.
(17) Feng, Z.; Reniers, A.; Haus, B. K.; Solo-Gabriele, H. M.; Wang,
J. D.; Fleming, L. E. A predictive model for microbial counts on
beaches where intertidal sand is the primary source. Mar. Pollut. Bull.
2015, 94 (1−2), 37−47.
(18) Liu, L.; Phanikumar, M. S.; Molloy, S. L.; Whitman, R. L.;
Shively, D. A.; Nevers, M. B.; Schwab, D. J.; Rose, J. B. Modeling the
Transport and Inactivation of E. coli and Enterococci in the NearShore Region of Lake Michigan. Environ. Sci. Technol. 2006, 40 (16),
5022−5028.
(19) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S. WaveInduced Mass Transport Affects Daily Escherichia coli Fluctuations in
Nearshore Water. Environ. Sci. Technol. 2012, 46 (4), 2204−2211.
(20) Ge, Z.; Whitman, R. L.; Nevers, M. B.; Phanikumar, M. S.;
Byappanahalli, M. N. Nearshore hydrodynamics as loading and forcing
factors for Escherichia coli contamination at an embayed beach. Limnol.
Oceanogr. 2012, 57 (1), 362−381.
(21) Francy, D. S.; Stelzer, E. A.; Duris, J. W.; Brady, A. M. G.;
Harrison, J. H.; Johnson, H. E.; Ware, M. W. Predictive Models for
Escherichia coli Concentrations at Inland Lake Beaches and Relationship of Model Variables to Pathogen Detection. Appl. Environ.
Microbiol. 2013, 79 (5), 1676−1688.
(22) Nevers, M. B.; Whitman, R. L. Efficacy of monitoring and
empirical predictive modeling at improving public health protection at
Chicago beaches. Water Res. 2011, 45 (4), 1659−1668.
(23) US EPA. Action plan for beaches and recreational waters; EPA/
600/R-98/079; US Environmental Protection Agency: Washington,
DC, 1999.
(24) US EPA. Recreational Water Quality Criteria; EPA-820-F-12−
058; US Environmental Protection Agency, Office of Water:
Washington, DC, 2012.
(25) Stidson, R. T.; Gray, C. A.; McPhail, C. D. Development and
use of modelling techniques for real-time bathing water quality
predictions. Water Environ. J. 2012, 26 (1), 7−18.
(26) Nevers, M. B.; Byappanahalli, M. N.; Edge, T. A.; Whitman, R.
L. Beach science in the Great Lakes. J. Great Lakes Res. 2014, 40 (1),
1−14.
(27) Boehm, A. B.; Whitman, R. L.; Nevers, M. B.; Hou, D.;
Weisberg, S. B. Nowcasting Recreational Water Quality. In Statistical
Framework for Recreational Water Quality Criteria and Monitoring; John
Wiley & Sons, Ltd: New York, 2007; pp 179−210.
(28) Whitman, R. L.; Nevers, M. B. Escherichia coli Sampling
Reliability at a Frequently Closed Chicago Beach: Monitoring and
Management Implications. Environ. Sci. Technol. 2004, 38 (16), 4241−
4246.
(29) Froelich, B.; Bowen, J.; Gonzalez, R.; Snedeker, A.; Noble, R.
Mechanistic and statistical models of total Vibrio abundance in the
Neuse River Estuary. Water Res. 2013, 47 (15), 5783−5793.
(30) Hampson, D.; Crowther, J.; Bateman, I.; Kay, D.; Posen, P.;
Stapleton, C.; Wyer, M.; Fezzi, C.; Jones, P.; Tzanopoulos, J.
Predicting microbial pollution concentrations in UK rivers in response
to land use change. Water Res. 2010, 44 (16), 4748−4759.
(31) Thupaki, P.; Phanikumar, M. S.; Whitman, R. L. Solute
dispersion in the coastal boundary layer of southern Lake Michigan. J.
Geophys. Res. Oceans 2013, 118 (3), 1606−1617.
(32) Olyphant, G. A.; Thomas, J.; Whitman, R. L.; Harper, D.
Characterization and statistical modeling of bacterial (Escherichia coli)
outflows from watersheds that discharge into southern Lake Michigan.
Environ. Monit. Assess. 2003, 81, 289−300.
(33) Chen, C.; Beardsley, R.; Cowles, G. An Unstructured Grid,
Finite-Volume Coastal Ocean Model (FVCOM) System. Oceanography 2006, 19 (1), 78−89.
(34) Chen, C.; Liu, H.; Beardsley, R. C. An unstructured grid, finitevolume, three-dimensional, primitive equations ocean model:
application to coastal ocean and estuaries. J. Atmospheric Ocean.
Technol. 2003, 20 (1), 159−186.
(35) Wendzel, A. Constraining mechanistic models of indicator bacteria
at recreational beaches in Lake Michigan using easily-measurable
environmental variables. M. S. Dissertation, Michigan State University,
East Lansing, MI, 2014.
(36) Thupaki, P.; Phanikumar, M. S.; Nevers, M. B.; Whitman, R. L.
Modeling the effects of hydrologic separation on the Chicago area waterway
system on water quality in Lake Michigan; Great Lakes and Mississippi
River Interbasin Study (GLMRIS) Report; US Army Corps of
Engineers: Chicago, 2013; Appendix F; p F639−F743.
(37) Schimmelpfennig, S.; Kirillin, G.; Engelhardt, C.; Nützmann, G.
Effects of wind-driven circulation on river intrusion in Lake Tegel:
modeling study with projection on transport of pollutants. Environ.
Fluid Mech. 2012, 12 (4), 321−339.
(38) Schwab, D. J.; Beletsky, D. Lake Michigan Mass Balance Study:
Hydrodynamic Modeling Project; NOAA Technical Memorandum ERL
GLERL-108; Great Lakes Environmental Research Laboratory: Ann
Arbor, MI, 1998.
(39) Parkinson, C. L.; Washington, W. M. A large-scale numerical
model of sea ice. J. Geophys. Res. 1979, 84 (C1), 311−337.
(40) Nguyen, T. D.; Thupaki, P.; Anderson, E. J.; Phanikumar, M. S.
Summer circulation and exchange in the Saginaw Bay-Lake Huron
system. J. Geophys. Res. Oceans 2014, 119 (4), 2713−2734.
(41) Hayashi, M. Temperature-electrical conductivity relation of
water for environmental monitoring and geophysical data inversion.
Environ. Monit. Assess. 2004, 96 (1−3), 119−128.
(42) Fry, L. M.; Hunter, T. S.; Phanikumar, M. S.; Fortin, V.;
Gronewold, A. D. Identifying streamgage networks for maximizing the
effectiveness of regional water balance modeling: Identifying Gage
Networks. Water Resour. Res. 2013, 49 (5), 2689−2700.
(43) Nevers, M. B.; Whitman, R. L.; Frick, W. E.; Ge, Z. Interaction
and Influence of Two Creeks on Concentrations of Nearby Beaches:
Exploration of Predictability and Mechanisms. J. Environ. Qual. 2007,
36 (5), 1338.
(44) Shively, D. A.; Nevers, M. B.; Breitenbach, C.; Phanikumar, M.
S.; Przybyla-Kelly, K.; Spoljaric, A. M.; Whitman, R. L. Prototypic
Automated Continuous Recreational Water Quality Monitoring of
Nine Chicago Beaches. J. Environ. Manage. 2016, 166, 285−293.
(45) Anderson, E. J.; Phanikumar, M. S. Surface storage dynamics in
large rivers: Comparing three-dimensional particle transport, onedimensional fractional derivative and multi-rate transient storage
models. Water Resour. Res. 2011, 47 (9), W09511.
(46) Shen, C.; Phanikumar, M. S. An efficient space-fractional
dispersion approximation for stream solute transport modeling. Adv.
Water Resour. 2009, 32 (10), 1482−1494.
(47) Phanikumar, M. S.; Aslam, I.; Shen, C.; Long, D. T.; Voice, T. C.
Separating surface storage from hyporheic retention in natural streams
using wavelet decomposition of acoustic Doppler current profiles.
Water Resour. Res. 2007, 43 (5), W05406.
(48) Niu, J.; Phanikumar, M. S. Modeling watershed-scale solute
transport using an integrated, process-based hydrologic model with
applications to bacterial fate and transport. J. Hydrol. 2015, 529 (1),
35−48.
(49) Niu, J.; Shen, C.; Li, S. G.; Phanikumar, M. S. Quantifying
storage changes in regional Great Lakes watersheds using a coupled
subsurface - land surface process model and GRACE, MODIS
products. Water Resour. Res. 2014, 50 (9), 7359−7377.
(50) Brion, G. M.; Lingireddy, S. A neural network approach to
identifying non-point sources of microbial contamination. Water Res.
1999, 33 (14), 3099−3106.
(51) Chang, F. J.; Chen, Y. C. A counterpropagation fuzzy-neural
network modeling approach to real time streamflow prediction. J.
Hydrol. 2001, 245 (1−4), 153−164.
2449
DOI: 10.1021/acs.est.5b05378
Environ. Sci. Technol. 2016, 50, 2442−2449