Calibration, Sensitivity Analysis and Uncertainty Analysis

Calibration, Sensitivity Analysis
and Uncertainty Analysis for
Computationally Expensive
Models
Prof. Christine Shoemaker
Pradeep Mugunthan, Dr. Rommel Regis, and Dr. Jennifer Benaman
School of Civil and Environmental Engineering and
School of Operations Research and Industrial Engineering
Cornell University
South Florida Water District Morning Meeting
Sept. 24, 3003
Models Help Extract Information
from Point Data to Processes
Continuous in Space and Time
Point Data
Model
from monitoring
or experiments
at limited
number of
points in space
and time
that describes
temporal and
spatial
connections
Forecasts (with statistical
representation)
Comparison of Alternative
Management Options
Understanding Processes
Models Help Extract Information
from
for Multiple Outputs
Data___________________
Model Outputs
Point Data
Model
from monitoring
or experiments
at limited
number of
points in space
and time
that describes
temporal and
spatial
connections
Forecasts (with statistical
representation)
Comparison of Alternative
Management Options
Understanding Processes
Steps in Modeling
• Calibration—selecting parameter values within
acceptable limits to fit the data as well as possible
• Validation—applying the model and calibrated
parameters to independent data set
• Sensitivity Analysis—assess the impact of changes in
uncertain parameter values on model output
• Uncertainty Analysis-assessing the range of model
outcomes likely given uncertainty in parameters, model
error, and exogenous factors like weather.
Computationally Expensive Models
• It is difficult to calibrate for many parameters
with existing methods with a limited number of
simulations.
• Most existing uncertainty methods require
thousands of simulations.
• We can only do a limited number of model
simulations if models that hours to run.
• Our methods are designed to reduce the
number of simulations required to do good
calibration and sensitivity analysis.
Methods and Applications
• We will discuss a general methodology for
calibration, sensitivity analysis and
uncertainty analysis that can be applied to
many types of computationally expensive
models.
• We will present numerical examples for
two “real life” examples: a watershed and
a groundwater remediation.
1.Effective Use of Models and
Observations Through Calibration,
Sensitivity Analysis and Uncertainty
Analysis
A description of the technical approach and “real life
applications. Including:
1. Sensitivity Analysis for large number of parameters with
application to a large watershed.
2. Optimization methods for calibration with application to
ground water remediation based on field data.
3. Uncertainty Analysis based on groundwater model
Cannonsville Watershed
• Cannonsville Reservoir Basin – agricultural basin
• Supply of New York City drinking water
• To avoid $8 billion water filtration plant, need
model analysis to help manage phosphorous
1200 km2
Watershed
subject to
economic
constraints if P
violations of
TMDL.
Monitoring Stations
Town Brook
T
$
W. Br. Delaware @ Delhi
T
$
U
S%
#
Trout Creek
S
#
S
#
T
$
Town Brook
S
#
T
#$
S
U
%
Little Delaware R.
T
$
Beerston
W. Br. Delaware R. @ Walton
N
There are over
20,000 data for
this watershed
U
%
5
0
5
10 Kilo me ters
T
$
S
#
Sediment Monitoring Stations
Climate Stations
USGS Flow Gauges
Rivers and Streams
Subwatersheds Boundaries
Questions
• Using all this data, can we develop a
model that is a useful forecasting tool to
assess the impact of weather and
phosphorous management actions on
future loading the reservoir?
• What phosphorous management
strategies should be undertaken if any?
I. Methodology for Sensitivity Analysis
of a Model with Many Parameters:
Application to Cannonsville Basin
• Joint work with Jennifer Benaman (Cornell
Ph.D. in Civil and Environmental
Engineering, 2003)
• Funded by EPA Star Fellowship
Sensitivity Analysis with Many
Parameters
• Sensitivity Analysis measures the change in
model output associated with the change
(perturbation) in model input (e.g. in parameter
values).
• Purposes include:
– To help select which parameters should be adjusted
in a calibration and which can be left at default
values.
– This makes multivariate sensitivity and uncertainty
analysis more feasible for computationally expensive
models
Sensitivity Analysis with Many
Parameters- Additional Purposes
– To prioritize additional data collection, and
– To estimate potential errors in model forecasts
that could be due to parameter value errors.
• Sensitivity Analysis and calibration are
difficult with a large number of parameters.
Questions
• Can we develop a sensitivity analysis
method that is:
– robust (doesn’t depend strongly on our
assumptions)?
– computationally efficient for a large
number of parameters (hundreds)?
– allows us to consider many different
model outputs simultaneously?
–.
Choose Parameters
Establish Parameter
Ranges
Choose Output
Variables of Concern
Application to Cannonsville Watershed
• 160 parameters
– 35 basinwide
– 10 vary by land use (10 x 5 land uses)
– 7 vary by soil (7 x 10 soil types)
– 2 additional for corn and hay
– 1 additional for pasture
• Ranges obtained from literature,
databases, and SWAT User’s Manual
Monitoring Stations
Town Brook
T
$
W. Br. Delaware @ Delhi
T
$
U
S%
#
Trout Creek
S
#
S
#
T
$
Town Brook
S
#
T
#$
S
U
%
Little Delaware R.
T
$
Beerston
W. Br. Delaware R. @ Walton
N
U
%
5
0
5
10 Kilo me ters
T
$
S
#
Sediment Monitoring Stations
Climate Stations
USGS Flow Gauges
Rivers and Streams
Subwatersheds Boundaries
Choose Parameters
Establish Parameter
Ranges
Choose Output
Variables of Concern
Output Variables of Concern
• Basinwide (average annual from 1994-1998)
– Surface water runoff
– Snowmelt
– Groundwater flow
– Evapotranspiration
– Sediment yield
• Location in-stream (monthly average over entire simulation)
– Flow @ Beerston
– Flow @ Trout Creek
– Flow @ Town Brook
– Flow @ Little Delaware River
– Sediment load @ Beerston
– Sediment load @ Town Brook
Final Results
Percentage of times in the 'Top 20'
These
are in
top 20
for ALL
cases
These
are in
top 20
most
of the
time
Weighting Method A Weighting Method B Weighting Method C Weighting Method D
Focus on Basinwide
All Equal Weights
Focus on Beerston
Focus on Calibration
Management
APMBASIN
100
100
100
100
BIOMIXBASIN
100
100
100
100
CN2CSIL
100
100
100
100
CN2FRSD
100
100
100
100
CN2PAST
100
100
100
100
RSDCOPAST
100
100
100
100
SLSUBBSNBASIN
100
100
100
100
SMFMNBASIN
100
100
100
100
T_BASEPAST
100
100
100
100
T_OPTPAST
100
100
100
100
USLEKNY129
100
100
100
100
ESCONY129
100
75
75
100
SMTMPBASIN
100
75
75
100
LAT_SEDBASIN
100
50
100
100
CN2HAY
75
75
75
75
ESCONY132
75
75
75
50
GWQMNBASIN
75
75
75
75
TIMPBASIN
75
50
75
75
BIO_MINPAST
75
50
50
75
ROCKNY132
75
25
50
50
REVAPMNBASIN
50
50
50
75
ROCKNY129
50
25
50
25
USLEPCSIL
25
25
50
25
HVSTICSIL
25
25
25
50
USLECPAST
25
25
25
25
SMFMXBASIN
25
0
0
50
GSIPAST
0
0
25
0
ROCKNY026
0
0
25
0
Computational Issues
• We have a robust method for determining
importance and sensitivity of parameters.
• An advantage is that the number of model
simulations is independent of the number of
output variables, sensitivity indices, or
weighting factors considered in the combined
sensitivity analysis. (Almost no extra
computation is required to do many output
variables, indices or weightings.)
• The number of simulations is simply the
number required to do a single (non robust)
univariate sensitivity analysis multiplied by
the number of perturbation methods (=2 in
this example).
Next Steps
• Once the most important parameters have
been identified we can extend the analysis to
more detailed analyses including:
– Multivariate sensitivity analysis (changes in more
than one parameter at a time)
– Uncertainty Analysis (e.g. GLUE)
• Both of these analyses above are highly
computationally demanding and can hence
only be done with a small number of
parameters.
• The (univariate) sensitivity analysis done here
can identify the small number of parameters
on which these analyses should be focused.
Questions
• Can we develop a sensitivity analysis method that is:
– robust (doesn’t depend strongly on our assumptions)?
– computationally efficient for a large number of
parameters (hundreds)?
– allows us to consider many different model outputs
simultaneously?
– Yes, the results for Cannonsville indicate this
is possible with this methodology.
– Models with longer simulation times require
more total simulation times or fewer
parameters.
II: Use of Response Surface Methods
in Non-Convex Optimization,
Calibration and Uncertainty Analysis
• Joint work with
– Pradeep Mugunthan (PhD Candidate in Civil and
Environmental Engineering)
– Rommel Regis (Postdoctoral Fellow with PhD in
Operations Research)
– Funded by three National Science Foundaton (NSF)
Projects
Computational Effort for Trial and
Error (Manual) Calibration
• Assume that you have P parameters and you want to
consider N levels of each.
• Then the total number of combinations of possible sets
of parameter is NP.
• So with 10 parameters, considering only 2 values each
(very crude evaluation), there are 21024 possible
combinations, too many to evaluate all of them for
computationally expensive function.
• With 8 parameters considering a more reasonable 10
values each gives 100 million possible combinations of
parameters!
• With so many possibilities it is hard to find with trial and
error good solutions with few (e.g. 100) function
evaluations.
Automatic Calibration
• We would like to find the set of parameter
values (decision variables) such that
– the calibration error (objective function) is
minimized
– subject to constraints on the allowable range
of the parameter values.
This is an Optimization Problem.
It can be a global optimization problem.
NSF Project 1: Function Approximation
Algorithms for Environment Analysis with
Application to Bioremediation of
Chlorinated Ethenes
• Title: “Improving Calibration, Sensitivity and
Uncertainty Analysis of Data-Based Models of
the Environment”,
• The project is funded by the NSF Environmental
Engineering Program.
• The following slides will discuss the application
of these concepts to uncertainty analysis.
“Real World Problem”:Engineered
Dechlorination by Injection of Hydrogen
Donor and Extraction
We have developed a user friendly transport model of engineered
anaerobic degradation of chlorinated ethenes that models chemical
and biological species and utilizes MT3D and RT3D.
This model is the application for the function approximation
research.
Optimization
• Because our model is computationally
expensive, we need to find a better way
than trial and error to get a good
calibration set of parameters.
• Optimization can be used to efficiently
search for a “best” solution.
• We have developed optimization methods
that are designed for computationally
expensive functions.
Optimization
• Our goal is to find the
minimum of f(x)
This can be a
measure of
error between
model
prediction and
observations
X can be
parameter
values
where x є D
• We want to do very few evaluations of f(x)
because it is “costly to evaluate.
Global versus Local Minima
Many optimization methods only find one local minimum.
We want a method that finds the global minimum.
F(x)
Local minimum
Global minimum
X (parameter value)
Experimental Design with Symmetric
Latin Hypercube (SLHD)
• To fit the first function approximation we
need to have evaluated the function at
several points.
• We use a symmetric Latin Hypercube
(SLHD) to pick these initial points.
• The number of points we evaluate in the
SLHD is (d+1)(d+2)/2, where d is the
number of parameters (decision
variables).
One Dimensional Example of Experimental Design
to Obtain Initial Function Approximation
Objective
Function
Costly Function Evaluation
(e.g. over .5 hour CPU time for one evaluation).
f(x)
measure of
error
x (parameter value-one dimensional example)
Function Approximation with Initial
Points from Experimental Design
f(x)
x (parameters)
In real applications x is multidimensional since there are many
parameters (e.g. 10).
Update in Function Approximation with New Evaluation
Update done in each iteration for function
approximation for each algorithm expert.
f(x)
new
x (parameter value)
Function Approximation is a guess of the function value of
f(x) for all x.
Use of Derivatives
• We use the gradient-based methods only
on the function approximations R(x) (for
which accurate derivatives are
inexpensive to compute).
• We do not try to compute
gradients/derivatives for the underlying
costly function f(x).
Our RBF Algorithm
• Our paper on RBF optimization algorithm
has will appear soon in Jn. of Global
Optimization .
• The following graphs show a related RBF
method called “Our RBF” as well as an
earlier RBF optimization suggested by
Gutmann (2000) in Jn. of Global
Optimization called “Gutmann RBF”.
Comparison of RBF Methods on a 14-dimensional
Schoen Test Function (Average of 10 trials)
Comparison of RBF Methods on a 14-dimensional Schoen Test Function
45
ExpRBF-L
GutmannRBF
GreedyRBF
40
mean of the best value in 30 runs
Objective
Function
35
30
Our RBF
25
20
15
120
140
160
180
200
220
240
number of function evaluations
Number of Function Evaluations
260
280
300
Comparison of RBF Methods on a 12-dimensional Groundwater
Aerobic Bioremediation Problem ( a PDE system)
(Average of 10 trials)
Comparison of RBF Methods on a 12-dimensional Groundwater Bioremediation Problem
1100
ExpRBF-L
GutmannRBF
GreedyRBF
1000
mean of the best value in 10 runs
Objective
Function
900
800
700
600
Our RBF
500
400
80
100
120
140
160
number of function evaluations
Number of Function Evaluations
180
200
The following results are from:
NSF Project 1: Function Approximation
Algorithms for Environment Analysis with
Application to Bioremediation of Chlorinated
Ethenes
• Title: “Improving Calibration, Sensitivity and
Uncertainty Analysis of Data-Based Models of
the Environment”,
• The project is funded by the NSF Environmental
Engineering Program.
Now a real costly function:
DECHLOR: Transport Model of
Anaerobic Bioremediation of Chlorinated
Ethene
• This model was originally developed by
Willis and Shoemaker based on kinetics
equations by Fennell and Gossett.
• This model will be our “costly” function in
the optimization.
• Model based on data from a field site in
California.
Complex model: 18 species at each of thousands
of nodes of finite difference model
Dechlorinator
Chlorinated.
Ethenes
PCE
Donors
DCE
TCE
VC
Ethene
H2
Butyrate
Methane
Acetate
Propionate
Lactate
Prop2Ace
But2Ace
Hyd2Meth
Lac2Prop
Lac2Ace
But2Ace
Example of Objective Function for
Optimization of Chlorinated Ethene Model
Model 2
J Observation
o
s
(Y tij  Y tij )
t 1 i 1 j1
T
I
SSE   
where, SSE is the sum of squared errors between
observed and simulated chlorinated ethenes
Ytijo is the observed molar concentration of species j at time t,
location i
Ytijs is the simulated molar concentration of species j at time t,
location i
t = 1 to T represent time points at which measured data is
available
j = 1 to J represents PCE, TCE, DCE, VC and ethene in
that order
i = 1 to I is a set monitoring locations
Algorithms Used for Comparison of
Optimization Performance on Calibration
•
•
•
•
Stochastic Greedy Algorithm
– Neighborhood defined to make search global
– Neighbors generated from triangular distribution around current
solution. Moves only to a better solution.
Evolutionary Algorithms
– Derandomized evolution strategy DES with lambda = 10 and b1 =
1/n and b2 = 1/n0.5 (Ostermeier et al. 1992)
– Binary or Real Genetic algorithm GA, population size 10, one point
cross-over, mutation probability 0.1, crossover probability 1
RBF Function Approximation Algorithms
– RBF Gutmann- radial basis function approach, with cycle length
five, SLH space filling design
RBF-Cornell radial basis function approach.
FMINCON
– derivative based optimizer in Matlab with numerical derivatives
•
10 trials of 100 function evaluations were performed for heuristic
and function approximation algorithms for comparison
Comparison of algorithms for NS as objective
function on a hypothetical problem
Lower curve is better
FMINCON
-(Average NS)
19
RBF-CORNELL
14
RBF-GUT
9
ours
FMINCON+RBF
DES
4
RealGA
-1
30
50
70
90
BinaryGA
Number of function evaluations
Average is based on 10 trials. The best possible value for –NS is –
1. 28 Experimental design evaluations done.
Boxplot comparing best objective value (CNS)
produced by the algorithms in each trial over 10 trials
outlier
average
ours
Conclusions
• Optimizing costly functions is typically done only
once.
• The purpose for our examination of multiple
trials is to examine how well one is likely to do if
you do solve the problem only once.
• Hence we want the method that has both the
smallest Mean objective function value and the
smallest Variance.
• Our RBF has both the smallest Mean and the
smallest Variance.
• The second best method is Gutmann RBF, so
RBF methods seem very good in general.
Conclusions
• Optimizing costly functions is typically done only
once.
• The purpose for our examination of multiple
trials is to examine how well one is likely to do if
you do solve the problem only once.
• Hence we want the method that has both the
smallest Mean objective function value and the
smallest Variance.
• Our RBF has both the smallest Mean and the
smallest Variance.
• The second best method is Gutmann RBF, so
RBF methods seem very good in general.
Alameda Field Data
• The next step was to work with a real field site.
• We obtained data from a DOD field site studied
by a group (including Alleman, Morse, Gossett,
and Fennell).
• Running the simulation model takes about three
hours for one run of the chlorinated ethene
model at this site because of the nonlinearities in
the kinetics equations.
Site Layout
Range of objective values for SSE objective function at
Alameda field site - Mean, min and max are shown for each
algorithm
650000
SSE (m M)2
550000
450000
350000
gradient
250000
150000
DES
FA-Gutmann
ours
FA-RS
FMINCON
Conclusions on RBF Optimization
of Calibration
• Radial Basis Function Approximation Methods
can be used effectively to find optimal solutions
of costly functions.
• “Our RBF” performed substantially better than
the previous RBF method by Gutmann on the
difficult chlorinated ethene remediation problem,
especially because our RBF is robust (small
variance).
• Both Genetic algorithms and derivative-based
search did very poorly.
• The two RBF methods did much better on the
Alameda field data problem than other methods.
However,300 hours is a long time to wait!
Solution: Parallel Algorithms
• We would like to be able to speed up
calculations for costly functions by using parallel
computers.
• To get a good speed up on a parallel computer,
you need an algorithm that parallelizes
efficiently.
• We are developing such an algorithm through a
second NSF grant (from Computer and
Information Science Directorate).
III: Uncertainty Analysis
• Modelers have discovered that there is
often more than one set of parameters that
gives and “adequate” fit to the data.
• One approach to assessing uncertainty
associated with a model output is to look
at the weighted mean and the variability of
the output associated all the sets of
parameters that give an equally good fit.
More than one parameter value might give acceptable goodness
of fit
f(x)
acceptable
x (parameters)
If we impose a “filter” and allow only the acceptable points, then
only the black points are incorporated in the analysis.
Uncertainty Analysis: GLUE
Approach
• GLUE is a methodology (by Bevins and
co-workers) used largely in watersheds
(where computation times are not long).
Uncertainty Analysis via GLUE: Dots are Model
Simulations of Parameter Combinations Chosen at
Random (Two Parameter Example)
parameter 2
parameter 1
parameter combination that gives R2 greater than .75
parameter combination that gives R2 less than .75
Glue Methodology
(used mostly in watershed modeling)
• Step 1: Select combinations of parameter values
at random and simulate model for each
combination.
• Step 2:compare goodness of fit (e.g. R 2) for
each model simulation compared with data
• Step 3: Simulate model at acceptable points and
weight output to determine variability
characteristics of model output (e.g. mean and
variance of amount of contamination remaining
after N years)
Problems with GLUE
Methodology
• We applied GLUE to the Cannonsville
Watershed SWAT model predictions for
sediment (a very hard quantity to model).
• We did 20,000 Monte Carlo runs (which
took about three weeks of computer time).
• Of the 20,000 runs only two runs were
within the allowable R2. (only two
)
• This does not adequately characterize
uncertainty, and it is not computationally
feasible to make more runs.
• For computationally expensive models like
our groundwater problem or your
Everglades problem, it is not feasible to
run the model 20,000 times!
• Hence GLUE has the problem that it finds
very few samples within an acceptable
level (filter) if the filter is fairly stringent.
Groundwater Example Used for
Numerical Comparison with GLUE
• 2-D confined aquifer contaminated with chlorinated
ethenes.
• Same PDE equations as earlier field case
• 400m long, 100m wide
• Modeled using a coarse 10mx10m finite difference grid
– Simulation time for 6 month calibration period was
approximately ¾ minute in a Pentium4® 3GHz
computer
– Typical simulation time for long-term forecast
scenarios is of the order of several hours to days
Calibration Problem
• Calibration of 3 parameters were considered – 2
biological parameters and one biokinetic
parameter
• Synthetic observations were generated for a
period of 6 months using a known set of
parameters
• Optimal calibration was attempted using a
response surface (RS) optimization method
(Regis and Shoemaker, 2004)
• GLUE based calibration/uncertainty assessment
was also performed for comparison
Output Definition
• Output: The total moles of toxic
compounds (chlorinated ethenes)
remaining in aquifer at final time period.
(This cannot be measured but can be
estimated through model.)
• Uncertainty in the Output was analyzed
using GLUE and RS based methods
Goodness-of-fit Measure
• Nash-Sutcliffe Efficiency Measure (Nash
and Sutcliffe, 1970)
1
NS
S
S
1 
 C
i 1
sim
i , j ,t
 Ciobs
, j ,t
j ,t
 C
obs
i , j ,t
 Ciav

2

2
j ,t
   NS  1
• Optimization algorithm was setup to
minimize CNS = 1-NS, so that a CNS of
zero is best
Uncertainty Estimates for Output Total Moles of
Chlorinated Ethenes Remaining
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
6
12
146.00
145.00
35
5
RS200
G500
126
144.00
143.00
142.00
141.00
Our Method 1 with 200 function
evaluations
G1000 G2000
RSG20k
TRUE
Uncertainty Estimates for Output Total Moles of
Chlorinated Ethenes Remaining
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
6
12
146.00
145.00
35
5
RS200
G500
126
144.00
143.00
142.00
141.00
GLUE 1 with 500 function
evaluations
G1000 G2000
RSG20k
TRUE
Uncertainty Estimates for Total Moles of
Chlorinated Ethenes
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
6
12
146.00
35
145.00
126
5
This is the
true answer
144.00
143.00
142.00
141.00
RS200
G500
G1000 G2000
Is the mean, range is 99% of data
RSG20k
TRUE
Uncertainty Estimates for Total Moles of
Chlorinated Ethenes
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
146.00
145.00
Number of
points after
applying filter
35
5
6
12
126
This is the
true answer
144.00
143.00
142.00
141.00
RS200
G500
G1000 G2000
RSG20k
TRUE
RS200 uses 200 function evaluations. G200 found 0 solutions (none)
for this filter. GS500 found only 5 solutions.
Uncertainty Estimates for Total Moles of
Chlorinated Ethenes
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
146.00
145.00
Number of
points after
applying filter
35
5
The mean estimate
12
6
is almost perfect126
for
our RS method and
is far off for GLUE
method with 250%
as many points
evaluated !
144.00
This is the
true answer
143.00
142.00
141.00
RS200
G500
G1000 G2000
Is the mean, range is 99% of data
RSG20k
TRUE
Uncertainty Estimates for Total Moles of
Chlorinated Ethenes
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
146.00
145.00
Number of points
after applying
filter
5
35
6
12
126
144.00
143.00
142.00
141.00
RS200
G500
G1000 G2000
RSG20k
TRUE
Even with 2000 function evaluations, GLUE has a much worse mean than
our RS method with only 1/10 as many function evaluations.
Our Method 2(RSG)
• Step 1: Same as in Method 1
• Step Construct a function approximation
surface of the output
• Step 3: Make a large number of samples
from function approximation. Do further
function evaluations if function
approximation is negative and refit
function approximation.
• Step 4: Filter out points that are not
acceptable and compute statistics
Uncertainty Estimates for Total Moles of
Chlorinated Ethenes
Total moles of chlorinated ethenes
Bounds obtained using a filter of 0.01 for CNS
147.00
146.00
145.00
Number of points
after applying
filter
5
35
6
12
126
144.00
143.00
142.00
141.00
RS200
G500
G1000 G2000
RSG20k
TRUE
Our Method 2 with 200 function evaluations and 20,000 samples
from the response surface
Difference Between Method 1 and
Method 2
The uncertainty analysis in Method 1 is
based only on actual function evaluations.
The uncertainty analysis in Method 2 is
based on a very large number of samples
from the function approximation.
Comments on Results
• A strict filter produces very few points with GLUE
– even after 2000 function evaluations, only 12 points
remain after filtering
• Our RS method produces the tightest bounds
and also provides more points for uncertainty
assessment with only 200 function evaluations
– Limited with respect to sample independence
• The RSG provides an improvement over GLUE
– Independent samples for uncertainty assessment
– A larger sample size for a tight filter
Effect of Relaxing Filter – CNS of 0.1
Total moles of chlorinated
ethenes
Empirical 98% Bounds obtained using a filter of 0.1 for
CNS
12
165.00
44
84
167
160.00
155.00
1542
90
150.00
145.00
140.00
135.00
RS200
G200
G500
G1000
G2000
RSG20k
TRUE
Percentage of Points for Different
Filters
Percentage
of points
Percentage
of points
after
after
filtering
filtering
Comparison
of percentage
percentageofofpoints
points
after
filtering
filtering
after
Comparison of
50
120
Filter
RS200
RS200
G200
100
40
80
30
60
20
40
G200
G500
G500
G1000
G1000
G2000
G2000
RSG20k
RSG20k
20
10
0
0
0.01
0.01
0.1
0.3
0.1
CNS Filter
CNS Filter
1
0.3
inf
Advantages of Method 2
• The samples are independent
• Reuse information from calibration
• Computationally cheap –
– use only the same number of costly function
evaluations as in the regular RS optimization
method (200 in these examples)
– Can obtain goodness-of-fit and output values
for many thousands of points
Summary
• Models can help us use data take a small
scale and at discrete time points to
understand and manage environmental
processes over large spatial areas and
time frames.
• Development of computationally efficient
methods for automatic calibration,
sensitivity and uncertainty analysis are
very important.
New Project 2: Parallel
Optimization Algorithms
• Funded by the Computer Science (CISE)
Directorate at NSF
• The method is general and can be used
for a wide range of problems including
other engineering systems in addition to
environmental systems.
• This research is underway.
2. How are calibration sensitivity
analysis and uncertainty
analysis used in environmental
analyses?
3. What are the alternatives to
sensitivity analysis and uncertainty
analysis?
How Do we address the
uncertainties that are not directly
related to parameter uncertainty
such as data uncertainty?
My NSF Projects
• NSF-Environmental Engineering: applications of
methods to watershed and groundwater
• NSF-Advanced Computing: development of
parallel algorithms for function approximation
optimization
• NSF-Statistics: development of an integration of
Bayesian statistical methods with function
approximation optimization for computationally
expensive functions.
• All this previously funded research can be useful
in applications to the Everglades.