Monte Carlo techniques for estimating solution quality in

Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
Monte Carlo techniques for estimating
solution quality in stochastic groundwater
management models
David W. Watkins, Jr * , David P. Morton^ & Daene C.
McKinney*
* Department of Civil Engineering, University of Texas at
fm; mew af //z^ TTydro/og/c EA7g.
*Department ofMechanical Engineering, University of
r^^^^^/m^^/m, 7% 7^77 2,
morton@mail. utexas. edu
7% 7^772,
daene mckinney@mail utexas. edu
Abstract
For computational purposes, a stochastic groundwater management model with
minimizing value z* is often approximately solved by randomly sampling n
realizations of the model's stochastic parameters, and then solving the resulting
"approximating problem" for ( x% , Zn ). It can be shown that, in expectation, ^
is a lower bound on z* and that this lower bound monotonically improves as n
increases. Using this result, an approach based on Monte Carlo sampling and
solution of multiple instances of the ^-scenario approximating problem is applied
to construct confidence intervals on the optimally gap for any candidate solution
x , such as x = x"^ . A sampling procedure based on common random numbers
ensures non-negative estimates of the optimality gap and provides significant
variance reduction over naive sampling on a test problem involving the location
of pumping wells for contaminant plume containment.
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
68
Computer Methods in Water Resources XII
1 Introduction
Scenario-based stochastic optimization models have proven to be useful
ways of incorporating risk and uncertainty into the screening of water
resources decision alternatives. Solved using variational techniques (i.e.,
linear or nonlinear programming), these models avoid the "curse of
dimensionality" which often hinders the application of stochastic
dynamic programming techniques to problems with more than a few state
variables. However, scenario-based approaches come with their own
curse of dimensionality, which is associated with the number of
realizations which may reasonably be considered in the model. Although
decomposition techniques have made it possible to solve some largescale models, a relatively small number of scenarios, and thus a very
coarse discretization of the joint distribution of random parameters, is
often required.
Scenario-based optimization models have been applied to groundwater management problems by numerous authors (see the review by
Wagner[3]). Nearly all have recognized that the "true" reliability of a
candidate design (actually the probability that the design will satisfy the
model constraints under a realization of the random parameters) is likely
to be lower than that estimated by the model, but the model's estimate
improves as more realizations are considered. To assess the actual
reliability of a candidate design, post-optimality Monte Carlo simulation
has typically been used.
This work, based on that of Mak et al.[l], extends previous work by
constructing confidence intervals on the optimality gap—the difference in
the objective value of the candidate design and the optimal design. As in
previous studies, Monte Carlo sampling is used to assess the expected
performance of a candidate solution Jc, which provides (in expectation)
an upper bound on the true minimizing value of the original problem, z*
However, it is also noted that the minimizing value of an /^-scenario
stochastic program Zn is (in expectation) a lower bound on the true
minimizing value z*, and this lower bound monotonically increases as n
increases. Thus, repeated sampling and solution of "instances" of an nscenario program provide estimates of a lower bound on the true solution
value. By combining the upper and lower bound estimates, confidence
intervals on the optimality gap for a candidate solution Jc can be
constructed.
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
Computer Methods in Water Resources XII
69
2 Groundwater management model
The problem considered is that of hydraulically containing a groundwater
contaminant plume in the presence of hydrogeologic uncertainty. As
shown in Figure 1, a finite-difference representation is used to model
steady flow in a heterogeneous, confined aquifer of constant depth.
Pumping wells are to be installed and operated such that groundwater
flow velocities are directed inwards toward the plume at a number of
check points. A scenario-based stochastic program is formulated with
two main assumptions. First, the spatial distribution of the hydraulic
conductivity is considered a random field with known parameters (mean,
variance, and correlation length) which can be represented by a finite
number of realizations. Second, the decision regarding well locations can
be represented by a two-stage process: the locations must be selected
"here and now," but more information will be available before the
pumping rates are determined ("wait and see").
550m
Constant-head boundary (100 m)
>%
Cj
"O!—
JDO
73
C
3
O
1,6)
O
—
Constant-head boundary (92 m)
Figure 1. Finite-difference grid, gradient check points (—>),
and potential well sites (•).
The design objectives are threefold: minimize the expected (mean)
cost of installing and operating the wells; minimize the upper partial
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
70
Computer Methods in Water Resources XII
mean of the cost distribution (a measure of the risk of cost overruns); and
minimize an expected penalty for not maintaining inward velocities at the
check points. These three objectives are combined using the weighting
method for multiobjective programming, as in the robust optimization
models of Mulvey et al.[2], and the resulting mixed-integer nonlinear
program is solved using the stochastic extension of generalized Benders
decomposition (see Watkins and McKinney[4]). A matrix decomposition
algorithm (Yang[5]) generates realizations of a heterogeneous, log
normally-distributed hydraulic conductivity field with a mean of 1.14 x
10~4 m/s, a standard deviation of the underlying normal distribution of
1.45 m/s, and a correlation length of 100 meters in all directions.
3 Monte Carlo bounds
In estimating an upper bound on z*, consider a "good" but probably
suboptimal set of well locations x E X obtained by some procedure,
possibly heuristic. The expected cost of operating the system with the
suboptimal decision vector, £/fx,£ J, can estimated by the standard
sample mean estimator
where <f ,...,<f ^ are independent and identically distributed (i.i.d.) from
the distribution of <f , a vector of random hydraulic conductivity values.
Two important properties of this estimator are that it is an unbiased
estimator of the true cost of the suboptimal decision, x , i.e.,
)>z,
(1)
and it satisfies the following central limit theorem (CLT):
Vn[[/"(A%)-^(x,f)]^^(0,(7^)as /%-^oo,where(7^=var/(x^). (2)
This CLT, along with the standard sample variance estimator s% (n) ,
allows construction of confidence intervals for Ef( x , % ).
Estimation of a lower bound on z* requires the following two
theorems (see Mak et al.[l] for proofs):
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
Computer Methods in Water Resources XII
be i.i.d. from the distribution of
Theorem 1 . Let <f
=
71
. Then,
mn
(3)
"
be i.i.d. from the distribution of
Theorem 2. Let
Define,
and
Zn = mm
n ;_i
Then,
= mm
xeX
I
(4)
In solving the original problem for z*, an optimal decision must be found
that hedges against all possible realizations of % . Theorem 1 states that
optimizing over only n realizations of ^ results, on average, in an optimistic objective value because of "inside information." Intuitively, one
would expect that optimism wanes (and hence the lower bound estimate
of z* increases) as n increases. This is made precise in Theorem 2.
Just as eqns (1) and (2) can be used to construct confidence intervals
on an upper bound on z*, Theorem 1, exploited in a batch-means
approach, can provide confidence intervals on a lower bound for z*.
These results can then be combined to bound the optimality gap.
4 Confidence intervals and variance reduction
Two approaches are taken for constructing confidence intervals on the
optimality gap Ef( x, £) - z* with respect to a candidate solution x . The
first approach uses independent streams of realizations of % to estimate
the upper and lower bounds separately. The second approach uses the
variance reduction technique know as "common random numbers" to
estimate the optimality gap more directly.
In the first approach, define
1 /? i
and L(WI) =— ][z%*
Ml ;=1
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
72
Computer Methods in Water Resources XII
where <f'l,...,£"* are i.i.d. batches of random vectors. Then,
V/?-l[ZX/2^) -&%]=> 7V(0,a^)as n^—>oo^ where a^=varz^.
(5)
Next, let z^ satisfy P{/V(YU) < z^}= 1-a and define
Then, an approximate (l-2oc)-level confidence interval of the optimality
gap at x is given by (see Mak et al.[l] for details):
P, lM"w )- Z7(A%^)]+ + ^ +g^ j where [y]^ = max{},0}
(6)
Rather than constructing a confidence interval for the optimality gap
by estimating upper and lower bounds separately, the approach based on
common random numbers (CRN) uses the following result of Theorem 1:
E
(7)
i=l
This allows use of a batch-means approach to estimate EG^ in which the
same set of observations is used in the upper- and lower-bound estimators
on the left-hand side of Eq (7). Let G^ be an observation of the
optimality gap and define
—
1 " •
ZaJa(fio)
G,; =- ZG^ and g%, =
&_* .
" i=l
*
(8)
Then, [0, G^ + £g \ is an approximate (l-oc)-level confidence interval for
the optimality gap at x (see Mak et al.[l] for details). Since G^ > 0, there
is no possibility of negative gap estimates. Furthermore, significant
variance reduction over the previous method may be obtained since the
upper- and lower-bound estimators are likely to be positively correlated.
5
Computational results
The proposed techniques were applied to the problem described in
Section 2. Results of the solution strategy based on naive sampling with
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
Computer Methods in Water Resources XII
independent
random number streams
73
are shown in Table 1.
In
calculating the lower-bound estimator, n^ - 40 batches are used, with
each batch consisting of 50
i.i.d. observations of the hydraulic
conductivity field. The upper bound is computed using i.i.d. observations
that are independent of those used in estimating the lower bound. The
upper bound is estimated with respect to a candidate integer solution x =
mode(x*l,...,x^ ), where x*' are the optimal well locations in the n^
respective approximating problems. The sample size n% used for the
upper-bound estimator is selected to yield an error estimate g% on the
order of Eg .
Table 1. Results from Naive Sampling (Lower bound batch size, n = 50;
Number of batches, n^ - 40; Upper bound sample size, %% = 1000).
Lower Bound
Upper Bound
Optim;ility Gap
Point
Iestimate
,):
493
i7(n J:
L]+:
68.0
20.7
Error
Estimal te
E^: 4.,4
Conf. Int.
(95%)
4: 12.3
CPU
min.
102.7
32.6
[0, 37.4]
134.3
Computational results for the CRN strategy are shown in Table 2.
Unlike the naive strategy, a candidate solution x cannot be derived from
the approximating problems used to estimate the optimality gap. In order
to keep the CRN strategy self-contained, x is computed by solving an
initial approximating problem with twice as many scenarios as used in
the lower-bound estimation. The CPU times reported for gap estimation
include the solution time for this initial approximating problem, the time
to solve the 40 approximating problems used for lower-bound estimation,
and the time for the calculating the upper bound terms in G% (see eqn
[7]). The upper-bound estimates reported in Table 1 are not necessary to
generate the confidence interval on the optimality gap, but they do allow
for comparison of the two methods of generating a candidate solution x.
The CPU times relevant to comparison of the two strategies are the
"Optimality Gap" times in Tables 1 and 2.
Transactions on Ecology and the Environment vol 17, © 1998 WIT Press, www.witpress.com, ISSN 1743-3541
74
Computer Methods in Water Resources XII
Table 2. Results from Sampling with Common Random Numbers
(Sample size used to generate x , n^ - 100; Batch size, n - 50; Number of
batches, n - 40; Upper bound sample size, n^ - 1000).
8
Error
Estimate
£g : 2.0
£7(«J: 46.2
e«: 6.9
Point
Estimate
Optimality Gap
Upper Bound
Conf. Int.
(95%)
[0, 7.8]
CPU
min.
104.3
32.8
For this test problem, the computational savings of the CRN strategy
over the naive strategy (or, alternatively, the improvement of the
confidence interval with similar computational effort) is substantial. This
savings can be quantified in the "variance reduction" factor, calculated as
((EI +£u)/£gj= 69.7.
When £g -£„, this value is the approximate
factor by which the sampling sizes in the naive strategy would have to be
increased in order to achieve the confidence interval width provided by
the CRN strategy. In other words, the naive strategy would require that
both rig and n^ be increased by a factor about 70 to achieve results similar
to those from the CRN strategy.
6 References
[1]
Mak, W-K., Morton, D.P., and Wood, R.K. (1997). Monte Carlo
bounding techniques for determining solution quality in
stochastic programs, Dept. Rept., Dept. of Mech. Eng., Univ. of
Texas, Austin, Tex.
[2]
Mulvey, J.M., Vanderbei, R.J., and Zenios, S.A. (1995). Robust
optimization of large-scale systems, Opns. Res., 43(2), 264-281.
[3]
Wagner, B.J. (1995). Recent advances in simulation-optimization
ground-water management modeling, in U.S. Natl. Rept. to
7C/GG 7997-7994; CoMfn'WmMj m##Y,/ogy, 1021-1028.
[4]
Watkins, D.W. Jr., and McKinney, D.C. (1997). Finding robust
solutions to water resources problems, J. Water Resour. Ping.
aWMgmf., ASCE, 123(1), 49-58.
[5]
Yang, A.P. (1990). Stochastic Heterogeneity and Dispersion,
Ph.D. Dissertation, Univ. of Texas, Austin, Tex.