PPT - Institute for Mathematical Sciences

Mechanism-Based Emulation of
Dynamic Simulation Models –
Concept and Application in Hydrology
Peter Reichert
Eawag Dübendorf and ETH Zürich
Switzerland
Eawag: Swiss Federal Institute of Aquatic Science and Technology
Contents
Motivation
 Motivation
Concept
Implementation
 Concept of Emulators
Application
 General Concept
Discussion
 Gaussian Process Emulator
 Dynamic Emulator
 Implementation
 Application
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Discussion and Outlook
Motivation
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Motivation
Motivation
Motivation
Problem
Concept
Implementation
Application
Discussion
 Many important systems analytical techniques,
such as optimization, sensitivity analysis, and
statistical inference (e.g. Bayesian inference using
MCMC) require a large number of model
evaluations.
 Many environmental simulation models are
computationally demanding.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Model-based analysis of environmental systems is
often limited by computational requirements.
Motivation
Motivation
Solution Strategies
Concept
Implementation
Application
Discussion
1. Improve the efficiency of the implementation of
environmental simulation models.
2. Improve the efficiency of the implementaton of
systems analytical techniques.
3. Replace the simulation model by a simplified
statistical description, an emulator.
 Obviously, all three strategies must be followed.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 This talk is about recent progress with strategy 3:
The construction and use of emulators of dynamic
environmental simulation models.
Concept
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Concept
Concept
Motivation
Concept
Implementation
Emulator:
An emulator is a statistical approximation of a
deterministic simulation model
Application
Discussion
It can be used for interpolating model results between
simulation results gained at carefully chosen design
points in model input space.
Replacing the simulation model by the emulator can
tremendously increase the efficiency of analyses
(but it also adds additional uncertainty).
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
The emulator provides a deterministic interpolation
result as well as a probability distribution representing
our knowledge of the uncertainty of emulation.
Concept
Motivation
Concept
Implementation
Application
Gaussian Process Emulators:
Emulators have quite successfully been constructed by
setting-up a Gaussian process prior with a mean
consisting of a linear combination of basis functions
and then conditioning this prior on the design data.
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
O‘Hagan 2006
Concept
Motivation
Concept
Gaussian Process Emulators:
Limitations:
Implementation
Application
Discussion
1. Dense output in the time domain leads to numerical
difficulties (large size and poor conditioning of
matrices to be inverted).
2. The knowledge about the mechanisms built into the
simulation program is not used.
It can be expected that we could built a better
emulator when using this knowledge. This is of
particular importance if the design set is small.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
This raises the question how to build an emulator of a
dynamic model that resolves both of these issues.
Concept
Motivation
Concept
Emulators for Dynamic Models:
Three Options:
Implementation
Application
Discussion
1. Application of Gaussian processes with time
dimension as an additional input.
Can lead to very large and poorly conditioned
matrices to invert and numerical problems.
2. For Markovian or state-space models: Emulate
transfer function from one state to the next instead
of the complete dynamic response.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Use a simple dynamic model as a prior and model
innovations as Gaussian processes in the other
input dimensions. These Gaussian processes
correct for the bias in the simple model.
Concept
Motivation
Concept
Implementation
Application
Discussion
Emulators for Dynamic Models:
All emulators proposed so far (to my knowledge) do not
consider our knowledge about the mechanisms
implemented in the simulation model (with the exception
of an problem-specific choice of basis functions).
Approach proposed in this talk:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Use a simplified, linear state-space model to
describe the approximate dynamics of the
simulation model.
 Formulate the innovations as Gaussian processes
of parameters (and potentially other input).
 Derive the emulator (posterior) by Kalman
smoothing.
Implementation
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Implementation
Construction of Emulators
Motivation
Concept
Implementation
Construction of Emulators:
We can distinguish five steps of emulator
development:
Application
Discussion
1. Choice of Design Data
2. Choice of a Simplified Probabilistic Model
3. Coupling of Replicated Simplified Models
4. Conditioning the Simplified Model on the Design
Data
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
5. Calculation of Expected Value and Uncertainty
Construction of Emulators
Motivation
Concept
Implementation
Application
Discussion
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube
sampling from reasonable domains of model
parameters. However, adaptive sampling schemes
could be used that increase the density of sampling
points in regions of high variability of results.
The design data set consists of these parameter
values and the corresponding simulation results:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Construction of Emulators
Motivation
Concept
Implementation
Application
Discussion
2. Choice of a Simplified Probabilistic Model:
The emulator is based on a simplified probabilistic
model M‘ of the simulation model M.
This model expresses our prior beliefs of the behaviour
of the deterministic simulation model.
Ist likelihood function is given by:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Construction of Emulators
Motivation
Concept
Implementation
3. Coupling of Replicated Simplified Models:
The augmented model consists of n replicates of the
simplified model for different parameter values:
Application
Discussion
These models are stochastically coupled.
Probabilities represent here beliefs in a Bayesian
sense.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
We construct a model with n = nD+1 replicates of the
simplified model. These correspond to models for the
nD design parameter sets and for the emulation
parameter set.
Construction of Emulators
Motivation
Concept
Implementation
Application
Discussion
4. Conditioning the Simplified Model on the
Design Data:
We calculate the distribution of the last set of
components conditional on results for the first nD sets
of components:
The emulator is gained by integrating out additional
parameters:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Construction of Emulators
Motivation
Concept
Implementation
5. Calculation of Expected Value and Uncertainty:
The expected value provides the deterministic
emulator:
Application
Discussion
The variance-covariance matrix of the emulator is a
quantification of emulation uncertainty.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
Discussion
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube
sampling from reasonable domains of model
parameters. However, adaptive sampling schemes
could be used that increase the density of sampling
points in regions of high variability of results.
The design data set consists of these parameter
values and the corresponding simulation results:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
2. Choice of a Simplified Probabilistic Model:
The simplified probabilistic model consists of a
deterministic model plus a multivariate normal error
term with mean zero:
Discussion
The simplified model can contain additional
parameters. Often a linear combination of suitably
chosen basis function is used:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Coupling of Replicated Simplified Models:
The augmented model consists of independent
replications of the deterministic simplified model and
error terms that are stochastically coupled:
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Coupling of Replicated Simplified Models:
A simple stochastic coupling is obtained by:
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
4. Conditioning the Simplified Model on the
Design Data:
The augmented model is then multivariate normal.
For this reason, we can apply the standard result for
conditioning a multivariate normal distribution on some
of ist components:
Gaussian Process Emulator
Motivation
Concept
Implementation
Application
4. Conditioning the Simplified Model on the
Design Data:
This leads to the emulator as a multivariate normal
distribution:
Discussion
with
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Gaussian Process Emulator
Motivation
5. Calculation of Expected Value and Uncertainty:
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
O‘Hagan 2006
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Dynamic models (and their emulators) have a
structured output:
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
1. Choice of Design Data:
Often parameter values are chosen by latin hypercube
sampling from reasonable domains of model
parameters. However, adaptive sampling schemes
could be used that increase the density of sampling
points in regions of high variability of results.
The design data set consists of these parameter
values and the corresponding simulation results:
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
2. Choice of a Simplified Probabilistic Model:
Concept:
Use of state-space model – emulation of „observed“
output only.
Reasons:
 This accounts for the typical „hidden Markov“
structure of environmental simulation models.
 It allows us to implement an emulator with a simplied
(lower dimensional) state space.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
2. Choice of a Simplified Probabilistic Model:
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Coupling of Replicated Simplified Models:
Augmented Model (1):
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Coupling of Replicated Simplified Models:
Augmented Model (2):
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
3. Coupling of Replicated Simplified Models:
Augmented Model (3): Stochastic coupling
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
4. Conditioning the Simplified Model on the
Design Data:
Kalman (forward) filtering (Künsch, 2001):
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
4. Conditioning the Simplified Model on the
Design Data:
Kalman (backward) smoothing (Künsch, 2001):
Dynamic Emulator
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
5. Calculation of Expected Value and Uncertainty:
Calculation of expected value and variance-covariance
matrix of last set of components:
Implementation
Motivation
Concept
Implementation
Application
Due to the dependence on
(which depends on the design data as well as on the
new parameter values), the smoothing step is very
inefficient.
Discussion
By using the general matrix identity
Data-driven
and physicallybased models,
we are able to separate-out the inversion of the large
sub-matrix that depends only on the design data. This
makes the procedure much more efficient as we do
not have to perform large matrix inversions when
using the emulator at new parameter values.
IMS, Singapore,
Jan. 2008
Application
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Application
Hydrological Model
Motivation
Simple Hydrological Watershed Model (1):
Concept
Implementation
qrain
qet
Application
qrunoff
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
ground
water
dhgw
dt
soil
qgw
dhs
 (qrain  qrunoff )  qet  qlat  qgw
dt
 qgw  qbf  qdp
dhr
 qrunoff  qlat  qbf  qr
dt
qlat
river
qr
qbf
qdp
Kuczera et al. 2006
Hydrological Model
Simple Hydrological Watershed Model (2):
Motivation
Concept
qrain  rain (t )
Implementation
qrunoff  f sat rain (t )
Application
Discussion
qbf  k bf hgw
1
qrunoff
soil
qgw
ground
water
qdp
f sat
3
qdp  kdp hgw
5
qr  kr hr
2
qrain
qet
IMS, Singapore,
Jan. 2008
4
qet  1  exp( ket hs )  f pet pet (t ) 
qlat  f sat qlat,max
Data-driven
and physicallybased models,
qgw  f sat qgw,max
6
1 7
1


1  sF exp( ks hs ) sF  1
Qr  Aw qr
8
qlat
river
qr
qbf
Kuczera et al. 2006
8 model parameters
3 initial conditions
1 standard dev. of obs. err.
Model Application
Motivation
Concept
Implementation
Application
Discussion
 Data set of Abercrombie watershed, New South
Wales, Australia (2770 km2), kindly provided by
George Kuczera (Kuczera et al. 2006).
 Box-Cox transformation applied to model and
data to decrease heteroscedasticity of residuals.
 Step function input to account for input data in
the form of daily sums of precipitation and
potential evapotranspiration.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
 Daily averaged output to account for output data
in the form of daily averaged discharge.
Linearization
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Linearization of model nonlinearities:
Linearization
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Derivation of simplified, linear state-space model:
Results
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Preliminary results with a simpler model look
promising. They demonstrate that the concept
works.
Unfortunately, the results for the hydrological model
are not yet available.
Discussion
Motivation
Concept
Implementation
Application
Discussion
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
Discussion
Discussion
Motivation
Concept
Implementation
Application
Discussion
• We developed a general technique of constructing
emulators for dynamic simulation models.
• In addition to solving technical problems of Gaussian
process emulation of dynamic models, this technique
easily allows us to rely on mechanisms incorporated
in the simulation model. It can be expected that this
improves the emulation process. This is of particular
importance if the design set is small.
• There is need for more research:
• Gaining more experience with our approach.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008
• Extending the approach to the estimation of
additional parameters of the simplified model.
• Learning about advantages and disadvantages of
the different approaches to dynamic emulation.
Acknowledgements
Motivation
Concept
Implementation
Application
Discussion
 Collaboration for this paper:
Gentry White, Susie Bayarri, Bruce Pitman, Tom
Santner during my stay at SAMSI, NC, USA
• Hydrological example and data:
George Kuczera.
• More Interactions at SAMSI:
Jim Berger, Fei Liu, Rui Paulo, Robert Wolpert, John
Paul Gosling, Tony O‘Hagan, and many more.
Data-driven
and physicallybased models,
IMS, Singapore,
Jan. 2008