Lecture Slides

Numerical techniques for PDEs with random
input data
Fabio Nobile1 and Raúl Tempone2
1
1
CSQI-MATHICSE, École Polytechnique Fédérale de Lausanne,
Switzerland
SRI Uncertainty Quantification Center, CEMSE, King Abdullah
University of Science and Technology, Saudi Arabia
Current Trends in Computational Methods for PDEs
CIMPA Summer school, IISc Bangalore, India, July 8-19, 2013
1 / 81
Part I
Introduction and motivating examples
2 / 81
Outline
Introduction
Some motivating examples
Abstract framework
Random Fields
3 / 81
Introduction
Mathematical models and computer simulations are widely used in
engineering and science applications.
However, in many cases, the parameters in the model are affected by
uncertainty, either because they are not perfectly known or because they
are intrinsically variable.
Goal: devise effective ways to include and treat uncertainty in a
mathematical model.
Examples
I Modeling of living tissues / biological fluids
I Subsurface modeling: groundwater flows,
contaminant transport, earthquake
simulations, . . .
I Combustion problems / chemical reactions
I Weather forecast / climate modeling
I Molecular biology / protein dynamics / . . .
I Finance
4 / 81
Probabilistic approach
Probability theory provides an effective tool to describe and propagate
uncertainty (although it is not the only one: we mention also worst case
scenario analysis, fuzzy logic, etc.).
We will focus on mathematical models based on Partial Differential
Equations (PDEs) whose input data (coefficients, forcing terms,
boundary conditions, domain boundary, ....) are uncertain and described
as random variables or random fields.
Therefore, the solution of the PDE is itself a random function,
u = u(ω, x) (ω here denotes an elementary random event).
The main question we address in the course is how to effectively
approximate the random function u(ω, x) or some (random) output
Quantities of Interest Q(u).
5 / 81
Linear elasticity with random elastic properties
Consider an elastic body, occupying the domain D ⊂ R3 , with restricted
displacement u = 0 on a subset of its boundary, Σ1 .
1111111111111111111111111
0000000000000000000000000
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
z
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
m
c
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
20
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
10 cm
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
x
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
50
0000000000000000000000000
1111111111111111111111111
cm
0000000000000000000000000
1111111111111111111111111
0000000000000000000000000
1111111111111111111111111
0000000
1111111
1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
y
1 MPa
The (infinitesimal) displacement of the body u ∈ [HΣ1 1 (D)]3 satisfies the
equation
Z
Z
2µ∇s u : ∇s v +
D
P · v,
λ div(u) div(v) =
D
T
Z
Σ2
u
Eν
, λ = (1+ν)(1−2ν)
,µ=
with ∇s u = ∇u+∇
2
Young modulus and Poisson ration, resp.)
E
2(1+ν)
∀v ∈ [HΣ1 1 (D)]d
(where E , ν are the
6 / 81
Assume in the previous problem that the Young modulus E , the Poisson
ration ν and the load vector P = (P1 , P2 , P3 ) are uncertain and treated
as random variables.
Vector of random parameters: y = (E , ν, P1 , P2 , P3 ), with E > 0 and
0 < ν < 21 .
Uncertainty model: y takes values in Γ ⊂ R+ × (0, 12 ) × R3 and has joint
probability density function ρ(y) : Γ → R+ .
Then, for any outcome of the random vector y ∈ Γ there exists a unique
solution u = u(y) ∈ V := [HΣ1 1 (D)]3 .
We can therefore define the random map u(y) : Γ → V that depends on
N = 5 random parameters.
7 / 81
Seismic waves in random layered medium
Elastic waves in the ground can be well described by the linear
elastodynamics equations
 2
 ∂ u
ρ 2 − div [2µ∇s u + λ(div u)I] = f,
in D, t ∈ (0, T ]
∂t
+ suitable initial and boundary conditions
Typically, the medium is made of layers of different materials, whose
mechanical properties %, λ, µ are not perfectly known.
8 / 81
Vector of random parameters: y = (%1 , λ1 , µ1 , . . . , %N , λN , µN ) where
(%i , λi , µi ) are the density and Lame’s constants in the i-th layer.
Uncertainty model: the parameters in two different layers are statistically
independent and always positive. Joint probability density function:
ρ(y) =
N
Y
ρi (%i , λi , µi ),
ρi (%i , λi , µi ) : R3+ → R+
i=1
Other parameters could be uncertain as well, such as the position of the
internal interfaces, the location and intensity of the source term, etc.
For any outcome of the random parameters y, the problem admits a
unique solution
u(y) ∈ V := L2 ([0, T ], H1 (D)) ∩ L∞ ([0, T ], L2 (D)).
We can therefore define the random map u(y) : Γ → V that depends on
3N random parameters (N being the number of layers).
9 / 81
Groundwater flow in a random heterogeneous porous
medium
According to Darcy’s law, the pressure gradient ∇p and the fluid velocity
u in a porous medium follow a linear relation, that is
(
u = −k∇p
div u = f
+ boundary conds.
in D
on ∂D
The second equation (mass conservation) relates sinks and sources of
flow to the velocity field.
In most aquifers, the macroscopic properties (porosity and permeability)
of the ground are highly variable and never perfectly known.
They can be described as random fields, i.e. the permeability k(xi ) > 0
in each point xi ∈ D is a random variable and, taken n points x1 , . . . , xn ,
the random variables k(x1 ), . . . , k(xn ) are in general correlated.
10 / 81
To guarantee positivity of the permeability field one often writes k = e Y .
Random parameters: log-permeability Y(x) = log(k(x)). It is actually a
random field (collection of ∞ random variables)
Uncertainty model: given by the probabily law of the random field.
How can we treat in practice the case of random fields?
We will see that any random field on a compact domain D ⊂ Rd with
bounded variance and continuous covariance can be expanded in series
(e.g. Karhunen-Loève, Fourier, ...)
Y(ω, x) = E[Y](x) +
∞
X
yi (ω)bi (x)
i=1
where y = (y1 , y2 , . . .) is an infinite (countable) sequence of random
variables. By suitable expansion one can make them uncorrelated and,
sometimes, even independent.
For each outcome of the random sequence y ∈ RN , and hence each
realization of the random field Y(x), the problem admits a unique
solution (p(y), u(y)) ∈ V := H 1 (D) × H(div, D)
We can therefore define the random map (p(y), u(y)) : Γ ⊂ RN → V that
depends on an infinite countable number of random variables.
11 / 81
Uncertainty Quantification in Option Pricing
The Black-Scholes model for the value f : (0, T ) × (0, ∞) → R of a
European call option is the following linear parabolic partial differential
equation

2 2 2
 ∂f + rs ∂f + σ s ∂ f = rf , 0 < t < T ,
∂t
∂s
2 ∂s 2
(1)

f (s, T ) = max(s − K , 0),
where the constants r and σ denote the riskless interest rate and the
volatility, respectively.
The volatility is typically estimated from history matching and is often
uncertain. Parametric uncertainty, in the context of derivative pricing,
results in mis-pricing of contingent claims.
Goal: Quantify the impact of volatility uncertainty on option pricing
under Black-Scholes framework.
Remark: The same mathematical problem appears in the modeling of
flows in porous media with uncertain permeability.
12 / 81
Time Continuous Markov Chains for Wear degradation
4.5
4
Wear [mm]
3.5
3
2.5
2
1.5
Data
90% conf. band model 1
90% conf. band model 2
1
0.5
0
0
1
2
3
4
Operating time [h]
5
6
4
x 10
Cylinder liners of diesel engines used for marine propulsion are naturally
subjected to a wear process, and may fail when their wear exceeds a
specified limit.
The liner should be substituted before it accumulates a given wear level,
in order to avoid catastrophic and very expensive failures.
Wear process is here modeled as a pure jump process since it is measured
by a caliper with a finite precision of 0.05 mm.
13 / 81
The state vector X (t), representing the wear level at time t, takes values
in a lattice S, and is modeled as a continuous-time Markov chain.
Assume that each possible jump in the system occurs according to one of
the pairs {(aj (x; θ), νj )}Jj=1 , where aj : S×Θ → R+ is known as the
propensity function associated with the jump νj .
The probability that the system jumps from x∈S to x+νj ∈S during the
small interval (t, t+dt) is
P(X (t+dt) = x+νj X (t) = x) = aj (x; θ)dt + o(dt).
The propensity functions depend on an unknown parameter, θ ∈ Θ,
where Θ is assumed here to be finite dimensional.
Remark: Even if the value of θ is perfectly known, the outcomes of this
model are random.
Goal: Using observed wear data, estimate the parameter θ and then
predict remaining lifetime, i.e. given 0 < s, x0 , δ and letting
T (δ) = max{t ≥ 0 : Xt ≤ δ}
compute
P(T (δ) ≤ s | X0 = x0 ).
14 / 81
Types of uncertainty
I
Aleatory: irreducible variability inherent in nature.
Examples:
I
I
I
I
I
location and intensity of the source of an earthquake;
variability between patients in biomedical applications;
volatility of a stock;
etc.
Epistemic: reducible uncertainties resulting from a lack of
knowledge.
Examples:
I
porosity / permeability in porous media flows.
Both can be effectively treated in a probabilistic framework.
15 / 81
Uncertainty Quantification (UQ) analysis
I
I
I
I
Forward Uncertainty Quantification: given the probabilistic
characterization of the input uncertain parameters, quantify the
uncertainty in the output quantities.
Global sensitivity anaysis: Find out which input random variables (or
combination of them) have the largest influence on the solution or
output quantity.
This is also strictly related to dimension reduction: retain in the
analysis only the most important random variables or
linear/nonlinear combination of them.
Inverse Uncertainty Quantification / Data Assimilation: use
available measurements on observables of the system to improve the
uncertainty characterization of the input variables.
Strictly related is the Optimal Design of Experiments: figure out
which are the best measurements to acquire to reduce at most the
uncertainty on the input random variables.
Optimization / Control under uncertainty: Assume that we can
control the system to minimize a given cost functional. The optimal
control should take into account the uncertainty on the input
parameters. One often refers to this as Robust Control.
16 / 81
Abstract framework
Consider a deterministic PDE model
find u :
L(y(ω))(u) = F(y(ω))
in D ⊂ Rd
(2)
with suitable boundary / initial conditions.
The data of the problem (operator L, forcing term F and, eventually, the
domain D, the initial and boundary conditions) depend on a vector of N
random variables:
y(ω) = (y1 (ω), . . . , yN (ω)) : Ω → RN
where (Ω, A, P) is a complete probability space (Ω: set of outcomes, A:
sigma-algebra of subsets of Ω, P : A → [0, 1]: probability measure).
(when dealing with distributed random fields, N could go to infinity!).
Then, the solution to (2) is also random. Moreover, u depends on the
random event ω only through the random vector y(ω). Therefore,
u = u(y(ω))
17 / 81
Notation, definitions
Let
I
I
I
I
I
I
I
I
D ⊂ Rd be the physical domain where we set our problem, i.e.
x ∈ D.
V be a separable Hilbert space of real valued functions from D.
V 0 be the dual space of V , composed of linear bounded functionals
from V onto R, a Hilbert space too when endowed with the norm
kϕkV 0 = supv ∈V , kv kV =1 ϕ(v ).
Γ = y(Ω) ⊂ RN the image of y
µy the induced measure in Γ: µy (B) = P(ω : y(ω) ∈ B). For
convenience, we assume that µy is absolutely continuous w.r.t the
Lebesgue measure on Γ, so that there exists a density function
ρ : Γ → R+ such that µy (dy) = ρ(y)dy.
1
R
kv kLpρ = Γ v p (y)ρ(y)dy p and Lpρ (Γ) = {v : Γ → R : kv kLpρ < ∞}.
v : Γ → V is strongly measurable if there exist a V -valued sequence
of simple functions, ϕn , s.t. kv (y) − ϕn (y)kV → 0 a.e. in Γ.
1
R
kv kLpρ (Γ;V ) = Γ kv (y)kpV ρ(y)dy p and
Lpρ (Γ; V ) = {v : Γ → V strongly measurable : kv kLpρ (Γ;V ) < ∞}.
18 / 81
Notation, definitions
Multiindices.
For a multi-indices p = (p1 , . . . , pN ) and α = (α1 , . . . , αN ) ∈ NN we use
the following notations
I
Let 1 ≤ q < ∞ then denote
|p|q =
N
X
! q1
pnq
n=1
and
|p|∞ = max pn
1≤n≤N
α
QN
= n=1 pnαn ,
QN
= n=1 pn !
I
p
I
p!
I
p ≤ k means pn ≤ k, ∀n = 1, . . . , N, i.e. |p|∞ ≤ k.
19 / 81
Assumption: ∀y ∈ Γ the problem admits a unique solution u in a Hilbert
space V . Moreover,
ku(y)kV ≤ C (y)kF(y)kV 0
I
Then, the PDE (2) induces a map u = u(y) : Γ → V .
I
if C (y)kF(y)kV 0 ∈ Lpρ (Γ), then u ∈ Lpρ (Γ, V ).
Remark 1. The uniqueness assumption is essential for the forward
uncertainty propagation problem to be well posed.
Remark 2. In most applications, it is reasonable to assume that the
randomness in the coefficients of the PDE and that in the forcing terms
are independent. In this case, we have
kukLpρ (Γ,V ) ≤ kC kLpρ (Γ) kFkLpρ (Γ,V 0 ) .
20 / 81
Goals of the computation (forward uncertainty analysis)
Compute statistics of the solution
pointwise Expected value: ū(x) = E[u(x, ·)], x ∈ D
pointwise Variance: Var[u](x) = E[(u(x, ·) − ū(x))2 ](x)
two points corr.: Covu (x1 , x2 ) = E[(u(x1 , ·) − ū(x1 ))(u(x2 , ·) − ū(x2 ))]
Compute statistics of Quantities of Interest (functionals of the solution)
ψ(y) = Q(u(y)),
where Q : V → R.
ψ is a real-valued function of the random vector y and we would like to
approximate its law.
Examples of quantities of interest:
Z
Q(u) = u(x̄),
Q(u) = sup u(x),
f (u(x), ∇u(x))dx,
Q(u) =
x∈D
etc.
Σ⊂D
In particular, we may be interested in some failure probability event
entailing the approximation of
P(ψ(y) ≥ ψcr )
for some given critical value ψcr .
21 / 81
Recall: basic definitions of Sobolev spaces
Let D ⊂ Rd an open bounded Lipschitz domain
R 2
I L2 (D) ≡ {v : D → R,
v < +∞};
D R
inner product (u, v )L2 (D) = D (uv )
I
2
H 1 (D) ≡ {v ∈ L2 (D) and ∇v
R ∈ L (D)};
inner product (u, v )H 1 (D) = D (uv + ∇u · ∇v )
I
H01 (D): subspace of H 1 of functions that vanish on the boundary
H01 (D) ≡ {v ∈ H 1 (D), v |∂D = 0}
Pd
multi-index: k = (k1 , . . . , kd ); length |k| = j=1 kj ;
I
multi-derivative D k v =
I
∂ k1 +k2 +...+kd
k
k
∂x1 1 ···∂xd d
v
H s (D) ≡ {v : D → R, kD k v kL2 (D) < +∞, ∀k s.t. |k| ≤ s}
X
inner product (u, v )H s (D) =
(D k u, D k v )L2 (D)
k: |k|≤s
I
2
s
L (I , H (D)): space of functions v (t, x) : I × D → R such that
I
I
v (t, ·) ∈ H s (D) for (almost) all t ∈ I (function of t with values in
the Hilbert space H s ).
R
t → v (t) ∈ H s (D) is square integrable: I kv (t)k2H s (D) dt < +∞
22 / 81
Tensor product of Hilbert spaces
Let H1 , H2 be Hilbert spaces.
P The tensor product Hilbert space H1 ⊗ H2
is a vector space of sums n vn ⊗ wn with vn ∈ H1 , wn ∈ H2 , and
endowed with an inner product as follows:
I
Inner product in H = H1 ⊗ H2 Let v1 , v2 ∈ H1 , and w1 , w2 ∈ H2 and
define z1 , z2 ∈ H, z1 = v1 ⊗ w1 , z2 = v2 ⊗ w2 . For such simple
functions we define the inner product as
(z1 , z2 )H1 ⊗H2 ≡ (v1 , v2 )H1 (w1 , w2 )H2 .
For general sums we extend its definition by linearity.
I
The Hilbert
H = H1 ⊗ H2 is defined as the completion of finite
Pspace
N
sums v = n=1 vn ⊗ wn , with vn ∈ H1 , wn ∈ H2 , n = 1, . . . , N with
respect to the inner product (·, ·)H1 ⊗H2 .
I
Let {φi ∈ H1 , i = 1, 2, . . .} be a basis of H1 and
{ψj ∈ H2 , j = 1, 2, . . .} a basis of H2 . Then
{φi ⊗ ψj , i, j = 1, 2, . . .} is a basis of H = H1 ⊗ H2 .
I
If H1 = L2P (Ω) then the space L2P (Ω) ⊗ H2 is isomorphic to
L2P (Ω; H2 ) (Bochner space).
23 / 81
Simple examples of Tensor Product spaces
I
H 1 = Rn , H 2 = R m ; H = Rn ⊗ Rm
Given v = (v1 , . . . , vn ) ∈ Rn and w = (w1 , . . . , wm ) ∈ Rm , then
z = v ⊗ w can be represented as the matrix
 


v1
v1 w1 v1 w2 . . . v1 wm
 v2 
v2 w1 . . .

 


z ∈ Rn×m =  .  (w1 , . . . , wm ) =  .

 .. 
 ..

vn
vn w 1 . . .
vn w m
Rn ⊗ Rm corresponds to the space of n × m matrices with Frobenius
inner product
I
How does H1 ⊗ H2 relate to H2 ⊗ H1 ?
24 / 81
Simple examples of Tensor Product spaces
I
H1 = H2 = L2 (−1, 1), H = L2 (−1, 1) ⊗ L2 (−1, 1).
Let {φi }∞
i=0 basis of orthonormal Legendre polynomials. Then, a
basis for H is {φi ⊗ φj , i, j ≥ 0}.
P
Now, let v (x, y ) = 0≤i,j≤N αij φi (x)φj (y ), then
1. v ∈ H (by definition);
2.
X
X
kv k2H = (
αij φi (x)φj (y ),
αlk φl (x)φk (y ))H
0≤i,j≤N
=
X
0≤l,k≤N
αij αlk (φi (x), φl (x))H1 (φj (y ), φj (y ))H2
ijlk
=
X
ijlk
αij αlk δil δjk =
X
αij2
ij
Observe that the latter is the standard norm in L2 ([−1, 1]2 ). What
do you conclude?
25 / 81
Random fields
Let D ⊂ Rd be a physical domain. A random field a(x, ω) : D × Ω → R
is a collection of infinite random variables: a(x, ω) is a random variable
for each x ∈ D.
I
Finite dimensional distribution of order n. Let x1 , . . . , xn be n points
in the physical domain D. Then
Fn (z1 , . . . , zn ; x1 , . . . , xn ) = P(a(x1 , ω) ≤ z1 , . . . , a(xn , ω) ≤ zn ) (*)
Kolmogorov’s extension theorem
Given the collections of all finite dimensional distributions Fn : Rn ×D ×n →
[0, 1], satisfying a consistency and symmetry condition (see below), then
there exists a probability space (Ω, A, P) and a random field a : D ×Ω → R
satisfying (*).
I
Consistency condition: for m < n
Fm (z1 , . . . , zm ; x1 , . . . , xm ) = Fn (z1 , . . . , zm , ∞, . . . , ∞; x1 , . . . , xm , . . . , xn )
I
Symmetry condition: for any permutation π1 , . . . , πn of the indices
1, . . . , n, Fn (zπ1 , . . . , zπn ; xπ1 , . . . , xπn ) = Fn (z1 , . . . , zn ; x1 , . . . , xn )
26 / 81
I
Expected value: ā(x) = E[a(x, ·)] : D → R.
I
Covariance function: Cova (x1 , x2 ) : D × D → R,
Cova (x1 , x2 ) = E[(a(x1 , ·) − ā(x1 ))(a(x2 , ·) − ā(x2 ))].
I
Variance: Vara (x) = Cova (x, x) : D → R.
We say that a(x, ω) is a second order random field if Vara (x) < ∞,
for all x ∈ D.
I
A random field a(x, ω) : Rd × Ω → R is said to be stationary if its
law is invariant under translation
a(x + h, ω) ∼ a(x, ω),
I
A random field is said to be weakly stationary if
I
I
I
∀h ∈ Rd .
ā is constant
Cova (x1 , x2 ) = Cova (x1 − x2 )
A weakly stationary random field is said isotropic if the covariance
function depends only on kx1 − x2 k
27 / 81
Properties of the covariance function
For a second order random field
function is
p
pthe covariance
I bounded: Cova (x1 , x2 ) ≤
Vara (x1 ) Vara (x2 ).
I symmetric: Cova (x1 , x2 ) = Cova (x2 , x1 )
I semi-positive definite: for any ξ1 , . . . , ξn ∈ D, the matrix
Cij = Cova (ξi , ξj ) is semi-positive definite.
Indeed, let α = (α1 , . . . , αN ) ∈ RN and a0 (x, ω) = a(x, ω) − E[a](x).
X
X
X
αT C α =
αi αj Cova (ξi , ξj ) =
αi αj E[a0 (ξi , ·)a0 (ξj , ·)] = E[(
αi a0 (ξi , ·))2 ] ≥ 0
i,j
i
i,j
If a : Rd × Ω → R is a weakly stationary mean square continuous random
field, its covariance r (x1 − x2 ) = Cova (x1 , x2 ) has the representation
Z
r (τ ) =
e 2πis·τ dµ(s)
Rd
for some positive finite measure µ. (Bochner’s theorem).
If, moreover, µ has a density S(s): µ(ds) = S(s)ds, then S is called the
spectral density and corresponds to the Fourier transform of r .
In particular, the Fourier transform of a covariance function of a weakly
stationary random field is always non negative.
28 / 81
Mean square continuity
A random field a : D × Ω → R is sait to be mean square continuous at
x̄ ∈ D if
lim E[(a(x) − a(x̄))2 ] = 0
x→x̄
A centered (zero mean) random field a : D × Ω → R is mean square
continuous at x̄ ∈ D iff its covariance function Cova (x1 , x2 ) is
continuous at x1 = x2 = x̄.
Proof: 1) let r (x1 , x2 ) = Cova (x1 , x2 ) be continuous at x1 = x2 = x̄, Then
x→x̄
E[(a(x) − a(x̄))2 ] = E[a(x)2 − 2a(x)a(x̄)) + a(x̄)2 ] = r (x, x) − 2r (x, x̄) + r (x̄, x̄) −−−→ 0
2) let a(x, ω) be mean square continuous at x̄. Then
r (x1 , x2 ) − r (x̄, x̄) = E[a(x1 )a(x2 ) − a(x̄)2 ± a(x̄)a(x2 )]
1
1 (x1 ,x2 )→(x̄,x̄)
≤ E[(a(x1 ) − a(x̄))2 ]E[a(x2 )2 ] 2 + E[(a(x2 ) − a(x̄))2 ]E[a(x̄)2 ] 2 −−−−−−−−−→ 0
29 / 81
Mean square differentiability
A random field a : D × Ω → R is mean square differentiable at x̄ ∈ D if
a(x̄ + h, ·) − a(x̄, ·)
∂
a(x̄, ·) := lim
h→0
∂x
h
exists in mean square sense
A centered random field a : D × Ω → R is mean square differentiable at
2
x̄ ∈ D iff ∂x∂1 ∂x2 Cova (x1 , x2 ) exists and is finite at x1 = x2 = x̄.
More generally, a random field a : D × Ω → R is k-times differentiable at
x̄ if the 2k partial derivative of the covariance exists and is bounded at
x1 = x2 = x̄.
30 / 81
Sample path continuity
A random field a : D × Ω → R is sait to be sample path (or almost
surely) continuous at x̄ ∈ D if
P ω : lim a(x, ω) = a(x̄, ω) = 1
x→x̄
Kolmogorov’s theorem
Given a random field a : D × Ω → R, with D ⊂ Rd compact, if there exist
positive constants p, β, K such that
p |a(x1 , ·) − a(x2 , ·)|
E
≤ K , ∀x1 , x2 ∈ D,
|x1 − x2 |β
then, a ∈ C 0,α almost surely, for all 0 ≤ α < β − dp .
31 / 81
Karhunen-Loève expansion
Let D ⊂ Rd be compact and Cova : D × D → R continuous. Then
There exists a sequence of values λ1 ≥ λ2 ≥ . . . ≥ λk ≥ . . . ≥ 0, with
limk→∞ λk = 0 and a corresponding sequence of functions
bi (x) : D → R, i = 1, 2, . . . such that
Z
Z
Cova (x, y )bi (y ) dy = λi bi (x), and
bi (x)bj (x) dx = δij
D
D
Define, now, the sequence of random variables yi (ω), i = 1, 2, . . .
Z
1
√
yi (ω) =
(a(x, ω) − E[a](x)) bi (x) dx
λi D
which are uncorrelated with zero mean and unit variance.
Then, the random field a(x, ω) can be represented as the infinite series
Karhunen-Loève expansion a(x, ω) = E[a](x) +
∞ p
X
λi bi (x)yi (ω)
i=1
32 / 81
Karhunen-Loève expansion
The Karhunen-Loève expansion is a consequence of Mercer’s theorem.
Moreover, under the above assumptions (D compact and K continuous)
it holds

!2 
N p
X
λn bn (x)yn (·)  = 0
lim sup E  a(x, ·) −
N→∞ x∈D
n=1
The KL expansion is the best N-terms approximation in terms of variance
{yn , bn }N
n=1

Z
= arg min E 
R
(ξn ,ψn )
D ψn ψm =δnm
D
a(x, ·) − E[a](x) −
N
X
!2
ξn (·)ψn (x)

dx 
n=1
The convergence rate P
of the√N-term truncation
N
aN (x, ω) = E[a](x) + n=1 λn bn (x)yn (ω) depends on the decay of the
eigenvalues λn , which, in turn, depends on the smoothness of the
Covariance function.
Estimates on the decay of the KL eigenvalues can be found in [Schwab et
al 05].
33 / 81
Mercer’s theorem
Let D ⊂ Rd be a compact domain and K : D × D → R a Mercer’s kernel
I K is symmetric: K (x, y ) = K (y , x)
I K is continuous
I K is semi-positive definite
Define, moreover, the operator TK : L2 (D) → L2 (D)
Z
Tk f (x) =
K (x, y )f (y )dy ,
∀f ∈ L2 (D).
D
Mercer’s theorem
Under the above assumptions on D and K , there is an orthonormal basis
{bi }i of L2 (D) consisting of eigenfunctions of TK such that the corresponding sequence of eigenvalues {λi }i is non-negative. The eigenfunctions corresponding to non-zero eigenvalues are continuous in D and
∞
X
K (x, y ) =
λi bi (x)bi (y )
i=1
where the convergence is absolute and uniform, that is
n
X
lim sup K (x, y ) −
λi bi (x)bi (y ) = 0.
n→∞ x,y ∈D
i=1
34 / 81
Proof of Karhunen-Loève expansion
Let D ⊂ Rd be compact and a : D × Ω → R a random field with continuous
covariance Cova . Then, Cova is a Mercer’s kernel (symmetric, continuous and
semi-positive definite) and by Mercer’s theorem
∞
∞
X
X
Cova (x, y ) =
λi bi (x)bi (y ), =⇒ Vara (x) =
λi bi (x)2
i=1
i=1
Define now the truncated Karhunen-Loève expansion
aN (x, ω) = E[a](x) +
N p
X
λi yi (ω)bi (x).
i=1
with yi (ω) = √1
R
λi
D (a(x, ω) − E[a](x))bi (x)dx and VaraN (x) =
PN
i=1
λi bi (x)2 .
Now, set a0 (x, ω) = a(x, ω) − E[a](x) and observe that
0
E[a0 (x, ·)aN
(x, ·)] =
N p
X
λi E[a0 (x, ·)yi ]bi (x) =
i=1
=
N
X
i=1
N
X
Z
i=1
Z
bi (x)
a0 (x, ·)a0 (z, ·)bi (z)dz]bi (x)
E[
Cova (x, z)bi (z)dz =
D
D
N
X
λi bi (x)2 = VaraN (x)
i=1
Therefore
E[(a(x, ·) − aN (x, ·))2 ] = Vara (x) − VaraN (x) =
∞
X
N→∞
λi bi (x)2 −−−−→ 0
i=N+1
uniformly in x ∈ D.
35 / 81
Fourier expansion of a stationary random field
Let a : R × Ω → R a weakly stationary Gaussian random field with
covariance Cova (x, y ) = Cova (x − y ). We aim at finding a representation
by Fourier series of a restricted to the interval [0, L].
Idea: Restrict the covariance to the interval [−Lp , Lp ], with Lp ≥ L and
replicate it periodically.
111111111
000000000
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
−Lp −L
L
Lp
x−y
Then, the periodicized covariance Cov#
a can be expanded in Fourier series
Cova# (x − y ) =
∞
X
n=0
an2 cos
nπ(x − y )
Lp
36 / 81
The random field a admits the exact representation in [0, Lp ]
a(ω, x) = E [a](x) +
∞
X
nπx
nπx
an yn (ω)cos
+ zn (ω)sin
Lp
Lp
n=0
with
I
E [yn ] = E [zn ] = 0
I
Var [yn ] = Var [zn ] = 1
I
{yn , zn }n uncorrelated.
This technique is known as circulant embedding [Dietrich-Newsam ’97,
Wood-Chan ’94]. The Fourier coeffs. can be efficiently computed by FFT.
Warning: replicating the covariance periodic one may introduce
discontinuities at x − y = nLp which reflect into a slow decay of the
Fourier coefficients. Advise: take Lp much bigger that the characteristic
correlation length so that Cova (Lp ) ≈ 0.
37 / 81
Gaussian random fields
I
I
A random field a : D × Ω → R is Gaussian if all finite dimensional
distributions are Gaussian. I.e. for any x1 , . . . , xn ∈ D, the random
vector y(ω) = (a(x1 , ω), . . . , a(xn , ω)) has a multivariate Gaussian
distribution.
For a centered Gaussian random field, mean square integrability
implies integrability in Lp , ∀p > 0. Indeed let
z(x, ω) = ã(x, ω)/ stda (x) ∼ N(0, 1). Then
1
1
1
E[|a(x, ·)|p ] p = stda (x)E[|z(x, ω)|p ] p = cp E[a(x, ω)2 ] 2
1
I
I
where cp = E[|z|p ] p < ∞, for any p since z ∼ N(0, 1).
As a consequence, for Gaussian fields mean square differentiability
implies almost sure sample path continuity. More generally, mean
square Hölder continuity C 0,β implies almost sure sample path
Hölder continuity C 0,α , ∀0 ≤ α < β. (for any α < β take
p > d/(β − α) and apply Kolmogorov’s theorem).
Consider the Karhunen-Loève
P∞ √ expansion
a(x, ω) = E[a](x) + i=1 λi bi (x)yi (ω). Then, yi are independent
N(0, 1) random variables. (Remember that uncorrelated gaussian
random variables are independent). The same holds for Fourier
expansion of stationary fields.
38 / 81
Examples of Covariance models
σ 2 : variance of the field; lc : correlation length.
I Exponential
kx−y k
Cova (kx − y k) = σ 2 e − lc
I
Smoothness of the Gaussian field: almost surely C 0,α , α <
regularity as a Brownian motion)
Squared exponential (gaussian)
Cova (kx − y k) = σ 2 e
I
−
1
2
(same
kx−y k2
2lc2
Smoothness of the Gaussian field: analytic almost surely
Matérn
√ kx − y k ν
√ kx − y k ν
1
Cova (kx−y k) = σ 2
2ν
K
2ν
ν
Γ(ν)2ν−1
lc
lc
where Γ is the gamma function, Kν is the modified Bessel function
of the second kind, ν, β ∈ (0, 1] is a smoothness parameter.
Let ν = s + β, s ∈ N, β ∈ (0, 1].
Smoothness of the Gaussian field: almost surely C s,α , for any α < β.
One recovers the exponential covariance for ν = 12 , and the squared
exponential covariance for ν → ∞.
39 / 81
Examples of Covariance models
realizations of a(x,omega)
realizations of a(x,omega)
6
4
3
4
2
1
a(x,omegai)
a(x,omegai)
2
0
−2
0
−1
−2
−3
−4
−4
−6
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
−5
1
exponential covariance
σ 2 = 2, Lc = 0.1
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Matérn covariance
σ 2 = 2, Lc = 0.1, ν = 2
realizations of a(x,omega)
4
0
10
3
−2
10
2
1
−2
−4
0
−6
10
−1
an
a(x,omegai)
10
−2
−8
10
−5
−3
−10
10
exponential
matern
squared exp.
−4
−5
−12
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
10
−14
10
squared exp. covariance
σ 2 = 2, Lc = 0.1
0
10
1
2
10
10
n
Decay of Fourier coeffs
40 / 81
Part II
Monte Carlo sampling methods
41 / 81
Outline
Monte Carlo method
42 / 81
Monte Carlo Sampling
Let y = (y1 , . . . , yN ) a random vector with density ρ(y) : Γ → R+ ,
u(y) : Γ → V a Hilbert-valued function, u ∈ L2ρ (Γ; V ), and Q : V → R a
continuous funtional on V (possibly non linear), s.t. E[|Q(u(y))|p ] < ∞
for p sufficiently large.
Goal: Compute E[Q(u(y))]
Classical Monte Carlo approach
Approximate expectations by sample averages: Let {y(ωm )}M
m=1 be iid y
samples and
M
1 X
E[Q(u(y))] ≈
Q(u(y(ωm ))
M m=1
Here for each y(ωm ) we have to solve for u(y(ωm )) through the PDE and
evaluate the q.o.i. Q(u(y(ωm )).
Pros and cons of Monte Carlo
Pros Reusability of deterministic legacy codes, convergence rate
M −0.5 resilient w.r.t. length of y and regularity of u(·).
Cons Slow convergence, does not exploit possibly available
regularity.
43 / 81
Monte Carlo error analysis
Assume now that the PDE had been discretized by some mean (finite
elements, finite volumes, spectral methods, ...), so that in practice we
compute discrete solutions uh (y(ωm )), m = 1, . . . , M.
Then our estimator is
E[Q(u(y))] ≈
M
1 X
Q(uh (y(ωm ))
M m=1
Error splitting:
E[Q(u(y))] −
M
1 X
Q(uh (y(ωm )))
M m=1
M
1 X
= E[Q(u(y)) − Q(uh (y))] + E[Q(uh (y))] −
Q(uh (y(ωm )))
|
{z
}
M m=1
|
{z
}
discretization error E Q (h)
statistical error EhQ (M)
44 / 81
Monte Carlo error analysis. Discretization Error
Let us consider a functional Q : V → R, with Q(0) = 0, that is globally
Lipschitz, i.e.
∃CQ > 0 s.t.
|Q(u) − Q(u 0 )| ≤ CQ ku − u 0 kV ,
∀u, u 0 ∈ V .
There exists α > 0 and Cu (y) > 0, with
RAssumption.
p
C
(y)
ρ(y)dy
< ∞ for some p > 1, such that
u
Γ
ku(y) − uh (y)kV ≤ Cu (y)hα ,
∀y ∈ Γ and 0 < h < h0 .
Then
Z
|E (h)| = |E[Q(u(y)) − Q(uh (y))]| = (Q(u(y)) − Q(uh (y)))ρ(y)dy
Γ
Z
≤ CQ
ku(y) − uh (y)kV ρ(y)dy
Γ
Z
≤ CQ hα
Cu (y)ρ(y)dy
Q
Γ
≤ CQ kCu kLpρ hα
45 / 81
Example – elliptic PDE with random coefficient
Consider the following elliptic problem with uniformly bounded random
coefficients
(
− div(a(x, y)∇u(x, y)) = f (x), x ∈ D,
,
∀y ∈ Γ ⊂ RN
u(x, y) = 0,
x ∈ ∂D
where D ⊂ Rd is a convex, Lipschitz domain and f ∈ L2 (D).
Random coefficient model:
PN √
I a(x, y) = ā +
λi yi bi (x)
√ √ i=1
√ √
I yi ∼ U(− 3, 3) i.i.d. (so Γ = [− 3, 3]N )
I
bi ∈ C ∞ (D)
PN √
i=1 3λi kbi k∞ ≤ δā, 0 < δ < 1.
=⇒
(1 − δ)ā ≤ a(x, y) ≤ (1 + δ)ā,
I
k∇a(·, y)kL∞ (D) ≤ Ca ,
∀y ∈ Γ
46 / 81
Denote
H01 (D) = {v ∈ H 1 (D) : kv − ϕn kH 1 (D) → 0, for some (ϕn ) ⊂ C0∞ (D)}
endowed with the norm kv kH01 (D) = k∇v kL2 (D) .
Under the previous assumptions, there exists a unique solution
u(y) ∈ H01 (D) ⊂ H 1 (D), such that
ku(y)kH01 (D) ≤
CP kf kL2 (D)
, for all y ∈ Γ
(1 − δ)ā
where CP is the Poincaré constant, i.e.
kv kL2 (D) ≤ CP kv kH01 (D) , for all v ∈ H01 (D).
Moreover,
∃Cu > 0 s.t.
ku(y)kH 2 (D) ≤ Cu ,
∀y ∈ Γ.
(uniform bound in y on the H 2 -norm of the solution)
47 / 81
We consider now a piecewise linear finite element approximation:
for any y ∈ Γ, find uh (y) ∈ Vh s.t.
Z
Z
a(·, y)∇uh (y) · ∇vh (y) =
fvh ,
∀vh ∈ Vh
D
D
where Vh ⊂ H01 (D) is the space of continuous piecewise linear functions
on a triangulation Th , vanishing on ∂D.
The discrete solution uh (y) satisfies the same bound as the continuous
kf kL2 (D)
√
one, kuh (y)kH 1 (D) ≤
, for all y ∈ Γ and
2
(1−δ)ā
1
1+CP
1
E[|Q(uh (y))|p ] p ≤ E[(CQ kuh (y)kH 1 (D) )p ] p ≤
CQ kf kL2 (D)
p
,
(1 − δ)ā 1 + CP2
∀p ≥ 1
Then
ku(y) − uh (y)kL2 (D) + h k∇u(y) − ∇uh (y)kL2 (D) ≤ Cu h2
(3)
Since the bound is uniform in y, all moments are bounded
1
1
E[ku(y)−uh (y)kpL2 (D) ] p +h E[k∇u(y)−∇uh (y)kpL2 (D) ] p ≤ Ch2 ,
∀p ≥ 1
48 / 81
Monte Carlo error analysis. Statistical Error
Let us analyze the error assuming that Q is a gobally Lipschitz
functional. Observe now that we have
EhQ (M) =E[Q(uh (y))] −
=
M
1 X
Q(uh (y(ωm ))
M m=1
M
o
1 Xn
E[Q(uh (y))] − Q(uh (y(ωm ))
M m=1
and E[EhQ (M)] = 0. (here expectation is with respect to the random
sample)
Then
Var [EhQ (M)] =
1
Var [Q(uh (y))]
M
and we can estimate
Var[Q(uh (y))] ≤ E[Q(uh (y))2 ]
≤ CQ2 kuh k2L2ρ (Γ;V )
≤ 2CQ2 (kuk2L2ρ (Γ;V ) + ku − uh k2L2ρ (Γ;V ) )
≤ 2CQ2 (kuk2L2ρ (Γ;V ) + h2α kCu k2L2ρ (Γ) ) < ∞.
49 / 81
Monte Carlo error analysis. Statistical Error
Remarks:
I
We concluded that for Lipschitz functionals Q we have
Var [EhQ (M)] ≤
C
→ 0 as M → ∞
M
with constant C uniformly bounded w.r.t. h.
I
Since Var[Q(uh (y))] ≤ C we can apply the law of large numbers,
law of iterated logarithms and the Central Limit Theorem.
I
In particular, the previous result implies, via the Chebyshev
inequality, convergence in probability, that is, for any given > 0
Var[E Q (M)]
C
h
≤
→ 0 as M → ∞
P EhQ (M) > ≤
2
M 2
50 / 81
Theorem (The Weak Law of Large Numbers)
Assume Yj , j = 1, 2, 3, . . . are independent, identically distributed (i.i.d)
random variables and E [Yj ] = µ < ∞. Then
M
X
Yj
j=1
M
P
−
→ µ,
as M → ∞,
(4)
P
where −
→
denotes convergence in probability,
i.e. the convergence (4)
PM
means P j=1 Yj /M − E [Y ] > → 0 for all > 0.
51 / 81
Alternative convergence notions:
P
I
Convergence in probabililty, Xn −
→ Y , means that
P (|Xn − Y | > ) → 0 as n → ∞ for all > 0.
I
Almost sure convergence, Xn −−→ Y , means that
P ({Xn → Y , as n → ∞}) = 1.
I
Convergence in distribution, Xn * Y
E[g (Xn )] → E[g (Y )], as n → ∞ for all bounded and continuous
functions g .
a.s.
52 / 81
Theorem (The Strong Law of Large Numbers)
Assume Yj , j = 1, 2, 3, . . . are i.i.d random variables with E [|Yj |] < ∞
and E [Yj ] = µ. Then
M
X
Yj
j=1
M
a.s.
−−→ µ,
as M → ∞,
(5)
a.s.
where −−→
almost sure
o convergence; i.e. the convergence (5)
ndenotes
PM
= 1.
means P
j=1 Yj /M → µ
What about the rate of a.s. convergence?
C
Exercise: Under our particular assumptions, we have Var [EhQ (M)] ≤ M
.
n
Consider the sequence Mn = M0 2 , n = 0, . . . and, for a fixed value of
Q
α > 0, the scaled sequence Ẽn (α) = EM
Mnα . Apply Borel-Cantelli’s
n
a.s.
Lemma to find the values of α s.t. an (α) −−→ 0. Interpret the result.
53 / 81
Rate of a.s. convergence
We can be even more precise than the previous exercise. Consider now
Y −µ
the scaled deviations ξj = σj Y .
Theorem (Law of the iterated logarithm)
Assume ξj j = 1, 2, 3, . . . are independent, identically distributed (i.i.d)
and E [ξj ] = 0, E [ξj2 ] = 1. Then
PM
ξm |
|
lim sup √ m=1
= 1, a.s.
2M log log M
M→∞
The original statement (1924) of the law of the iterated logarithm is due
to A. Y. Khinchin. Another statement was given by A.N. Kolmogorov
(1929). See (Theorem 3.52 in Breiman).
54 / 81
Example
R
Compute the integral I = [0,1]d f (x)dx by the Monte Carlo method,
where we assume f (x) : [0, 1]d → R.
Let Y = f (X ), where X is uniformly distributed in [0, 1]d . We have
Z
I =
f (x) dx
[0,1]d
Z
=
f (x)p(x) dx ( where p is the uniform pdf)
[0,1]d
= E [f (x)] ( where x is uniformly distributed in [0, 1]d )
'
M
X
f (x(ωj ))
j=1
M
≡ IM ,
The values {x(ωj )} are sampled uniformly in the cube [0, 1]d , by sampling
the components xi (ωj ) independently and uniformly on the interval [0, 1].
55 / 81
Monte Carlo: Numerical example
Consider the computation of the integral
Z
1=
[0,1]N
PN
e n=1 xn
dx1 . . . dxN
(e − 1)N
M = 1e6;
% Max. number of realizations
N = 20;
% Dimension of the problem
u = rand(M,N);
f = exp(sum(u’)’);
run_aver = cumsum(f)./(((1:M)’)*(exp(1)-1)^N);
plot(1:M, run_aver),
figure, plot(1:M, run_aver), xlabel ’M’
figure,plot(1:M,(run_aver-1)), xlabel ’M’
figure,loglog(1:M,abs(run_aver-1)), xlabel ’M’,
56 / 81
57 / 81
Weak Convergence and the Central Limit Theorem
Denote Y = Q(u(y)) and σY2 = Var [Y ]. Consider the scaled random
variable
!
√
M
M
1 X
ZM ≡
Y (ωm ) − E [Y ]
σY
M m=1
with cumulative distribution function
FZM (x) ≡ P(ZM ≤ x),
∀ x ∈ R.
58 / 81
The Central Limit Theorem is the fundamental result to understand the
statistical error in Monte Carlo.
Theorem (The Central Limit Theorem)
Assume ξj , j = 1, 2, 3, . . . are independent, identically distributed (i.i.d)
and E [ξj ] = 0, E [ξj2 ] = 1. Then
M
X
ξ
√ j * ν,
M
j=1
(6)
where ν is N(0, 1) and * denotes convergence of the distributions, also
called weak convergence, i.e. the convergence (6) means that the
following limit holds as M → ∞:
 

M
X
1
E g  √
ξj  → E[g (ν)]
M j=1
for all bounded and continuous functions g .
59 / 81
Histogram based on 1000 Monte Carlo outcomes
each based on 10 independen Exp(1/2) samples
90
80
70
60
50
40
30
20
10
0
−3
−2
−1
0
1
2
3
rescaled MC error : M1/2/σ (mean(Xm) − µ)
4
5
60 / 81
Histogram based on 1000 Monte Carlo outcomes
each based on 100 independent Exp(1/2) samples
90
80
70
60
50
40
30
20
10
0
−3
−2
−1
0
1
2
rescaled MC error : M1/2/σ (mean(Xm) − µ)
3
4
61 / 81
CDF based on 1000 Monte Carlo outcomes
each based on 10 independent Exp(1/2) samples
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
rescaled experimental CDF
CDF of N(0,1) variabel
0.1
0
−4
−3
−2
−1
0
1
2
3
4
5
6
x
62 / 81
CDF based on 1000 Monte Carlo outcomes
each based on 100 independent Exp(1/2) samples
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
rescaled experimental CDF
0.1
0
−4
CDF of N(0,1) variabel
−3
−2
−1
0
1
2
3
4
5
x
63 / 81
What is the error of IM − I in Example?
Let the error M be defined by
M
=
M
X
f (xj )
j=1
=
M
Z
−
f (x)dx
[0,1]d
M
X
f (xj ) − E [f (x)]
j=1
M
.
√
By the Central Limit Theorem, MM * σν, where ν is N(0, 1).
Thus, we approximate
√
!
M M
P ≤ C ≈ P (|ν| ≤ co ) = 2Φ(co ) − 1
σ with Φ(x) =
2
exp − x2
√
dx
−∞
2π
Rx
64 / 81
The exact variance,
σ
2
Z
!2
Z
2
f (x)dx −
=
f (x)dx
[0,1]d
[0,1]d
Z
f (x) −
=
!2
Z
[0,1]d
f (x)dx
dx.
[0,1]d
is in practice approximated by the sample variance
M
M
X
1 X
f (xm )
σ̂ 2 =
f (xj ) −
M −1
M
m=1
!2
.
j=1
65 / 81
√
Approximate error bound: Cα σ̂/ M. Here Cα = 3.
66 / 81
Theorem (Berry–Esseen)
Assume
λ≡
3 1/3
E Y − E [Y ]
< +∞,
σY
then we have a uniform estimate in the central limit theorem
|FZM (x) − Φ(x)| ≤
CBE λ3
√
(1 + |x|)3 M
Here Φ is the distribution function of N(0, 1),
Z x
s2 1
Φ(x) = √
exp −
ds.
2
2π −∞
(7)
and CBE = 30.51175...
67 / 81
By the Berry–Esseen thm., the statistical error
ES (Y ; M) ≡ E [Y ] −
M
1 X
Y (ωm )
M m=1
satisfies, ∀ c0 > 0,
CBE λ3
σY
√
√ .
≥ 2Φ(c0 ) − 1 − 2
P |ES (Y ; M)| ≤ c0
M
(1 + c0 )3 M
Compare this last result with the direct application of the CLT, which
neglects the last term in the bound.
Further improvement: Edgeworth Expansion
−x 2 /2
√
λ3
2 e
√
√
FZM (x) − Φ(x) =
(1 − x )
+ o(1/ M)
2π
6 M
(Compare with the Berry-Esseen bound: |FZM (x) − Φ(x)| ≤
CBE λ3
√ )
(1+|x|)3 M
68 / 81
CLT approximation of cdf
Consider a Binomial r.v. with parameter p = 1/2,
X =
M
X
Yi
i=1
and Yi iid Bernoulli r.vars., σ 2 = p(1 − p). Let
Z=
(X − Mp)
√
,
σ M
then we compare its cdf (computed exactly) vs. the CLT approximation,
Φ(z). We do it for several values of M . . .
69 / 81
0
Bin(M,1/2), normalized cdf error, M = 10, 100, 1000
10
BET bound
!1
10
!2
Z
M1/2 * ("(z)!F (z))
10
!3
10
!4
10
!5
10
!6
10
!7
10
!8
10
!4
!3
!2
!1
0
1
2
3
4
z
Numerical results from sampling with Bernoulli random variables
70 / 81
Observe that on then previous figure:
√ log
M FZM (z) − Φ(z) ≈ c1 − c2 z 2
or simply
−c2 z 2
FZ (z) − Φ(z) ≈ Ce√
.
M
M
Compare the above with the B.-E. Theorem (assuming only bounded 3:rd
moment)
CBE λ3
√ .
|FZM (x) − Φ(x)| ≤
(1 + |x|)3 M
Note that the binomial distribution has finite exponential moments, i.e.
E [exp (ΘYi )] < ∞, ∀ Θ.
71 / 81
Example – elliptic PDE with random coeff.
Consider again the model problem
(
√ √
− div(a(x, y)∇u(x, y)) = f (x), x ∈ D,
,
∀y ∈ Γ := [− 3, 3]N
u(x, y) = 0,
x ∈ ∂D
√ √
PN √
with a(x, y) = ā + i=1 λi yi bi (x), yi ∼ U(− 3, 3) i.i.d.,
PN √
i=1 3λi kbi k∞ ≤ δā for some 0 < δ < 1, and its approximation uh (y)
by piecewise linear finite elements.
We have seen that kuh (y)kH 1 (D) ≤ C , =⇒ Q(uh (y)) ≤ CQ C for any
globally Lipschitz functional Q : H01 (D) → R with Q(0) = 0 and ∀y ∈ Γ.
We have therefore a control on the third moment of Q(uh (y)) and we
can apply the Berry-Esseen estimate to conclude that the statistical error
M
1 X
Q(uh (y(ωm )))
M m=1
CBE λ3
std[Q(uh )]
Q
√
√ .
satisfies P |Eh (M)| ≤ c0
≥ 2Φ(c0 ) − 1 − 2
M
(1 + c0 )3 M
with λ3 = E[|Q(uh ) − E(Q(uh ))|3 ]/ std[Q(uh )]3 .
EhQ (M) = E[Q(uh )] −
72 / 81
Monte Carlo complexity analysis
Recall the error
splitting: E[Q(u(y))] −
M
1 X
Q(uh (y(ωm ))) = E Q (h) + EhQ (M) with
M m=1
|EhQ | = |E [Q(u(y)) − Q(uh (y))]| ≤ C hα
|
{z
}
discretization error
M
Q
|EM
| = |E [Q(uh (y))] −
|
1 X
Q(uh (y(ωm )))| .
M m=1
{z
}
c0
std[Q(uh )]
√
M
statistical error
The last approximation is motivated by the CLT, i.e.
√
P( M|EhQ (M)| ≤ c0 std[Q(uh )]) → 2Φ(c0 ) − 1 as M → ∞.
Let us assume now that the computational work to solve for each
u(y(ωm )) is O(h−dβ ). We have therefore the following estimates
Total work :
W ∝ Mh−dβ
Total error :
C2
|E Q (h)| + |EhQ (M)| ≤ C1 hα + √
M
73 / 81
We want now to choose optimally h and M. Here we minimize the
computational work subject to an accuracy constraint, i.e. we solve
(
minh,M M h−dβ
2
s.t. C1 hα + √CM
≤ TOL
The Lagrangian of the above problem is
C2
L(M, h, λ) = M h−dβ + λ(C1 hα + √ − TOL).
M
2αC1 α
Enforcing ∂M L(M, h, λ) = ∂h L(M, h, λ) = 0 yields √1M = dβC
h and
2
1
α
using the accuracy constraint we have h = TOL C1 (1+2α/(dβ)) .
We can interpret the above as a tolerance splitting into statistical and
space discretization tolerances, TOL = TOLS + TOLh , such that
TOL
1
and TOLS = TOL 1 −
.
TOLh =
(1 + 2α/(dβ))
(1 + 2α/(dβ))
The resulting complexity (error versus computational work) is then
W ∝ TOL−(2+dβ/α)
74 / 81
Monte Carlo, further reading
I
I
“A First Course in Monte Carlo”, by G. S. Fishman
“Monte Carlo Statistical Methods”, by C. P. Robert & G. Casella
75 / 81
Adaptive Monte Carlo
For a given 0 < δ, TOLS , the goal is to find M such that we have
!
1 X
P E [Y ] −
Y (ωm ) ≤ TOLS ≥ 1 − δ.
M
m=1
The following algorithm attempts to adaptively findPthe number of
realizations M(TOLS , δ), using sample average M1
m=1 Y (ωm ) as an
approximation to E [Y ].
PM
2
Approximation: Let Z ∼ N(0, 1) and (S)2 ≡ M1
m=1 (Y (ωm ) − EY ) .
!
√ !
1 X
TOLS M
Y (ωm ) ≤ TOLS = P |ZM | ≤
P E [Y ] −
M m=1
σY
√ !
TOLS M
+ ECLT
= P |Z | ≤
σY
!
√
TOLS M = P |Z | ≤
S + ECLT + ESTD
S
76 / 81
Sample Variance Based Stopping Rule
routine Monte-Carlo(TOLS , Y , M0 ; EY )
Set the batch counter k = 1, M[1] = M0 and ES [1] = 2 TOLS .
Do while (ES [k] > TOLS )
Compute M[k] new samples of Y ,
PM[k]
1
the sample average EY ≡ M[k]
m=1 Y (ωmk ),
the sample variance
(S[k])2 ≡
M[k]
X
1
(Y (ωmk ) − EY )2
(M[k] − 1) m=1
co S[k]
and the deviation ES [k + 1] ≡ √
.
M[k]
Compute M[k + 1] = M[k] ∗ 2
Increase k by 1.
end-do
end of Monte-Carlo
77 / 81
Adaptive Monte Carlo
Question: How relevant are the errors ECLT and ESTD in the behavior of
the previous algorithm?
Observe that ESTD is present even if the sampled r.v. is Gaussian.
Indeed, a second moment based sequential stopping rules such as the
previous Algorithm run the risk of using too few samples in Monte Carlo
estimates, especially when sampling heavy-tailed r.v. in settings with very
stringent confidence requirements, i.e., δ TOLS .
For a discussion on stopping rules based on higher order moment
information, see
I
F. J. Hickernell, L. Jiang, Y. Liu, and A. Owen. Guaranteed
Conservative Fixed Width Confidence Intervals Via Monte Carlo
Sampling. ArXiv e-prints, August 2012.
I
C. Bayer, H. Hoel, E. von Schwerin and R. Tempone. On
non-asymptotic optimal stopping criteria in Monte Carlo simulations.
Submitted, February 2013.
78 / 81
Acknowledgements
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Ivo Babuska
Joakim Beck
Abdul Lateef Haji-Ali
Quan Long
Giovanni Migliorati
Mohammad Motamed
Marco Scavino
Erik von Schwerin
Lorenzo Tamellini
Suojin Wang
Clayton Webster
Georgios Zouraris
Italian project FIRB-IDEAS (’09) Advanced Numerical Techniques for
Uncertainty Quantification in Engineering and Life Science Problems
Academic Excellency Alliance KAUST–UT Austin project “Predictability
and uncertainty quantification for models of porous media”
KAUST Strategic Research Initiative (SRI) Center for Uncertainty
Quantification in Computational Science and Engineering.
79 / 81
References
I. Babuška, F. Nobile and R. Tempone.
A stochastic collocation method for elliptic PDEs with random input data, SIAM Review,
52(2):317–355, 2010
J. Bäck, F. Nobile, L. Tamellini and R. Tempone
Stochastic spectral Galerkin and collocation methods for PDEs with random coefficients: a
numerical comparison, LNCSE Springer, 76:43-62, 2011
M. Motamed, F. Nobile, R. Tempone,
Analysis and Computation of the elastic wave equation with random coefficients,
EPFL-MATHICSE Technical Report Nr. 32.2012, August 2012, Submitted
M. Motamed, F. Nobile, R. Tempone,
A stochastic collocation method for the second order wave equation with a discontinuous
random speed, Numerische Mathematik, 123(3):493–536, 2012
G. Migliorati, F. Nobile, E. von Schwerin and R. Tempone,
Approximation of quantities of inter- est in stochastic PDEs by the random discrete L2
projection on polynomial spaces, SISC, 35(3):A1440-A1460, 2013
G. Migliorati and F. Nobile and E. von Schwerin and R. Tempone
Analysis of the discrete L2 projection on polynomial spaces with random evaluations,
MOX-Report 46/2011. Submitted.
J. Bäck and F. Nobile and L. Tamellini and R. Tempone
On the optimal polynomial approximation of stochastic PDEs by Galerkin and Collocation
methods, Math. Mod. Methods Appli. Sci. (M3AS), 22(9), 2012.
Q. Long, M. Scavino, R. Tempone and S. Wang
Fast estimation of expected information gains for Bayesian experimental designs based on
Laplace approximations, CMAME 259 24–39, 2013.
80 / 81
References
F. Nobile and R. Tempone
Analysis and implementation issues for the numerical approximation of parabolic equations
with random coefficients, IJNME, 80:979–1006, 2009
F. Nobile, R. Tempone and C. Webster
An anisotropic sparse grid stochastic collocation method for PDEs with random input data,
SIAM J. Numer. Anal., 46(5):2411–2442, 2008
L. Tamellini,
Polynomial approximation of PDEs with stochastic coefficients, PhD Thesis, Politecnico di
Milano. March 2012.
81 / 81