Monte Carlo Integration

Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Lecture 9 - Monte Carlo Integration
Björn Andersson (w/ Xijia Liu)
Department of Statistics, Uppsala University
February 27, 2015
1 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Table of Contents
1 Numerical integration
Basic methods for numerical integration
2 Monte Carlo integration
3 Variance reduction in Monte Carlo integration
2 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Basic methods for numerical integration
Quadrature rules
Midpoint rule:
b
Z
f (x)dx ≈ (b − a)f
a
a+b
2
Trapezodial rule:
Z
b
f (x)dx ≈ (b − a)
a
f (a) + f (b)
2
Composite trapezodial rule:
"
#
Z b
n−1 X
b−a
b−a
f (x)dx ≈
f (a) + 2
f a+i
+ f (b)
2n
n
a
i=1
3 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Basic methods for numerical integration
Quadrature rules
Consider estimating
R3
1
2
√1 e −x /2 dx.
2π
R> pnorm(3) - pnorm(1)
[1] 0.1573054
R> (3 - 1) * dnorm((4/2))
[1] 0.1079819
R> (3 - 1) * (dnorm(1) + dnorm(3)) / 2
[1] 0.2464026
R> ((3 - 1) / (2*20)) * (dnorm(1) + 2 * (sum(dnorm(1 +
+ (1:19) * (3 - 1) / 20))) + dnorm(3))
[1] 0.157496
4 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Basic methods for numerical integration
Simpson’s rule
Rb
We want to calculate a f (x)dx. Let ti = a + ih, h =
divide [a, b] into 2n intervals. Simpson’s rule:
Z
a
b
h
f (x)dx ≈
3
f (a) + 4
n
X
f (t2i−1 ) + 2
i=1
n−1
X
b−a
2n
and
!
f (t2i ) + f (b) .
i=1
Exercise: implement Simpson’s rule in R.
5 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Table of Contents
1 Numerical integration
2 Monte Carlo integration
Simple Monte Carlo estimation
General Monte Carlo estimation
Hit or Miss estimation
Variance and efficiency
3 Variance reduction in Monte Carlo integration
6 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Monte Carlo methods
The textbook contains only a brief introduction to the Monte Carlo
integration methods. This lecture will introduce the topic in a
somewhat more formal way.
For a thorough treatment of the topic:
Simulation and the Monte Carlo Method, Rubinstein and
Kroese (2008)
Monte Carlo Statistical Methods, Robert and Casella (2004)
7 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Simple Monte Carlo estimation
Simple Monte Carlo
R1
The object is to estimate the integral θ = 0 g (x)dx.
We can re-write θ as
Z 1
θ=
g (x)dx = E (g (X )),
0
where X ∼ U(0, 1).
If X1 , . . . , Xm is a random sample from U(0, 1), then we
retrieve the simple Monte Carlo (SMC) estimator
m
θ̂SMC
1 X
g (Xi ).
= gm (X ) =
m
i=1
θ̂SMC converges in probability to E (g (X )) = θ by the strong
law of large numbers.
8 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Simple Monte Carlo estimation
Simple Monte Carlo example
We want to compute a Monte Carlo estimate of θ =
compare it to the exact value.
R1
0
e −x dx and
R> x <- runif(10000)
R> mean(exp(-x))
[1] 0.6342469
R> 1 - exp(-1)
[1] 0.6321206
Hence, the estimate θ̂ is very close to the exact value.
9 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
General Monte Carlo estimation
General Monte Carlo estimation
Here, the object is to estimate θ =
Rb
a
g (x)dx.
We re-write θ as
Z
θ = (b − a)
b
g (x)
a
1
dx = (b − a)E [g (X )],
b−a
where X ∼ U(a, b).
The crude Monte Carlo (CMC) estimate is then:
m
θ̂CMC
b−aX
=
g (Xi ),
m
i=1
where X1 , . . . , Xm are i.i.d. from U(a, b).
10 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
General Monte Carlo estimation
General Monte Carlo estimation: R
We want to estimate
R3
1
2
x
√1 e − 2
2π
dx.
R> x <- runif(10000, 1, 3)
R> ((3 - 1) / 10000) * sum(dnorm(x))
[1] 0.1573236
R> pnorm(3) - pnorm(1)
[1] 0.1573054
11 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
General Monte Carlo estimation
General Monte Carlo estimation
R
More generally, suppose we want to estimate θ = D g (x)dx,
where g (x) can be decomposed as h(x)f (x), where f (x) is a
probability density on D.
Then we have
Z
θ=
h(x)f (x)dx = Ef [h(X )],
D
where X is a R.V. with density f on the set D.
The general crude Monte Carlo estimate is then
m
θ̂GCMC =
1 X
h(Xi ),
m
i=1
where X1 , . . . Xm are i.i.d. R.V.s from density f on D.
12 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
General Monte Carlo estimation
General Monte Carlo estimation example
Consider estimating θ =
then
R∞
−∞ x
4 √1 e −(x−2)2 /2 .
2π
Our estimator is
m
θ̂ =
1 X 4
xj ,
m
j=1
where the xj are generated from f .
R> x4 <- rnorm(100000, mean = 2)
R> sum(x4^4) / 100000
[1] 42.80055
For X ∼ N(µ, σ 2 ), E (X 4 ) = µ4 + 6µ2 σ 2 + 3σ 2 .
So for µ = 2, σ 2 = 1, E (X 4 ) = 43.
13 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Hit or Miss estimation
Hit or Miss estimation
Rb
We consider estimating θ = a g (x)dx. We assume that
0 ≤ g (x) ≤ c, and re-write θ as
Z
b
Z
g (x)
Z
b
Z
1dydx = c(b − a)
θ=
a
0
a
0
c
1{y ≤ g (x)}
dydx.
c(b − a)
Hence, θ = c(b − a)E (1{Y ≤ g (X )}), where Y ∼ U(0, c)
and X ∼ U(a, b).
14 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Hit or Miss estimation
Hit or Miss estimation
Hence, by independently generating X1 , . . . , Xm from U(a, b) and
Y1 , . . . , Ym from U(0, c) we get the following hit or miss (HM)
estimator of θ
m
θ̂HM
c(b − a) X
=
1{Yi ≤ g (Xi )}.
m
i=1
15 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Hit or Miss estimation
Hit or Miss estimation
Consider again to estimate
R3
1
2
x
√1 e − 2
2π √
dx, this time with Hit or
Miss. We have that c = g (0) √
= 1/ 2π. Hence we generate
xi ∼ U(1, 3) and yi ∼ U(0, 1/ 2π) and compare yi with g (xi ):
R> x <- runif(10000, 1, 3)
R> y <- runif(10000, 0, 1 / sqrt(2 * pi))
R> ((2 / sqrt(2 * pi)) / 10000) * sum(y < dnorm(x))
[1] 0.1588588
R> pnorm(3) - pnorm(1)
[1] 0.1573054
16 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Variance and efficiency
Variances of the Monte Carlo estimators
For θ̂CMC =
b−a
m
Pm
i=1 g (Xi ),
we have
"
#
m
b−aX
(b − a)2
Var(θ̂CMC ) = Var
g (Xi ) =
Var[g (X )],
m
m
i=1
where X ∼ U(a, b).
P
For θ̂GCMC = m1 m
i=1 h(Xi ), we have
"
#
m
1 X
1
Var(θ̂GCMC ) = Var
h(Xi ) = Var[h(X )],
m
m
i=1
where X is generated from f on set D.
17 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Variance and efficiency
Estimating the variance of the Monte Carlo estimators
We may estimate the variance of g (X ) from the sample
variance, i.e. by
i2
h
m
g
(x
)
−
g
(x)
X
i
ˆ (X )] =
Var[g
.
m−1
i=1
The standard error estimate of θ̂CMC is then

h
i2 1/2


m


g
(x
)
−
g
(x)
X
i
b−a
se(
ˆ θ̂CMC ) = √
.

m−1
m 
 i=1

An alternative is to repeat the estimation of θ m times to
retrieve the sampling distribution of the estimator. Then the
sample variance of the repetitions can be used to estimate the
variance of the Monte Carlo estimator.
18 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Variance and efficiency
Estimating the variance of the Monte Carlo estimators
Consider
R>
R>
+
+
+
R>
2
x
√1 e − 2
1
2π
R3
dx.
xMC <- numeric(100)
for(i in 1:100){
x <- runif(1000, 1, 3)
xMC[i] <- ((3 - 1) / 1000) * sum(dnorm(x))
}
var(xMC)
[1] 1.492597e-05
R> 4 * var(dnorm(x)) / 1000
[1] 2.036098e-05
19 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Variance and efficiency
Efficiency
For two estimators θ̂1 and θ̂2 of θ, θ̂1 is more efficient than θ̂2
if
Var(θ̂1 )
< 1.
Var(θ̂2 )
If the variances of the estimators are unknown, they can be
estimated from the Monte Carlo sample.
The variance of the Monte Carlo estimators can be reduced by
increasing the number of replications, hence the speed of the
estimators must also be considered.
20 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Variance and efficiency
Crude Monte Carlo vs. Hit or Miss Monte Carlo
For a given m, the variance of θ̂HM is always larger than θ̂GCMC .
The Hit or Miss method is worse because an extra R.V. Y is
introduced, which leads to extra uncertainty.
R>
R>
+
+
+
+
R>
xHM <- numeric(100)
for(i in 1:100){
x <- runif(1000, 1, 3)
y <- runif(1000, 0, 1 / sqrt(2 * pi))
xHM[i] <- ((2 / sqrt(2 * pi)) / 1000) * sum(y < dnorm(x))
}
var(xHM)
[1] 0.0001227415
R> var(xMC)
[1] 1.492597e-05
21 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Table of Contents
1 Numerical integration
2 Monte Carlo integration
3 Variance reduction in Monte Carlo integration
Common and antithetic variables
Importance sampling
22 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Introduction to variance reduction
Increasing the number of replications m reduces the variance
of the Monte Carlo estimator
A large increase in m is required to get a small reduction to
the standard error
Generally, if the standard error is to be at most e then
m ≥ bσ 2 /e 2 c, where σ 2 = Var[g (X )]
Reducing the standard error thus has a high computational
cost
Considering this, it is useful to attempt other variance
reduction methods
23 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Common and antithetic variables
Common and antithetic random variables
Let X and Y be R.V.s with known cdfs F and G . We want to
estimate θ = E (X − Y ) by Monte Carlo methods.
We draw X and Y using the inverse transform method:
X = F −1 (U1 ), Y = G −1 (U2 ),
where U1 and U2 are generated from U(0, 1)
Note that:
Var(X − Y ) = Var(X ) + Var(Y ) − 2Cov(X , Y ).
Hence, the variance of X − Y will be reduced if Cov(X , Y ) is
positive.
24 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Common and antithetic variables
Common and antithetic random variables
We say that common R.V.s are used if U1 = U2 and that
antithetic R.V.s are used if U2 = 1 − U1 .
Since F −1 and G −1 are non-decreasing functions, using
common R.V.s implies
Cov[F −1 (V ), G −1 (V )] ≥ 0,
for V ∼ U(0, 1). Thus we achieve a variance reduction.
Using the same reasoning, Var(X + Y ) is reduced when using
antithetic R.V.s.
25 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Common and antithetic variables
Variance reduction strategy
We want to estimate θ = E [H(X )] for a monotone function H,
where X can be generated using the inverse transform method, i.e.
X = FX−1 (V ), where V ∼ U(0, 1).
Using antithetic R.V.s, m/2 replicates
Yj = H[FX−1 (Vj )]
are generated along with m/2 replicates
Yj0 = H[FX−1 (1 − Vj )],
where Vj ∼ U(0, 1), j ∈ {1, . . . , m/2}.
The antithetic estimator is then
m/2
θ̂AT
1 X
=
(Yj + Yj0 ).
m
j=1
26 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Common and antithetic variables
Variance reduction strategy example
Consider a R.V. X . Want to estimate E [h(X )] with antithetic
variables.
R>
+
+
+
+
+
+
+
+
R>
+
+
+
ATest <- function(Finv, h, n){
if(n %% 2 != 0) return("n must be a positive even
integer.")
m <- n / 2
u <- runif(m)
y1 <- h(Finv(1 - u))
y2 <- h(Finv(u))
sum(y1 + y2) / n
}
regest <- function(Finv, h, n){
u <- runif(n)
sum(h(Finv(u))) / n
}
27 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Common and antithetic variables
Variance reduction strategy example
Example with h(x) = x 2 (second moment) and X ∼ U(0, 1):
R>
R>
+
+
R>
R>
+
+
R>
uniAT <- numeric(100)
for(i in 1:100){
uniAT[i] <- ATest(function(x) x, function(x) x^2, 10000)
}
unireg <- numeric(100)
for(i in 1:100){
unireg[i] <- regest(function(x) x, function(x) x^2, 10000)
}
var(uniAT)
[1] 7.796065e-07
R> var(unireg)
[1] 7.996479e-06
28 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Importance sampling
Importance sampling: idea
R
We want to estimate θ = h(x)f (x)dx
P
For θ̂GCMC = m1 m
j=1 h(Xj ), we see that its variance depends
on the variance of h(X ).
If we instead of simulating from f choose another distribution
g to simulate from, we retrieve
Z
f (x)
θ = h(x)
g (x)dx
g (x)
g is called the importance sampling function
29 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Importance sampling
Importance sampling
Let X be a R.V. with pdf g , such that g (x) > 0 on the set
)
A = {x : f (x) > 0}. Let Y be the R.V. such that Y = gf (X
(X ) h(X ).
Then
Z
Z
f (X )
f (x)
g (x)dx = Eg
h(X ) = Eg (Y ).
h(x)f (x)dx =
h(x)
g (x)
g (X )
x∈A
x∈A
The importance sampling estimator is
m
θ̂IS
1 X f (Xi )
=
h(Xj ),
m
g (Xi )
i=1
where X1 , . . . , Xm are generated from g . The variance of the
importance sampling estimator depends on the variance of Y .
Hence, for the variance to be reduced by this method, the density
g should be close to f .
30 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Importance sampling
Importance sampling
Note that while θ̂IS converges almost surely to θ, the variance is
only finite when
"
2 #
f
(X
)
2 f (X )
2
= Ef [h(X )]
h(X )
Eg (Y ) = Eg
g (X )
g (X )
Z
[f (x)]2
=
[h(x)]2
dx < ∞.
g (x)
x∈A
So the choice of g should be such that g does not have lighter
tails than f . The ratio f /g should be bounded.
31 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Importance sampling
Importance sampling
P
f (Xi )
An alternative to θ̂IS = m1 m
i=1 g (Xi ) h(Xj ) which avoids the finite
variance requirement is to instead use
Pm
j=1 h(Xj )f (Xj )/g (Xj )
Pm
.
θ̂wIS =
j=1 f (Xj )/g (Xj )
This estimator has a small bias but has lower variance under some
circumstances, such as when h is nearly constant.
32 / 33
Numerical integration
Monte Carlo integration
Variance reduction in Monte Carlo integration
Importance sampling
Importance sampling: example
R∞
Consider θ = 4 f (x)dx, where f is the standard normal density
function. We could estimate θ with the empirical average
θ̂MC1 =
m
1 X
1(xj > 4),
m
j=1
where the xj are generated from ∼ N(0, 1). However, very few of
the generated numbers will be larger than 4. Instead we could use
importance sampling with importance sampling function
g (x) = e −(x−4) , x ∈ [4, ∞), the density for a shifted ∼ Exp(1)
random variable. (Note that h(x) = 1.) Our estimator is then
−xj2 /2
θ̂IS
2
e
m
m
m
1 X f (xj )
1 X √2π
1 X e −xj /2+xj −4
√
=
=
=
m
g (xj )
m
m
e −(xj −4)
2π
j=1
j=1
j=1
where the xj are generated from g . Exercise: compare θ̂IS and
θ̂MC1 .
33 / 33