Review Course: Markov Chains and Monte Carlo Methods
Assignment Sheet: Transformation Methods, Rejection Sampling,
and Importance Sampling — Model Answers
The Python code in these model answers requires the following libraries:
1
2
3
4
5
from scipy import
from scipy import
from scipy import
from scipy import
import pylab
*
optimize
random
stats
1. (a) Steps 1 to 3 are a rejection sampling algorithm sampling from a uniform distribution the the “inside” of the unit circle,
i.e. {(u1 , u2 ) ∈ R2 : u21 + u22 < 1}. Thus after step 3, the density of (U1 , U2 ) is
fU1 ,U2 (u1 , u2 ) =
Let (ϑ, R) be the polar coordinates of (U1 , U2 ), then
√
U1 = cos(ϑ) · R2 ,
1
·1 2 2 .
π u1 +u2 <1
U2 = sin(ϑ) ·
√
R2 .
We will now show that ϑ ∼ U(0, 2π) and R2 ∼ U(0, 1): For ϑ ∈ (0, 2π) and r2 ∈ (0, 1) we have that
∂u1 ∂u1 √
√
∂θ
∂r2 fϑ,R2 (ϑ, r2 ) = fU1 ,U2 (cos(ϑ) · r2 , sin(ϑ) · r2 ) ·
∂u
∂u2 2
|
{z
}
1
2 =π
∂θ
|
{z ∂r }
˛
˛
cos(ϑ) ˛
˛
˛ −r sin(ϑ)
˛ 1
2r
˛
=˛˛
sin(ϑ) ˛˛= 2
˛ r cos(ϑ)
2r
=
1
,
2π
thus ϑ ∼ U(0, 2π) and R2 ∼ U(0, 1), and ϑ and R2 are independent.
As we can rewrite
X1 = U1
r
X2 = U2
r
−2 log(R2 )
=
R2
=
−2 log(R2 )
=
R2
=
U1
R
|{z}
R cos(ϑ)
=cos(ϑ)
R
U2
R
|{z}
R sin ϑ)
=sin(ϑ)
R
p
p
· −2 log(R2 ) = cos(ϑ) · −2 log(R2 )
p
p
· −2 log(R2 ) = sin(ϑ) · −2 log(R2 ),
the Polar Marsaglia method is equivalent to the Box-Muller method and thus also generates two independent samples
from the N(0, 1) distribution.
(b)6 # Question 1b
7
8
9
10
11
12
13
14
15
16
# Function draws a pair of N(0,1) random numbers using the Polar Marsaglia method
def polarMarsaglia():
while True:
# Keep sampling until inside unit circle
u = -1 + 2*random.random_sample(2)
# Draw U_1,U_2 ˜ U[-1,1]
r2 = sum(u**2)
if r2<=1:
# If radius is less than one, we are done
break
factor = sqrt(-2*log(r2)/r2)
return u * factor
17
18
x = array([polarMarsaglia() for i in range(50)]).ravel()
In R . . .
1
1
# Question 1b
2
3
4
5
6
7
8
9
10
11
12
13
# Function draws a pair of N(0,1) random numbers using the Polar Marsaglia method
polarMarsaglia <- function() {
repeat {
# Keep sampling until inside unit circle
u <- -1 + 2*runif(2)
# Draw U_1,U_2 ˜ U[-1,1]
r2 <- sum(u**2)
if (r2<=1)
# If radius is less than one, we are done
break
}
factor <- sqrt(-2*log(r2)/r2)
u * factor
}
14
15
16
17
x <- numeric(100)
for (i in 1:50)
x[2*i+c(-1,0)] <- polarMarsaglia()
# Create vector to store result
# Fill with pairs of values
2. (a) Using the fact that U1 and U2 have the same distribution and are independent we obtain
2 · Cov F − (U1 ), F − (1 − U1 )
=
2 · E F − (U1 )F − (1 − U1 ) − 2 · E F − (U1 ) E F − (1 − U1 )
U1 ∼U2
=
2 · E F − (U1 )F − (1 − U1 ) − 2 · E F − (U1 ) E F − (1 − U2 )
U1 ⊥U2
=
2 · E F − (U1 )F − (1 − U1 ) − 2 · E F − (U1 )F − (1 − U2 )
U1 ∼U2
=
E F − (U1 )F − (1 − U1 ) − E F − (U1 )F − (1 − U2 )
−E F − (U2 )F − (1 − U1 ) + E F − (U2 )F − (1 − U2 )
=
E F − (U1 ) − F − (U2 ) F − (1 − U1 ) − F − (1 − U2 )
(b) Let without loss of generality u1 ≤ u2 (otherwise relabel u1 and u2 ), thus 1 − u2 ≤ 1 − u1 . As F − (·) is non-decreasing
we have that F − (u1 ) ≤ F − (u2 ) and F − (1 − u2 ) ≤ F − (1 − u1 ). Thus
F − (u1 ) − F − (u2 ) F − (1 − u1 ) − F − (1 − u2 ) ≤ 0.
|
{z
}|
{z
}
≤0
(a)
(c) 2 · Cov F − (u1 ), F − (1 − u1 ) = E
(d) We have that
−
F (U1 ) + F − (1 − U1 )
Var
2
≥0
(b)
F − (U1 ) − F − (U2 ) F − (1 − U1 ) − F − (1 − U2 ) ≤ 0
=
=
≤
Var(F − (U1 )) + 2 · Cov(F − (U1 ), F − (1 − U1 )) + Var(F − (1 − U1 ))
4
−
−
−
Var(F (U1 )) Cov(F (U1 ), F (1 − U1 ))
+
2
2
|
{z
}
Var(F − (U1 ))
= Var
2
≤0
F − (U1 ) + F − (U2 )
2
=
Var(F − (U1 ))
2
The variance of a Monte Carlo estimate can be reduced by using negatively correlated samples. The method described
above (often referred to as “antithetic variables”) is one way of obtaining a negatively correlated sample.
3. (a) We have that In order to be able to carry out rejection sampling we need that there is a finite M > 0 such that
φ(0,1) (x) < M φ(1,σ2 ) (x). This is the case iff φ(0,1) (x)/φ(1, σ 2 ) is bounded.
2
2
√
√1 exp − x
√
φ(0,1) (x)
2
x
1
(x − 1)2
2π
2
2
2 exp −
2 exp −
σ
σ
=
=
=
+
(σ
−
1)x
+
2x
−
1
2
φ(1,σ2 ) (x)
2
2σ 2
2σ 2
√ 1
exp − (x−1)
2
2πσ 2
2σ
It is easy to see that the right-hand side is bounded above iff σ 2 > 1 (the “overall sign” of the x2 has to be negative,
otherwise the exponential is unbounded).
Note that if σ 2 < 1, then the instrumental distribution φ(1,σ2 ) (·) has thinner tails than the target distribution φ(0,1) (·).
(b) To find M we need to find the maximum (over x) of the ratio computed in part (a). This is equivalent to finding the
minimum of
2
1
1
1
− 2
−
,
(σ 2 − 1)x2 + 2x − 1 = x + 2
σ −1
σ − 1 (σ 2 − 1)2
2
which is attained at x = −1/(σ 2 − 1). Plugging this into the above ratio yields
M≥
√
σ2
exp
1
2
2(σ − 1)
√
σ 2 exp
1
2(σ 2 −1)
. Thus
= Mopt
(c) The probability of accepting a value in rejection sampling is 1/M , thus we need to minimise M . Differentiating the
logarithm of Mopt yields
1
1
σ 4 − 3σ 2 + 1
∂ log Mopt
=
−
=
= 0,
2
2
2
2
∂σ
2σ
2(σ − 1)
2σ 2 (σ 2 − 1)2
which has as solution σ 2 =
√
5+3
2
(and a negative solution, which is not relevant).
4. (a) We have that
2
Eg W (X)
=
=
=
=
=
=
=
=
Z 2
Z
φ(0,1) (x)2
φ(1,σ2 ) (x) dx =
dx
φ(1,σ2 ) (x)
Z
1
2
2π exp −x
dx
(x−1)2
√ 1
exp
−
2
2σ
2πσ 2
r
Z
2
1
σ
2
2
dx
exp −x + 2 (x − 1)
2π
2σ
r
Z
1
σ2
dx
exp − 2 (2σ 2 − 1)x2 + 2x − 1
2π
2σ
r
Z
1
σ2
2
2
2
2
dx
exp − 2 (2σ − 1)(x + 1/(2σ − 1)) − 1 − 1/(2σ − 1)
2π
2σ
r r
r
Z
σ2
2πσ 2
2σ 2 − 1
1
2
2
dx
exp
−
(x
+
1/(2σ
−
1))
2π 2σ 2 − 1
2πσ 2
2σ 2 /(2σ 2 − 1)
|
{z
}
=1
1
1
+
· exp
2σ 2
2σ 2 (2σ 2 − 1)
σ2
1
1
√
+
exp
2σ 2
2σ 2 (2σ 2 − 1)
2σ 2 − 1
σ2
1
√
exp
2σ 2 − 1
2σ 2 − 1
φ(0,1) (x)
φ(1,σ2 ) (x)
(1)
Using Varg (W (X)) = Eg (W (X)2 ) − (Eg (W (X)))2 and Eg (W (X)) = 1 we obtain that
σ2
1
Varg (W (X)) = √
− 1,
exp
2σ 2 − 1
2σ 2 − 1
(b) In (1) the right-hand side is finite iff σ 2 > 1/2 (the “overall sign” of the x2 has to be negative, otherwise the integral is
+∞).
(c) I don’t know how to find the minimum of the variance in closed form, so I had to resort to minimising the variance
numerically:
19
# Question 4b
20
21
22
def impsamp_var(sigma2):
return sigma2 / sqrt(2*sigma2-1) * exp(1/(2*sigma2-1)) - 1
23
24
optimize.fmin(impsamp_var, 2)
In R . . .
18
19
20
impsamp.var <- function(sigma2) {
sigma2 / sqrt(2*sigma2-1) * exp(1/(2*sigma2-1)) - 1
}
21
22
optimize(impsamp.var, c(0.5,5))
With both programs I obtain σ 2 ≈ 2.2808 as optimal solution.
3
5. (a)26 # Question 5a
27
28
29
30
def rejection_example(n, sigma2):
x = zeros(n)
M = sqrt(sigma2) * exp(1/(2*(sigma2-1)))
# Create vector to store result
# Optimal M
31
32
33
34
35
36
37
38
39
40
41
for i in xrange(n):
while True:
# Keep proposing ...
x_new = stats.norm(1,sqrt(sigma2)).rvs(1)
# Generate x_new ˜ N(1,sigma2)
accept = stats.norm(0,1).pdf(x_new) / (M*stats.norm(1,sqrt(sigma2)).pdf(x_new))
# Compute probability of acceptance
if (random.random_sample()<accept):
# ... until a proposed value is accepted
break
x[i] = x_new
return i
42
43
44
45
46
47
n = 1000
x = rejection_example(n, (sqrt(5)+3)/2)
pylab.hist(x, normed=True)
t = linspace(-3, 3, 1000)
pylab.plot(t, stats.norm(0,1).pdf(t))
# Generate sample
# Draw histogram
# Add density of target dist’n
In R . . .
23
# Question 5a
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
rejection.example <- function(n, sigma2) {
x <- numeric(n)
# Create vector to store result
sigma2 <- (sqrt(5)+3)/2
# Optimal variance
M <- sqrt(sigma2) * exp(1/(2*(sigma2-1)))
# Optimal M
for (i in 1:n) {
repeat {
# Keep proposing ...
x.new <- rnorm(1,1,sqrt(sigma2))
# Generate x.new ˜ N(1,sigma2)
accept <- dnorm(x.new,0,1) / (M*dnorm(x.new,1,sqrt(sigma2)))
# Compute probability of acceptance
if (runif(1)<accept)
# ... until a proposed value is accepted
break
}
x[i] <- x.new
x
}
40
41
42
43
44
45
n <- 1000
x <- rejection.example(n, (sqrt(5)+3)/2)
hist(x, freq=FALSE)
t <- seq(-3, 3, length.out=1000)
lines(t, dnorm(t,0,1), lwd=2)
# Generate sample
# Draw histogram
# Add density of target dist’n
I obtained the histogram shown below:
0.2
0.0
0.1
Density
0.3
0.4
Histogram of x
−3
−2
−1
0
x
(b)48 # Question 5b
49
def importance_sample(n, sigma2):
4
1
2
3
50
51
52
53
x = stats.norm(1,sqrt(sigma2)).rvs(n)
# Draw from instrumental dist’n
w = stats.norm(0,1).pdf(x) / stats.norm(1,sqrt(sigma2)).pdf(x)
# Compute weights
return x, w
54
55
56
57
58
n = 1000
[x, w] = importance_sample(n, 2.2808)
print(sum(x*w)/sum(w))
print(sum(x**2*w)/sum(w))
# Generate weighted sample
# Self-normalised estimate of E(X)
# Self-normalised estimate of E(Xˆ2)
In R . . .
46
47
48
49
50
51
# Question 5b
importance.sample <- function(n, sigma2) {
x <- rnorm(n,1,sqrt(sigma2))
# Draw from instrumental dist’n
w <- dnorm(x,0,1) / dnorm(x,1,sqrt(sigma2)) # Compute weights
list(x=x, w=w)
}
52
53
54
55
56
n <- 1000
s = importance.sample(n, 2.2808)
sum(s$x*s$w)/sum(s$w)
sum(s$x**2*s$w)/sum(s$w)
Z
f (x)
g(x) dx = 1. Further
g(x)
Z
Z
Z
f (x)2
f (x)
2
w(X) =
dx < M f (x) dx = M.
g(x) dx = f (x)
g(x)2
g(x)
| {z }
| {z }
6. (a) From the lectures we have that Eg (w(X)) =
Eg
# Generate weighted sample
# Self-normalised estimate of E(X)
# Self-normalised estimate of E(Xˆ2)
<M
Thus
=1
2
Varg (w(X)) = Eg w(X)2 − (Eg (w(X))) < M − 1.
(b) The importance sampling estimate has finite variance iff Eg h(X)2 w(X)2 < +∞. We have that
2
2
Eg w(X) h(X)
=
Z
f (x)2
h(x)2 g(x) dx =
g(x)2
Z
f (x)
h(x)2 f (x) < M · Ef h(X)2 < +∞,
g(x)
| {z }
<M
as Varf (h(X)) < +∞.
7. (a) We have that M = 1/(1 − Φ(µ,σ2 ) (x)) and thus the probability of acceptance is 0 if X ≤ τ , and 1 otherwise (X ∼
N(µ, σ 2 )). Thus rejection sampling simplifies to:
1. Draw X ∼ N (µ, σ 2 ).
2. If X > τ accept X as a sample from the left-truncated normal distribution, otherwise reject it and go back to 1.
59
# Question 7a
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def sample_truncated_normal_a(n, mu, sigma2, tau):
x = zeros(n)
# Create vector to store result
num_proposed = 0
# Set counter of proposed values
for i in xrange(n):
while True:
# Keep proposing ...
x_new = stats.norm(mu,sqrt(sigma2)).rvs(1)
# Generate x.new ˜ N(mu,sigma2)
num_proposed = num_proposed + 1
# Increment counter
if x_new>tau:
# ... until x_new > tau (accepted)
break
x[i] = x_new
print ” P r o p o r t i o n o f a c c e p t e d v a l u e s ”
# Print proportion of accepted values
print float(n)/float(num_proposed)
return x
In R . . .
5
57
# Question 7a
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
sample.truncated.normal.a <- function(n, mu, sigma2, tau) {
x <- numeric(n)
# Create vector to store result
num.proposed <- 0
# Set counter of proposed values
for (i in 1:n) {
repeat {
# Keep proposing ...
x.new <- rnorm(1,mu,sqrt(sigma2))
# Generate x.new ˜ N(mu,sigma2)
num.proposed <- num.proposed + 1
# Increment counter
if (x_new>tau)
# ... until x_new > tau (accepted)
break
}
x[i] <- x.new
}
cat(” P r o p o r t i o n o f a c c e p t e d v a l u e s \n ”)
# Print proportion of accepted values
print(n/num.proposed)
x
}
(b)75 # Question 7b
76
77
x = sample_truncated_normal_a(10, 0, 1, 4)
In R . . .
75
# Question 7b
76
77
x <- sample.truncated.normal.a(10, 0, 1, 4)
We should require something of the order of 300,000 attempts in order to get 10 accepted values. The expected number
of attempts necessary is 10 · M = 10/(1 − Φ(4)) ≈ 315744. This leads to a proportion of rejected values of around
1 − 1/300000 ≈ 0.99997.
When I ran the above code, only 0.00227% of the values proposed were accepted.
(c) One approach is to generate Z ∼ N(0, 1) and to set X := 4 + |Z|. The corresponding proposal density is
(
q
0
for x ≤ 4
g(x) =
(x−4)2
2
for x > 4
π exp −
2
The ratio of the densities is for x > 4
f (x)
1
1
=
exp −x2 /2 + (x − 4)2 /2 =
exp (−4x + 8) ,
g(x)
2(1 − Φ(4))
2(1 − Φ(4))
which is decreasing in x, so we have that f (x) < M g(x) with
M=
78
f (4)
1
1
=
exp −42 /2 =
exp (−8) ≈ 5.30.
g(4)
2(1 − Φ(4))
2(1 − Φ(4))
# Question 7c
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def sample_truncated_normal_c(n, tau):
M = 1 / ( 2 * (1-stats.norm(0,1).cdf(tau)) ) * exp(-tau**2/2)
# Compute M
x = zeros(n)
# Create vector to store result
num_proposed = 0
# Set counter of proposed values
for i in xrange(n):
while True:
# Keep proposing ...
x_new = tau + abs(stats.norm(0,1).rvs(1))
# Draw new value from instrumental dist’n
num_proposed = num_proposed + 1
# Increment counter
f = stats.norm(0,1).pdf(x_new) / (1-stats.norm(0,1).cdf(tau))
# Evaluate target density
g = 2 * stats.norm(tau,1).pdf(x_new) # Evaluate instrumental density
accept = f / ( M * g )
# Probability of accepting
if random.random_sample(1)<accept:
# ... until one value is accepted
break
x[i] = x_new
6
print(” P r o p o r t i o n o f a c c e p t e d v a l u e s \n ”)
print(float(n)/float(num_proposed))
return x
97
98
99
# Print proportion of accepted values
100
101
x = sample_truncated_normal_c(10, 4)
In R . . .
78
# Question 7c
79
80
81
sample.truncated.normal.c <- function(n, tau) {
M <- 1 / ( 2 * (1-pnorm(tau) )) * exp(-tauˆ2/2)
# Compute M
# Create vector to store result
# Set counter of proposed values
82
x <- numeric(n)
num.proposed <- 0
for (i in 1:n) {
repeat {
x.new <- tau + abs(rnorm(1))
num.proposed <- num.proposed + 1
f <- dnorm(x.new) / (1-pnorm(tau))
g <- 2 * dnorm(x.new,mean=tau)
accept <- f / ( M * g )
if (runif(1)<accept)
break
}
x[i] <- x.new
}
cat(” P r o p o r t i o n o f a c c e p t e d v a l u e s \n ”)
print(n/num.proposed)
x
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#
#
#
#
#
#
#
Keep proposing ...
Draw new value from instrumental dist’n
Increment counter
Evaluate target density}
Evaluate instrumental density
Probability of accepting
... until one value is accepted
# Print proportion of accepted values
}
101
102
x <- sample.truncated.normal.c(10, 4)
When I ran the above code, I required 46 attempts, giving a proportion of rejected values of about 79%. The expected
number of attempts required is 10 · M ≈ 53.
8.
102 # Question 8a and b
103
104
105
106
107
def is_beta(n, alpha, beta):
x = random.random_sample(n)
w = stats.beta(alpha,beta).pdf(x)
return x, w
# Draw from instrumental dist’n
# Compute weights
108
109
110
111
112
n = 10
N = 100000
result_selfnorm = ones(N)
result_simple = ones(N)
113
114
115
116
117
for i in xrange(N):
[x, w] = is_beta(n, 2, 3)
# Draw weighted sample
result_selfnorm[i] = sum(x*(1-x)*w)/sum(w) # Compute self-normalised estimate
result_simple[i] = sum(x*(1-x)*w)/n
# Compute simple estimate
118
119
120
bias_selfnorm = mean(result_selfnorm) - 0.2
bias_simple = mean(result_simple) - 0.2
# Compute the bias
var_selfnorm = var(result_selfnorm)
var_simple = var(result_simple)
# Compute the variance
121
122
123
124
125
126
mse_selfnorm = bias_selfnorm**2 + var_selfnorm # Compute m.s.e.
mse_simple = bias_simple**2 + var_simple
127
128
129
130
pylab.figure()
pylab.boxplot([result_simple, result_selfnorm])
pylab.show
In R . . .
103
# Question 8a and b
7
104
105
106
107
108
109
is.beta <- function (n, alpha, beta) {
x <- runif(n)
w <- dbeta(x,alpha,beta)
list(x=x, w=w)
}
# Draw from instrumental dist’n
# Compute weights
110
111
112
113
114
n <- 10
N <- 100000
result.selfnorm = numeric(N)
result.simple = numeric(N)
115
116
117
118
119
120
121
122
123
for (i in 1:N) {
s <- is.beta(n, 2, 3)
# Draw weighted sample
result.selfnorm[i] <- sum(s$x*(1-s$x)*s$w)/sum(s$w)
# Compute self-normalised estimate
result.simple[i] <- sum(s$x*(1-s$x)*s$w)/n # Compute simple estimate
}
bias.selfnorm <- mean(result.selfnorm) - 0.2
# Compute the bias
bias.simple <- mean(result.simple) - 0.2
124
125
126
var.selfnorm <- var(result.selfnorm)
var.simple <- var(result.simple)
# Compute the variance
127
128
129
mse.selfnorm <- bias.selfnormˆ2 + var.selfnorm # Compute m.s.e.
mse.simple <- bias.simpleˆ2 + var.simple
130
boxplot(as.data.frame(cbind(result.simple, result.selfnorm)),
names=c(” s i m p l e ”,” s e l f −n o r m a l i s e d ”))
0.15
0.20
0.25
0.30
0.35
The results I obtained were:
0.10
132
0.05
131
simple
‘Simple’ estimate µ̃
Self-normalised estimate µ̂
self−normalised
Estimated bias
−4.67 × 10−5
−1.33 × 10−3
Estimated variance
2.26 × 10−3
3.22 × 10−4
Estimated MSE
2.26 × 10−3
3.24 × 10−4
The self-normalised estimator has a significantly smaller mean-square error, so we would prefer it in this case.
8
© Copyright 2026 Paperzz