2. Simulation of Random Variables

iid uniforms
Discrete RVs
Continuous RVs
Rejection method
2. Simulation of Random Variables
Illusions are art, for the feeling person, and it is by art that we
live, if we do
Elizabeth Bowen
2. Simulation of Random Variables
1/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Often we are interested in the distribution of a random variable X
which is complicated, but which can none-the-less be built up from
simple components such as independent rv’s with known
distributions.
Monte-Carlo simulation is an excellent tool for such problem: we
seek to generate a random sample from the distribution of X ,
which we can use to estimate its mean, median, mode, percentiles,
etc.
The starting point for any simulation is the generation of r.v.s with
known distributions (binomial, poisson, exponential, normal, etc.),
which are the building blocks for more complicated distributions. It
turns out that all random variables can be generated by
manipulating U (0, 1) rv’s.
2. Simulation of Random Variables
2/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Simulating iid uniform samples
We cannot generate truly random numbers on a computer. Instead
we generate pseudo-random numbers, which have the appearance
of random numbers, but are in fact completely deterministic.
Pseudo-random numbers can be generated by chaotic dynamical
systems, which have the characteristic that the future is very hard
to predict given the present.
A very important advantage of using pseudo-random numbers is
that, because they are deterministic, any experiment performed
using pseudo-random numbers can be repeated exactly.
2. Simulation of Random Variables
3/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Congruential generators
Congruential generators were the first reasonable class of
pseudo-random number generators. R uses a pseudo-random
number generator called the Mersenne-Twister, which has similar
properties to congruential generators.
Given an initial number X0 ∈ {0, 1, . . . , m − 1} and two big
numbers A and B we define a sequence of numbers
Xn ∈ {0, 1, . . . , m − 1}, n = 0, 1, . . ., by
Xn+1 = (AXn + B )
mod m.
We get a sequence of numbers Un ∈ [0, 1), n = 0, 1, . . ., by putting
Un = Xn /m. If m, A, and B are well chosen then the sequence
U0 , U1 , . . ., is almost impossible to distinguish from an iid
sequence of U (0, 1) random variables.
2. Simulation of Random Variables
4/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
In practice it is sensible to discard the value 0 when it occurs, as
we often divide by Un . This is justifiable since for a true uniform,
the probability of taking on the value 0 is zero. The value 1 can
also be a problem, but note that as defined, Un < 1 for all n.
Example: If we take m = 10, A = 103, and B = 17, then for
X0 = 2, we have
X1 = 223 mod 10 = 3
X2 = 326 mod 10 = 6
X3 = 635 mod 10 = 5
..
.
Clearly the sequence produced by a congruential generator will
eventually cycle and thus since there are at most m possible
values, the maximum cycle length is m.
(The Mersenne-Twister has a cycle length of 219937 − 1.)
2. Simulation of Random Variables
5/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Because computers use binary arithmetic, if we have m = 2k for
some k , then taking x mod m is very quick.
An example of a good congruential generator is m = 232 ,
A =1,664,525, and B = 1,013,904,223.
An example of a bad congruential generator is RANDU, which was
shipped with IBM computers in the 1970’s. RANDU used
m = 231 , A =65,539, and B = 0.
2. Simulation of Random Variables
6/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Seeding
The number X0 is called the seed. If you know the seed (as well as
m, A, and B ), then you can reproduce the whole sequence exactly.
This is a very good idea from a scientific point of view; being able
to repeat an experiment means that your results are verifiable.
To generate n pseudo-random numbers in R, use runif(n). R
does not use a congruential generator, but is still needs a seed to
generate pseudo-random numbers. For a given value of seed
(assumed integer), the command set.seed(seed) always puts
you at the same point on the cycle of pseudo-random numbers.
2. Simulation of Random Variables
7/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
The current state of the random number generator is kept in the
vector .Random.seed. You can save the value of .Random.seed
and then use it to return to that point in the sequence of
pseudo-random numbers.
If the random number generator is not initialised before you start
generating pseudo-random numbers, then R initialises it using a
value taken from the system clock.
2. Simulation of Random Variables
8/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
> set.seed(42)
> runif(2)
[1] 0.9148060 0.9370754
> RNG.state <- .Random.seed
> runif(2)
[1] 0.2861395 0.8304476
> set.seed(42)
> runif(4)
[1] 0.9148060 0.9370754 0.2861395 0.8304476
> .Random.seed <- RNG.state
> runif(2)
[1] 0.2861395 0.8304476
2. Simulation of Random Variables
9/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Simulating discrete random variables
Let X be a discrete random variable taking values in the set
{0, 1, . . .} with cdf F and pmf p. The following snippet of code
takes a uniform random variable U and returns a discrete random
variable X with cdf F .
# given U ~ U(0,1)
X <- 0
while (F(X) < U) {
X <- X + 1
}
When the algorithm terminates we have F (X ) ≥ U and
F (X − 1) < U , that is U ∈ (F (X − 1), F (X )]. That is,
P(X = x ) = P(U ∈ (F (x − 1), F (x )]) = F (x ) − F (x − 1) = p(x ).
2. Simulation of Random Variables
10/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
1.0
simulating from a binom(3, 0.5) c.d.f.
0.8
(0.875,1) mapped to 3
0.6
0.4
U ~ U(0,1)
(0.5,0.875) mapped to 2
0.2
(0.125,0.5) mapped to 1
0.0
(0,0.125) mapped to 0
−1
0
1
2
3
4
X ~ binom(3, 0.5)
2. Simulation of Random Variables
11/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
To simulate binomial, geometric, negative-binomial or Poisson rv’s
in R, use rbinom, rgeom, rnbinom or rpois.
For simulating other (finite) discrete rv’s R provides
sample(x, size, replace = FALSE, prob = NULL).
The inputs are
x A vector giving the possible values the rv can take;
size How many rv’s to simulate;
replace Set this to TRUE to generate an iid sample, otherwise
the rv’s will be conditioned to be different from each
other;
prob A vector giving the probabilities of the values in x. If
omitted then the values in x are assumed to be
equally likely.
2. Simulation of Random Variables
12/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Simulating continuous random variables
Suppose that we are given U ∼ U (0, 1) and want to simulate a
continuous rv X with cdf FX .
Put Y = FX−1 (U ) then we have
FY (y) = P(Y ≤ y) = P(FX−1 (U ) ≤ y) = P(U ≤ FX (y)) = FX (y).
That is, Y has the same distribution as X .
Thus, if we can simulate a U (0, 1) rv, then we can simulate any
continuous rv X for which we know FX−1 . This is called the inverse
transformation method or simply the inversion method.
2. Simulation of Random Variables
13/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
0.8
1.0
Inversion method for U(1, 3)
0.0
0.2
0.4
U
0.6
If X ∼ U (1, 3) then
FX (x ) = (x − 1)/2 for
x ∈ (1, 3) and thus
FX−1 (y) = 2y + 1 for
y ∈ (0, 1).
0
1
2
3
4
X
2. Simulation of Random Variables
14/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
0.6
0.8
1.0
Inversion method for exp(1)
0.0
0.2
0.4
U
If X ∼ exp(λ) then
FX (x ) = 1 − e −λx for
x ≥ 0 and thus
FX−1 (y) = − λ1 log (1 − y).
0
1
2
3
4
X
2. Simulation of Random Variables
15/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Random variable simulators in R
Distribution
binomial
Poisson
geometric
negative binomial
uniform
exponential
normal
gamma
beta
student t
F
chi-squared
Weibull
2. Simulation of Random Variables
R command
rbinom
rpoisson
rgeom
rnbinom
runif
rexp
rnorm
rgamma
rbeta
rt
rf
rchisq
rweibull
16/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
The rejection method
The inversion method works well if we can find F −1 analytically. If
not. we can use root-finding techniques to invert F numerically,
but this can be time-consuming. An alternative method in this
situation, which is often faster, is the rejection method.
We start with an example. Suppose that we have a continuous
random variable X with pdf fX concentrated on the interval (0, 4).
We imagine ‘sprinkling’ points P1 , P2 , . . ., uniformly at random
under the density function, and consider the distribution of X1 , the
x coordinate of P1 .
2. Simulation of Random Variables
17/26
Discrete RVs
Continuous RVs
Rejection method
0.3
0.0
0.1
0.2
pdf
0.4
0.5
0.6
iid uniforms
a
−1
0
1
b
2
3
4
5
x
2. Simulation of Random Variables
18/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Let R be the shaded region under fX between a and b, then
P(a < X1 < b) = P(P1 hits R)
Area of R
=
Area under density
Rb
fX (x )dx
= a
1
Z b
=
fX (x )dx .
a
So X1 has the same distribution as X .
But how do we generate the points Pi uniformly under fX ? The
answer is to generate points at random in the rectangle
[0, 4] × [0, 0.5], and then reject those that fall above the pdf.
2. Simulation of Random Variables
19/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Rejection method (uniform envelope) Suppose that fX is nonzero only on [a, b], and fX ≤ k .
1. Generate X ∼ U (a, b) and Y ∼ U (0, k ) independent of X
(so P = (X , Y ) is uniformly distributed over the rectangle
[a, b] × [0, k ]).
2. If Y < fX (X ) then return X , otherwise go back to step 1.
Example: consider the triangular pdf fX defined as

if 0 < x < 1;
 x
(2 − x ) if 1 ≤ x < 2;
fX (x ) =

0
otherwise.
We apply the rejection method as follows:
source(rejecttriangle.r)
2. Simulation of Random Variables
20/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
General rejection method
Our rejection method uses a rectangular envelope to cover the
target density fX . What to do if fX is unbounded?
Let X have pdf h and let Y ∼ U (0, kh(X )), then (X , Y ) is
uniformly distributed under the curve kh:
P((X , Y ) ∈ (x , x + dx ) × (y, y + dy))
= P(Y ∈ (y, y + dy) | X ∈ (x , x + dx ))P(X ∈ (x , x + dx ))
1
dy
h(x )dx = dxdy.
=
kh(x )
k
2. Simulation of Random Variables
21/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Suppose we wish to simulate from the density fX . Let h be a
density we can simulate from, and choose k such that
k ≥ k ∗ = sup
x
fX (x )
.
h(x )
Then kh forms an envelope for fX , and we can generate points
uniformly within this envelope. By accepting points below the
curve fX , we get the general rejection method:
General rejection method
To simulate from the density fX , we assume that we have envelope
density h from which you can simulate, and that we have some
k < ∞ such that supx fX (x )/h(x ) ≤ k .
1. Simulate X from h.
2. Generate Y ∼ U (0, kh(X )).
3. If Y < fX (X ) then return X , otherwise go back to step 1.
2. Simulation of Random Variables
22/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Efficiency
The efficiency of the rejection method is measured by the expected
number of times you have to generate a candidate point (X , Y ).
The area under the curve kh is k and the area under the curve fX
is 1, so the probability of accepting a candidate is 1/k .
Thus the number of times N we have to generate a candidate
point has distribution 1 + geom(1/k ), with mean
EN = 1 + (1 − 1/k )/(1/k ) = k .
So, the closer h is to fX , the smaller we can choose k , and the
more efficient the algorithm.
2. Simulation of Random Variables
23/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
Example: gamma
For m, λ > 0 the Γ(λ, m) density is
f (x ) = λm x m−1 e −λx /Γ(m), for x > 0,
There is no explicit formula for the cdf F or its inverse, so we will
use the rejection method to simulate from f .
We will use an exponential envelope h(x ) = µe −µx , for x > 0.
Using the inversion method we can easily simulate from h using
− log(U )/µ, where U ∼ U (0, 1).
2. Simulation of Random Variables
24/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
To envelop f we need to find
k ∗ = sup
x >0
f (x )
λm x m−1 e (µ−λ)x
= sup
.
h(x ) x >0
µΓ(m)
Clearly k ∗ will be infinite if m < 1 or λ ≤ µ. For m = 1 the
gamma is just an exponential. Thus we will assume m > 1 and
choose µ < λ.
For m ∈ (0, 1) the rejection method can still be used, but a
different envelope is required.
To find k ∗ we take the derivative of the right-hand side above and
set it to zero, to find the point where the maximum occurs. You
can check that this is at the point x = (m − 1)/(λ − µ), which
gives
λm (m − 1)m−1 e −(m−1)
k∗ =
.
µ(λ − µ)m−1 Γ(m)
2. Simulation of Random Variables
25/26
iid uniforms
Discrete RVs
Continuous RVs
Rejection method
To improve efficiency we would like to choose our envelope to
make k ∗ as small as possible. Looking at the formula for k ∗ this
means choosing µ to make µ(λ − µ)m−1 as large as possible.
Setting the derivative with respect to µ to zero, we see that the
maximum occurs when µ = λ/m. Plugging this back in we get
k ∗ = m m e −(m−1) /Γ(m).
We can now code up our rejection algorithm.
gamma_sim.r
2. Simulation of Random Variables
26/26