iid uniforms Discrete RVs Continuous RVs Rejection method 2. Simulation of Random Variables Illusions are art, for the feeling person, and it is by art that we live, if we do Elizabeth Bowen 2. Simulation of Random Variables 1/26 iid uniforms Discrete RVs Continuous RVs Rejection method Often we are interested in the distribution of a random variable X which is complicated, but which can none-the-less be built up from simple components such as independent rv’s with known distributions. Monte-Carlo simulation is an excellent tool for such problem: we seek to generate a random sample from the distribution of X , which we can use to estimate its mean, median, mode, percentiles, etc. The starting point for any simulation is the generation of r.v.s with known distributions (binomial, poisson, exponential, normal, etc.), which are the building blocks for more complicated distributions. It turns out that all random variables can be generated by manipulating U (0, 1) rv’s. 2. Simulation of Random Variables 2/26 iid uniforms Discrete RVs Continuous RVs Rejection method Simulating iid uniform samples We cannot generate truly random numbers on a computer. Instead we generate pseudo-random numbers, which have the appearance of random numbers, but are in fact completely deterministic. Pseudo-random numbers can be generated by chaotic dynamical systems, which have the characteristic that the future is very hard to predict given the present. A very important advantage of using pseudo-random numbers is that, because they are deterministic, any experiment performed using pseudo-random numbers can be repeated exactly. 2. Simulation of Random Variables 3/26 iid uniforms Discrete RVs Continuous RVs Rejection method Congruential generators Congruential generators were the first reasonable class of pseudo-random number generators. R uses a pseudo-random number generator called the Mersenne-Twister, which has similar properties to congruential generators. Given an initial number X0 ∈ {0, 1, . . . , m − 1} and two big numbers A and B we define a sequence of numbers Xn ∈ {0, 1, . . . , m − 1}, n = 0, 1, . . ., by Xn+1 = (AXn + B ) mod m. We get a sequence of numbers Un ∈ [0, 1), n = 0, 1, . . ., by putting Un = Xn /m. If m, A, and B are well chosen then the sequence U0 , U1 , . . ., is almost impossible to distinguish from an iid sequence of U (0, 1) random variables. 2. Simulation of Random Variables 4/26 iid uniforms Discrete RVs Continuous RVs Rejection method In practice it is sensible to discard the value 0 when it occurs, as we often divide by Un . This is justifiable since for a true uniform, the probability of taking on the value 0 is zero. The value 1 can also be a problem, but note that as defined, Un < 1 for all n. Example: If we take m = 10, A = 103, and B = 17, then for X0 = 2, we have X1 = 223 mod 10 = 3 X2 = 326 mod 10 = 6 X3 = 635 mod 10 = 5 .. . Clearly the sequence produced by a congruential generator will eventually cycle and thus since there are at most m possible values, the maximum cycle length is m. (The Mersenne-Twister has a cycle length of 219937 − 1.) 2. Simulation of Random Variables 5/26 iid uniforms Discrete RVs Continuous RVs Rejection method Because computers use binary arithmetic, if we have m = 2k for some k , then taking x mod m is very quick. An example of a good congruential generator is m = 232 , A =1,664,525, and B = 1,013,904,223. An example of a bad congruential generator is RANDU, which was shipped with IBM computers in the 1970’s. RANDU used m = 231 , A =65,539, and B = 0. 2. Simulation of Random Variables 6/26 iid uniforms Discrete RVs Continuous RVs Rejection method Seeding The number X0 is called the seed. If you know the seed (as well as m, A, and B ), then you can reproduce the whole sequence exactly. This is a very good idea from a scientific point of view; being able to repeat an experiment means that your results are verifiable. To generate n pseudo-random numbers in R, use runif(n). R does not use a congruential generator, but is still needs a seed to generate pseudo-random numbers. For a given value of seed (assumed integer), the command set.seed(seed) always puts you at the same point on the cycle of pseudo-random numbers. 2. Simulation of Random Variables 7/26 iid uniforms Discrete RVs Continuous RVs Rejection method The current state of the random number generator is kept in the vector .Random.seed. You can save the value of .Random.seed and then use it to return to that point in the sequence of pseudo-random numbers. If the random number generator is not initialised before you start generating pseudo-random numbers, then R initialises it using a value taken from the system clock. 2. Simulation of Random Variables 8/26 iid uniforms Discrete RVs Continuous RVs Rejection method > set.seed(42) > runif(2) [1] 0.9148060 0.9370754 > RNG.state <- .Random.seed > runif(2) [1] 0.2861395 0.8304476 > set.seed(42) > runif(4) [1] 0.9148060 0.9370754 0.2861395 0.8304476 > .Random.seed <- RNG.state > runif(2) [1] 0.2861395 0.8304476 2. Simulation of Random Variables 9/26 iid uniforms Discrete RVs Continuous RVs Rejection method Simulating discrete random variables Let X be a discrete random variable taking values in the set {0, 1, . . .} with cdf F and pmf p. The following snippet of code takes a uniform random variable U and returns a discrete random variable X with cdf F . # given U ~ U(0,1) X <- 0 while (F(X) < U) { X <- X + 1 } When the algorithm terminates we have F (X ) ≥ U and F (X − 1) < U , that is U ∈ (F (X − 1), F (X )]. That is, P(X = x ) = P(U ∈ (F (x − 1), F (x )]) = F (x ) − F (x − 1) = p(x ). 2. Simulation of Random Variables 10/26 iid uniforms Discrete RVs Continuous RVs Rejection method 1.0 simulating from a binom(3, 0.5) c.d.f. 0.8 (0.875,1) mapped to 3 0.6 0.4 U ~ U(0,1) (0.5,0.875) mapped to 2 0.2 (0.125,0.5) mapped to 1 0.0 (0,0.125) mapped to 0 −1 0 1 2 3 4 X ~ binom(3, 0.5) 2. Simulation of Random Variables 11/26 iid uniforms Discrete RVs Continuous RVs Rejection method To simulate binomial, geometric, negative-binomial or Poisson rv’s in R, use rbinom, rgeom, rnbinom or rpois. For simulating other (finite) discrete rv’s R provides sample(x, size, replace = FALSE, prob = NULL). The inputs are x A vector giving the possible values the rv can take; size How many rv’s to simulate; replace Set this to TRUE to generate an iid sample, otherwise the rv’s will be conditioned to be different from each other; prob A vector giving the probabilities of the values in x. If omitted then the values in x are assumed to be equally likely. 2. Simulation of Random Variables 12/26 iid uniforms Discrete RVs Continuous RVs Rejection method Simulating continuous random variables Suppose that we are given U ∼ U (0, 1) and want to simulate a continuous rv X with cdf FX . Put Y = FX−1 (U ) then we have FY (y) = P(Y ≤ y) = P(FX−1 (U ) ≤ y) = P(U ≤ FX (y)) = FX (y). That is, Y has the same distribution as X . Thus, if we can simulate a U (0, 1) rv, then we can simulate any continuous rv X for which we know FX−1 . This is called the inverse transformation method or simply the inversion method. 2. Simulation of Random Variables 13/26 iid uniforms Discrete RVs Continuous RVs Rejection method 0.8 1.0 Inversion method for U(1, 3) 0.0 0.2 0.4 U 0.6 If X ∼ U (1, 3) then FX (x ) = (x − 1)/2 for x ∈ (1, 3) and thus FX−1 (y) = 2y + 1 for y ∈ (0, 1). 0 1 2 3 4 X 2. Simulation of Random Variables 14/26 iid uniforms Discrete RVs Continuous RVs Rejection method 0.6 0.8 1.0 Inversion method for exp(1) 0.0 0.2 0.4 U If X ∼ exp(λ) then FX (x ) = 1 − e −λx for x ≥ 0 and thus FX−1 (y) = − λ1 log (1 − y). 0 1 2 3 4 X 2. Simulation of Random Variables 15/26 iid uniforms Discrete RVs Continuous RVs Rejection method Random variable simulators in R Distribution binomial Poisson geometric negative binomial uniform exponential normal gamma beta student t F chi-squared Weibull 2. Simulation of Random Variables R command rbinom rpoisson rgeom rnbinom runif rexp rnorm rgamma rbeta rt rf rchisq rweibull 16/26 iid uniforms Discrete RVs Continuous RVs Rejection method The rejection method The inversion method works well if we can find F −1 analytically. If not. we can use root-finding techniques to invert F numerically, but this can be time-consuming. An alternative method in this situation, which is often faster, is the rejection method. We start with an example. Suppose that we have a continuous random variable X with pdf fX concentrated on the interval (0, 4). We imagine ‘sprinkling’ points P1 , P2 , . . ., uniformly at random under the density function, and consider the distribution of X1 , the x coordinate of P1 . 2. Simulation of Random Variables 17/26 Discrete RVs Continuous RVs Rejection method 0.3 0.0 0.1 0.2 pdf 0.4 0.5 0.6 iid uniforms a −1 0 1 b 2 3 4 5 x 2. Simulation of Random Variables 18/26 iid uniforms Discrete RVs Continuous RVs Rejection method Let R be the shaded region under fX between a and b, then P(a < X1 < b) = P(P1 hits R) Area of R = Area under density Rb fX (x )dx = a 1 Z b = fX (x )dx . a So X1 has the same distribution as X . But how do we generate the points Pi uniformly under fX ? The answer is to generate points at random in the rectangle [0, 4] × [0, 0.5], and then reject those that fall above the pdf. 2. Simulation of Random Variables 19/26 iid uniforms Discrete RVs Continuous RVs Rejection method Rejection method (uniform envelope) Suppose that fX is nonzero only on [a, b], and fX ≤ k . 1. Generate X ∼ U (a, b) and Y ∼ U (0, k ) independent of X (so P = (X , Y ) is uniformly distributed over the rectangle [a, b] × [0, k ]). 2. If Y < fX (X ) then return X , otherwise go back to step 1. Example: consider the triangular pdf fX defined as if 0 < x < 1; x (2 − x ) if 1 ≤ x < 2; fX (x ) = 0 otherwise. We apply the rejection method as follows: source(rejecttriangle.r) 2. Simulation of Random Variables 20/26 iid uniforms Discrete RVs Continuous RVs Rejection method General rejection method Our rejection method uses a rectangular envelope to cover the target density fX . What to do if fX is unbounded? Let X have pdf h and let Y ∼ U (0, kh(X )), then (X , Y ) is uniformly distributed under the curve kh: P((X , Y ) ∈ (x , x + dx ) × (y, y + dy)) = P(Y ∈ (y, y + dy) | X ∈ (x , x + dx ))P(X ∈ (x , x + dx )) 1 dy h(x )dx = dxdy. = kh(x ) k 2. Simulation of Random Variables 21/26 iid uniforms Discrete RVs Continuous RVs Rejection method Suppose we wish to simulate from the density fX . Let h be a density we can simulate from, and choose k such that k ≥ k ∗ = sup x fX (x ) . h(x ) Then kh forms an envelope for fX , and we can generate points uniformly within this envelope. By accepting points below the curve fX , we get the general rejection method: General rejection method To simulate from the density fX , we assume that we have envelope density h from which you can simulate, and that we have some k < ∞ such that supx fX (x )/h(x ) ≤ k . 1. Simulate X from h. 2. Generate Y ∼ U (0, kh(X )). 3. If Y < fX (X ) then return X , otherwise go back to step 1. 2. Simulation of Random Variables 22/26 iid uniforms Discrete RVs Continuous RVs Rejection method Efficiency The efficiency of the rejection method is measured by the expected number of times you have to generate a candidate point (X , Y ). The area under the curve kh is k and the area under the curve fX is 1, so the probability of accepting a candidate is 1/k . Thus the number of times N we have to generate a candidate point has distribution 1 + geom(1/k ), with mean EN = 1 + (1 − 1/k )/(1/k ) = k . So, the closer h is to fX , the smaller we can choose k , and the more efficient the algorithm. 2. Simulation of Random Variables 23/26 iid uniforms Discrete RVs Continuous RVs Rejection method Example: gamma For m, λ > 0 the Γ(λ, m) density is f (x ) = λm x m−1 e −λx /Γ(m), for x > 0, There is no explicit formula for the cdf F or its inverse, so we will use the rejection method to simulate from f . We will use an exponential envelope h(x ) = µe −µx , for x > 0. Using the inversion method we can easily simulate from h using − log(U )/µ, where U ∼ U (0, 1). 2. Simulation of Random Variables 24/26 iid uniforms Discrete RVs Continuous RVs Rejection method To envelop f we need to find k ∗ = sup x >0 f (x ) λm x m−1 e (µ−λ)x = sup . h(x ) x >0 µΓ(m) Clearly k ∗ will be infinite if m < 1 or λ ≤ µ. For m = 1 the gamma is just an exponential. Thus we will assume m > 1 and choose µ < λ. For m ∈ (0, 1) the rejection method can still be used, but a different envelope is required. To find k ∗ we take the derivative of the right-hand side above and set it to zero, to find the point where the maximum occurs. You can check that this is at the point x = (m − 1)/(λ − µ), which gives λm (m − 1)m−1 e −(m−1) k∗ = . µ(λ − µ)m−1 Γ(m) 2. Simulation of Random Variables 25/26 iid uniforms Discrete RVs Continuous RVs Rejection method To improve efficiency we would like to choose our envelope to make k ∗ as small as possible. Looking at the formula for k ∗ this means choosing µ to make µ(λ − µ)m−1 as large as possible. Setting the derivative with respect to µ to zero, we see that the maximum occurs when µ = λ/m. Plugging this back in we get k ∗ = m m e −(m−1) /Γ(m). We can now code up our rejection algorithm. gamma_sim.r 2. Simulation of Random Variables 26/26
© Copyright 2024 Paperzz