POLYTECHNIC UNIVERSITY Department of Computer and Information Science PROBABILITY AND STATISTICS K. Ming Leung Abstract: This chapter provides a brief coverage of the basics in the area of probability and statistics that are relevant to simulation. Directory • Table of Contents • Begin Article c 2000 [email protected] Copyright Last Revision Date: February 8, 2002 Version 1.0 Table of Contents 1. Introduction 2. Discrete Random Variables and Their Properties 2.1. Cumulative Distribution Function 2.2. Probability Mass Function 2.3. Expected Value 2.4. Variance and Standard Deviation 2.5. General Properties 3. Continuous Random Variables and Their Properties 3.1. Uniform Probability Distribution • Expected Value • Variance • Mapping to Different Domain 3.2. Normal Probability Distribution • Expected Value • The Variance • Probability • The Rule of 3σs • The most probable error 4. Sampling Distribution 4.1. Central Limit Theorem 5. General Scheme of Monte Carlo Simulation Method 5.1. General Remarks on Monte Carlo Methods Section 1: Introduction 3 1. Introduction In a simulation study, probability and statistics are needed to understand how to model a probabilistic system, validate the simulation model, choose the input probability distributions, generate random samples from these distributions, perform statistical analysis of the simulation output data, and design the simulation experiment. This chapter provides only a minimal coverage of the basics in the area of probability and statistics that are relevant to simulation. Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 4 2. Discrete Random Variables and Their Properties In probability theory, a process whose outcome cannot be predicted with certainty is called an experiment. The set of all possible outcomes of an experiment is called the sample space S. The outcomes themselves are specified as sample points in the sample space. In the discrete case, the number of sample points is either finite or countably infinite (meaning that the number of sample points can be put in a one-to-one correspondence with the set of positive integers). For example, the process of tossing a die is an experiment, whose outcome is either 1, 2, 3, 4, 5 or 6, has the sample space S = {1, 2, 3, 4, 5, 6} . (1) If the experiment consists of rolling a pair of dice, then its sample has 36 elements given by S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), . . . , (6, 6)} , (2) where (i, j) means that i and j appeared on the first and second die, respectively. A random variable X is defined as a function or rule that assigns Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 5 a real number, x, to each point in the sample space S. A random variable X is discrete if it can only take on a countable (possibly infinite) number of possible values, x1 , x2 , . . . . In the first experiment, if X is the random variable corresponding to the face value of the die, then the possible values that X can take on are 1, 2, 3, 4, 5, 6. In the second example, if X is the random variable corresponding to the sum of the two dice, then X assigns the value x of 7 to the outcome (4, 3). We will adopt the common notation of using capital letters such as X, Y , Z to denote random variables, and lower case letters such as x, y, z for the corresponding values taken on by these variables. The cumulative distribution function F (x) of the random variable X is defined for any real values of x by F (x) = P (X ≤ x) − ∞ < x < ∞, (3) where P (X ≤ x) is the probability associated with the event {X ≤ x}. Thus F (x) is the probability that in an experiment the random variable X takes on a value no larger than x. Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 6 2.1. Cumulative Distribution Function A cumulative distribution function F (x) has the following properties: 1. 0 ≤ F (x) ≤ 1 for all x. 2. F (x) is non-decreasing [ i.e. if x1 < x2 , then F (x1 ) ≤ F (x2 ) ]. 3. F (−∞) = 0 and F (∞) = 1. They can be easily proved from the definition of F (x). 2.2. Probability Mass Function The probability that X takes on the value x is given by the probability mass function p(x) = P (X = x), (4) which must satisfy the following 2 conditions: 1. p(x) ≥ 0 for all x, since probabilities cannot be negative. We usually omit any value of x which has p(x) = 0. Then we can assume p(x) > 0 for all x. Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 2. 7 P∞ x p(x) = 1, where the sum is to be carried out over all the possible values of x. Each allowed value of x contributes exactly one term to the sum. This relation is referred to as the normalization condition. Example 1: Consider a discrete random variable X that describe the outcome of tossing a die. The possible values of x and their respective probabilities are given by 1 2 3 4 5 6 x p(x) 16 16 16 16 16 16 For any x, the cumulative distribution function can be written as in terms of the probability mass function: X F (x) = p(x0 ), (5) x0 ≤x where the sum is to be carried out over all the values of x0 that are less than or equal to x. Example 2: Consider a discrete random variable X that obeys the following distribution: Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties x p(x) 0 1 2 1 4 1 2 1 4 8 F(x) 6 1 3 4 1 2 1 4 0 1 x 2 To obtain the cumulative probability distribution function F (x), notice that F (x) must be 0 for x < 0 since X cannot take on a negative value. At x = 0 we see that F (x) = P (X ≤ 0) = P (X = 0) = 14 . F (x) continues to have this value till x = 1. At x = 1, F (x) = P (X ≤ 1) = P (X = 0) + P (X = 1) = 41 + 12 = 34 . F (x) continues to have this value till x = 2. At x = 2, F (x) = P (X ≤ 2) = P (X = 0) + P (X = Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 9 1) + P (X = 2) = 14 + 12 · 14 = 1. Clearly F (x) = 1 for x ≥ 2. So F (x) remains constant except at each possible values of x where the function jumps up by an amount p(x). 2.3. Expected Value The expected value of X is defined as X E(X) = xp(x) = µ, (6) x and represents the mean or average value. For example 2, we have 1 1 1 (7) E(X) = 0 · + 1 · + 2 · = 1. 4 2 4 It can be shown [1] that if g(X) is a real-valued function of X, then the expected value of g(X) is given by X E(g(X)) = g(x)p(x). (8) x However, unless g is a linear function, E(g(X)) 6= g(E(X)). Toc JJ II J I Back (9) J Doc Doc I Section 2: Discrete Random Variables and Their Properties 10 Example 3: Consider next a discrete random variable X that obeys the following distribution: -1.0 0.5 0 1.5 4.0 x p(x) 0.1 0.2 0.2 0.4 0.1 E(X) = −1 · 0.1 + 0.5 · 0.2 + 0 · 0.1 + 1.5 · 0.4 + 4.0 · 0.1 = 1. (10) Therefore the expected value is 1 just like example 2. However the two distributions are quite different. The values in examples 3 are distributed over a wider range than in example 2. A measure of the width of the distribution comes from the variance of the distribution. 2.4. Variance and Standard Deviation The variance of a random variable is defined by V (X) = E((X − µ)2 ). (11) It measures the sum of the squares of the deviation of the possible values of X from the mean value. Notice that the variance is the expected value of the square of a quantity and therefore cannot be Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 11 negative. It takes on its lowest possible value of zero for the case where X can assume a single value only. The distribution has no spread at all, and the variable X is really not random. For example 2, since the mean is 1, we have 1 1 1 1 (12) V (X) = (0 − 1)2 · + (1 − 1)2 · + (2 − 1)2 · = . 4 2 4 2 However, for example 3, V (X) = (−1 − 1)2 · 0.1 + (0.5 − 1)2 · 0.2 + (0 − 1)2 · 0.2 (13) +(1.5 − 1)2 · 0.4 + (4 − 1)2 · 0.1 = 2.55, is substantially larger than in example 2. Another useful quantity is the standard deviation σ defined by p σ = V (X). (14) Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 12 2.5. General Properties The following results involving the expected values can be easily proved: For an constant c X X cp(x) = c p(x) = c. (15) E(c) = x x For any constant c and any function of the random variable X, g(X), X cg(x)p(x) = c X g(x)p(x) = cE(g(X)). (16) If g1 (X), g2 (X), . . . are functions of X, then X E(g1 (X) + g2 (X) + . . . ) = (g1 (x) + g2 (x) + . . . )p(x) (17) E(cg(X)) = y x x = X g1 (x)p(x) + x X g2 (x)p(x) + . . . x = E(g1 (x)) + E(g2 (x)) + . . . . Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 13 Using the above results, we have V (X) = E((X − µ)2 ) = E(X 2 − 2µX + µ) = E(X 2 ) + E(−2µX) + E(µ2 ) = E(X 2 ) − 2µE(X) + µ2 . (18) Thus we obtain the following important result for the variance: V (X) = E(X 2 ) − µ2 . (19) Next we consider the effect of shifting the random variable on the variance by computing V (X + c) where c is a constant. First, we have V (X + c) = E((X + c)2 ) − (E(X + c))2 . The second term E(X + c) = E(X) + E(c) = µ + c. The first term E((X + c)2 ) = E(X 2 ) + 2cE(X) + c2 = E(X 2 ) + 2cµ + c2 . Therefore V (X + c) = E(X 2 ) + 2cµ + c2 − (µ + c)2 = E(X 2 ) − µ2 = V (X). This result is to be expected because a rigid shift in the distribution should not change the breath of the distribution. Toc JJ II J I Back J Doc Doc I Section 2: Discrete Random Variables and Their Properties 14 Another result is Toc V (cX) = E((cX)2 ) − (E(cX))2 = c2 (E(X 2 ) − (E(X))2 ) = c2 V (X). JJ II J I Back J Doc (20) Doc I Section 3: Continuous Random Variables and Their Properties 15 3. Continuous Random Variables and Their Properties If the cumulative distribution function of a random variable X defined by F (x) = P (X ≤ x) for − ∞ < x < ∞ (21) is a continuous function of x, then X is a continuous random variable. Since the probability of finding the random variable X taking on a value less than or equal to −∞ is zero, we must have F (−∞) = 0. On the other hand, the probability of finding the random variable X taking on a value less than or equal to ∞ is unity, therefore we must have F (∞) = 1. The probability density function f (x) for X is defined as the derivative of the cumulative distribution function: dF f (x) = , (22) dx wherever the derivative exists. By writing dF = f (x)dx, Toc JJ II J I (23) Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 16 changing the variable from x to t to get dF = f (t)dt, (24) and integrating both sides of the equation from t = −∞ to t = x, we have Z x Z x dF = f (t)dt. (25) −∞ −∞ The integral on the left-hand side can be evaluated to give F (x) − F (−∞). Since F (−∞) = 0, we now have the cumulative distribution function expressed in terms of an integral over the probability density function: Z x F (x) = f (t)dt. (26) −∞ Thus for any x0 , the area underneath the curve of f (x) from −∞ to x0 is precisely F (x0 ). We refer f (x) as the probability density function because its integral gives a probability. This is analogous to the fact that in Physics, the integral of the mass density over a certain region gives the mass of that region. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 17 Weibull(1.5,6.0) Distribution 0.16 0.14 0.12 f(x) 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 x Figure 1: Probability density function for the Weibull distribution with the shape parameter α = 1.5 and the scale parameter β = 6.0. The Weibull density function is defined by h probability i 2 −α α−1 f (x) = αβ x exp − (x/β) for x in [0, ∞). Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 18 The probability density function f (x) obeys the following properties: 1. Since F (x) is a monotonic non-decreasing function of x, we must have f (x) = dF/dy ≥ 0 for any value of x. R∞ 2. Since F (∞) = 1, therefore −∞ f (x)dx = 1. Since F (x) = P (X ≤ x) gives the probability that X ≤ x, the probability that X has a value in the interval a ≤ y ≤ b is P (a ≤ X ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a) (27) Z b Z a Z b = f (x)dx − f (x)dx = f (x)dx. −∞ −∞ a Graphically P (a ≤ X ≤ b) is represented by the area of the region below the f (x) curve from a to b. If we set a = b = x0 , we see the probability P (X = x0 ) of finding X having any particular value x0 is exactly zero. This is in sharp contrast to the case of discrete random variables where P (X = x0 ) is non-zero for certain discrete values of x0 . The expected value of a continuous variable X is defined in terms Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 19 Weibull(1.5,6.0) Distribution 0.16 0.14 0.12 f(x) 0.1 0.08 0.06 0.04 0.02 0 0 x1 5 x2 10 15 20 x Figure 2: P (x1 ≤ X ≤ x2 ) is given by the area in the shaded region as illustrated here for the the Weibull distribution with the shape parameter α = 1.5 and the scale parameter β = 6.0. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 20 of an integral by Z ∞ E(X) = xf (x)dx = µ, (28) −∞ provided that the integral exists. Since the integral is a limiting form of a sum, consequently the results that we obtained for the discrete case are also valid for the continuous case. Therefore we have Z ∞ E(g(X)) = g(x)f (x)dx (29) −∞ E(c) = c E(cg(X)) = cE(g(X)) E(g1 (X) + g2 (X) + . . . ) = E(g1 (X)) + E(g2 (X)) + . . . . The variance of a continuous variable X is defined by V (X) = E((X − µ)2 ). (30) Using the above results for the expected values, one can easily show Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 21 that the variance can also be expressed in the form V (X) Toc JJ = E((X − µ)2 ) = E(X 2 − 2µX + µ) = E(X 2 ) − 2µE(X) + E(µ2 ) = E(X 2 ) − µ2 . II J I Back J Doc (31) Doc I Section 3: Continuous Random Variables and Their Properties 22 3.1. Uniform Probability Distribution Consider a random variable X whose values, x, are distributed uniformly in the interval [a, b]. The probability of finding X having a value in the interval from x1 to x1 + ∆x must then be the same as in the interval from x2 to x2 + ∆x, for any x1 and x2 as long as the two intervals lie within [a, b]. Therefore we have P (x1 ≤ X ≤ x1 + ∆x) = P (x2 ≤ X ≤ x2 + ∆x) and so Z x1 +∆x Z x2 +∆x f (x)dx = x1 (32) f (x)dx. (33) x2 It is clear that the probability density function f (x must be a constant in [a, b]. If c is that constant, then normalization requires that Z b Z b f (x)dx = c dx = c(b − a) = 1. (34) a a Thus we have c = 1/(b − a). Outside the interval [a, b], f (x) must be zero, and so the probability density function for a uniform random Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 23 variable is given by f (x) = 0, x<a a<x<b x ≥ b. dF 1 = b−a , dx 0, (35) We can then use Eq. (26) to find the cumulative distribution function F (x). Since f (x) is 0 for x ≤ 0, we see that F (x) must be 0 for x ≤ 0. For x in [a, b], f (x) = 1/(b − a) and so Z 1 y F (x) = dx + c = + c, (36) b−a b−a where c is an integration constant. To find c, we require that F (x) must be continuous at x = a. But we know that F (a) = 0, and so a + c = 0. (37) b−a Solving for c gives us c = −a/(b − a), and therefore F (x) = (y − a)/(b − a). Notice that at x = b, F (b) = 1, and since f (x) = 0 for x > 1, we must have F (x) = 1 for x ≥ b. The cumulative distribution Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 24 function can be written as F (x) = 0, x−a , b−a 1, x<a a≤x≤b x ≥ b. (38) Notice that although the probability distribution function for a continuous random variable must be continuous everywhere, the probability density function need not be everywhere continuous. • Expected Value The expected value is b Z µ = E(X) = x a 1 1 dx = b−a b−a x2 2 b (39) a 1 b2 − a2 a+b = , b−a 2 2 which is located at the center of the interval, as expected. = Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 25 • Variance To calculate the variance, we first calculate 3 b Z b 1 x 2 2 1 E(X ) = x dx = b − a b − a 3 a a = (40) b3 − a3 a2 + ab + b2 = . 3(b − a) 3 Then the variance is given by V (X) = E(X 2 ) − µ2 2 = 2 a + ab + b − 3 (41) a+b 2 2 2 = (b − a) . 12 If a = 0 and b = 1, then we are considering the usual interval [0, 1] for the uniform variates. We see that µ = 1/2 and V (X) is 1/12. • Mapping to Different Domain A natural way to generate random numbers x that are distributed uniformly in [a, b] is to have a mapping from the uniform variates u Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 26 generated from a pseudo-random number generator on [0, 1] to x. In order to preserve the uniformity in the distribution of these numbers, the mapping (or relationship between u and x) should be a linear one. We will now derive this relationship. In fact we will do this in three different ways, each using different techniques and looking at the problem from different angles. 0 u0 b−a 0 0 u 1 a x b One approach is to first scale u by multiplying it with a scale factor (b − a) so that u0 = (b − a)u is distributed uniformly between 0 and b−a. We then shift u0 by an amount a so that x = u0 +a = (b−a)u+a Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 27 are distributed uniformly between a and b. We illustrate the two steps of the transformation in the figure. [Notice that it is possible to shift the interval first and then scale it, but it requires is a little more algebra to figure out the correct amount of shift.] A systematic but slightly more lengthen way to derive the relationship is to first write out the most general linear relationship between u and x: x = αu + β, (42) where α and β are two constants that have to be determined. We want the point u = 0 to map to x = a and u = 1 to x = b, Setting u = 0 and x = a in the above equation gives β = a. Then setting u = 1 and x = b gives us b = α + a, and therefore we have α = b − a. Thus we obtain the same relationship as before: x = (b − a)u + a. Toc JJ II J I (43) Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties u s 0 u 1 x s 0 a 28 x b The quickest way (that I know) to derive the above relationship is to consider a general point u and its image x in the following drawing. Since the mapping is a linear one, the relative positions of these two points within their own segments in the two coordinate systems must be identical. Therefore we have u−0 x−a = . (44) 1−0 b−a Solving for x we obtain exactly the same relation as before. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 29 3.2. Normal Probability Distribution The normal probability distribution is another often encountered distribution. It is especially important in understanding how computer simulations based on random sampling work. A random variable X has a normal probability distribution if it has a probability density function 1 (x − µ)2 f (x) = √ exp − − ∞ < x < ∞, (45) 2σ 2 2πσ with real parameters µ and σ. In addition, σ must be positive. As will be shown below, the expected value E(X) is µ, and the standard deviation is σ. Notice that f (x) is properly normalized since Z ∞ Z ∞ 1 (x − µ)2 √ exp − dx (46) f (x)dx = 2σ 2 2πσ −∞ −∞ 2 Z ∞ 1 t exp − dt = 1, = √ 2 2π −∞ where we have a change of variable from x to t = (x − µ)/σ. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 30 Normal Distribution 0.45 0.4 0.35 f(x) 0.3 0.25 0.2 0.15 0.1 0.05 0 −4 −2 0 2 x µ 4 6 8 Figure 3: Probability density function of a normal distribution. In this example, µ = 3 and σ = 1. The probability density function has the form of a bell-shape curve √ with a single peak at x = µ. The value at the peak is 1/ 2πσ, and so the smaller σ is the higher is the peak. Letting x = µ + h be the point at which f (x) is equal to half its Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 31 peak value, inserting it into Eq. (45) and setting f to a half, yields h2 1 exp − 2 = . (47) 2σ 2 Solving√for h, which is referred to as the half-width of the peak, gives h = ± 2 ln 2 σ ≈ ±1.177 σ. Decreasing σ decreases h and the peak becomes sharper. At the same time the height of the peak increases so that the total area underneath f (x) remains unity. • Expected Value The expected value can be easily calculated because Z ∞ 1 (x − µ)2 x exp − dx (48) E(X) = √ 2σ 2 2πσ −∞ Z ∞ 1 t2 (t + µ) exp − 2 dt = √ 2σ 2πσ −∞ Z ∞ Z ∞ µ t2 1 t2 exp − 2 dt. = √ t exp − 2 dt + √ 2σ 2σ 2πσ −∞ 2πσ −∞ Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 32 Width of the Normal Distribution 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 −4 −2 0 2 x µ 4 6 8 Figure 4: The half-width of of a normal distribution. In this example, µ = 3 and σ = 1. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 33 . The first integral is zero since the integrand is an odd function. The second integral is basically the same one encountered before for the normalization. The result shows that E(X) = µ. • The Variance The variance is given by V (X) = E((X − µ)2 ) Z ∞ (x − µ)2 1 (x − µ)2 exp − dx = √ 2σ 2 2πσ −∞ Z 2σ 2 ∞ 2 = √ t exp −t2 dt π −∞ (49) = σ2 , where we√have made a change of integration variable from x to t = (x − µ)/( 2σ). Therefore σ is indeed the standard deviation. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 34 • Probability For any x2 ≥ x1 , the probability of find X having a value between x1 and x2 is Z x2 1 (x − µ)2 P (x1 ≤ X ≤ x2 ) = √ exp − dx (50) 2σ 2 2πσ x1 Z t2 1 = √ exp[−t2 ]dt π t1 Z t2 Z t1 1 − exp[−t2 ]dt = √ π 0 0 1 = [Φ(t2 ) − Φ(t1 )], 2 where where variable from x to √ we made a change of integration √ √t= (x − µ)/( 2σ), and so t1 = (x1 − µ)/( 2σ) and t2 = (x2 − µ)/( 2σ), and Φ(t) is the error function defined by Z t 2 exp[−x2 ]dx. (51) Φ(t) = √ π 0 Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 35 Since the integrand is a positive function, Φ(t) is monotonic increasing function of t. At t = 0, clearly Φ(0) = 0. At t = ∞, the integral can be evaluated exactly and one finds that Φ(∞) = 1. Moreover Z −t 2 Φ(−t) = √ exp[−x2 ]dx (52) π 0 Z t 2 = −√ exp[−u2 ]du π 0 = −Φ(t), where we made a change of integration variable from x to u = −x. Therefore Φ(t) is an odd function, which has the value of −1 at t = −∞, rises monotonically to 0 at t = 0 and then to 1 at t = ∞. Here the change of integration variable is from x to t = (x − µ)/σ Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 36 Normal Distribution: 3 σs 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 −4 −2 0 2 x µ 4 6 8 Figure 5: The probability that the outcome falls within 3σs is 0.9973% for the case of a normal distribution. In this example, µ = 3 and σ = 1, and so that probability is given by the area under the curve from 0 to 6. Toc JJ II J I Back J Doc Doc I Section 3: Continuous Random Variables and Their Properties 37 • The Rule of 3σs Letting x1 = µ − 3σ and x2 = µ + 3σ in Eq. (50), we see that 1 3 3 P (µ − 3σ ≤ X ≤ µ + 3σ) = Φ √ (53) − Φ −√ 2 2 2 3 ≈ 0.9973. = Φ √ 2 What this means is that in a single trial, it is highly unlikely (less than 0.3% of a chance) to obtain a value for the normal random variable X that differs from its mean value by more than 3σs. • The most probable error The most probable error, r, is defined so that P (µ − r ≤ X ≤ µ + r) = 1 , 2 which becomes 1 r r 1 r = , =Φ √ − Φ −√ Φ √ 2 2 2σ 2σ 2σ Toc JJ II J I Back J Doc (54) (55) Doc I Section 3: Continuous Random Variables and Their Properties 38 Normal Distribution: Most Probable Value 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 −4 −2 0 2 x µ 4 6 8 Figure 6: There is equal chance that the outcome falls outside or inside the shaded region in the case of a normal distribution. This determines the most probable value. In this example, µ = 3 and σ = 1. Toc JJ II J I Back J Doc Doc I Section 4: Sampling Distribution 39 √ √ by setting t1,2 = [(µ ± r) − µ]/( 2σ) = ∓r/( 2σ) in Eq. (50). Thus the most probable error is √ r = 2Φ−1 (1/2)σ ≈ 0.6745σ, (56) where Φ−1 is the inverse function of Φ. 4. Sampling Distribution We are interested in functions of N random variables X1 , X2 , . . . , XN observed in a random sample selected from a population of interest. The variables X1 , X2 , . . . , XN are assumed to be independent of each other and obeying the same common distribution. From a random sample of N observations, x1 , x2 , . . . , xN , we can estimate the population mean µ with the sample mean x̄ = N 1 X xn . N n=1 (57) The goodness of this estimate depends on the behavior of the random variables X1 , X2 , . . . , XN and the effect that this behavior has on the Toc JJ II J I Back J Doc Doc I Section 4: Sampling Distribution random variable X̄ = 40 N 1 X Xn . N n=1 (58) Now suppose that the Xn s are independent, normally distributed variables, with a common mean E(Xn ) = µ and a common variance V (Xn ) = σ 2 , for n = 1, 2, . . . , N . It can be shown that X̄ is a normally distributed random variable. Its mean is ! N N N 1 X 1 X 1 X E(X̄) = E Xn = E(Xn ) = µ = µ, (59) N n=1 N n=1 N n=1 and its variance is ! N N N 1 X 1 X 2 σ2 1 X Xn = 2 V (Xn ) = 2 σ = . (60) V (X̄) = V N n=1 N n=1 N n=1 N Therefore X̄ has a mean µ̄ = µ and a variance σ̄ 2 = σ 2 /N . Toc JJ II J I Back J Doc Doc I Section 4: Sampling Distribution 41 4.1. Central Limit Theorem It turns out that to a good approximation, X̄ has a normal distribution with a mean µ̄ = µ and a variance σ̄ 2 = σ 2 /N even when the Xn s are not normally distributed. The requirement is that the sample size has to be large enough and a few individual variables do not play too great a role in determining e sum. This is a consequence of the Central Limit Theorem. In Nature, the behavior of a variable often depends on the accumulated effect of a large number of small random factors and so its behavior is approximately normal. This is why we call it ”normal”. The Central Limit Theorem can be stated more formally as follows: Let X1 , X2 , . . . , XN be independent and identically distributed (iid) random variables with mean E(Xn ) = µ and variance V (Xn ) = σ 2 , then as N → ∞, the variable X̄ = N 1 X Xn N n=1 (61) has a distribution function that converges to a normal distribution Toc JJ II J I Back J Doc Doc I Section 4: Sampling Distribution 42 function with mean µ̄ = µ and variance σ̄ 2 = σ 2 /N . In reality, the Xn s need not have the same identical distribution function, and N needs not be extremely large. Often N = 5 or more is large enough for a reasonably good approximate normal behavior for X̄. Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 43 5. General Scheme of Monte Carlo Simulation Method In a Monte Carlo simulation, we compute a quantity of interest, say q, by randomly sampling a population. We start by identifying a random variable X whose mean µ is supposed to be given by q, the quantity we want to compute. Let us denote the variance of X by σ 2 . Suppose we sample the population N times where N is supposed to be a large number. So we consider N independent random variables X1 , X2 , . . . , XN , each obeying the same distribution as X, with the same mean µ and variance σ 2 . From the Central Limit Theorem, the variable N 1 X X̄ = Xn (62) N n=1 is approximately normal with a mean µ̄ = µ and a variance σ̄ 2 = σ 2 /N . Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 44 Applying the Rule of the 3σs to X̄, we see that 3σ 3σ √ √ P (µ − 3σ̄ ≤ X̄ ≤ µ + 3σ̄) = P µ − ≤ X̄ ≤ µ + (63) N N ! N 1 X 3σ = P Xn − µ ≤ √ N N n=1 ≈ 0.9973. Therefore we can approximate the value of q by the sample mean x̄ = N 1 X xn . N n=1 (64) In all likelihood (with √ a 99.73% chance for being correct), the error √ does not exceed 3σ/ N . In practice, the bound given by 3σ/ N is too loose (overly cautious), √ it is commonly replaced by the most probable error of 0.6745σ/ N . Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 45 5.1. General Remarks on Monte Carlo Methods Three remarks are in order: 1. Note that the most probable error decreases with increasing N 1 as N − 2 . That means that our estimation for the value of q from the sample mean gets better and better if more and more trials or samples are used in the simulation. However, the dependence of the error on N to the power of −1/2 means that the convergence of the Monte Carlo method is in general very slow. So for example, if we want to further decrease the probable error by a factor of 10, we will need to use 100 times more samples. 2. The most probable error is proportional to σ. Different ways of sampling the population lead to different values of σ. In the next month or so, we will find different sampling schemes that will reduce the value of σ and therefore reduce the probable error for the same number of trials, N . These techniques are referred to as variance reduction techniques. 3. The strong point about the Monte Carlo methods is that the methods are in general very flexible and can be readily adapted Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 46 to treat rather complex models. Recall that in our first example where we computed the value of π by performing a simulation to find the ratio of dust particles inside a unite circle to the total number of particles, we can use exactly the same simulation to find the area an any arbitrary two-dimensional objects. The only change we need to make is to change program line, that determines whether a given point (x, y) lands inside the object. In the case of a unity circle, the point lies within the circle if x2 + y 2 < 1. For an object with a more complicated shape, we will need to replace that by a more complicated function. Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 47 References [1] Wackerly, Mendenhall and Scheaffer, ”Mathematical Statistics with Applications”, 5th edition, (Duxburg, 1996). [2] A proof can be found on page 82 of Wackerly, Mendenhall and Scheaffer, ”Mathematical Statistics with Applications”, 5th edition, (Duxburg, 1996). 9 Toc JJ II J I Back J Doc Doc I Section 5: General Scheme of Monte Carlo Simulation Method 48 ASSIGNMENT 2 Problem 3 Chapter 4 of the textbook, problem 4.1. ASSIGNMENT 3 Problem 4 Chapter 4 of the textbook, problem 4.2. Toc JJ II J I Back J Doc Doc I
© Copyright 2026 Paperzz