download

POLYTECHNIC UNIVERSITY
Department of Computer and Information Science
PROBABILITY AND
STATISTICS
K. Ming Leung
Abstract: This chapter provides a brief coverage of the
basics in the area of probability and statistics that are
relevant to simulation.
Directory
• Table of Contents
• Begin Article
c 2000 [email protected]
Copyright Last Revision Date: February 8, 2002
Version 1.0
Table of Contents
1. Introduction
2. Discrete Random Variables and Their Properties
2.1. Cumulative Distribution Function
2.2. Probability Mass Function
2.3. Expected Value
2.4. Variance and Standard Deviation
2.5. General Properties
3. Continuous Random Variables and Their Properties
3.1. Uniform Probability Distribution
• Expected Value • Variance • Mapping to Different Domain
3.2. Normal Probability Distribution
• Expected Value • The Variance • Probability • The Rule
of 3σs • The most probable error
4. Sampling Distribution
4.1. Central Limit Theorem
5. General Scheme of Monte Carlo Simulation Method
5.1. General Remarks on Monte Carlo Methods
Section 1: Introduction
3
1. Introduction
In a simulation study, probability and statistics are needed to understand how to model a probabilistic system, validate the simulation
model, choose the input probability distributions, generate random
samples from these distributions, perform statistical analysis of the
simulation output data, and design the simulation experiment. This
chapter provides only a minimal coverage of the basics in the area of
probability and statistics that are relevant to simulation.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
4
2. Discrete Random Variables and Their Properties
In probability theory, a process whose outcome cannot be predicted
with certainty is called an experiment. The set of all possible outcomes of an experiment is called the sample space S. The outcomes
themselves are specified as sample points in the sample space. In the
discrete case, the number of sample points is either finite or countably
infinite (meaning that the number of sample points can be put in a
one-to-one correspondence with the set of positive integers).
For example, the process of tossing a die is an experiment, whose
outcome is either 1, 2, 3, 4, 5 or 6, has the sample space
S = {1, 2, 3, 4, 5, 6} .
(1)
If the experiment consists of rolling a pair of dice, then its sample has
36 elements given by
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), . . . , (6, 6)} ,
(2)
where (i, j) means that i and j appeared on the first and second die,
respectively.
A random variable X is defined as a function or rule that assigns
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
5
a real number, x, to each point in the sample space S. A random
variable X is discrete if it can only take on a countable (possibly
infinite) number of possible values, x1 , x2 , . . . .
In the first experiment, if X is the random variable corresponding
to the face value of the die, then the possible values that X can take on
are 1, 2, 3, 4, 5, 6. In the second example, if X is the random variable
corresponding to the sum of the two dice, then X assigns the value x
of 7 to the outcome (4, 3).
We will adopt the common notation of using capital letters such
as X, Y , Z to denote random variables, and lower case letters such
as x, y, z for the corresponding values taken on by these variables.
The cumulative distribution function F (x) of the random variable
X is defined for any real values of x by
F (x) = P (X ≤ x)
− ∞ < x < ∞,
(3)
where P (X ≤ x) is the probability associated with the event {X ≤ x}.
Thus F (x) is the probability that in an experiment the random variable X takes on a value no larger than x.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
6
2.1. Cumulative Distribution Function
A cumulative distribution function F (x) has the following properties:
1. 0 ≤ F (x) ≤ 1 for all x.
2. F (x) is non-decreasing [ i.e. if x1 < x2 , then F (x1 ) ≤ F (x2 ) ].
3. F (−∞) = 0 and F (∞) = 1.
They can be easily proved from the definition of F (x).
2.2. Probability Mass Function
The probability that X takes on the value x is given by the probability
mass function
p(x) = P (X = x),
(4)
which must satisfy the following 2 conditions:
1. p(x) ≥ 0 for all x, since probabilities cannot be negative. We
usually omit any value of x which has p(x) = 0. Then we can
assume p(x) > 0 for all x.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
2.
7
P∞
x p(x) = 1, where the sum is to be carried out over all the
possible values of x. Each allowed value of x contributes exactly one term to the sum. This relation is referred to as the
normalization condition.
Example 1: Consider a discrete random variable X that describe
the outcome of tossing a die. The possible values of x and their
respective probabilities are given by
1 2 3 4 5 6
x
p(x) 16 16 16 16 16 16
For any x, the cumulative distribution function can be written as
in terms of the probability mass function:
X
F (x) =
p(x0 ),
(5)
x0 ≤x
where the sum is to be carried out over all the values of x0 that are
less than or equal to x.
Example 2: Consider a discrete random variable X that obeys the
following distribution:
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
x
p(x)
0
1
2
1
4
1
2
1
4
8
F(x) 6
1
3
4
1
2
1
4
0
1
x
2
To obtain the cumulative probability distribution function F (x),
notice that F (x) must be 0 for x < 0 since X cannot take on a negative
value. At x = 0 we see that F (x) = P (X ≤ 0) = P (X = 0) = 14 .
F (x) continues to have this value till x = 1. At x = 1, F (x) = P (X ≤
1) = P (X = 0) + P (X = 1) = 41 + 12 = 34 . F (x) continues to have this
value till x = 2. At x = 2, F (x) = P (X ≤ 2) = P (X = 0) + P (X =
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
9
1) + P (X = 2) = 14 + 12 · 14 = 1. Clearly F (x) = 1 for x ≥ 2. So
F (x) remains constant except at each possible values of x where the
function jumps up by an amount p(x).
2.3. Expected Value
The expected value of X is defined as
X
E(X) =
xp(x) = µ,
(6)
x
and represents the mean or average value. For example 2, we have
1
1
1
(7)
E(X) = 0 · + 1 · + 2 · = 1.
4
2
4
It can be shown [1] that if g(X) is a real-valued function of X,
then the expected value of g(X) is given by
X
E(g(X)) =
g(x)p(x).
(8)
x
However, unless g is a linear function,
E(g(X)) 6= g(E(X)).
Toc
JJ
II
J
I
Back
(9)
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
10
Example 3: Consider next a discrete random variable X that obeys
the following distribution:
-1.0 0.5
0 1.5 4.0
x
p(x)
0.1 0.2 0.2 0.4 0.1
E(X) = −1 · 0.1 + 0.5 · 0.2 + 0 · 0.1 + 1.5 · 0.4 + 4.0 · 0.1 = 1. (10)
Therefore the expected value is 1 just like example 2. However the
two distributions are quite different. The values in examples 3 are
distributed over a wider range than in example 2. A measure of the
width of the distribution comes from the variance of the distribution.
2.4. Variance and Standard Deviation
The variance of a random variable is defined by
V (X) = E((X − µ)2 ).
(11)
It measures the sum of the squares of the deviation of the possible
values of X from the mean value. Notice that the variance is the
expected value of the square of a quantity and therefore cannot be
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
11
negative. It takes on its lowest possible value of zero for the case
where X can assume a single value only. The distribution has no
spread at all, and the variable X is really not random.
For example 2, since the mean is 1, we have
1
1
1
1
(12)
V (X) = (0 − 1)2 · + (1 − 1)2 · + (2 − 1)2 · = .
4
2
4
2
However, for example 3,
V (X)
=
(−1 − 1)2 · 0.1 + (0.5 − 1)2 · 0.2 + (0 − 1)2 · 0.2 (13)
+(1.5 − 1)2 · 0.4 + (4 − 1)2 · 0.1 = 2.55,
is substantially larger than in example 2.
Another useful quantity is the standard deviation σ defined by
p
σ = V (X).
(14)
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
12
2.5. General Properties
The following results involving the expected values can be easily proved:
For an constant c
X
X
cp(x) = c
p(x) = c.
(15)
E(c) =
x
x
For any constant c and any function of the random variable X, g(X),
X
cg(x)p(x) = c
X
g(x)p(x) = cE(g(X)).
(16)
If g1 (X), g2 (X), . . . are functions of X, then
X
E(g1 (X) + g2 (X) + . . . ) =
(g1 (x) + g2 (x) + . . . )p(x)
(17)
E(cg(X)) =
y
x
x
=
X
g1 (x)p(x) +
x
X
g2 (x)p(x) + . . .
x
= E(g1 (x)) + E(g2 (x)) + . . . .
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
13
Using the above results, we have
V (X)
= E((X − µ)2 ) = E(X 2 − 2µX + µ)
= E(X 2 ) + E(−2µX) + E(µ2 )
= E(X 2 ) − 2µE(X) + µ2 .
(18)
Thus we obtain the following important result for the variance:
V (X) = E(X 2 ) − µ2 .
(19)
Next we consider the effect of shifting the random variable on
the variance by computing V (X + c) where c is a constant. First,
we have V (X + c) = E((X + c)2 ) − (E(X + c))2 . The second term
E(X + c) = E(X) + E(c) = µ + c. The first term E((X + c)2 ) =
E(X 2 ) + 2cE(X) + c2 = E(X 2 ) + 2cµ + c2 . Therefore V (X + c) =
E(X 2 ) + 2cµ + c2 − (µ + c)2 = E(X 2 ) − µ2 = V (X). This result
is to be expected because a rigid shift in the distribution should not
change the breath of the distribution.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 2: Discrete Random Variables and Their Properties
14
Another result is
Toc
V (cX)
= E((cX)2 ) − (E(cX))2
= c2 (E(X 2 ) − (E(X))2 ) = c2 V (X).
JJ
II
J
I
Back
J
Doc
(20)
Doc
I
Section 3: Continuous Random Variables and Their Properties
15
3. Continuous Random Variables and Their Properties
If the cumulative distribution function of a random variable X defined
by
F (x) = P (X ≤ x) for − ∞ < x < ∞
(21)
is a continuous function of x, then X is a continuous random variable.
Since the probability of finding the random variable X taking on a
value less than or equal to −∞ is zero, we must have F (−∞) = 0.
On the other hand, the probability of finding the random variable X
taking on a value less than or equal to ∞ is unity, therefore we must
have F (∞) = 1.
The probability density function f (x) for X is defined as the derivative of the cumulative distribution function:
dF
f (x) =
,
(22)
dx
wherever the derivative exists. By writing
dF = f (x)dx,
Toc
JJ
II
J
I
(23)
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
16
changing the variable from x to t to get
dF = f (t)dt,
(24)
and integrating both sides of the equation from t = −∞ to t = x, we
have
Z x
Z x
dF =
f (t)dt.
(25)
−∞
−∞
The integral on the left-hand side can be evaluated to give F (x) −
F (−∞). Since F (−∞) = 0, we now have the cumulative distribution
function expressed in terms of an integral over the probability density
function:
Z x
F (x) =
f (t)dt.
(26)
−∞
Thus for any x0 , the area underneath the curve of f (x) from −∞ to x0
is precisely F (x0 ). We refer f (x) as the probability density function
because its integral gives a probability. This is analogous to the fact
that in Physics, the integral of the mass density over a certain region
gives the mass of that region.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
17
Weibull(1.5,6.0) Distribution
0.16
0.14
0.12
f(x)
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
x
Figure 1:
Probability density function for the Weibull distribution with the shape parameter α = 1.5 and the scale parameter
β = 6.0. The Weibull
density function is defined by
h probability
i
2
−α α−1
f (x) = αβ x
exp − (x/β) for x in [0, ∞).
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
18
The probability density function f (x) obeys the following properties:
1. Since F (x) is a monotonic non-decreasing function of x, we must
have f (x) = dF/dy ≥ 0 for any value of x.
R∞
2. Since F (∞) = 1, therefore −∞ f (x)dx = 1.
Since F (x) = P (X ≤ x) gives the probability that X ≤ x, the
probability that X has a value in the interval a ≤ y ≤ b is
P (a ≤ X ≤ b)
= P (X ≤ b) − P (X ≤ a) = F (b) − F (a) (27)
Z b
Z a
Z b
=
f (x)dx −
f (x)dx =
f (x)dx.
−∞
−∞
a
Graphically P (a ≤ X ≤ b) is represented by the area of the region
below the f (x) curve from a to b. If we set a = b = x0 , we see the
probability P (X = x0 ) of finding X having any particular value x0 is
exactly zero. This is in sharp contrast to the case of discrete random
variables where P (X = x0 ) is non-zero for certain discrete values of
x0 .
The expected value of a continuous variable X is defined in terms
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
19
Weibull(1.5,6.0) Distribution
0.16
0.14
0.12
f(x)
0.1
0.08
0.06
0.04
0.02
0
0
x1
5
x2
10
15
20
x
Figure 2:
P (x1 ≤ X ≤ x2 ) is given by the area in the shaded
region as illustrated here for the the Weibull distribution with the shape
parameter α = 1.5 and the scale parameter β = 6.0.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
20
of an integral by
Z
∞
E(X) =
xf (x)dx = µ,
(28)
−∞
provided that the integral exists. Since the integral is a limiting form
of a sum, consequently the results that we obtained for the discrete
case are also valid for the continuous case. Therefore we have
Z ∞
E(g(X)) =
g(x)f (x)dx
(29)
−∞
E(c) = c
E(cg(X)) = cE(g(X))
E(g1 (X) + g2 (X) + . . . ) = E(g1 (X)) + E(g2 (X)) + . . . .
The variance of a continuous variable X is defined by
V (X) = E((X − µ)2 ).
(30)
Using the above results for the expected values, one can easily show
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
21
that the variance can also be expressed in the form
V (X)
Toc
JJ
= E((X − µ)2 ) = E(X 2 − 2µX + µ)
= E(X 2 ) − 2µE(X) + E(µ2 ) = E(X 2 ) − µ2 .
II
J
I
Back
J
Doc
(31)
Doc
I
Section 3: Continuous Random Variables and Their Properties
22
3.1. Uniform Probability Distribution
Consider a random variable X whose values, x, are distributed uniformly in the interval [a, b]. The probability of finding X having a
value in the interval from x1 to x1 + ∆x must then be the same as in
the interval from x2 to x2 + ∆x, for any x1 and x2 as long as the two
intervals lie within [a, b]. Therefore we have
P (x1 ≤ X ≤ x1 + ∆x) = P (x2 ≤ X ≤ x2 + ∆x)
and so
Z
x1 +∆x
Z
x2 +∆x
f (x)dx =
x1
(32)
f (x)dx.
(33)
x2
It is clear that the probability density function f (x must be a constant
in [a, b]. If c is that constant, then normalization requires that
Z b
Z b
f (x)dx = c
dx = c(b − a) = 1.
(34)
a
a
Thus we have c = 1/(b − a). Outside the interval [a, b], f (x) must
be zero, and so the probability density function for a uniform random
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
23
variable is given by
f (x) =


0,
x<a
a<x<b
x ≥ b.
dF
1
= b−a
,

dx

0,
(35)
We can then use Eq. (26) to find the cumulative distribution function
F (x). Since f (x) is 0 for x ≤ 0, we see that F (x) must be 0 for x ≤ 0.
For x in [a, b], f (x) = 1/(b − a) and so
Z
1
y
F (x) =
dx + c =
+ c,
(36)
b−a
b−a
where c is an integration constant. To find c, we require that F (x)
must be continuous at x = a. But we know that F (a) = 0, and so
a
+ c = 0.
(37)
b−a
Solving for c gives us c = −a/(b − a), and therefore F (x) = (y −
a)/(b − a). Notice that at x = b, F (b) = 1, and since f (x) = 0 for
x > 1, we must have F (x) = 1 for x ≥ b. The cumulative distribution
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
24
function can be written as
F (x) =


0,
x−a
,
 b−a

1,
x<a
a≤x≤b
x ≥ b.
(38)
Notice that although the probability distribution function for a
continuous random variable must be continuous everywhere, the probability density function need not be everywhere continuous.
• Expected Value
The expected value is
b
Z
µ = E(X)
=
x
a
1
1
dx =
b−a
b−a
x2
2
b
(39)
a
1 b2 − a2
a+b
=
,
b−a
2
2
which is located at the center of the interval, as expected.
=
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
25
• Variance
To calculate the variance, we first calculate
3 b
Z b
1
x
2
2 1
E(X ) =
x
dx =
b
−
a
b
−
a
3 a
a
=
(40)
b3 − a3
a2 + ab + b2
=
.
3(b − a)
3
Then the variance is given by
V (X)
= E(X 2 ) − µ2
2
=
2
a + ab + b
−
3
(41)
a+b
2
2
2
=
(b − a)
.
12
If a = 0 and b = 1, then we are considering the usual interval [0, 1]
for the uniform variates. We see that µ = 1/2 and V (X) is 1/12.
• Mapping to Different Domain
A natural way to generate random numbers x that are distributed
uniformly in [a, b] is to have a mapping from the uniform variates u
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
26
generated from a pseudo-random number generator on [0, 1] to x. In
order to preserve the uniformity in the distribution of these numbers,
the mapping (or relationship between u and x) should be a linear
one. We will now derive this relationship. In fact we will do this in
three different ways, each using different techniques and looking at
the problem from different angles.
0
u0
b−a
0
0
u
1
a
x
b
One approach is to first scale u by multiplying it with a scale factor
(b − a) so that u0 = (b − a)u is distributed uniformly between 0 and
b−a. We then shift u0 by an amount a so that x = u0 +a = (b−a)u+a
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
27
are distributed uniformly between a and b. We illustrate the two steps
of the transformation in the figure. [Notice that it is possible to shift
the interval first and then scale it, but it requires is a little more
algebra to figure out the correct amount of shift.]
A systematic but slightly more lengthen way to derive the relationship is to first write out the most general linear relationship between
u and x:
x = αu + β,
(42)
where α and β are two constants that have to be determined. We
want the point u = 0 to map to x = a and u = 1 to x = b, Setting
u = 0 and x = a in the above equation gives β = a. Then setting
u = 1 and x = b gives us b = α + a, and therefore we have α = b − a.
Thus we obtain the same relationship as before:
x = (b − a)u + a.
Toc
JJ
II
J
I
(43)
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
u
s
0
u
1
x
s
0
a
28
x
b
The quickest way (that I know) to derive the above relationship is
to consider a general point u and its image x in the following drawing.
Since the mapping is a linear one, the relative positions of these two
points within their own segments in the two coordinate systems must
be identical. Therefore we have
u−0
x−a
=
.
(44)
1−0
b−a
Solving for x we obtain exactly the same relation as before.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
29
3.2. Normal Probability Distribution
The normal probability distribution is another often encountered distribution. It is especially important in understanding how computer
simulations based on random sampling work.
A random variable X has a normal probability distribution if it
has a probability density function
1
(x − µ)2
f (x) = √
exp −
− ∞ < x < ∞,
(45)
2σ 2
2πσ
with real parameters µ and σ. In addition, σ must be positive. As
will be shown below, the expected value E(X) is µ, and the standard
deviation is σ. Notice that f (x) is properly normalized since
Z ∞
Z ∞
1
(x − µ)2
√
exp −
dx
(46)
f (x)dx =
2σ 2
2πσ
−∞
−∞
2
Z ∞
1
t
exp −
dt = 1,
= √
2
2π −∞
where we have a change of variable from x to t = (x − µ)/σ.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
30
Normal Distribution
0.45
0.4
0.35
f(x)
0.3
0.25
0.2
0.15
0.1
0.05
0
−4
−2
0
2
x
µ
4
6
8
Figure 3: Probability density function of a normal distribution. In
this example, µ = 3 and σ = 1.
The probability density function has the form of a bell-shape
curve
√
with a single peak at x = µ. The value at the peak is 1/ 2πσ, and
so the smaller σ is the higher is the peak.
Letting x = µ + h be the point at which f (x) is equal to half its
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
31
peak value, inserting it into Eq. (45) and setting f to a half, yields
h2
1
exp − 2 = .
(47)
2σ
2
Solving√for h, which is referred to as the half-width of the peak, gives
h = ± 2 ln 2 σ ≈ ±1.177 σ. Decreasing σ decreases h and the peak
becomes sharper. At the same time the height of the peak increases
so that the total area underneath f (x) remains unity.
• Expected Value
The expected value can be easily calculated because
Z ∞
1
(x − µ)2
x exp −
dx
(48)
E(X) = √
2σ 2
2πσ −∞
Z ∞
1
t2
(t + µ) exp − 2 dt
= √
2σ
2πσ −∞
Z ∞
Z ∞
µ
t2
1
t2
exp − 2 dt.
= √
t exp − 2 dt + √
2σ
2σ
2πσ −∞
2πσ −∞
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
32
Width of the Normal Distribution
0.45
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
−4
−2
0
2
x
µ
4
6
8
Figure 4: The half-width of of a normal distribution. In this example, µ = 3 and σ = 1.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
33
. The first integral is zero since the integrand is an odd function. The
second integral is basically the same one encountered before for the
normalization. The result shows that E(X) = µ.
• The Variance
The variance is given by
V (X)
= E((X − µ)2 )
Z ∞
(x − µ)2
1
(x − µ)2 exp −
dx
= √
2σ 2
2πσ −∞
Z
2σ 2 ∞ 2
= √
t exp −t2 dt
π −∞
(49)
= σ2 ,
where we√have made a change of integration variable from x to t =
(x − µ)/( 2σ). Therefore σ is indeed the standard deviation.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
34
• Probability
For any x2 ≥ x1 , the probability of find X having a value between x1
and x2 is
Z x2
1
(x − µ)2
P (x1 ≤ X ≤ x2 ) = √
exp −
dx (50)
2σ 2
2πσ x1
Z t2
1
= √
exp[−t2 ]dt
π t1
Z t2 Z t1 1
−
exp[−t2 ]dt
= √
π
0
0
1
=
[Φ(t2 ) − Φ(t1 )],
2
where where
variable from x to
√ we made a change of integration
√
√t=
(x − µ)/( 2σ), and so t1 = (x1 − µ)/( 2σ) and t2 = (x2 − µ)/( 2σ),
and Φ(t) is the error function defined by
Z t
2
exp[−x2 ]dx.
(51)
Φ(t) = √
π 0
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
35
Since the integrand is a positive function, Φ(t) is monotonic increasing
function of t. At t = 0, clearly Φ(0) = 0. At t = ∞, the integral can
be evaluated exactly and one finds that Φ(∞) = 1. Moreover
Z −t
2
Φ(−t) = √
exp[−x2 ]dx
(52)
π 0
Z t
2
= −√
exp[−u2 ]du
π 0
= −Φ(t),
where we made a change of integration variable from x to u = −x.
Therefore Φ(t) is an odd function, which has the value of −1 at t =
−∞, rises monotonically to 0 at t = 0 and then to 1 at t = ∞. Here
the change of integration variable is from x to t = (x − µ)/σ
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
36
Normal Distribution: 3 σs
0.45
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
−4
−2
0
2
x
µ
4
6
8
Figure 5:
The probability that the outcome falls within 3σs is
0.9973% for the case of a normal distribution. In this example, µ = 3
and σ = 1, and so that probability is given by the area under the curve
from 0 to 6.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 3: Continuous Random Variables and Their Properties
37
• The Rule of 3σs
Letting x1 = µ − 3σ and x2 = µ + 3σ in Eq. (50), we see that
1
3
3
P (µ − 3σ ≤ X ≤ µ + 3σ) =
Φ √
(53)
− Φ −√
2
2
2
3
≈ 0.9973.
= Φ √
2
What this means is that in a single trial, it is highly unlikely (less than
0.3% of a chance) to obtain a value for the normal random variable
X that differs from its mean value by more than 3σs.
• The most probable error
The most probable error, r, is defined so that
P (µ − r ≤ X ≤ µ + r) =
1
,
2
which becomes
1
r
r
1
r
= ,
=Φ √
− Φ −√
Φ √
2
2
2σ
2σ
2σ
Toc
JJ
II
J
I
Back
J Doc
(54)
(55)
Doc
I
Section 3: Continuous Random Variables and Their Properties
38
Normal Distribution: Most Probable Value
0.45
0.4
0.35
0.3
f(x)
0.25
0.2
0.15
0.1
0.05
0
−4
−2
0
2
x
µ
4
6
8
Figure 6:
There is equal chance that the outcome falls outside
or inside the shaded region in the case of a normal distribution. This
determines the most probable value. In this example, µ = 3 and σ = 1.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 4: Sampling Distribution
39
√
√
by setting t1,2 = [(µ ± r) − µ]/( 2σ) = ∓r/( 2σ) in Eq. (50). Thus
the most probable error is
√
r = 2Φ−1 (1/2)σ ≈ 0.6745σ,
(56)
where Φ−1 is the inverse function of Φ.
4. Sampling Distribution
We are interested in functions of N random variables X1 , X2 , . . . , XN
observed in a random sample selected from a population of interest.
The variables X1 , X2 , . . . , XN are assumed to be independent of each
other and obeying the same common distribution. From a random
sample of N observations, x1 , x2 , . . . , xN , we can estimate the population mean µ with the sample mean
x̄ =
N
1 X
xn .
N n=1
(57)
The goodness of this estimate depends on the behavior of the random
variables X1 , X2 , . . . , XN and the effect that this behavior has on the
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 4: Sampling Distribution
random variable
X̄ =
40
N
1 X
Xn .
N n=1
(58)
Now suppose that the Xn s are independent, normally distributed
variables, with a common mean E(Xn ) = µ and a common variance
V (Xn ) = σ 2 , for n = 1, 2, . . . , N . It can be shown that X̄ is a normally
distributed random variable. Its mean is
!
N
N
N
1 X
1 X
1 X
E(X̄) = E
Xn =
E(Xn ) =
µ = µ,
(59)
N n=1
N n=1
N n=1
and its variance is
!
N
N
N
1 X
1 X 2
σ2
1 X
Xn = 2
V (Xn ) = 2
σ =
. (60)
V (X̄) = V
N n=1
N n=1
N n=1
N
Therefore X̄ has a mean µ̄ = µ and a variance σ̄ 2 = σ 2 /N .
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 4: Sampling Distribution
41
4.1. Central Limit Theorem
It turns out that to a good approximation, X̄ has a normal distribution with a mean µ̄ = µ and a variance σ̄ 2 = σ 2 /N even when the
Xn s are not normally distributed. The requirement is that the sample
size has to be large enough and a few individual variables do not play
too great a role in determining e sum. This is a consequence of the
Central Limit Theorem.
In Nature, the behavior of a variable often depends on the accumulated effect of a large number of small random factors and so its
behavior is approximately normal. This is why we call it ”normal”.
The Central Limit Theorem can be stated more formally as follows:
Let X1 , X2 , . . . , XN be independent and identically distributed
(iid) random variables with mean E(Xn ) = µ and variance V (Xn ) =
σ 2 , then as N → ∞, the variable
X̄ =
N
1 X
Xn
N n=1
(61)
has a distribution function that converges to a normal distribution
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 4: Sampling Distribution
42
function with mean µ̄ = µ and variance σ̄ 2 = σ 2 /N .
In reality, the Xn s need not have the same identical distribution
function, and N needs not be extremely large. Often N = 5 or more
is large enough for a reasonably good approximate normal behavior
for X̄.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
43
5. General Scheme of Monte Carlo Simulation Method
In a Monte Carlo simulation, we compute a quantity of interest, say
q, by randomly sampling a population. We start by identifying a
random variable X whose mean µ is supposed to be given by q, the
quantity we want to compute. Let us denote the variance of X by σ 2 .
Suppose we sample the population N times where N is supposed
to be a large number. So we consider N independent random variables
X1 , X2 , . . . , XN , each obeying the same distribution as X, with the
same mean µ and variance σ 2 . From the Central Limit Theorem, the
variable
N
1 X
X̄ =
Xn
(62)
N n=1
is approximately normal with a mean µ̄ = µ and a variance σ̄ 2 =
σ 2 /N .
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
44
Applying the Rule of the 3σs to X̄, we see that
3σ
3σ
√
√
P (µ − 3σ̄ ≤ X̄ ≤ µ + 3σ̄) = P µ −
≤ X̄ ≤ µ +
(63)
N
N
!
N
1 X
3σ
= P Xn − µ ≤ √
N
N
n=1
≈ 0.9973.
Therefore we can approximate the value of q by the sample mean
x̄ =
N
1 X
xn .
N n=1
(64)
In all likelihood (with
√ a 99.73% chance for being correct), the error
√
does not exceed 3σ/ N . In practice, the bound given by 3σ/ N
is too loose (overly cautious),
√ it is commonly replaced by the most
probable error of 0.6745σ/ N .
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
45
5.1. General Remarks on Monte Carlo Methods
Three remarks are in order:
1. Note that the most probable error decreases with increasing N
1
as N − 2 . That means that our estimation for the value of q from
the sample mean gets better and better if more and more trials
or samples are used in the simulation. However, the dependence
of the error on N to the power of −1/2 means that the convergence of the Monte Carlo method is in general very slow. So for
example, if we want to further decrease the probable error by a
factor of 10, we will need to use 100 times more samples.
2. The most probable error is proportional to σ. Different ways of
sampling the population lead to different values of σ. In the next
month or so, we will find different sampling schemes that will
reduce the value of σ and therefore reduce the probable error
for the same number of trials, N . These techniques are referred
to as variance reduction techniques.
3. The strong point about the Monte Carlo methods is that the
methods are in general very flexible and can be readily adapted
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
46
to treat rather complex models. Recall that in our first example
where we computed the value of π by performing a simulation
to find the ratio of dust particles inside a unite circle to the total
number of particles, we can use exactly the same simulation to
find the area an any arbitrary two-dimensional objects. The
only change we need to make is to change program line, that
determines whether a given point (x, y) lands inside the object.
In the case of a unity circle, the point lies within the circle if
x2 + y 2 < 1. For an object with a more complicated shape, we
will need to replace that by a more complicated function.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
47
References
[1] Wackerly, Mendenhall and Scheaffer, ”Mathematical Statistics
with Applications”, 5th edition, (Duxburg, 1996).
[2] A proof can be found on page 82 of Wackerly, Mendenhall
and Scheaffer, ”Mathematical Statistics with Applications”, 5th
edition, (Duxburg, 1996).
9
Toc
JJ
II
J
I
Back
J
Doc
Doc
I
Section 5: General Scheme of Monte Carlo Simulation Method
48
ASSIGNMENT 2
Problem 3 Chapter 4 of the textbook, problem 4.1.
ASSIGNMENT 3
Problem 4 Chapter 4 of the textbook, problem 4.2.
Toc
JJ
II
J
I
Back
J
Doc
Doc
I