Probability and Random Variable Primer • Sample space and

B. Maddah
ENMG 622 Simulation
02/22/10
Probability and Random Variable Primer
• Sample space and Events
¾ Suppose that an experiment with an uncertain outcome is
performed (e.g., rolling a die).
¾ While the outcome of the experiment is not known in
advance, the set of all possible outcomes is known. This set is
the sample space, Ω.
¾ For example, when rolling a die Ω = {1, 2, 3, 4, 5, 6}. When
tossing a coin, Ω = {H, T}. When measuring life time of a
machine (years), Ω = {1, 2, 3, …}.
¾ A subset E⊂ Ω is known as an event.
¾ E.g., when rolling a die, E = {1} is the event that one appears
and F = {1, 3, 5} is the event that an odd number appears.
• Probability of an event
¾ If an experiment is repeated for a number of times which is
large enough, the fraction of time that event E occurs is the
probability that event E occurs, P{E}.
¾ E.g., when rolling a fair die, P{1} = 1/6, and P{1, 3, 5} =
3/6 = 1/2. When tossing a fair coin, P{H} = P{T} = 1/2.
¾ In some cases, events are not repeated many times.
¾ For such cases, probabilities can be a measure of belief
(subjective probability).
1
• Axioms of probability
(1) For E ⊂ Ω, 0 ≤ P{E} ≤ 1;
(2) P{ Ω} = 1;
(3) For events E1, E2, …, Ei, …, with Ei ⊂ Ω, Ei ∩ Ej = ∅, for all
⎧∞ ⎫ ∞
P
i and j, ⎨∪ Ei ⎬ = ∑ P{Ei } .
⎩ i =1 ⎭ i =1
• Implications
¾ The axioms of probability imply the following results:
o For E and F ⊂ Ω,
P{E “or” F} = P{E ∪ F} = P{E} + P{F} − P{E ∩ F} ;1
o If E and F are mutually exclusive (i.e., E ∩ F = ∅), then
P{E ∪ F} = P{E} + P{F};
o For E ⊂ Ω, let Ec be the complement of E (i.e., E ∪ Ec = Ω),
P{Ec} = 1 − P{E};
o P{∅} = 0.
• Conditional probability
¾ The probability that event E occurs given that event F has
already occurred is
P{E | F } =
1
P{E ∩ F }
.
P{F }
P{E ∩ F} = P{E “and” F} .
2
• Independent events
¾ For E and F ⊂ Ω, P{E ∩ F} = P{E|F}P{F} .
¾ Two events are independent if an only if
P{E ∩ F} = P{E}P{F}. That is, P{E|F} = P{E} .
• Example 1
¾ Suppose that two fair coins are tossed. What is the probability that
either the first or the second coin falls heads?
¾ In this example, Ω = {(H, H), (H, T), (T, H), (T,T)}. Let E (F) be the
event that the first (second) coin falls heads, E ={(H, H), (H, T)} and
F = {(H, H), (T, H)}, and E ∩ F ={H, H}. The desired probability is
P{E ∪ F } = P{E} + P{F } − P{E ∩ F } = 1/ 2 + 1/ 2 − 1/ 4 = 3 / 4 .
• Example 2
¾ When rolling two fair dice, suppose the first die is 3, what is the
probability the sum of the two dice is 7?
¾ Let E be the event that the sum of the two dice is 7, E = {(1, 6), (2,
5), (3, 4), (4, 3), (5, 2), (6, 1)}, and F be the event that the first die is
3, F= {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}. Then,
P{E ∩ F }
P{(3, 4)}
=
P{F }
P{(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}
1/ 36 1
=
= .
6 / 36 6
P{E | F } =
• Finding Probability by Conditioning
¾ Suppose that we know the probability of event B once event
A is realized (or not). We also know P{A}. That is, we know
P{B|A}, and P{B|Ac} and P{A}. What is P{B}?
3
¾ Note that
B = (A ∩ B) ∪ (Ac ∩ B) ⇒ P{B} = P{A ∩ B} + P{Ac ∩ B}.
¾ Therefore,
P{B} = P{B|A}P{A} + P{B|Ac}P{Ac}
= P{B|A}P{A} + P{B|Ac}(1 −P{A}) .
¾ Here we are finding P{B} by “conditioning” on A .
¾ In general, if the realization of B depends on a partition Ai of
Ω, A1 ∪ A2 ∪ … ∪ An = Ω, Ai ∩ Aj = ∅, i ≠ j ,
n
P{B} = ∑ P{B | Ai }P{ Ai }.
i =1
• Bayes’ Formula
¾ This follows from conditional probabilities. For two events,
P{ A | B} =
P{ A ∩ B}
P{B | A}P{ A}
=
.
P{B}
P{B | A}P{ A} + P{B | Ac }P{ Ac }
¾ With a partition Ai,
P{ Aj | B} =
P{ Aj ∩ B}
P{B}
=
P{B | Aj }P{ Aj }
n
∑ P{B | A }P{ A }
i =1
i
.
i
• Example 3
¾ Consider two urns. The first urn contains three white and seven black
balls, and the second contains five white and five black balls. We flip
a coin and then draw a ball from the first urn or the second urn
depending on whether the outcome was heads or tails.
¾ What is the probability that a white ball is selected?
4
P{W} = P{W|H}P{H} + P{W|T}P{T} = (3/10)(1/2) + (5/10)(1/2)
= 2/5 .
¾ What is the probability that a black ball is selected?
P{B} = 1 − P{W} = 3/5 .
¾ What is the probability that the coin has landed heads given that a
white ball is selected?
From Bayes’ formula, P{H | W } =
P{W | H }P{H } (3 /10)(1/ 2) 3
=
= .
P{W }
2/5
8
• Random Variables
¾ Consider a function that assigns real numbers to events
(outcomes) in Ω. Such real-valued function is a random
variable.
¾ E.g., when rolling two fair dice, define X as the sum of the
two dice. Then, X is a random variable with P{X = 2} =
P{(1,1)}=1/36, P{X = 3} = P{(1, 2), (2, 1)}=2/36=1/18, etc.
¾ E.g., the salvage value of a machine, S, is $1,500 if the
market goes up (with probability 0.4) and $1,000 if the
market goes down (with probability 0.6). Then, S is a
random variable with P{S = 1500} = 0.4 and P{S = 1000} =
0.6 .
¾ If the random variable can take on a limited number of
values. Then, this is a discrete random variable. E.g., the
random variable X representing the sum of two dice.
5
¾ If the random variable can take on an uncountable number of
values. Then, this is a continuous random variable. E.g., the
random variable H representing height of an AUB student.
¾ If X is a discrete random variable, the function
fX(x) = P{X = x} is the probability mass function (pmf) of X .
¾ The function FX(x) = P{X ≤ x} =
∑f
xi ≤ x
X
( xi ) is the cumulative
distribution function (cdf) of X.
¾ E.g., for the random variable S representing salvage value of
a machine above,
⎧0.6 if s = 1000
⎧0
⎪
⎪
f S ( s ) = ⎨0.4 if s = 1500 , FS ( s) = ⎨ 0.6
⎪0
⎪1
othewise
⎩
⎩
if s < 1000
if 1000 ≤ s < 1500 .
if s ≥ 1500
¾ For a continuous random variable, X, the cdf is defined based
on a function fX(x) called the density function, where
x
P{ X ≤ x} = FX ( x) =
∫
−∞
f X (t )dt .
Fact. For a discrete random variable
∑f
xi
X
( xi ) = 1. For a
∞
continuous random variable, ∫ f X ( x) = 1.
−∞
• Independent Random variables
¾ Two random variables X and Y are said to be independent if
P{ X ≤ x, Y ≤ y} = P{ X ≤ x}P{Y ≤ y} = FX ( x) FY ( y ) .
6
• Expectation of a random variable
¾ The expectation of a discrete random variable X is
E[ X ] = ∑ xi P{ X = xi } = ∑ xi f X ( xi ) .
xi
xi
¾ The expectation of a continuous random variable X is
∞
E[ X ] =
∫ xf
X
−∞
( x)dx .
¾ The expectation of a random variable is the value obtained if
the underlying experience is repeated for a number of times
which is large enough and the resulting values are averaged.
¾ The expectation is “linear.” That is, for two random
variables X and Y, E[aX + bY] = aE[X] + bE[Y] .
¾ The expectation of a function of random variable X, g(X), is
∞
E[ g ( X )] =
∫ g ( x) f
−∞
X
( x)dx .
¾ An important measure is the nth moment of X, n =1, 2, …
∞
E[ X ] =
n
∫x
n
f X ( x)dx
−∞
• Measures of variability
¾ The variance of a random variable X is
Var[ X ] = E[( X − E[ X ])2 ] = E[ X 2 ] − ( E[ X ]) .
2
¾ The standard deviation of a random variable X is
σ X = Var[ X ] .
7
¾ The coefficient of variation of X is CV[X] = σX/E[X] .
¾ The variance (standard deviation) measures the spread of the
random variable around the expectation.
¾ The coefficient of variation is useful when comparing
variability of different alternatives.
¾ Note that Var[aX+b] =a2 Var[X], for any real number a and
random variable a .
• Joint distribution
¾ The joint distribution function of two random variables is
FX ,Y (x , y ) = P {X ≤ x ,Y ≤ y }.
¾ If X and Y are discrete random variables then,
FX ,Y (x , y ) =
∑
P {X = i ,Y = j } =
i ≤x , j ≤ y
∑
i ≤x , j ≤ y
f X ,Y (i , j ) ,
where fX,Y (.) is the joint pmf of X and Y.
¾ If X and Y are continuous random variables then,
x
FX ,Y (x , y ) =
y
∫ ∫f
X ,Y
(x , y ) dxdy ,
−∞ −∞
where fX,Y (.) is the joint pdf of X and Y.
Fact. FX ,Y (x , y ) = FX (x )FY ( y ) if and only if (iff) X and Y
are independent.
8
• Covariance
¾ The covariance measures the dependence of two random
variables. For two random variables X and Y,
σ X Y = Cov[X ,Y ] = E [(X − E [X ])(Y − E [Y ])]
= E [ X Y ] − E [X ]E [Y ] ,
where,
∞ ∞
E [XY ] =
∫ ∫ xyf
X ,Y
(x , y ) dxdy ,
−∞ −∞
¾ If σ > 0 (<0), X and Y are said to be positively (negatively)
correlated.
¾ σxy = 0 iff X and Y are independent.
¾ Properties of covariance
Cov[ X , X ] = Var[ X ],
Cov[ X , Y ] = Cov[Y , X ],
Cov[ aX , Y ] = aCov[Y , X ],
Cov[ X , Y + Z ] = Cov[ X , Y ] + Cov[ X , Z ],
Cov[ X , Y ] = σ XY ≤ σ X σ Y .
σ XY
¾ The coefficient of correlation is defined as ρ XY = σ σ .
X Y
¾ Note that ρ XY ≤ 1
¾ Note that Var[ X + Y ] = Var[ X ] + 2Cov[ X , Y ] + Var[Y ] .
¾ If X and Y are independent, Var[X+Y] = Var[X] + Var[Y].
9
• The Bernoulli Random Variable
¾ Suppose an experiment can result in success with probability
p and failure with probability (w.p.) 1−p. We define a
Bernoulli random variable X as X =1 if the experiment
outcome is a success and X = 0, otherwise.
¾ The pmf of X is
⎧1 − p if x = 0
f X ( x) = P{ X = x} = ⎨
if x = 1 .
⎩p
¾ The expected value of X is E[X] = 0(1−p) + 1(p) = p.
¾ The second moment of X is E[X2] = 02(1−p) + 12(p) = p.
¾ The variance of X is Var[X] = E[X2] −(E[X])2
= p − p2 = p(1−p).
• The Binomial Random Variable
¾ Consider n independent trials, each of which can results in a
success w.p. p and failure w.p. 1−p .
¾ We define a Binomial random variable, X, as the number of
successes in the n trials.
¾ The pmf of X is defined as
⎛n⎞
f X (i ) = P{ X = i} = ⎜ ⎟ p i (1 − p ) n −i , i = 0,1,… n
⎝i⎠
⎛n⎞
n!
=
where ⎜ i ⎟ ( n − i )!i ! .
⎝ ⎠
10
0.3
0.2
fX( i)
0.1
0
0
1
n
∑f
i =1
3
4
5
i
1
¾ Note that
2
⎛n⎞
(i ) = ∑ ⎜ ⎟ p i (1 − p ) n −i = 1, i = 0,1,… n .
i =0 ⎝ i ⎠
n
X
Fact. Let Xi = 1, i =1, …, n, if the ith trial results in success
n
and Xi = 0, otherwise. Then X = ∑ X i .
i =1
¾ Note that Xi are independent and identically distributed (iid)
Bernoulli random variable with parameter p.
¾ Therefore,
n
n
i =1
i =1
E[ X ] = ∑ E[ X i ] = np, Var[ X ] = ∑ Var[ X i ] = np (1 − p ).
• Example 4
¾ A fair coin is flipped 5 times.
¾ What is the probability that two heads are obtained?
¾ The number of heads, X, is a binomial random variable with
parameters n = 5 and p = 0.5. Then, the desired probability is
P{X = 2} = [5!/(2!×3!)]×(0.5)2(0.5)3 = 0.313 .
11
• The Geometric Random Variable
¾ Suppose independent trials, each having a probability p of
being a success, are performed.
¾ We define the geometric random variable (rv) X as the
number of trials until the first success occurs.
¾ The pmf of X is defined as
f X (i ) = P{ X = i} = (1 − p )i −1 p , i = 1, 2,…
0.15
fX( i) 0.1
0.05
0
5
10
15
20
i
¾ Note that fX(i) defines a pmf since
∞
∑f
i =1
∞
X
(i ) = p ∑ (1 − p )
i −1
i =1
∞
= p ∑ (1 − p )i = p /[1 − (1 − p )] = 1.
i =0
¾ Let q = 1−p. The first two moments and variance of X are
∞
E[ X ] = ∑ iq i −1 p =
i =1
∞
1
,
p
E[ X ] = p ∑ i 2 q i −1 =
2
i =1
2− p
,
p2
Var[ X ] = E[ X 2 ] − ( E[ X ]) 2 =
1− p
.
p2
12
Example 5.
¾ When rolling a die repetitively, what is the probability that the first 6
appears on the sixth roll?
¾ Let X be the number of rolls until a 6 appears. Then, X is a geometric
rv with parameter p = 1/6, and the desired probability is P{X = 6} =
(5/6)5(1/6) = 0.0667 .
¾ What is the expected number of rolls until a 6 appears?
¾ E[X] = 1/p = 6.
• The Poisson Random Variable
¾ A rv, taking on values 0, 1, …, is said to be a Poisson random
variable with parameter λ > 0 if
f X (i ) = P{ X = i} = e
−λ
λi
i!
, i = 0,1,…
0.15
fX( i) 0.1
0.05
0
0
2
4
6
8
10
i
∞
¾ fX(i) defines a pmf since ∑ f X (i ) = e
i =0
−λ
∞
λi
∑ i ! = (e λ )(eλ ) = 1.
−
i =0
¾ The Poisson rv is a good model for demand, arrivals, and
certain rare events.
13
¾ The first two moments and the variance of X are
E[ X ] = λ ,
E[ X 2 ] = λ + λ 2 ,
Var[ X ] = E[ X 2 ] − ( E[ X ]) 2 = λ .
¾ Let X1 and X2 be two independent Poisson rv’s with means λ1
and λ2. Then, Z = X1 + X2 is a Poisson rv with mean λ1 + λ2 .
Example 6.
¾ The monthly demand for a certain airplane spare part of Fly High
Airlines (FHA) fleet at Beirut airport is estimated to be a Poisson
random variable with mean 0.5. Suppose that FHA will stock one
spare part at the beginning of March. Once the part is used, a new part
is ordered. The delivery lead time for a part is 2 months.
¾ What is the probability that the spare part will be used during March?
¾ Let X be the demand for the spare part. The desired probability is
P{X ≥ 1} = 1− e−λ = 1 − e−0.5 = 0.393 .
¾ What is the probability that FHA will face a shortage on this part in
March?
¾ The desired probability is P{X > 1} = 1 − P{X = 0} − P{X = 1}
= 1 − e−0.5 − 0.5e−0.5 = 0.09 .
• The Uniform Random Variable
¾ A rv X that is equally like to be “near” any point of an
interval (a, b) is said to have a uniform distribution.
14
¾ The pdf of X is
⎧ 1
, if a < x < b
⎪
f X ( x) = ⎨ b − a
⎪⎩0 ,
otherwise
b
¾ Note that fX(x) defines a pdf since
∫
a
b
1
dx = 1.
b
−
a
a
f X ( x) x = ∫
0.3
fX ( x) 0.2
0.1
0
0
1
2
3
4
5
6
x
¾ The cdf of X is
x
FX ( x) =
∫
−∞
⎧ 0, if x < a
⎪x−a
⎪
, if a ≤ x ≤ b
f X (t )dt = ⎨
−
b
a
⎪
1, otherwise
⎪⎩
¾ The first two moments of X are
b
b
x
b2 − a 2 b + a
,
E[ X ] = ∫ xf X ( x)dx = ∫
dx =
=
2(
)
2
b
a
b
a
−
−
a
a
b
b
x2
b3 − a 3 a 2 + ab + b 2
.
E[ X ] = ∫ x f X ( x)dx = ∫
dx =
=
3(
)
3
b
a
b
a
−
−
a
a
2
2
¾ The variance of X is E[X2] − (E[X])2 = (b − a)2 / 12 .
15
• The Exponential Random Variable
¾ An exponential rv with parameter λ is a rv whose pdf is
⎧λ e − λ x , if x ≥ 0
f X ( x) = ⎨
othewise
⎩ 0,
0.15
fX ( x) 0.1
0.05
0
0
5
10
15
20
x
¾ Note that fX(x) defines a pdf since
∞
∫
0
∞
f X ( x) x = ∫ λ e − λ x dx = −e − λ x
0
∞
0
= 1.
¾ The exponential rv is a good model for time between arrivals
or time to failure of certain equipments.
¾ The cdf of X is
x
x
0
0
x
FX ( x) = ∫ f X (t )dt = ∫ λ e− λt dt = −e− λ x = 1 − e− λ x , x ≥ 0 .
0
¾ A useful property of the exponential distribution is that
P{X > x} = e−λx .
16
¾ The first two moments and the variance of X are
E[ X ] =
1
λ
E[ X 2 ] =
,
2
λ2
,
Var[ X ] = E[ X 2 ] − ( E[ X ]) 2 =
1
λ2
.
Preposition. The exponential distribution has the memoryless
property. I.e., P{ X > t + u | X > t} = P{ X > u} .
Proof.
P{ X > t + u | X > t} =
P{ X > t + u , X > t} P{ X > t + u}
=
P{ X > t}
P{ X > t}
e− λ (t +u )
= − λt = e − λu = P{ X > u} .
e
¾ The memoryless property allows developing tractable
analytical models with the exponential distribution. It makes
the exponential distribution very popular in modeling.
Preposition. Let X1 and X2 be two independent exponential
random variables with parameters λ1`and λ2 . Let X = min(X1, X2).
Then, X is an exponential random variable with parameter λ1`+ λ2.
Proof. P{X > x} = P{X1 > x, X2 > x} = P{X1 > x}P{X2 > x}
−λ x −λ x
− ( λ + λ2 ) x
.
= e 1 e 2 =e
17
Example 7.
¾ The amount of time one spends in the bank is exponentially
distributed with mean 10 minutes.
¾ A customer arrives at 1:00 PM. What is the probability that the
customer will be in the bank at 1:15 PM?
¾ Let X be the time the customers spends in the bank. Then, X is
exponentially distributed with parameter λ = 1/10. The desired
probability is P{X > 15} = e−15λ = e−15/10 = 0.223.
¾ It is now 1:20 PM and the customer is still in the bank? What is the
probability that the customer will be in the bank at 1:35 PM?
¾ 0.223 (by the memoryless property).
• The Normal Random Variable
¾ We say that a random variable X is a normal rv with
parameters μ and σ > 0 if it has the following pdf:
2
2
e − ( x − μ ) /(2σ )
, x ∈ ( −∞, ∞ ) .
f X ( x) =
2π σ
0.15
fX ( x) 0.1
0.05
0
5
0
5
10
15
x
18
¾ Note that fX(x) defines a pdf. With a change of variable
∞
z = (x − μ)/σ and using the fact that
∞
∞
2
∫
e− z
2
/2
dz = 2π ,
−∞
∞
2
e − ( x − μ ) /(2σ )
1
− z2 / 2
f
(
x
)
dx
dx
e
dz =1.
=
=
∫−∞ X
∫−∞ 2π σ
∫
2π −∞
¾ The normal rv is a good model for quantities that can be seen
as sums or averages of a large number of rv’s.
x
¾ The cdf of X, FX ( x ) =
∫
−∞
f X (t ) dt , has no closed-form.
¾ The first two moments of and variance of X are
E[ X ] = μ ,
E[ X 2 ] = σ 2 + μ 2 ,
Var[ X ] = σ 2 .
Fact. If X is a normal rv, then Z = (X − μ)/σ is a “standard
normal r.v.” with parameters 0 and 1.
Proof. Note that
P{Z < z} = P{
X −μ
σ
< z} = P{ X < μ + σ z} =
μ +σ z
∫
−∞
z
2
2
e − (t − μ ) /(2σ )
dt .
2πσ
2
e−u / 2
du , which is the cdf of
Let u = (t − μ)/σ, then P{Z < z} = ∫
2π
−∞
the standard normal.
¾ This fact implies that X = μ + σ Z .
19
¾ The cdf of X, FX(x), is evaluates through the cdf of Z, FZ(z),
which is often tabulated,
P{ X < x} = P{Z <
x−μ
⎛ x−μ ⎞
} ⇒ FX ( x) = FZ ⎜
⎟ .
σ
⎝ σ ⎠
Proposition If X1 and X2 are two independent normal rvs with
means μi and variances σi2 , i =1,2, then Z = X1 + X2 is normal
with mean μ1 + μ2 and variance σ12 + σ22 .
Theorem (central limit theorem). If Xi, i =1, 2,…, n, are iid
rv’s with mean μ and variance σ 2. Then, for n large
n
enough, ∑ X i is normally distributed with mean nμ and
i =1
variance nσ2 .
Example 8.
¾ The height of an AUB male student is a normal rv with mean 170 cm and
standard deviation 8 cm.
¾ What is the probability that the height of an AUB student is less than 180
cm?
¾ Let X be the height of the student. Then, the desired probability is
P{X < 180} = P{Z < (180 −170)/8} = P{Z < 1.25} = 0.894.
20

Download Report

Probability and Random Variable Primer • Sample space and

Paperzz.com

Your Paperzz