Law of Total Probability - Full

Math- 435: Handout 1 – Review of Math 335
(C1) Sampling without Replacement
We select a card from a deck of n different cards then we remove it from the deck.
Next, we select another card from the (n − 1) remaining cards and remove this one
two. Clearly, there are n × (n − 1) ways to do this. In general, if we repeat this process
r times, the possible number of outcomes will be:
n × (n − 1) × (n − 2) × ... × (n − r + 1).
This in fact, is what we call Pn,r .
It becomes trivial that: Pn,n = n!
Note that Pn,r × (n − r)! = n × (n − 1) × (n − 2) × ... × (n − r + 1) × (n − r)! = n!
Hence,
n!
Pn,r =
(n − r)!
(E3) In Maryland’s lottery, players pick six different integers between 1 and 49, order of
selection is irrelevant. Six numbers among 49 are randomly selected as winning numbers. A player hits the jackpot if she/he matches all six numbers. The second big prize
is rewarded to person(s) matching five numbers and the third prize goes to person(s)
matching 4. Find the probabilities that: (a) Sam’s ticket wins the jackpot. (b) Sam’s
ticket wins the second prize. (c) Sam’s ticket wins the third prize.
(E4) From an ordinary deck of 52 cards, seven are drawn at random and without replacement. What is the probability that at least one of the cards is a king?
(E5*) What is the probability that a poker hand is a full house? In the game of poker, a
hand of 5 randomly selected cards is called full if there are three cards are from one
denomination and the other 2 cards are from another denomination. For example, a
hand of three kings and two 3s.
(E6) The mathematics department consists of 25 full professors, 15 associate professors, and
35 assistant professors. A committee of 6 is selected at random from each faculty of the
department. (a) Find the probability that all members of the committee are assistant
professors. (b) What is the probability that the committee of 6 is composed of 2 full
professors, 3 associate professors and 1 assistant professor?
(E7) In a hand of 13 cards chosen from an ordinary deck of 52, find the probability that the
hand is composed of exactly 3 clubs, 4 diamonds, 4 hearts and 2 spades.
Law of Total Probability
Recall:
P (A) =
k
X
j=1
1
P (Bj )P (A|Bj )
(1)
P (A|C) =
k
X
P (Bj |C)P (A|Bj ∩ C)
(2)
j=1
(I1) Two boxes contain long bolts and short bolts. Suppose that one box contains 60 long
bolts and 40 short bolts, and the other box contains 10 long bolts and 20 short bolts.
Suppose also that one box is selected at random and a bolt is then selected at random
from that box. What is the probability that this bolt is long?
(F2) (Mood, Graybill and Boes, 1974) There are five urns, and they are numbered 1 to 5.
Each urn contains 10 balls. Urn i has i defective balls and 10 − i non-defective balls,
for i = 1, ..., 5. For example, urn 3 has 3 defective balls and 7 non-defective balls.
Consider the following random experiment:
First, an urn is selected at random, and then a ball is selected at random from the selected urn. (The experimenter does not know which urn is selected.) We are interested
in two questions:
(1) What is the probability that a defective ball is selected?
(2) If we have already selected a ball and noted that it is defective, what is the probability that it came from urn 5?
(F3) (DeGroot, Schervish, 2002) Three machines M1 , M2 , and M3 were used for producing
a large batch of similar manufactured items. Suppose that 20 percent of the items
were produced by machine M1 , 30 percent of the items were produced by machine M2 ,
and 50 percent by machine M3 . Suppose further that 1 percent of the items produced
by machine M1 are defective, that 2 percent produced by machine M2 are defective,
and 3 percent produced by machine M3 are defective. Finally, suppose that one item
is selected at random from the entire batch, and it is found to be defective. Find the
probability that this item was produced by machine M2 .
(G2) Let’s assume that the cdf of a random variable x is as follows:


x<2 
 0
(x − 2)4 /16 2 ≤ x < 4
F (x) =


1
otherwise
(a) Graph F (x).
(b) Calculate the following probabailities:
P (X ≤ 3), P (X < 3), P (X > 2.5), P (1.5 < X ≤ 3.4).
(G3) Suppose that a bus arrives at the station every day between 10:00 A.M. and 10.30
A.M., at random. Let X be the arrival time; find the cdf of X and sketch its graph.
(G4) While walking in a certain park, the time X, in minutes, between seeing two people
smokinghas a probability density
function (pdf ) of the following form:
−x
λxe
x>0
f (x) =
0
otherwise
2
(a) Calculate the value of λ.
(b) Find the cdf of X.
(c) What is the probability that Sam, who has just seen a person smoking, will see
another person smoking in 2 to 5 minutes? In at least 7 minutes?
1 − λe−λ(x+y) x > 0, y > 0
(I3) Let λ be a positive number. Let, F (x, y) =
0
otherwise
Determine whether F could be a joint-pdf of the two variables X and Y ?
(J1) Let X and Y be two independent random variables. Let g : R → R and h : R → R
be two real-values functions. Prove that g(X) and h(Y ) are also independent. That
is, prove the following statement:
P (g(X) ≤ a, h(Y ) ≤ b) = P (g(X) ≤ a) × P (h(Y ) ≤ b)
for any two real numbers a and b.
(J2) Stores A and B, which belong to the same owner, are located in two different towns.
If the probability density function of the weekly profit of each store, in thousands of
Dollars, is given by:
x/4 1 < x < 3
f (x) =
0
otherwise
and the profit of one store is independent of the other, what is the probability that
next week one store makes at least $ 500 more than the other one?
(K1) Let the joint pmf of X and Y be given by:
1
(x + y) x = 0, 1, 2, y = 1, 2
15
f (x, y) =
0
otherwise
Find f (x|y) and P (X = 0|Y = 2).
(L1) Let’s reconsider
following joint-pdf
from handout 6:
21 the
2
2
x
y
x
≤
y
≤
1
4
f (x, y) =
0
otherwise
(a) Find the conditional pdf g(y|x).
(b) Calculate P (Y ≥ 41 |X = 21 ).
(c) Calculate P (Y ≥ 34 |X = 21 ).
(L3) Let the random
variable X follow a uniform distribution on the interval (0, 1). That
1 0≤x≤1
is, f (x) =
0 otherwise
Also, let f (y|x) follow a Binomial distribution with n (the number of experiments) and
x as the probability of success! That is:
3
n
y
xy (1 − x)n−y y = 0, 1, ..., n
g(y|x) =
0
otherwise
Find g2 (x|y) or the pdf of X|y.
Note: This is very close to what Thomas Bayes did in his hallmark paper:
Essay towards solving a problem in the doctrine of chances which was appeared at
Philosophical Transactions of the Royal Society of London in 1764.
(M4) Let X has
the3 following pdf:
4x 0 < x < 1
f (x) =
0
otherwise
Find the pdf of Y = 1 − 3X 2
(M5) Let X has
pdf :
the following
2
3(1 − x) 0 < x < 1
f (x) =
0
otherwise
Find the pdf of Y = 10e5X
(N3) Let X has
following pdf : (Poisson)
the
e−λ λx
x = 0, 1, 2, ...
x!
f (x) =
0
otherwise
(a) Find E(X).
(b)* Find V ar(X)
(N4) Let X has
following pdf : (Cauchy)
the
c
−∞ < x < ∞
2
1+x
f (x) =
0
otherwise
(a) Find c.
(b) Show that E(X) does not exist.
(N5) Prove that the following relation holds for any continuous random variable such as X:
Z ∞
Z ∞
E(X) =
[1 − F (t)]dt −
F (−t)dt.
0
0
Note that, a direct consequence of the above relation is the following:
Z
∞
Z
P (X > t)dt −
E(X) =
0
∞
P (X ≤ −t)dt.
0
(O2) Find the moment generating function for a random variable that follows Bernoulli
distribution. Find E(X) and V ar(X).
4
(O3) Find the moment generating function for a random variable that follows Binomial
distribution. Find E(X) and V ar(X).
(O4) Find the moment generating function for a random variable that follows exponential
distribution with parameter λ > 0. Find E(X) and V ar(X). (note that the random
variable X is not bounded from the above.)
(O5) Find the moment generating function for a random variable that follows Poisson distribution with parameter λ > 0.
(P1) Let X andY have a continuous distribution
with joint pdf :
x + y 0 ≤ x, y ≤ 1
f (x, y) =
0
otherwise
Find Cov(X, Y ).
(P2) Prove Shwarz inequality:
[E(U V )]2 ≤ E(U 2 )E(V 2 )
hint: See page 216.
(R3) Let X andY have joint density:
4x2 y + 2y 5 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0
otherwise
Find E(X|Y = y).
(R4) Let X andY be continuous random variables
with the following joint pdf :
−y
e
0 < x < 1, y > −∞
f (x, y) =
0
otherwise
Find E(X|Y = 2).
(Q1) Convergence in Probability Let X1 , X2 , ..., Xn be a sequence of random variables.
This sequence converges to a given number b if the probability distribution of Xn
becomes more and more concentrated around b as n → ∞.
Formally, this says: The sequence converges to b in probability if
∀ > 0:
lim P r(|Xn − b| < ) = 1
n→∞
(3)
P
We write this as Xn −
→b
(Q2) Law of Large Numbers. Let X1 , ..., Xn form a random sample from a distribution
for which the mean is µ and the variance exists. Let X̄n represent the sample mean.
Then:
P
X̄n −
→ µ.
5
(4)
(Q3) Consider flipping a coin for which the probability of heads is p. Let Xi denote the
outcome of a single toss (0 or 1). Hence:
E(Xi ) = P r(Xi = 1) = p.
Note that, the fraction of heads after n tosses is X̄n . According to the law of large
numbers, X̄n converges in probability to p. This does not mean that X̄n will numerically
equal p. It means that, when n is large, the distribution of X̄n is tightly
concentrated around p. Now, suppose p = 12 . How large n should be so that
P r(0.4 ≤ X̄n ≤ 0.6) ≥ 0.7?
1
i)
Note that: E(X̄n ) = 0.5, and V ar(X̄n ) = V ar(X
= p(1−p)
= 1/2×1/2
= 4n
. Hence, by
n
n
n
Chebyshev’s inequality:
P r(0.4 ≤ X̄n ≤ 0.6) = P r(|X̄n − 0.5| ≤ 0.1) = P r(|X̄n − µ| ≤ 0.1) =
1
25
1 − P r(|X̄n − µ| > 0.1) ≥ 1 − 4n×(0.1)
2 = 1 − n .
We want this to be larger than 0.7. In other words, 1 − 25
≥ 0.7. This is accomplished
n
when n ≥ 84.
6
(Q4) Convergence with Probability 1. Convergence in probability is different than
convergence with probability 1:
Let X1 , ..., Xn be a sequence of random variables. We say the sequence converges to b
with probability 1 if:
P r( lim Xn = b) = 1
(5)
n→∞
We are not going to study this type of convergence. However, you should be aware that
convergence w.p.1 is usually referred to as strong convergence whereas, convergence in
probability is often refered to as weak convergence. Similar to Q2 you can write:
P r( lim X̄n = µ) = 1
(6)
n→∞
The latter is the strong law of large numbers. The statement in Q2 is the weak law of
large numbers.
w.p.1
(Q5) Let Y ∼ U nif orm[0, 1], and let Xn = Y n . We want to prove that Xn −−−→ 0.
If 0 ≤ y < 1, then limn→∞ y n = 0.
w.p.1
Hence, P r(Xn → 0) = P r(Y n → 0) ≥ P (0 ≤ y < 1) = 1, i.e., P r(Y n −−−→ 0)
(S1) Here are a couple of examples for Poisson’s approximation to Binomial :
(a) Let X be the number of winning tickets among the California lottery tickets sold
in Bakersfield during one week. Then calling winning ticket successes, we have
that X is a binomial random variable. Since n, the total number of tickets sold in
Bakersfield, is large, p the probability of winning is small, and the average number
of tickets sold is large, then X is approximately a Poisson random variable.
(b) Let X be the number of spikes or firing activities of a particular neuronal cell
in a 500 millisecond span of time. Then calling the spike occurrence success,
we have that X is a binomial random variable. Since n, the total number of
spike occurrences is very large, and p the probability of a spike activity in a one
millisecond unit of time is relatively small, then X the number of spike occurrences
follows a Poisson random variable.
(S2) Every week the average number of wrong-number phone calls received by a certain
mail-order house is seven. What is the probability that they will receive:
(a) Two wrong calls tomorrow.
(b) At least one wrong call tomorrow?
(S3) The atoms of a radioactive element are randomly disintegrating. If every gram of this
element, on average, emits 3.9 alpha particles per second, what is the probability that
during the next second the number of alpha particles emitted from 1 gram is:
(a) At most 6.
7
(b) At least 2.
(c) At least 3 and at most 6.
(S4) Suppose that children are born at a Poisson rate of five per day in a certain hospital.
What is the probability that:
(a) At least two babies are born during the next six hours.
(b) No babies are born during the next two days.
(U1) The Central Limit Theorem Let X1 , X2 , ... be a sequence of iid random variables
whose mgf s exist in a neighborhood of 0 (that is, MX (t) exists for |t| < h, for some
positive h). Let E(Xi )P
= µ and V arXi = σ 2 > 0. (Both µ and
σ 2 are finite since mgf
√
n
exists.) Define X̄ = n1 i=1 Xi . Let Gn (x) denote the cdf of n(X̄ − µ)/σ. Then, for
any x, −∞ < x < ∞,
Z x
1
2
√ e−y /2dy
lim Gn (x) =
n→∞
2π
−∞
Proof (Cassella and Berger, 1990)
√
2
We want to show that for |t| < h, the mgf of n(X̄ − µ)/σ converges to et /2, the
mgf of a N (0, 1) random variable.
Define Yi = (Xi − µ)/σ, and let ΨY (t) denote the moment generating function of the
Yi s, which exists for |t| < σh. Since:
√
n
1 X
n(X̄n − µ)
=√
Yi
σ
n i=1
we have, from the properties of mgf s that:
Ψ√n(X̄n −µ)/σ (t) = ΨPni=1 Yi /√n (t)
√
= ΨPni=1 Yi (t/ n)
t n
= ΨY ( √ )
n
We can now expand ΨY ( √tn ) using Taylor series around 0. We have:
√
∞
X
t
(t/ n)k
(k)
ΨY ( √ ) =
ΨY (0)
k!
n
k=0
(k)
where ΨY (0) = (dk /dtk√
)ΨY (t)|t=0 . Since mgf s exist for |t| < h, the power series
expansion is valid if t < nσh.
(0)
(1)
(2)
Using the facts that ΨY = 1, ΨY = 0, and ΨY = 1, we have:
√
t
(t/ n)2
t
ΨY ( √ ) = 1 +
+ Resid( √ ),
2!
n
n
8
where Resid( √tn ) is the remainder term in the Taylor expansion:
√
P
(k)
(t/ n)k
Resid( √tn ) = ∞
.
k=3 ΨY (0)
k!
It is easy to show that for fixed t 6= 0, we have:
Resid( √tn )
√
lim
= 0.
n→∞ (t/ n)2
since t is fixed, we also have
Resid( √tn )
t
√ 2 = lim nResid( √ ) = 0
lim
n→∞ (1/ n)
n→∞
n
and the last relation is also true at t = 0 since Resid( √0n ) = 0. Therefore, for any fixed
t, we can write:
lim
n→∞
√
h
(t/ n)2
t in
t n
+ Resid( √ )
ΨY ( √ ) = lim 1 +
n→∞
2!
n
n
h
1 2
t in
= lim 1 +
t /2 + nResid( √ )
n→∞
n
n
2 /2
= et
which is the mgf of a N (0, 1).
An alternative proof utilizes the notion of characteristic
funstions which are defined as:
√
Φx (t) = E(eitx ) where i is the complex number −1. The difficulty here is that to obtain the charactersitic function one needs to calculate complex integrations. However,
the beauty of the alternative proof is in that characteristic functions always exist and
they completely determine thier associated distributions whereas, not all distributions
have mgf s defined for them. The central limit theorem proof based on characterisitc
functions relies upon the notion of convergence of characterisitc functions defined as
follows:
Let Xk , k = 1, 2, ... be a sequence of random variables, each with characteristic function
Φxk (t). Furthermore, suppose that :
lim Φxk (t) = Φx (t),
(7)
k→∞
for all t in a neighborhood of 0 and Φx (t) is a characteristic function. Then, for all X
where FX (x) is continuous,
lim Fxk (x) = Fx (x)
(8)
k→∞
which is the familiar definition of convergence in distribution.
(I) Negative-Binomial
Let X and Y be two independent random variables such that X ∼ N eg − Binom(r1 , p)
and Y ∼ N eg − Binom(r2 , p). Let Z = X + Y .
9
(a) Obtain the mean and the variance of X.
(b) Show that Z ∼ N eg − Binom(r1 + r2 ).
(II) Normal
Let X ∼ N ormal(µ, σ). Obtain ΨX (t) the moment generating function of X
(III) Normal
Suppose that a random sample of size n is to be taken from a normal distribution with
mean µ and variance 16. Determine the minimum value for n such that
P r(|X̄ − µ| ≤ 1) ≥ 0.95.
(9)
(IV) Lognormal
Let X ∼ N ormal(0, τ ). Prove that Y = eX has the following density:
(ln y)2 ) 1
1
exp −
f (y) = √
2τ 2
y
2πτ
(10)
for y > 0 and where τ > 0 is unknown. We say that Y ∼ Log − normal(τ ).
(V) Consider a situation involving a server, e.g., a cashier at a fast food restaurant, an
automatic bank teller machine, a telephone exchange, etc. Units typically arrive for
service in a random fashion and form a queue when the server is busy. It is often the
case that the number of arrivals at the server for some specific unit of time t, can be
modeled by a Poisson(λt) and is such that the number of arrivals in non-overlapping
periods are independent. One can show that λt is the average number of arrivals during
a time period of length t, and so λ is the rate of arrivals per unit of time.
Now, suppose telephone calls arrive at a help line at the rate of two per minute. A
Poisson process provides a good model.
(a) What is the probability that five calls arrive in the next two minutes?
(b) What is the probability that five calls arrive in the next 2 minutes and then five
more calls arrive in the following 2 minutes?
(c) What is the probability that no calls will arrive during a 10-minute period?
(VI) Let N (t); t ≥ 0 be a Poisson process with parameter λ > 0. Let 0 < s < t, and let j
be a positive integer.
(a) Compute P r(N (s) = j|N (t) = j).
(b) Calculate P r(N (s) = 1|N (t) = 1).
(c) Does answer to part a depends on the values of the parameter λ? Why?
10