2. Conditional Probability
1
Andy Guo
Conditional Probability
•
Pr( A | B ) is defined as the conditional probability of the event A given that
event B has occurred.
• As shown in the figure, Pr( A | B ) is the proportion of the total probability
Pr(B) that is represented by the probability Pr(AB).
• If A and B are two events such that Pr(B) > 0, then Pr( A | B ) =
2
Pr( AB)
Pr( B )
Andy Guo
Example 2.1.1: Rolling Dice
• Suppose that two dice were rolled and it was observed that the sum T of the
two numbers was odd. Determine the probability that T was less than 8.
• Let A be the event that T < 8 and let B be the event that T is odd. Then AB is
the event that T is 3, 5, or 7.
Pr( AB) =
Pr( B ) =
2 4 6 1
+ + =
36 36 36 3
2 4 6 4 2 1
+ + + + =
36 36 36 36 36 2
∴ Pr( A | B ) =
Pr( AB) 2
=
Pr( B ) 3
3
Andy Guo
Example 2.1.2: Rolling Dice Repeatedly
• Suppose that two dice are to be rolled repeatedly and the sum T of the two
numbers is to be observed for each roll. Determine the probability p that the value
T = 7 will be observed before the value T = 8 is observed.
• The problem can be restated as follows: Given that the outcome of the experiment
is either T = 7 or T = 8, determine the probability p that the outcome is actually T =
7.
• Let A be the event that T = 7 and let B be the event that the value of T is either 7 or
8, then AB = A and
6
Pr( AB ) Pr( A)
6
p = Pr( A | B ) =
=
= 36 =
Pr( B ) Pr( B ) 6 + 5 11
36 36
4
Andy Guo
Theorem
• Suppose that A1, A2 ,…, An are events such that Pr(B) > 0 and
Pr( A1 A2 An −1 ) > 0. Then
An ) = Pr( A1 ) Pr( A2 | A1 ) Pr( A3 | A1 A2 )
Pr( A1 A2
Pr( An | A1 A2
An −1 )
• Proof: The right side of the above equation is equal to
Pr( A1 ) ⋅
Pr( A1 A2 ) Pr( A1 A2 A3 )
⋅
Pr( A1 )
Pr( A1 A2 )
Pr( A1 A2
Pr( A1 A2
An )
= Pr( A1 A2
An −1 )
An )
• Based on the theorem, we have
Pr( AB) = Pr( B) Pr( A | B )
Pr( AB) = Pr( A) Pr( B | A)
• Conditional probabilities behave just like probabilities
Pr( A1 A2
An | B ) = Pr( A1 | B ) Pr( A2 | A1 B )
Pr( An | A1 A2
An −1 B )
5
Andy Guo
Example 2.1.5: Selecting Four Balls
• Suppose that four balls are selected one at a time, without replacement, from
a box containing r red balls and b blue balls ( r ≥ 2, b ≥ 2). Determine the
probability of obtaining the sequence of outcomes red, blue, red, blue.
• Let Rj denote the event that a red ball is obtained on the jth draw and Bj
denote the event that a blue ball is obtained on the jth draw, then
Pr( R1B2 R3 B4 ) = Pr( R1 ) Pr( B2 | R1 ) Pr( R3 | R1 B2 ) Pr( B4 | R1 B2 R3 )
=
6
r
b
r −1
b −1
⋅
⋅
⋅
r + b r + b −1 r + b − 2 r + b − 3
Andy Guo
Independent Events
• Two events A and B are independent if Pr( AB) = Pr( A) Pr( B).
• Two events A and B are independent if Pr( A | B) = Pr( A) and Pr( B | A) = Pr( B).
• If event A and B are independent, then events A and B c are independent.
Proof:
Pr( AB c ) = Pr( A) − Pr( AB) = Pr( A) − Pr( A) Pr( B) = Pr( A)[1 − Pr( B)]
= Pr( A) Pr( B c )
• The k events A1,…, Ak are independent if, for every subset Ai1 ,…, Ai j of j of
these events (j = 2, 3,…, k),
Pr( Ai 1 ∩
∩ Ai j ) = Pr( Ai 1 )
Pr( Ai j )
7
Andy Guo
Example 2.2.1: Machine Operation
• Suppose that two machines 1 and 2 in a factory are operated independently
of each other. Let A be the event that machine 1 will become inoperative
during a given 8-hour period; let B be the event that machine 2 will
become inoperative during the same period; and suppose that Pr(A) = 1/3
and Pr(B) = 1/4.
• Determine the probability that at least one of the machines will become
inoperative during the given period.
• Pr( AB) = Pr( A) Pr( B ) = 1 ⋅ 1 = 1
3 4 12
1 1 1 1
Pr( A ∪ B ) = Pr( A) + Pr( B ) − Pr( AB) = + − =
3 4 12 2
8
Andy Guo
Example 2.2.3: Pairwise Independence
• Consider an experiment in which the sample space S contains four outcomes
{s1, s2, s3, s4}, and suppose that the probability of each outcome is 1/4. Let
three events A, B, C be defined as follows:
A = {s1, s2}, B = {s1, s3} and C = {s1, s4}
• Then AB = AC = BC = ABC = {s1}. Hence,
Pr( A) = Pr( B ) = Pr(C ) = 1
2
Pr( AB) = Pr( AC ) = Pr( BC ) = Pr( ABC ) = 1
4
• Since Pr( ABC ) ≠ Pr( A) Pr( B ) Pr(C ) , A, B, C are not independent.
(But they are pairwise independent)
9
Andy Guo
Examples 2.2.4 &2.2.7: Inspecting Items
• Suppose that a machine produces a defective item with probability p and produces a
nondefective item with probability q = 1 – p.
• Suppose six items produced are selected at random. Determine the probability that
exactly two of the six items are defective.
⎛ 6⎞
– The probability is equal to ⎜ ⎟ p 2 q 4
⎝ 2⎠
• Suppose that items produced are selected at random and inspected one at a time.
Determine the probability pn that exactly n items ( n ≥ 5) must be selected to obtain five
defectives.
⎛ n − 1⎞ 4 n − 5
⎟p q
– The probability of obtaining 4 defectives among the first n-1 items is ⎜
⎝ 4 ⎠
⎛ n − 1⎞ 5 n −5
⎟p q
– So, pn = ⎜
⎝ 4 ⎠
10
Andy Guo
Example 2.2.8: People versus Collins
• In a criminal case the witness claimed to see a young woman with blond
hair in a ponytail fleeing in a yellow car driven by a black man with a beard.
• A couple meeting the description was arrested but no physical evidence was
found. A randomly selected couple from n couples would possess the
described characteristics as about p = 8.3 × 10 −8.
• Determine the conditional probability that there was also a second couple
such as the defendants.
• Let A be the event that at least one couple has the characteristics, B be the
event that at least two couples have the characteristics, and C be the event
that exactly one couple has the characteristics.
11
Andy Guo
Example 2.2.8: People versus Collins
• Assuming that the n couples are mutually independent,
Pr( Ac ) = (1 − p) n ,
Pr( A) = 1 − (1 − p ) n
Pr(C ) = np (1 − p ) n−1
• Since A = B ∪ C with B and C disjoint, we have
Pr( B ) = Pr( A) − Pr(C ) = 1 − (1 − p ) n − np (1 − p ) n −1
• Since B ⊂ A, it follows that
Pr( B | A) =
Pr( B ) 1 − (1 − p ) n − np (1 − p) n −1
=
Pr( A)
1 − (1 − p) n
• With p = 8.3 × 10 −8 and n = 8,000,000, the conditional probability is
0.2966. Such a value suggests that there is a reasonable chance that
these was another couple meeting the same characteristics.
12
Andy Guo
The Collector’s Problem
• Suppose that each package of bubble gum contains the picture of a baseball
player; that the pictures of r different players are used. Determine the probability
p that a person who buys n packages of gum ( n ≥ r ) will obtain a complete set of
r different pictures.
• Let Ai denote the event that the picture of player i is missing from all n packages.
⎛ r − 1⎞
Pr( Ai ) = ⎜
⎟
⎝ r ⎠
n
• Now consider any two players i and j. The probability that neither the picture of
player i nor the picture of players j will be obtained in any package.
⎛ r − 2⎞
Pr( Ai A j ) = ⎜
⎟
⎝ r ⎠
n
13
Andy Guo
The Collector’s Problem
• Now consider any three players i, j and k are missing from all n packages.
⎛ r − 3⎞
Pr( Ai A j Ak ) = ⎜
⎟
⎝ r ⎠
n
• The probability that the picture of at least one player is missing:
⎛ r ⎞ ⎛ r − 1 ⎞ ⎛ r ⎞⎛ r − 2 ⎞
Pr ⎜ ∪ Ai ⎟ = r ⎜
⎟ +
⎟ − ⎜ ⎟⎜
⎝ i =1 ⎠ ⎝ r ⎠ ⎝ 2 ⎠⎝ r ⎠
n
r −1
⎛ r ⎞⎛
j⎞
= ∑ (−1) j +1 ⎜ ⎟⎜1 − ⎟
j =1
⎝ j ⎠⎝ r ⎠
n
•
14
n
⎛ r ⎞⎛ 1 ⎞
+ (−1) r ⎜
⎟⎜ ⎟
⎝ r − 1⎠⎝ r ⎠
n
r
r −1
⎛ r ⎞⎛
j⎞
p = 1 − Pr ⎛⎜ ∪ Ai ⎞⎟ = ∑ (−1) j ⎜ ⎟⎜1 − ⎟
⎝ i =1 ⎠ j = 0
⎝ j ⎠⎝ r ⎠
n
Andy Guo
Law of Total Probability
k
• B1,…, Bk form a partition of S if B1,…, Bk are disjoint and ∪ Bi = S .
i =1
• Suppose that the events B1,…, Bk form a partition of the space S and Pr( B j ) > 0
k
for j = 1,…, k. Then, for every event A in S, Pr( A) = ∑ Pr( B j ) Pr( A |B j )
j =1
• Proof: As shown in the figure, we can write
A = ( B1 A) ∪ ( B2 A) ∪
∪ ( Bk A)
k
k
j =1
j =1
Pr( A) = ∑ Pr( B j A) = ∑ Pr( B j ) Pr( A | B j )
15
Andy Guo
Example 2.3.3: Selecting Bolts
• Two boxes contain long bolts and short bolts. Suppose that one box contains 60
long bolts and 40 short bolts, and that the other box contains 10 long bolts and
20 short bolts.
• Suppose that one box is selected at random and a bolt is then selected from the
box. Determine the probability that this bolt is long.
• Let B1 be the event that the first box is selected; Let B2 be the event that the
second box is selected; and let A be the event that a long bolt is selected. Then
Pr( A) = Pr( B1 ) Pr( A | B1 ) + Pr( B2 ) Pr( A | B2 )
1 3 1 1 7
= ⋅ + ⋅ =
2 5 2 3 15
16
Andy Guo
Example 2.3.5: Achieving a High Score
• Suppose that a person plays a game in which his score must be one of the 50
numbers 1, 2,…, 50. The first time he plays the game, his score is X. He then
continues to play the game until he obtains another score Y such that Y ≥ X .
Determine the probability of the event A that Y = 50.
• Let Bi be the event that X = i. Then
1
Pr( A | Bi ) = Pr(Y = 50 | Bi ) =
51 − i
• Since the probability of each of the 50 values of X is 1/50, it follows that
Pr(Bi) = 1/50 for all i and
50 1
1
1
1 1
1
Pr( A) = ∑ ⋅
= (1 + + + + ) = 0.09
50
2 3
50
i =1 50 51 − i
17
Andy Guo
Bayes’ Theorem
• Let the events B1,…, Bk form a partition of the space S such that the
prior probabilities Pr(Bj) > 0 for j = 1,…, k, and let A be an event such
that Pr(A) > 0. Then, for i=1,…, k, the posterior probabilities
Pr( Bi | A) =
Pr( Bi ) Pr( A | Bi )
k
∑ Pr( B j ) Pr( A | B j )
j =1
• Proof: Pr( Bi | A) =
Pr( Bi A)
Pr( Bi ) Pr( A | Bi )
= k
Pr( A)
∑ Pr( B j ) Pr( A | B j )
j =1
• Conditional version of Bayes’ theorem
Pr( Bi | AC ) =
Pr( Bi | C ) Pr( A | Bi C )
k
∑ Pr( B j | C ) Pr( A | B j C )
j =1
18
Andy Guo
Example 2.3.7: Identifying Source of Defective Item
• Three different machines M1, M2, and M3 were used for producing a large batch
of similar manufactured items. Suppose that 20% were produced by M1, 30% by
M2, and 50% by M3. Suppose further that 1% produced by M1 are defective, that
2% produced by M2 are defective, and that 3% produced by M3 are defective.
• Suppose one item is selected from the entire batch and it is defective. Determine
the probability that this item was produced by machine M2.
• Let Bi be the event that the selected item was produced by machine Mi and let A
be the event that the selected item is defective.
Pr( B1 ) = 0.2 Pr( B2 ) = 0.3 Pr( B3 ) = 0.5
Pr( A | B1 ) = 0.01
Pr( B2 | A) =
Pr( A | B2 ) = 0.02 Pr( A | B3 ) = 0.03
Pr( B2 ) Pr( A | B2 )
3
∑ Pr( B j ) Pr( A | B j )
=
(0.3)(0.02)
= 0.26
(0.2)(0.01) + (0.3)(0.02) + (0.5)(0.03)
j =1
19
Andy Guo
Example 2.3.8: Identifying Genotypes
• Consider a gene that has two alleles A and a. We shall call the phenotype
exhibited by individuals with genotype AA and Aa the dominant trait, and the
other trait called the recessive trait. Let E be the event that the observed
individual has the dominant trait. We would like to revise our opinion of the
possible genotypes of the parents.
• There are six possible genotype combinations, B1,…, B6, for the parents.
Parental genotypes
Name of event
Probability of Bi
Pr(E|Bi)
(AA,AA) (AA,Aa) (AA,aa) (Aa,Aa) (Aa,aa) (aa,aa)
B1
B2
B3
B4
B5
B6
1/16
1/4
1/8
1/4
1/4
1/16
1
1
1
3/4
1/2
0
1
1
1
3
⎛ 1 ⎞⎛ 3 ⎞ ⎛ 1 ⎞⎛ 1 ⎞ 1
(1) + (1) + (1) + ⎜ ⎟⎜ ⎟ + ⎜ ⎟⎜ ⎟ + (0) =
16
4
8
4
i =1
⎝ 4 ⎠⎝ 4 ⎠ ⎝ 4 ⎠⎝ 2 ⎠ 16
(1 / 16) × 1 1
(1 / 4) × (1 / 2) 1
∴ Pr( B1 | E ) =
=
=
Pr( B5 | E ) =
(3 / 4)
12
(3 / 4)
6
5
•
20
Pr( E ) = ∑ Pr( Bi ) Pr( E | Bi ) =
Andy Guo
Computation of Posterior Probability in
More than One Stage
• Suppose that a box contains one fair coin and one coin with a head on each side.
Suppose also that one coin is selected at random and that when it is tossed, a
head is obtained. Determine the probability that the coin is the fair coin.
• Let B1 be the event that coin is fair; let B2 be the event that the coin has two
heads; and let H1 be the event that a head is obtained.
Pr( B1 | H 1 ) =
Pr( B1 ) Pr( H 1 | B1 )
(1 2)(1 2)
1
=
=
Pr( B1 ) Pr( H 1 | B1 ) + Pr( B2 ) Pr( H 1 | B2 ) (1 2 )(1 2 ) + (1 2 )(1) 3
• Now suppose that the same coin is tossed again. Suppose that another head is
obtained. Determine the new value of the posterior probability that the coin is
fair.
21
Andy Guo
Computation of Posterior Probability in
More than One Stage
• The first method (consider two stages simultaneously)
Pr( B1 ) = Pr( B2 ) = 1 2
Pr( B1 | H1 H 2 ) =
Pr( H1H 2 | B2 ) = 1 2 × 1 2 = 1 4
(1 2)(1 4)
Pr( B1 ) Pr( H1 H 2 | B1 )
1
=
=
(
)(
)
(
)(
)
Pr( B1 ) Pr( H1 H 2 | B1 ) + Pr( B2 ) Pr( H1H 2 | B2 ) 1 2 1 4 + 1 2 1 5
• The second method (use 1st stage posterior probabilities as 2nd stage prior probabilities)
The new prior probabilities Pr( B1 | H 1 ) = 1 3 , and Pr( B2 | H 1 ) = 2 3
Also, Pr( H 2 | B1 H 1 ) = Pr( H 2 | B1 ) = 1 2 , and Pr( H 2 | B2 H 1 ) = 1
Pr( B1 | H1 H 2 ) =
=
22
Pr( B1 | H1 ) Pr( H 2 | B1H1 )
Pr( B1 | H1 ) Pr( H 2 | B1H1 ) + Pr( B2 | H1 ) Pr( H 2 | B2 H1 )
(1 3)(1 2)
1
=
(1 3)(1 2) + (2 3)(1) 5
Andy Guo
Example 2.3.9: Learning about a Probability
• Suppose that a machine produces a defective item with probability p and
produces a nondefective item with probability q = 1 - p. Suppose six items
produced are selected and two of the six items are defective.
• Prior to the inspection, the prior probability that p = 0.01 is 0.9, and p = 0.4 is
0.1. What is the posterior probability that p = 0.01 after the inspection?
• Let B1 = {p = 0.01} and B2 = {p = 0.4}. Let A be the event that two defectives
occur in a random sample of size six.
⎛ 6⎞
Pr( A | B1 ) = ⎜ ⎟0.012 0.99 4 = 1.44 × 10 −3
⎝ 2⎠
⎛ 6⎞
Pr( A | B2 ) = ⎜ ⎟0.4 2 0.6 4 = 0.311
⎝ 2⎠
0.9 × 1.44 × 10 −3
Pr( B1 | A) =
= 0.04
0.9 × 1.44 × 10 −3 + 0.1× 0.311
23
Andy Guo
Example 2.3.10: A Clinical Trial
• Consider a clinical trial such as a study of treatment for depression. Let Ei be the event
that the ith patient has success as his outcome; let Bj be the event that p = ( j – 1 )/10
for j = 1,…, 11, where p is the probability of success.
• We also model the patients as conditionally independent given each value of p, and we
set Pr( Ei | B j ) = ( j − 1) 10 for all i, j. We also assume that Pr(Bj) = 1/11 for all j prior to
the start of the trial. Determine p by computing posterior probabilities for the Bj events
after each patient finishes the trial.
• Consider the first patient
11 1 j − 1
1
=
Pr( E1 ) = ∑
2
j =111 10
Pr( E1 | B j ) Pr( B j ) 2( j − 1) j − 1
=
=
Pr( B j | E1 ) =
12
10 × 11
55
• After observing one success, the posterior probabilities of large values of p are higher
than their prior probabilities and those of low values of p are lower than their prior
probabilities.
24
Andy Guo
Example 2.3.10: A Clinical Trial
• Consider now 40 patients are observed and let A stand for the observed event that 22 of
them are successes and 18 are failures.
22
18
j − 1⎞
1 ⎛ 40 ⎞⎛ j − 1 ⎞ ⎛
−
1
⎜
⎟
⎟
⎟ ⎜
⎜
22
18
11 ⎝ 22 ⎠⎝ 10 ⎠ ⎝
10 ⎠
⎛ 40 ⎞⎛ j − 1 ⎞ ⎡ j − 1⎤
B
A
Pr( A | B j ) = ⎜ ⎟⎜
Pr(
|
)
=
⎟ ⎢1 −
j
18
11 1 ⎛ 40 ⎞ i − 1 22
10 ⎥⎦
⎝ 22 ⎠⎝ 10 ⎠ ⎣
⎞ ⎛ i − 1⎞
⎛
∑ ⎜ ⎟⎜ ⎟ ⎜1 −
⎟
10 ⎠
i =1 11 ⎝ 22 ⎠⎝ 10 ⎠ ⎝
• We show the posterior probabilities of the 11 partition elements after observing A. Notice
that B6 and B7 have the highest probabilities because they are close to 22/40 = 0.55.
25
Andy Guo
Example 2.3.10: A Clinical Trial
• We can compute the probability that the next patient will be a success
both before the trial and after the 40 patients.
Before the trial: Pr( E 41 ) = Pr( E1 ) = 1 2
11
After the trial:
Pr( E41 | A) = ∑ Pr( E41 | B j A) Pr( B j | A)
j =1
11
= ∑ Pr( E41 | B j ) Pr( B j | A)
j =1
11
⎛ j − 1⎞
= ∑⎜
⎟ Pr( B j | A) = 0.5476
j =1 ⎝ 10 ⎠
26
Andy Guo
Example 2.3.11: The Effect of Prior Probabilities
• Suppose that a different prior probabilities are assumed.
Event
B1
B2 B3 B4
B5 B6 B7
B8
B9 B10 B11
p
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Prior Pr. 0.00 0.39 0.20 0.13 0.09 0.07 0.05 0.03 0.02 0.02 0.00
• The following posterior probabilities are obtained after 25 patients and compared with
the posterior values (marked by X) calculated based on Pr(Bj) = 1/11. Notice that the
two sets of posterior probabilities are very close.
27
Andy Guo
The Gambler’s Ruin Problem
• Suppose that two gamblers A and B are playing a game against each other. On
each play of the game A will win one dollar from gambler B with probability
p and B will win one dollar from A with probability q = 1-p.
• Suppose the initial fortune of A is i dollars and the initial fortune of B is k-i
dollars. The gamblers continue playing the game until the fortune of one of
them has been reduced to 0 dollars.
• Let ai denote the probability that the fortune of A will reach k dollars before it
reaches 0 dollars. Let A1 denote the event that A wins one dollar on the first
play; let B1 denote the event that A loses one dollar on the first play; and let W
denote the event that the fortune of A ultimately reaches k dollars. Then,
Pr(W ) = Pr( A1 ) Pr(W | A1 ) + Pr( B1 ) Pr(W | B1 )
= p Pr(W | A1 ) + q Pr(W | B1 )
ai = pai +1 + qai −1
28
Andy Guo
The Gambler’s Ruin Problem
• Since a0 = 0 and ak = 1, we obtain the following k – 1 equations:
a1 = pa2
a2 = pa3 + qa1
ak -1 = p + qak − 2
• Let ai on the left side of the ith equation be rewritten as pai + qai,
a2 − a1 =
q
a1
p
a3 − a2 =
q
⎛q⎞
( a2 − a1 ) = ⎜ ⎟ a1
p
⎝ p⎠
1 − ak −1 =
q
⎛q⎞
(ak −1 − ak −2 ) = ⎜ ⎟
p
⎝ p⎠
2
k −1
a1
k −1 ⎛ q ⎞
∴1 − a1 = a1 ∑ ⎜ ⎟
i =1 ⎝ p ⎠
i
29
Andy Guo
Examples 2.5.1 &2.5.2: Gambler’s Ruin Problem
i
• If p = q =1/2, then q / p = 1, and we obtain the complete solution ai =
k
for i = 1,…, k-1.
• Suppose p ≠ q , then
k
⎛q⎞ ⎛q⎞
⎜ ⎟ −⎜ ⎟
p
p
1 − a1 = a1 ⎝ ⎠ ⎝ ⎠
⎛q⎞
⎜ ⎟ −1
⎝ p⎠
⎛q⎞
⎜ ⎟ −1
p
a1 = ⎝ ⎠k
⎛q⎞
⎜ ⎟ −1
⎝ p⎠
i
⎛q⎞
⎜ ⎟ −1
p
∴ ai = ⎝ ⎠k
⎛q⎞
⎜ ⎟ −1
⎝ p⎠
• For example, if p = 0.4, q = 0.6, i = 99 and k = 100
ai =
30
(3 2)99 − 1 2
≈
(3 2)100 − 1 3
Andy Guo
© Copyright 2026 Paperzz