Week 2: Conditional probability.

Conditional Probability
Arthur White∗
3rd October 2016
If we restrict our attention to a subset of possible outcomes relating to an experiment,
and calculate the probabilities of an event relating to this subset, then the probabilities so
calculated are said to be conditional.
Example 1
In a delivery of 1,000 screws, 140 were badly threaded, 3/4 of these also being rusty. Two
hundred and fifty screws in total were rusty.
i). What is the probability of any screw, picked at random, being rusty (R) and badly
threaded (BT )?
ii). What is the probability any rusty screw, picked at random, will be badly threaded?
iii). What is the probability of a properly threaded screw, picked at random, being rusty?
(Here, properly threaded simply means not badly threaded, or BT .)
Badly Threaded
35
Rusty
105
145
715
i). In this part we are referring to the entire set of screws, and we do not need to restrict
our attention to a subset: P(R ∩ BT ) = 105/1, 000 = 0.105.
∗
Based extensively on material previously taught by Eamonn Mullins.
1
ii). In this part we are only interested in those screws which are rusty; the other 750
are of no interest. Restricting ourselves to this subset, we can proceed as before:
P(BT |R) = 105/250 = 0.42.
iii). P(R|BT ) = 145/860 = 0.17.
Looking again at parts ii) and iii), we see that they can be written
P(BT |R) =
105/1, 000
P(BT ∩ R)
105
=
=
,
250
250/1, 000
P(R)
P(R|BT ) =
145
145/1, 000
P(R ∩ BT )
.
=
=
860
860/1, 000
P(BT )
and
These are particular cases of the more general rule, for events A and B,
P(A|B) =
P(A ∩ B)
,
P(B)
P(B|A) =
P(B ∩ A)
.
P(A)
or
Note that a) the event being conditioned upon appears in the denominator, and b) in general,
P(A|B) 6= P(B|A). E.g., P(R|BT ) = 105/140, while P(BT |R) = 105/250.
Statistical Independence
We have already noted that two events are independent if the outcome of one does not affect
the probable outcome of the other. Symbolically, P(A|B) = P(A). We then have
P(A|B) =
P(A ∩ B)
= P(A),
P(B)
which gives
P(A ∩ B) = P(A)P(B),
i.e., if two events are statistically independent then the probability of their joint occurrence
equals the product of their probabilities.
More generally, if two events are not statistically independent, then the probability of
their jointly occurring takes the following form:
P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A).
2
Example 2
In a factory, two machines I and II produce shoes. 60% of total production comes from I,
while 40% comes from II. 10% of shoes from I are defective (D), compared with 20% from
II. If we select a shoe at random, what is the probability that it is defective? (Alternatively,
we could ask the more practical question, “what proportion of shoes are defective overall?”)
I
Defective
II
Letting H1 be the event that a shoe was produced by machine I, and H2 be the event
that a shoe was produced by machine II, we see that
P(D) = P(D ∩ H1 ) + P(D ∩ H2 )
= P(D|H1 )P(H1 ) + P(D|H2 )P(H1 )
= 0.1 × 0.6 + 0.2 × 0.4 = 0.14.
The proportion of good shoes is then P(D) = 1 − P(D) = 0.86.
More generally, suppose the factory had K machines, with the event that a shoe was
produced by machine i set to be Hi , i = 1, . . . , K. Then
P(D) = P(D ∩ H1 ) + P(D ∩ H2 ) + . . . + P(D ∩ HK )
K
X
=
P(D ∩ Hi )
=
i=1
K
X
P(D|Hi )P(Hi ).
i=1
3
Bayes’s Theorem
Continuing from the last example, suppose that we ask “if a shoe is defective, what is the
probability that it came from the Machine I?” (“What proportion of defective shoes come
from the first machine?”)
I
Defective
II
P(H1 ∩ D)
P(D)
P(H1 ∩ D)
= PK
i=1 P(D ∩ Hi )
P(D|H1 )P(H1 )
= PK
i=1 P(D|Hi )P(Hi )
0.06
=
= 0.43.
0.14
P(H1 |D) =
If we regard H1 as a prior event and D as a posterior event, then what we have done here
is calculated the probability of the occurrence of one of a number (here two) of prior events
given that we know which posterior event has occurred. Many practical problems can be
modeled this way. The result
P(D|H1 )P(H1 )
P(H1 |D) = PK
i=1 P(D|Hi )P(Hi )
is known as Bayes’s Theorem.
4