Probability Theory - Jonathan Livengood

Probability Theory
The Law of Total Probability
Review
Last time, we derived Bayes’ theorem from the
definition of conditional probability.
Pr(e | h)  Pr( h)
Pr(h | e) 
Pr(e)
Review
Last time, we derived Bayes’ theorem from the
definition of conditional probability.
Pr(e | h)  Pr( h)
Pr(h | e) 
Pr(e)
We then saw an example and remarked on the
connection to Hume’s problem of induction.
Total Probability
Bayes’ Theorem is pretty great, but how do we
get the value for Pr(e)?
The answer is the Law of Total Probability.
Pr(e)   Pr(e | hi )  Pr( hi )
i
Total Probability
Consider the dice example from last time. I have
three dice in a bag.
Total Probability
If I choose one die and toss it, what is the
probability that I roll a four?
Total Probability
To answer the question, we use the law of total
probability. What are the hi’s and what is e?
Total Probability
h4 = I chose the four-sided die.
h8 = I chose the eight-sided die.
h20 = I chose the twenty-sided die.
e = I rolled a four.
Total Probability
Pr(h4) = Pr(h8) = Pr(h20) = 1/3
Pr(e | h4) = 1/4
Pr(e | h8) = 1/8
Pr(e | h20) = 1/20
Total Probability
Pr(e) = Pr(e | h4)∙Pr(h4) + Pr(e | h8)∙Pr(h8) +
Pr(e | h20)∙Pr(h20)
= (1/4)∙(1/3) + (1/8)∙(1/3) + (1/20)∙(1/3)
= (1/12) + (1/24) + (1/60)
= (10/120) + (5/120) + (2/120)
= 17 / 120
≈ 0.142
Total Probability
A partition of a set S is a collection of nonoverlapping sets that completely cover (or
exhaust) the set S.
Every element in S appears in exactly
one set in the partition.
Total Probability
This is the set S.
Total Probability
A
C
B
D
E
Total Probability
Formally, a partition of a set S is a collection of
sets, A1, …, An satisfying conditions:
(1)
(2)
The sets are pairwise disjoint: for
i ≠ j, Ai ∩ Aj = Ø, for all i and j
The union over Ai for all i is equal to
the set S
Total Probability
Let U be the universe of discourse, and suppose
that the sets A1, …, An form a partition of U.
The law of total probability says that for any
event B, the following equation holds:
Pr(B) = Pr(B | A1)∙Pr(A1) + … + Pr(B | An)∙Pr(An)
Total Probability
Let’s see how the law of total probability applies
to our example of the set S and partitioning sets
A, B, C, D, and E.
We want the probability of an
arbitrary set X.
Total Probability
A
D
C
X
B
E
Total Probability
A
D
C
X
B
E
Total Probability
A
A∩X
D
C
X
B
E
Total Probability
A
A∩X
D
C
X
B
E
Total Probability
A
A∩X
D
C
X
B
B∩X
E
Total Probability
Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X),
Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)).
Total Probability
Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X),
Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)).
Sets A, B, …, E are pairwise disjoint, so we have
Pr(X) = Pr(A ∩ X) + Pr(B ∩ X) + … + Pr(E ∩ X)
by Finite Additivity.
Total Probability
Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X),
Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)).
Sets A, B, …, E are pairwise disjoint, so we have
Pr(X) = Pr(A ∩ X) + Pr(B ∩ X) + … + Pr(E ∩ X)
by Finite Additivity.
By definition of conditional probability, we get
Pr(A ∩ X) = Pr(X | A)·Pr(A). Similarly for B, …, E.
Total Probability
In the special case where we are considering
only an hypothesis and its negation, we have:
Pr(e)  Pr(e | h)  Pr( h)  Pr(e | ~h)  Pr(~h)
Review: Total Probability
Suppose that in a certain field, there are two
varieties, A and B, of a grassy plant. Each plant
grows to be tall or short. Plants of type A grow
to be tall with probability 0.8, while plants of
type B grow to be tall with probability 0.4.
What is the probability that a randomly
selected plant grows to be tall if each
variety is equally likely to be chosen?
Total Probability
What is the probability that a randomly
selected plant grows to be tall if each
variety is equally likely to be chosen?
Pr(tall) = Pr(tall | A)·Pr(A) + Pr(tall | B)·Pr(B)
= 0.8 · 0.5 + 0.4 · 0.5
= 0.4 + 0.2
= 0.6
Total Probability
What if they are not equally likely to be
selected but plants of the A-variety
have probability 0.2 of being selected?
Total Probability
What if they are not equally likely to be
selected but plants of the A-variety
have probability 0.2 of being selected?
Pr(tall) = Pr(tall | A)·Pr(A) + Pr(tall | B)·Pr(B)
= 0.8 · 0.2 + 0.4 · 0.8
= 0.16 + 0.32
= 0.48
Base Rate Neglect
Since this is Wednesday, let’s think about a
common error that people make in reasoning
about probabilities. People neglect base rates.
Base Rate Neglect
Suppose you have a 99% accurate drug test,
balanced for false positives and false negatives.
So, 99% of users test positive, and
99% of non-users test negative.
Base Rate Neglect
Suppose you have a 99% accurate drug test,
balanced for false positives and false negatives.
Also, 1% of non-users test positive,
and 1% of users test negative.
Base Rate Neglect
Joe tests positive for drug use. What is the
probability that Joe is a drug user?
Base Rate Neglect
Joe tests positive for drug use. What is the
probability that Joe is a drug user?
You cannot answer the question
unless you know the prior
probability that Joe is a drug user.
Base Rate Neglect
Pr(e | h)  Pr( h)
Pr(h | e) 
Pr(e)
Base Rate Neglect
Pr(e | h)  Pr( h)
Pr(h | e) 
Pr(e)
Prior probability
Base Rate Neglect
Pr(e | h)  Pr( h)
Pr(h | e) 
Pr(e)
Prior probability
The prior probability is also
called the base rate.
Base Rate Neglect
According to the National Survey on Drug Use
and Health, about 9% of people aged 12 or over
are drug users.
Claim. The posterior probability
that Joe is a drug user is 91%.
Base Rate Neglect
Let …
h = Joe is a drug user.
~h = Joe is not a drug user.
e = Joe tests positive for drug use.
~e = Joe tests negative for drug use.
Base Rate Neglect
Pr(h) = 9/100
Pr(e | h) = 99/100
Pr(~h) = 91/100
Pr(e | ~h) = 1/100
Pr(e | h)∙Pr(h) = 891/(100∙100)
Pr(e) = Pr(e | h)∙Pr(h) + Pr(e | ~h)∙Pr(~h)
= 891/(100∙100) + 91/(100∙100)
Base Rate Neglect
So, Pr(h | e) = 891 / (891 + 91)
= 0.907
The probability that Joe is a drug user
given that he tested positive is
approximately 91%, despite the fact that
the test is 99% accurate.
Base Rate Neglect
Now, suppose that we only care about whether
Joe uses methamphetamine.
According to the NSDUH, only 0.1% of
people aged 12 or over use meth.
Base Rate Neglect
Since the base rate is so low, even with a 99%
reliable test, the posterior probability is low.
The posterior probability that Joe uses
meth given his positive test is only 9%.
Base Rate Neglect
Since the base rate is so low, even with a 99%
reliable test, the posterior probability is low.
The posterior probability that Joe uses
meth given his positive test is only 9%.
These numbers probably still look weird,
so let’s look at them another way.
Base Rate Neglect
100 out of every 100,000 people over the age of
12 uses meth. Now, 99 out of every 100 people
who use meth test positive for meth use, and 1
out of every 100 people who do not use meth
test positive for meth use. Everyone else tests
negative. If Joe tests positive, how likely is he to
be a meth user?
Base Rate Neglect
Let …
h = Joe is a meth user.
~h = Joe is not a meth user.
e = Joe tests positive for meth use.
~e = Joe tests negative for meth use.
Base Rate Neglect
Pr(h) = 1/1000
Pr(e | h) = 99/100
Pr(~h) = 999/1000
Pr(e | ~h) = 1/100
Pr(e | h)∙Pr(h) = 99/(100∙1000)
Pr(e) = Pr(e | h)∙Pr(h) + Pr(e | ~h)∙Pr(~h)
= 99/(100∙1000) + 999/(100∙1000)
Base Rate Neglect
So, Pr(h | e) = 99 / (999 + 99)
= 0.09
The probability that Joe is a meth user
given that he tested positive is only
approximately 9%, despite the fact that
the test is 99% accurate!
Next Time
We will talk about interpretations of probability.