Probability Theory The Law of Total Probability Review Last time, we derived Bayes’ theorem from the definition of conditional probability. Pr(e | h) Pr( h) Pr(h | e) Pr(e) Review Last time, we derived Bayes’ theorem from the definition of conditional probability. Pr(e | h) Pr( h) Pr(h | e) Pr(e) We then saw an example and remarked on the connection to Hume’s problem of induction. Total Probability Bayes’ Theorem is pretty great, but how do we get the value for Pr(e)? The answer is the Law of Total Probability. Pr(e) Pr(e | hi ) Pr( hi ) i Total Probability Consider the dice example from last time. I have three dice in a bag. Total Probability If I choose one die and toss it, what is the probability that I roll a four? Total Probability To answer the question, we use the law of total probability. What are the hi’s and what is e? Total Probability h4 = I chose the four-sided die. h8 = I chose the eight-sided die. h20 = I chose the twenty-sided die. e = I rolled a four. Total Probability Pr(h4) = Pr(h8) = Pr(h20) = 1/3 Pr(e | h4) = 1/4 Pr(e | h8) = 1/8 Pr(e | h20) = 1/20 Total Probability Pr(e) = Pr(e | h4)∙Pr(h4) + Pr(e | h8)∙Pr(h8) + Pr(e | h20)∙Pr(h20) = (1/4)∙(1/3) + (1/8)∙(1/3) + (1/20)∙(1/3) = (1/12) + (1/24) + (1/60) = (10/120) + (5/120) + (2/120) = 17 / 120 ≈ 0.142 Total Probability A partition of a set S is a collection of nonoverlapping sets that completely cover (or exhaust) the set S. Every element in S appears in exactly one set in the partition. Total Probability This is the set S. Total Probability A C B D E Total Probability Formally, a partition of a set S is a collection of sets, A1, …, An satisfying conditions: (1) (2) The sets are pairwise disjoint: for i ≠ j, Ai ∩ Aj = Ø, for all i and j The union over Ai for all i is equal to the set S Total Probability Let U be the universe of discourse, and suppose that the sets A1, …, An form a partition of U. The law of total probability says that for any event B, the following equation holds: Pr(B) = Pr(B | A1)∙Pr(A1) + … + Pr(B | An)∙Pr(An) Total Probability Let’s see how the law of total probability applies to our example of the set S and partitioning sets A, B, C, D, and E. We want the probability of an arbitrary set X. Total Probability A D C X B E Total Probability A D C X B E Total Probability A A∩X D C X B E Total Probability A A∩X D C X B E Total Probability A A∩X D C X B B∩X E Total Probability Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X), Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)). Total Probability Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X), Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)). Sets A, B, …, E are pairwise disjoint, so we have Pr(X) = Pr(A ∩ X) + Pr(B ∩ X) + … + Pr(E ∩ X) by Finite Additivity. Total Probability Since X = (A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X), Pr(X) = Pr((A ∩ X) ∪ (B ∩ X) ∪ … ∪ (E ∩ X)). Sets A, B, …, E are pairwise disjoint, so we have Pr(X) = Pr(A ∩ X) + Pr(B ∩ X) + … + Pr(E ∩ X) by Finite Additivity. By definition of conditional probability, we get Pr(A ∩ X) = Pr(X | A)·Pr(A). Similarly for B, …, E. Total Probability In the special case where we are considering only an hypothesis and its negation, we have: Pr(e) Pr(e | h) Pr( h) Pr(e | ~h) Pr(~h) Review: Total Probability Suppose that in a certain field, there are two varieties, A and B, of a grassy plant. Each plant grows to be tall or short. Plants of type A grow to be tall with probability 0.8, while plants of type B grow to be tall with probability 0.4. What is the probability that a randomly selected plant grows to be tall if each variety is equally likely to be chosen? Total Probability What is the probability that a randomly selected plant grows to be tall if each variety is equally likely to be chosen? Pr(tall) = Pr(tall | A)·Pr(A) + Pr(tall | B)·Pr(B) = 0.8 · 0.5 + 0.4 · 0.5 = 0.4 + 0.2 = 0.6 Total Probability What if they are not equally likely to be selected but plants of the A-variety have probability 0.2 of being selected? Total Probability What if they are not equally likely to be selected but plants of the A-variety have probability 0.2 of being selected? Pr(tall) = Pr(tall | A)·Pr(A) + Pr(tall | B)·Pr(B) = 0.8 · 0.2 + 0.4 · 0.8 = 0.16 + 0.32 = 0.48 Base Rate Neglect Since this is Wednesday, let’s think about a common error that people make in reasoning about probabilities. People neglect base rates. Base Rate Neglect Suppose you have a 99% accurate drug test, balanced for false positives and false negatives. So, 99% of users test positive, and 99% of non-users test negative. Base Rate Neglect Suppose you have a 99% accurate drug test, balanced for false positives and false negatives. Also, 1% of non-users test positive, and 1% of users test negative. Base Rate Neglect Joe tests positive for drug use. What is the probability that Joe is a drug user? Base Rate Neglect Joe tests positive for drug use. What is the probability that Joe is a drug user? You cannot answer the question unless you know the prior probability that Joe is a drug user. Base Rate Neglect Pr(e | h) Pr( h) Pr(h | e) Pr(e) Base Rate Neglect Pr(e | h) Pr( h) Pr(h | e) Pr(e) Prior probability Base Rate Neglect Pr(e | h) Pr( h) Pr(h | e) Pr(e) Prior probability The prior probability is also called the base rate. Base Rate Neglect According to the National Survey on Drug Use and Health, about 9% of people aged 12 or over are drug users. Claim. The posterior probability that Joe is a drug user is 91%. Base Rate Neglect Let … h = Joe is a drug user. ~h = Joe is not a drug user. e = Joe tests positive for drug use. ~e = Joe tests negative for drug use. Base Rate Neglect Pr(h) = 9/100 Pr(e | h) = 99/100 Pr(~h) = 91/100 Pr(e | ~h) = 1/100 Pr(e | h)∙Pr(h) = 891/(100∙100) Pr(e) = Pr(e | h)∙Pr(h) + Pr(e | ~h)∙Pr(~h) = 891/(100∙100) + 91/(100∙100) Base Rate Neglect So, Pr(h | e) = 891 / (891 + 91) = 0.907 The probability that Joe is a drug user given that he tested positive is approximately 91%, despite the fact that the test is 99% accurate. Base Rate Neglect Now, suppose that we only care about whether Joe uses methamphetamine. According to the NSDUH, only 0.1% of people aged 12 or over use meth. Base Rate Neglect Since the base rate is so low, even with a 99% reliable test, the posterior probability is low. The posterior probability that Joe uses meth given his positive test is only 9%. Base Rate Neglect Since the base rate is so low, even with a 99% reliable test, the posterior probability is low. The posterior probability that Joe uses meth given his positive test is only 9%. These numbers probably still look weird, so let’s look at them another way. Base Rate Neglect 100 out of every 100,000 people over the age of 12 uses meth. Now, 99 out of every 100 people who use meth test positive for meth use, and 1 out of every 100 people who do not use meth test positive for meth use. Everyone else tests negative. If Joe tests positive, how likely is he to be a meth user? Base Rate Neglect Let … h = Joe is a meth user. ~h = Joe is not a meth user. e = Joe tests positive for meth use. ~e = Joe tests negative for meth use. Base Rate Neglect Pr(h) = 1/1000 Pr(e | h) = 99/100 Pr(~h) = 999/1000 Pr(e | ~h) = 1/100 Pr(e | h)∙Pr(h) = 99/(100∙1000) Pr(e) = Pr(e | h)∙Pr(h) + Pr(e | ~h)∙Pr(~h) = 99/(100∙1000) + 999/(100∙1000) Base Rate Neglect So, Pr(h | e) = 99 / (999 + 99) = 0.09 The probability that Joe is a meth user given that he tested positive is only approximately 9%, despite the fact that the test is 99% accurate! Next Time We will talk about interpretations of probability.
© Copyright 2026 Paperzz