1. Bayes Theorem

Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
1. Bayes Theorem
Before proceeding in the course, we have to introduce Bayes Theorem (or Bayes' Theorem).
This is because the field of statistics is in the midst of a gradual transition from the 'Classical
model" to a 'Bayesian model.' Without understanding the difference, i.e., what the Bayesian
model is, some of the concepts in later chapters will seem unclear, nonintuitive, or both.
Rev. Thomas Bayes (c. 1701 – 1761)
Importance of Bayes' Theorem
Bayes Theorem is a new way to conceptualize probabilistic inferences, with the potential to
fundamentally change how probabilistic thinking occurs in human culture.
•
A fundamental change of paradigm
•
A superior way of thinking about and using probabilities
•
Helps us to see that many important inferences are only probabilistic, not certain
•
Requires us to identify and explicitly quantify the components of our probabilistic
inferences.
•
Makes our everyday probabilistic inferences less vague and subjective
•
Potential to transform every discipline that draws inferences from uncertain
evidence (medicine, law, quality analysis…)
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
(Note: the classical model of statistics is also called the 'frequentist' model)
Ways to Interpret Bayes Theorem
•
•
a way of relating conditional probabilities P(A|B) and P(B|A)
an expression of how update beliefs in the face of new evidence
P(A)
P(A|B)
= prior probability (belief in truth of A before evidence B)
= posterior probability (belief in truth of A after evidence B)
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
Motivating Example 1: Ball-and-Urn Experiment
Two urns are filled with Red and Black marbles, in the proportions shown.
•
An experimenter hides the urns behind a curtain.
•
He chooses which urn to draw from based on a (fair) coin flip.
•
He then draws three marbles from the chosen urn, one marble at a time (with
replacement).
•
You don't see the coin flip or know from which urn he draws.
•
He tells you only the number of Red and Black marbles drawn (say, three Reds)
•
Estimate the probability that he drew from Urn 1.
We can solve this problem with use of Bayes Theorem.
Initial Considerations
Multiplication Rule
Conditional Probability
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
but
so
From multiplication rule:
)
giving us Bayes Theorem:
which we can alternatively express as:
or
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
Solution to Ball-and-Urn Problem
From Bayes Theorem:
From binomial distribution:
= .7
.7
.7 = .343
= .3
.3
.3 = .027
Given a fair coin, P(Urn 1) = P(Urn 2) = .5. Therefore:
So we're pretty certain that the experimenter drew the marbles from Urn 1. If he had drawn four
marbles, and all were Red, the probability of Urn 1 would be 0.967. It would take 6 red marbles
drawn in succession (out of 6 draws) before we could say with > 99% confidence that he drew
from Urn 1.
More Complex Uses of Bayes' Theorem
Even more elaborately, Bayes' Theorem could be used here to predict not only from which urn
the marbles were drawn, but the proportion of black and red marbles in each urn, and the prior
probability of the chooser picking urn 1 or urn 2 to draw from (if we didn't already know these
things).
Bayes' Theorem and Scientific Inference
Bayes' Theorem is of fundamental importance to all scientific investigation and statistical
inference:
Let:
H = a scientific hypothesis (or inference about a population parameter)
E = observed evidence (e.g. an experimental result, or a sample statistic)
In general, scientific inferences can occur in either of two directions:
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
•
•
Estimating P(H | E), i.e., drawing an inference about truth of hypothesis based on evidence
Estimating P(E|H), or drawing inference about what evidence to expect if a hypothesis is
true
Similarly, in a statistical study, we can either make an inference about a population parameter
from sample data, or infer what we expect in samples based on a known or expected population
parameter, e.g.:
•
•
P(Population mean | sample mean)
P(sample mean | population mean)
These two directions of inference are related to each other by Bayes' Theorem:
Here P(H) is our prior probability of the hypothesis and P(H|E) is the posterior probability – i.e.,
our revised estimate of the probability of the hypothesis based on new experimental evidence.
Notice how the equation verifies our common-sense ideas about how experimental should affect
ones beliefs about hypotheses:
• P(H|E) increases as P(H) increases, i.e., the more plausible the hypothesis is to
begin with, the more willing we are to conclude it is true after the experiment.
•
P(H|E) increases as P(E|H) increases: i.e., our confidence in the conclusion is increased
when we know that the hypothesis strongly implies the expected experimental result.
•
P(H|E) decreases as P(E) goes up; that is, the more likely E is to occur in general (i.e.,
including for reasons not connected with the truth of our hypothesis), the less strongly we
will conclude that H is true based in evidence E.
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
Application to Medical Tests
Diagnostic tests are becoming increasingly common in medicine. When a disease has two
levels (present/absent) and a test result has two levels (positive/negative), four outcomes are
possible.
Table 1. Definitions of True and False Positives and Negatives
Test Result
Test Negative
Test Positive
True Condition
Disease Absent
Disease Present
True
False
Negative
Negative
False
True
Positive
Positive
Bayesian methods allow us to draw more precise and meaningful conclusions about the results
of one or more diagnostic tests.
Motivating Example 2: Diagnostic Test
You have come down with fever and chills, and see a doctor. The doctor (who happens to be a
specialist in tropical medicine) wants to give you a diagnostic test for a rare disorder called
Mekong Delta plague. You say you've never been to the Mekong Delta, but he nevertheless
wants to administer the test.
A blood sample is taken to the lab and tests positive for Mekong Delta plague. The doctor says
you'll have to undergo a severe course of treatment and stay in the hospital for weeks. You
say, "This is impossible! How accurate is the diagnostic test?" The doctor replies that it is
extremely accurate: it gives a positive result in 99% of all cases where Mekong Delta plague is
present, and a negative result in 95% of all cases where the disease is absent.
Confident of his diagnosis, then, the doctor prepares to prescribe a difficult and long course of
treatment. What's wrong here?
Analysis of Example 2
First we need to see that there are actually several different statistical indices that can be used
to measure the accuracy of a diagnostic test.
The two most common indices of diagnostic test accuracy are Sensitivity (Se) and Specificity
(Sp), defined as follows:
Table 2. Common Indices of Diagnostic Accuracy (Row Condition Given Column
Condition)
Test Result
Test Negative
True Condition
Disease Absent
Disease Present
P(Neg. Test | No Disease)
Specificity (Sp)
P(Neg. Test | Disease)
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
False Negative Rate (FN)
1 – Se
Test Positive
P(Pos. Test | No Disease)
False Positive Rate (FP)
1 – Sp
P(Pos. Test | Disease)
Sensitivity (Se)
[Other indices, which we will not talk about, are called the positive predictive value (Pv+) and
negative predictive value (Pv-) of a test.]
The accuracy index mentioned by the doctor was Se. However, by itself, knowing Se is not
enough to draw probabilistic inferences from a given test result.
Computation of Indices of Diagnostic Accuracy
Table 3. Computation of Indices of Diagnostic Accuracy
Test Result
True Condition
Disease Negative Disease Positive
(–)
(+)
Total
Test Negative
a
b
a+b
Test Positive
c
d
c+d
Total
a+c
b+d
a+b+
c+d=N
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem
What went wrong in our example? The doctor knew that the test had good Se and Sp (.99 and .
95, respectively). However he neglected to take into account that the prior probability of
Mekong Delta plague is extremely low, say, only 1 infection per 1 million persons. Using Bayes
rule we see how this would radically affect interpretation of test results:
Where:
,
,
So
The actual probability you have Mekong Delta plague is only .0000198, and the probability you
have something else (like a common cold), is 1 – .0000198 - .9999802, or better than 99.99.
Diagnosis Based on Multiple Symptoms
Finally, Bayes Theorem can be used to estimate the probability of disease based on a
combination of several diagnostic tests, or presence/absence of any number of symptoms:
Let D denote disease present,
S1, S2, … presence of Symptoms 1, 2, ….
Statistics 312 – Dr. Uebersax
16. Bayes' Theorem