Bayes’ Theorem and Hypergeometric Distribution
RLS
RLS
Bayes’ Theorem and Hypergeometric Distribution
Bayes’ Theorem
Bayes’ Theorem is a theorem of probability theory that can be seen
as a way of understanding how the probability that a theory is true
is affected by a new piece of evidence. It has been used in a wide
variety of contexts, ranging from marine biology to the development
of “Bayesian” spam blockers for email systems. In the philosophy of
science, it has been used to try to clarify the relationship between
theory and evidence. Many insights in the philosophy of science
involving confirmation, falsification, the relation between science
and pseudosience, and other topics can be made more precise, and
sometimes extended or corrected, by using Bayes’ Theorem.
RLS
Bayes’ Theorem and Hypergeometric Distribution
Definition
Let A1 , A2 , · · ·, Ak be a collection of k mutually exclusive
events with prior probabilities P (Ai )(i = 1, · · · , k). Then
for any other event B for which P (B) > 0, the posterior
probability of Aj given that B has occurred is
P (B|A)P (A)
P (A|B) = P
P (B|A)P (A)
RLS
Bayes’ Theorem and Hypergeometric Distribution
Bayes’ Example I
Incidence of a rare disease.
Only 1 in 1000 adults is afflicted with a rare disease for which a
diagnostic test has been developed. The test is such that when an
inidividual actually has the disease, a positive result will occur 99%
of the time, whereas an individual without the disease will show a
positive test result only 2% of the time. If a randomly selected
individual is tested and the result is positive, what is the probability
that the individual has the disease?
Let A = individual has the disease, and B = positive test result
1
P (A) = 1000
= 0.001, P (A{ ) = 1 − 0.001 = 0.999, P (B|A) = 0.99
{
and P (B|A ) = 0.02
RLS
Bayes’ Theorem and Hypergeometric Distribution
Bayes’ Example II
If a randomly selected individual is tested and the result is positive,
what is the probability that the individual has the disease?
P (A|B) =
=
=
P (B|A)P (A)
P (B|A)P (A) + P (B|A{ )P (A{ )
P (A ∩ B)
where P (B) = P (B|A)P (A) + P (B|A{ )P (A{ )
P (B)
(0.99)(0.001)
(0.99)(0.001)+(0.02)(0.999)
RLS
Bayes’ Theorem and Hypergeometric Distribution
= 0.0472103
Sampling with replacement vs. without replacement
Suppose we have a bowl of 100 unique numbers from 0 to 99. We
want to select a random sample of numbers from the bowl. After
we pick a number from the bowl, we can put the number aside or
we can put it back into the bowl. If we put the number back in the
bowl, it may be selected more than once; if we put it aside, it can
selected only one time.
When a population element can be selected more than one time, we
are sampling with replacement. When a population element can be
selected only one time, we are sampling without replacement.
Sampling with replacement tends to give more extreme (variable)
samples than without replacement.
RLS
Bayes’ Theorem and Hypergeometric Distribution
Sampling with and without replacement
Suppose we have a population as follows: 0, 1, 2, 3, 4
Take samples of size 2 with and without replacement:
With
with (con’t)
Without
(0,0)
(0,1)
(0,2)
(0,3)
(0,4)
(1,1)
(1,2)
(1,3)
(1,4)
(2,2)
(2,3)
(2,4)
(3,3)
(3,4)
(4,4)
(0,1)
(0,2)
(0,3)
(0,4)
(1,2)
(1,3)
(1,4)
(2,3)
(2,4)
(3,4)
RLS
Bayes’ Theorem and Hypergeometric Distribution
Models for Discrete Random Variables (rv)
Each random variable can be considered as being obtained by
probability sampling from its own sample space, and in doing so
leads to a classification of sampling experiments, and the
corresponding variables, into classes. Random variables within each
class share a common traits and parameter estimations. These
classes of probability distributions are also called probability
models.
RLS
Bayes’ Theorem and Hypergeometric Distribution
Bernoulli trials
A Bernoulli trial or experiment is one whose outcome can
be classified as either a success or failure. The Bernoulli
random variable X takes the value 1 if the outcome is a
success, 0 if it is a failure.
Examples of Bernoulli trials: flipping a coin, product taken randomly
from production line and classified as a success if the product is
defective or a failure if it is not defective.
RLS
Bayes’ Theorem and Hypergeometric Distribution
Bernoulli pdf CDF
If the probability of success is p and failure is 1 − p, the pmf and
CDF of X are:
x
0
1
p(x)
F (x)
1−p
1−p
p
1
The binomial distribution is an extension (in the same family of
probability models) of the Bernoulli distribution. The binomial
random variable is the number of k successes in the n Bernoulli
trials.
RLS
Bayes’ Theorem and Hypergeometric Distribution
The Hypergeometric Distribution
The assumptions leading to the hypergeometric distribution are:
1. The population or set to be sampled consists of N elements (a
finite population).
2. Each individual element can be characterized as a success(S)
or a failure (F ) and there are M successes in the population.
3. A sample of n individuals is selected without replacement in
such a way that each subset of size n is equally likely to be
chosen.
Let X be the number of successes in the sample.
RLS
Bayes’ Theorem and Hypergeometric Distribution
Hypergeometric pmf
P (X = x) =
EX =
VX =
RLS
Bayes’ Theorem and Hypergeometric Distribution
N −n
N −1
M N −M x
n−x
N
n
Mn
N
Mn
M
1−
N
N
√
SDX = V X
Hyper Example I
Five individuals from an animal population thought to be near
extinction in a certain region have been caught, tagged, and released
to mix into the population. After they have had an opportunity to
mix, a random sample of 10 of these animals is selected. Let X =
the number of tagged animals in the second sample. If there are
actually 25 animals of this type in the region, find the following:
1. Probability that exactly 2 are tagged? (P (X = 2))
2. Probability that at most 2 are tagged? (P (X ≤ 2))
3. EX, V X, SDX
4. Suppose the population size N is not actually known, the value
of x is observed and we can use that to estimate N . Suppose
now M = 100, n = 40 and x = 16
Mn
N̂ =
x
RLS
Bayes’ Theorem and Hypergeometric Distribution
Hyper Example II
1. P (X = 2) =
(52)(20
8)
= 0.3853755
(25
)
10
2. P (X ≤ 2) = P (2)+P (1)+P (0) =
0.6988142
RLS
Bayes’ Theorem and Hypergeometric Distribution
(52)(20
(5)(20) (5)(20)
8)
+ 1 25 9 + 0 2510 =
25
(10)
(10)
(10)
Hyper Example III
3. EX, V X, SDX
M=5; n=10; N=25
EX=M*n/N; VX=((N-n)/(N-1))*(M*n/N)*(1-M/N)
SDX=sqrt(VX)
EX; VX; SDX
[1] 2
[1] 1
[1] 1
4. N̂ =
Mn
x
=
100∗40
16
= 250
RLS
Bayes’ Theorem and Hypergeometric Distribution
R code
Use dhyper() for the pmf calculation (single probabilities) or
phyper() for CDF P(X<=x). dhyper(x,m,n,k) (and
phyper(x,m,n,k,lower.tail=T)) where:
I
I
I
I
I
x: argument of interest, x
m: M, the number ’tagged’ in the population, the number of
successes
n: N-M, the number of N elements in the population minus the
number tagged
k: n, the sample size
lower.tail=T: for use in ‘phyper()‘ only; logical, default is T,
calculations will be P (X ≤ x)
Use sum() with dhyper(x,m,n,k) to calculate probabilities of
intervals
RLS
Bayes’ Theorem and Hypergeometric Distribution
Previous example with full code
M=5; n=10; N=25 # P(X=2), P(X<=2)
dhyper(2,M,N-M,n)
[1] 0.3853755
sum(dhyper(0:2,M,N-M,n))
[1] 0.6988142
RLS
Bayes’ Theorem and Hypergeometric Distribution
© Copyright 2026 Paperzz