5.4.1
5.4: Hypergeometric Distribution
Here’s the classic set up for a random variable with a
hypergeometric PDF:
Suppose an urn has N balls with k of them being red
and N-k of them being white. Thus, there are k+Nk=N total balls. Suppose nN balls are drawn from
the urn without replacement. Let x be the number of
red balls drawn out.
The random variable X has a hypergeometric PDF.
Hypergeometric PDF – The PDF of the hypergeometric
variable X, the number of successes in a random sample of
size n selected from N items of which k are labeled success
and N-k are labeled failure, is
k N k
xn x
f(x;N,n,k) h(x;N,n,k )
N
n
where max{0, n-(N-k)} x min{k,n}
Example: Let N=10, k=4, N-k=6, n=3, and x=2
2005 Christopher R. Bilder
5.4.2
k N k 4 10 4
x n x 2 3 2 6 6 3
f(2)
N
10
120 10
n
3
Theorem 5.3 – The mean and variance of the
hypergeometric PDF are:
nk
and
N
Nn k
k nk(N n)(N k)
2 Var(X) n
1
N 1N N
N2 (N 1)
E(X)
pf: See Appendix A.25 on p. 712. It is similar to the
proof for the binomial PDF.
Example: Urns (urns.xls)
Suppose there are N=10 balls in an urn with k=4 of them
red and N-k=6 of them white. Suppose n=7 balls are
drawn from the urn. What is the PDF, mean, and
variance?
I used Excel to find these items. Below is the syntax to
use to find the probabilities:
2005 Christopher R. Bilder
5.4.3
=HYPERGEODIST(x, n, k, N)
Is it reasonable to observe x 3?
2005 Christopher R. Bilder
5.4.4
Example: Tea taster experiment (teataster.xls)
This is a classic example given in R. A. Fisher’s “The
Design of Experiments” textbook in 1935. Here’s an
adaptation of it presented in Agresti’s (1996) “An
Introduction to Categorical Data Analysis” textbook:
A colleague of Fisher’s at Rathamsted Experiment
Station in London claimed that, when drinking tea,
she could distinguish whether milk of tea was added
to the cup first. To test her claim, Fisher designed
an experiment in which she tasted N=8 cups of tea.
There were k=4 cups which had milk added first, and
the other N-k=4 cups had tea added first. She was
told there were four cups of each type, so that she
should try to select the k=4 that had milk added first.
The cups were presented in random order. Below is
a set up of the experiment:
Guess Pour First
Milk
Tea
Poured Milk
First Tea
x
n=4
4
k=4
N-k=4
N=8
Suppose she really could not differentiate between the
milk or tea added first cups. What is the probability that
2005 Christopher R. Bilder
5.4.5
she guesses all k=4 of the milk added first cups correct?
In other words, what is f(4) = P(X=4)?
k N k
xn x
N
n
48 4
40 4
1 0.0143
70
8
4
What are the probabilities of observing the other
possible values of X?
x
0
1
2
3
4
f(x)
0.0143
0.2286
0.5143
0.2286
0.0143
Suppose she did get all 4 correct. Given this PDF, what
do you think about whether or not she really can
differentiate between which is added first?
Answering questions like this will be VERY important
for Chapter 10 and 11 when we do hypothesis
testing!!!
Notes:
E(X) = 2 and Var(X) = 0.571.
2005 Christopher R. Bilder
5.4.6
Below is the actual observed data from the experiment.
Guess Pour First
Milk
Tea
Poured Milk
First Tea
3
1
4
1
3
4
4
4
8
Read on your own information about the hypergeometric
PDF’s relationship to the binomial PDF and about the
multivariate hypergeometric PDF.
2005 Christopher R. Bilder
© Copyright 2026 Paperzz