Hypergeometric and n..


The binomial distribution gives exact probabilities
for the number of successes in samples from a
dichotomous population (S-F) when sampling with
replacement.

The binomial distribution gives approximate
probabilities when sampling without replacement
provided that the sample size n is small relative to
the population size N.

The hypergeometric distribution gives exact
probabilities in the last case.
2

Assumptions:
The population or set to be sampled consists of
N (finite) individuals, objects, etc.
 Each individual can be classified as S or F, and
there are M successes in the population.
 A sample of n individuals is selected without
replacement in such a way that each subset of
size n is equally likely to be chosen.


The random variable X of interest is the number
of successes in the sample. The distribution is
denoted by P(X=x) = h(x;n,M,N).
3

If X is the number of S’s in a random sample of
size n drawn from a population consisting of M
S’s and (N-M) F’s,
 M  N  M 
 x  n  x 

P  X  x   h  x; n, M , N    
N
n 
 
for x, an integer, satisfying
max  0, n  M  N   x  min  n, M 
4

The mean and variance of a
hypergeometric random variable X having
pmf h(x;n,M,N) is
M
EX   n
N
 N n M
V X  
n
 N 1  N
 M
1  
N

5
The ratio M/N is the proportion of S’s in the
population.
 If we replace M/N by p, we see that the mean of the
hypergeometric mean is the same as for the binomial.
 The hypergeometric variance is multiplied by the
factor (N-n)/(N-1) (finite population correction factor).


The correction factor is less than 1
(the
hypergeometric has a smaller variance than the
binomial), and is close to one when n is small relative
to N.
6

Five individuals from an animal population of 25 are
caught, tagged, and released. After they have mixed
with the general population, a sample of 10 of these
animals is selected.

Let X be the number of tagged animals in this sample
of size 10.

Compute P( X  2) and P( X  2) , the expected
number of tagged animals, and the variance of the
number of tagged animals.
7
 5  20 
 2  8 



P  X  2  P  X  x  
 .385
 25 
10 
 
P  X  2    x0 h  x;10,5,25  .057  .257  .385  .699
2
E  X   nM / N  (10)(5) / (25)  2
 N n M
V X  
n
 N 1  N
 M
1 
N

 15
 5  20 
  (10)     1
 24
 25  25 
8

If the population size N is actually unknown,
it makes sense to equate the observed
sample proportion of tagged animals x/n with
the population proportion M/N. The estimate
of N if x=2 would then be
M  n (5)(10)
ˆ
N

 25
x
2
9

This random variable and distribution are based on an
experiment that satisfies the conditions for a binomial
random variable and one additional condition:

The experiment continues until a total of r successes have
been observed, where r is a positive integer.

The random variable of interest is X=the number of
failures that precede the r-th success.

The distribution is denoted by
nb  x; r , p  in the text.
10

The pmf of a negative binomial rx X with
parameters r and p
 x  r  1 r
x
nb  x; r , p   
p 1  p 

 r 1 
11
r 1  p 
EX  
p
r 1  p 
V X 
2
p
12

Suppose that p=P(female birth)=.5. A couple wishes
to have exactly two female children in their family.
They will have children until this condition is fulfilled.
What is the probability that the family has x male
children?
 What is the probability that the family has four
children?
 How many male children would you expect this family
to have? How many children would you expect this
family to have?

13



P(x male children)=  x  1.5
P(four children)=P(X=2)= (3).54  .1875
The expected number of male children before
the second female is
 2 x 
2 .5 
EX  
2
.5

The expected number of children is then four.
14

When r=1, X is the number of failures before
the first success.

The random variable is called geometric in this
case (though Y = X + 1 is also called
geometric).
P( X  x)  1  p  p
x
x  0,1,2
15