Document

12. Discrete probability
distributions
The Practice of Statistics in the Life Sciences
Third Edition
© 2014 W.H. Freeman and Company
Objectives (PSLS Chapter 12)
Discrete probability distributions

The binomial setting and binomial distributions

Binomial probabilities

Binomial mean and standard deviation

The Normal approximation to binomial distributions

The Poisson distributions

Poisson probabilities
Binomial setting and distributions
Binomial distributions are models for some categorical variables,
typically representing the number of successes in a series of n
independent trials.
The observations must meet these requirements:
 the total number of observations n is fixed in advance
 each observation falls into just one of two categories: success and failure
 the outcomes of all n observations are statistically independent
 all n observations have the same probability p of “success”
Applications for binomial distributions
Binomial distributions describe the possible number of times that a
particular event will occur in a sequence of observations.

In a clinical trial, a patient’s condition may improve or not. The binomial
distribution describes the number of patients who improved (not how much
better they feel) among the study participants.

Is a child obese or not (based on their body mass index)? The binomial
distribution describes the number of obese children in a random sample of
school-age children.

In a quality control study, we assess the number of defective items in a lot
of goods, irrespective of the type of defect.
Binomial parameters
We express a binomial distribution for the count X of successes among
n observations as a function of the parameters n and p: B(n,p).

The parameter n is the total number of observations.

The parameter p is the probability of success on each observation.

The count of successes X can be any whole number between 0 and n.
The CDC estimates that a third of adult men are obese. In a random
sample of 10 adult men, each man is either obese or not.
The variable X is the number of obese men among those 10 men
sampled, our count of “successes.”
For each man, the probability of success, “obese,” is 1/3. The number X of
obese men among 10 men has the binomial distribution B(n = 10, p = 1/3).
Binomial probabilities
The number of ways of arranging k successes in a series of n
observations (with constant probability p of success) is the number of
possible combinations (unordered sequences).
This can be calculated with the binomial coefficient:
n!
 n  
 k  k!(n  k )!
Where k = 0, 1, 2, ..., or n
The binomial coefficient “n_choose_k” uses the factorial notation “!”.
The factorial n! for any strictly positive whole number n is:
n! = n × (n − 1) × (n − 2) × … × 3 × 2 × 1
The binomial coefficient counts the number of ways in which k
successes can be arranged among n observations.
The binomial probability P(X = k) is this count multiplied by the
probability of any specific arrangement of the k successes:
P( X  k )   n  p k (1  p) nk
k
X
0
0 n
nC0 p q =
1
1 n-1
nC1 p q
2
2 n-2
nC2 p q
The probability that a binomial random variable takes any
…
range of values is the sum of each probability for getting
k
exactly that many successes in n observations.
…
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)
P(X)
n
Total
qn
…
k n-k
nCx p q
…
n 0
nCn p q =
1
pn
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is estimated to be
about 8%. In a group of 25 Caucasian American males, what
is the probability that exactly five are color blind?

P(x = 5) = [n! / k!(n – k)!] pk(1 – p)n-k = (25! / 5!(20)!) 0.0850.925
= [21*22*23*24*24*25 / 1*2*3*4*5] 0.0850.9220
= 53,130 * 0.0000033 * 0.1887 = 0.03285

Use technology
Excel: P(x = 5) = BINOM.DIST(5, 25, 0.08, 0) = 0.03285
TI-83: P(x = 5) = binompdf(25, 0.08, 5) = 0.0328508263
CrunchIt!: P(x = 5) = 0.0329 (Binomial for n = 25, p = 0.08, X = 5)
The incidence of major depression in adults is about 10%. A random sample of
50 adults will be tested for depression. The variable X is the number of
individuals diagnosed with depression among all 50 and has the binomial
distribution Bin(n = 50, p = 0.1).
The probability that exactly 2 adults in the sample have depression is
A) 0.010
B) 0.020
C) 0.078
D) 0.100
E) 0.112
Binomial mean and variance
The center and spread of the binomial distribution for a count X are
defined by the mean m and standard deviation s:
m  np
s  np(1  p)
The incidence of major depression in adults is about 10%. A random sample of
50 adults will be tested for depression. The variable X is the number of
individuals diagnosed with depression among all 50 and has the binomial
distribution Bin(n = 50, p = 0.1). Thus,
m  np  50  0.1  5
s  np(1  p)  50  0.1 0.9  4.5  2.12
Effect of changing p when n is fixed
Binomial distributions are skewed
when p is close to 0 or close to 1
(especially if the sample is small).
P(X=x)
0.4
B(5,0.5)
0.3
0.2
0.1
0
0
1
2
3
4
5
3
4
5
3
4
5
X
0.4
B(5,0.1)
P(X=x)
P(X=x)
1
0.5
0
0
1
2
3
4
0.3
0.2
0.1
0
5
0
X
1
2
X
0.4
0.8
B(5,0.3)
0.3
P(X=x)
P(X=x)
B(5,0.7)
0.2
0.1
0.6
B(5,0.9)
0.4
0.2
0
0
0
1
2
X
3
4
5
0
1
2
X
Effect of changing n for a fixed value of p
0.5
0.3
B(5,0.15)
0.3
0.2
0.2
0.15
0.1
0.05
0.1
0
0
0
2
4
6
0
8 10 12 14 16 18 20
X
2
4
6
8 10 12 14 16 18 20
X
0.3
0.4
P(X=x)
0.2
0.1
B(20,0.15)
0.25
B(10,0.15)
0.3
P(X=x)
B(15,0.15)
0.25
P(X=x)
P(X=x)
0.4
0.2
0.15
0.1
0.05
0
0
2
4
6
8 10 12 14 16 18 20
X
0
0
2
4
6
8 10 12 14 16 18 20
X
Normal approximation to binomial
If n is large, and p is not too close to 0 or 1, the binomial distribution can
be approximated by a Normal distribution.



B m  np, s  np (1  p) ~ N m  np, s  np (1  p)

Practically, the Normal approximation can be used when both np ≥10
and n(1 − p) ≥10.
The approximation can be improved by using a continuity correction
to take into account the fact that the Normal distribution is continuous.
The incidence of major depression in adults is about 10%.
0.30
Count of adults diagnosed
with depression in a sample of
20 adults, Bin(n = 20, p = 0.1).
Binomial,
n=20,p=0.1
p=0.1
Binomial,
n=20,
0.25
Probability
0.20
No Normal approximation
0.15
0.10
0.05
0.00
0
1
2
3
4
5
6
Count of adults with depression
0.30
Binomial, n=100, p=0.1
Binomial,
n=100, p=0.1
0.25
Probability
0.20
Count of adults diagnosed with
depression in a sample of 100
adults, Bin(n = 100, p = 0.1).
0.15
Normal approximation OK
0.10
0.05
0.00
0
5
10
15
Count of adults with depression
20
7
8
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is about 8%.
We take a random sample of size 125 from this population.
What is the probability that 6 individuals or fewer in the sample are color blind?

Distribution of the count X: B (n = 125, p = 0.08)  np = 10
P(X ≤ 6) = BINOM.DIST(6, 125, .08, 1) = 0.1198 or about 12%

Normal approximation: N (np = 10, √np(1 − p) = 3.033)
P(X ≤ 6) = NORM.DIST(6, 10, 3.033, 1) = 0.0936 or about 9%
Or z = (x - µ)/σ = (6 − 10)/3.033 = -1.32  P(X ≤ 6) = 0.0934 from Table B
The Normal approximation is reasonable, though not perfect. Here p = 0.08 is
not close to 0.5, but np = 10 and n(1 − p) = 115. Using a continuity correction
greatly improves the approximation:

P(X ≤ 6.5) = NORM.DIST(6.5, 10, 3.033, 1) = 0.1243 or about 12%
Distributions for the color blindness example.
Binomial
Normal approx.
0.25
P(X=x)
0.2
n = 50
0.15
0.1
The larger the sample size the better
0.05
the Normal approximation fits the
0
0
1
2
3
4
5
6
7
8
9 10 11 12
binomial distribution.
Count of successes
Normal approx.
Binomial
0.14
0.05
0.12
0.1
0.04
n = 125
0.08
0.06
P(X=x)
P(X=x)
Binomial
0.04
0.02
Normal approx.
n = 1000
0.03
0.02
0.01
0
0
0
5
10
15
Count of successes
20
25
0
20
40
60
80
100
Count of successes
120
140
The Poisson distributions
A Poisson distribution describes the count X of occurrences of an
event in fixed, finite intervals of time or space when

occurrences are all independent,

and the probability of an occurrence is the same over all possible
intervals.
Think of the
Items
Containers
Poisson distribution

Radioactive decays

Second
as describing the

Weeds

Acre of farm land
number of items in

Fleas

Dog
containers.

Cardiovascular deaths

County / year
If we divide a natural lawn into
1 ft2 quadrants, we can count
how many dandelions are in
each quadrant.
Dandelions seeds are wind-spread. The probabilities of a quadrant containing
0,1,2,3… dandelions are given by a Poisson distribution:
(i) independence of dandelions: the presence of one dandelion in a
quadrant does not make the presence of another more or less likely.
(ii) homogeneity of quadrants: each quadrant is equally susceptible
to contain dandelions.
Poisson probabilities
If μ is the population mean number of occurrences for a specified
interval of time or space, then the Poisson probability distribution of
observing k occurrences (k = 0, 1, 2, …) at constant μ (> 0) is:
P( X  k )  e  m
mk
k!
The Poisson distribution has mean μ and standard deviation σ:
m
s m
Effect of changing μ:
0.35
Poisson, Mean=3.5
Poisson, Mean=1.5
0.30
0.30
0.25
0.25
Probability
Probability
0.35
0.20
0.15
0.20
0.15
0.10
0.10
0.05
0.05
0.00
0.00
0
5
10
15
20
25
0
5
10
25
Poisson, Mean=15
Poisson, Mean=7
0.30
0.30
0.25
Probability
0.25
Probability
20
0.35
0.35
0.20
0.15
0.20
0.15
0.10
0.10
0.05
0.05
0.00
15
X
X
0
5
10
15
X
20
25
0.00
0
5
10
15
X
The Poisson distribution is skewed when μ < 5.
20
25
The number of deer crossing a road at night during mating season in a
particular rural area can be modeled with a Poisson distribution. A local survey
conducted over 4 nights found a total of 20 deer crossings. Based on this
information, what is the probability that fewer than three deer would cross on a
given night during mating season in this area?
e m m k
P( X  k ) 
,x
x  0,1,2...for some m >0
k!
To compute this probability using the Poisson distribution, we need to know μ.
In this case μ = 20 / 4 = 5 deer crossings per night.
P( X < 3)  P( X  0)  P( X  1)  P( X  2)
1
2
(5)0
5 (5)
5 (5)
e
e
e
 e5 (1  5  12.5)
0!
1!
2!
5
 0.1247
Historical records over 20 years in a particular town indicate
an average of 4 severe rainstorms per year.
Modeling the occurrences of severe rainstorms with the
Poisson distribution, the probability that there would be
no severe rainstorm next year is
P(X = 0) = (4)0 e–4 / 0! = 0.018
Probability of 5 severe rainstorms next year
P(X = 5) = (4)5 e–4 / 5! = 0.156
Probability of 1 or more severe rainstorms next year
P(X > 1) = 1 – P(X = 0) = 1 – 0.018 = 0.982
Probability of more than 5 severe rainstorms next year
P(X > 5) = 1 – P(X ≤ 5) = 1 – 0.785 = 0.215
x
P(X=x)
P(X≤x)
0
1.832%
1.832%
1
7.326%
9.158%
2
14.653%
23.810%
3
19.537%
43.347%
4
19.537%
62.884%
5
15.629%
78.513%
6
10.420%
88.933%
7
5.954%
94.887%
8
2.977%
97.864%
9
1.323%
99.187%
10
0.529%
99.716%
11
0.192%
99.908%
12
0.064%
99.973%
13
0.020%
99.992%
14
0.006%
99.998%