Confidence Interval for a Single Proportion

6.5 One- and Two-Sample Comparisons of Proportions (Only Confidence Intervals)
Confidence Interval for a Single Proportion
Example 17: Nickel-Cadmium Cells
n  235 cells (independent)
x  9 had shorts
pˆ  sample proportion
pˆ 
9
 0.038
235
or 3.8% had shorts
How do we find a confidence interval for p, the population proportion of shorts?
How variable/precise is p̂ as an estimator of p?
 The observed proportion can be looked at as an average of a bunch of 1's and 0's
Count   X i
X i  1 if i th cell has short ,
X i  0 if shortpˆ 
Count  X i

X
n
n
  E  X i    x * P  x  0* P  0   1* P 1  p
9
 0.038
235
pˆ 
E  pˆ     p
pˆ is an unbiased estimator of p. On average gives the right value (unbiased).
 2  Var  X i   E  x       x    * P  x    0    P  0   1    P 1
2
  0  p  1  p   1  p  p  p(1  p)
2
SE pˆ 
2
n
2

p 1  p 
n
Or
X ~ Binomial  n, p 
E  X   np
X
E  pˆ   E 
n
Var  X   np 1  p 
np
 1
p
  E X  
n
 n
X
Var  pˆ   Var 
n
np 1  p  p 1  p 
 1

Var
X





2
n2
n
 n
2
An approximate confidence interval for p  pˆ  z
pˆ 1  pˆ 
n
As long as n is big enough and the distribution of 1’s and 0’s is not too skewed/asymmetric, the
Central Limit Theorem tells us p̂  normal.
One can do computations (via computer) for the exact binomial distributions of X, but the
normal approximation is still very useful.
One rule for when you can use the normal approximation is
Successes  5
np  5
and
Failures  5
n 1  p   5
Nickel-Cadmium cells: 95% Confidence Interval
n  235 cells (independent)
x  9 had shorts
pˆ  sample proportion
pˆ 
9
 0.038
235
0.038  1.96
0.038 1  0.038 
235
0.038  1.96  0.0125
0.038  0.024
0.014 to 0.062
Example 18: n1  n2  100 pellets
Interest in fraction conforming to specs. For small and large shot sizes
Note: If conforming is defined as some variable such as strength say
 Y  10  Conform
 Y ~ normal ,
 We would be best off using Y and SY and the normal table to estimate % conforming.
Normal Reliability Estimates of p
Problem 4 in Section 5.3 gives lifetimes of n=23 bearings. We are interested in approximating
the fraction of bearing that last more than 50x106 revolutions.
 Using the observed fraction ≥50, the binomial distribution approximate confidence
interval is
pˆ 
18
 0.783
23
SE pˆ 
pˆ 1  pˆ   0.086
PLow  0.696  1.96*0.086  0.614
PLow  0.696  1.96*0.086  0.951
 Using p̂  fraction is less efficient use of the data; we are ignoring some of the data’s
information, the actual lifetimes.
The data for log(lifetimes) are fairly normal.
2.50
2.00
Normal Quantiles
1.50
1.00
0.50
0.00
-0.50 1.0
1.2
1.4
1.6
1.8
-1.00
-1.50
-2.00
-2.50
Log Life
2.0
2.2
2.4
Let X = Log(lifetime)
X  1.802 S  0.232
2.000  1.802 

P  Lifetime  100   P  X  log 100    P  Z 

0.232


 P  Z  0.853  0.803
Let Z 
x X
 0.853
S
Z low  Z 
PLow
1
n*Z2
1.96
23*0.8532
 0.853 
1
 0.373
n 1
23  1
23
1
n*Z2
1.96
23*0.8532
 0.853 
1
 1.333
n 1
23  1
23
n
 P  Z  Z low   P  Z  0.373  0.645
Z high  Z 
PHigh
Z 0.025
Z 0.025
n
 P  Z  Z low   P  Z  1.333  0.909
 The 95% approximate normal distribution confidence interval for the reliability, fraction
less than or equal to x, is 0.645 to 0.909.
o The width of this interval is 0.909 – 0.645 = 0.263
o The width of the binomial interval is 0.951 – 0.614 = 0.337
o The normal distribution based interval
 Is more precise
 But has a more stringent assumption of normality
Confidence Interval for Two Proportions
For comparing 2 independent proportions
Var  pˆ1  pˆ 2   Var  pˆ1   Var  pˆ 2  
p1 1  p1 
n1

Confidence Interval for p1 – p2  pˆ1  pˆ 2  z
p2 1  p2 
n2
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 

n1
n2
As long as number of successes & failures  5 separately for both groups.
Using the formulas from Section 6.5:
Example 18: 2 shot sizes
n1  100
pˆ1  0.38
n2  100
pˆ 2  0.29
90% confidence Interval
0.38  0.29  1.645
0.38  0.62 
100

0.29  0.71
100
 0.019 to 0.199
Even a difference of 9% (38% vs. 29%) with n1 = n2 = 100 is not particularly impressive. Using
the more informative quantitative information of y1 , s1 , y2 , s2 could do better.
Required Sample Size n
To be 95% certain that p̂ is within ± of the true population proportion p
 1.96* SE pˆ  1.96*
p 1  p 
n


The “worst case scenario” with largest SE is when p = 0.5
 More generally, the worst case is for p the closest potential p to 0.5
o For example suppose we figure p is somewhere between 0.7 and 0.9.
o The worst case situation with largest SE is p=0.7.
p
1  p 
WC
WC

 Solve 1.96* SE pˆ  1.96*
n
o PWC = worst case p = potential p closet to 0.5
 Suppose we want to be 95% certain that p̂ is within ± of the true population
proportion p and 0.3 ≤ p ≤ 0.8.
0.5 1  0.5 
 0.05
n
1.962 *0.5 1  0.5 
n
 384.2
0.052
Closest interger n with n  384.2  n  384
1.96*
 If 0.85 ≤ p ≤ 1.0 then
0.85 1  0.85 
 0.05
n
1.962 *0.85 1  0.85 
n
 195.9
0.052
Closest interger n with n  195.9  n  196
1.96*
 It’s easier to estimate proportions closer to 0 or 1.