CIs for Proportions

Proportions
How do “polls” work and
what do they tell you?
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
1
Objectives
 Create confidence
intervals for estimating
a true population proportion.
 Learn how to use a CI for the
“difference of two proportions”
to test for independence of two
categorical variables.
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
2
Statistical Inference for Proportions
Population
X = binary
variable.
p = proportion in
the population
having the
trait.
Sample
^
p
^ = proportion in
p
sample having trait.
n = sample size.
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
3
Binomial Distribution involved
“counts.”
X = a count of the number of
successes in “n” trials.
Now change the “count” to
“proportion” of successes.
p = the proportion of successes.
X
= n
= “batting average”
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
4
For the population of all possible
sample proportions:
 the mean is
mp
=
p
 the standard deviation is
sp
=
p(1- p)
n
 and the distribution is
approximately Normal.
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
5
^
Sampling Distribution of p
^
p ~ N
[
m ^p = p , s ^p =
p (1 – p)
n
]
if np > 5 and n(1–p) > 5;
this a refinement of the n  30 rule.
The Central Limit Theorem applies
^ is a sample average
because p
of n Bernoulli values!
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
6
^
Margin of Error in using p to
estimate p at (1–)100% confidence:
m.o.e. = Z 
2
^ (1 – p)
^
p
n
( if np > 5 and n(1–p) > 5 )
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
7
(1–)100% Confidence Interval for p:
^ + Z
p
–
2
^ (1 – p)
^
p
n
if np > 5 and n(1–p) > 5.
 Department of ISM, University of Alabama, 1995-2003
m.o.e.
M35 C.I. for proportions
8
Estimation of Parameters
A (1-)100% confidence interval estimate of a parameter is
point estimate  m.o.e.
Population
Parameter
Point Estimator
Mean, m
if s is known:
x
Mean, m
if s is unknown:
x
Proportion, p:
Margin of Error
at (1-)100% confidence
m.o.e.  Z  s
2
^
pp  X / n,
m.o.e.  t( 
n
s

, n 1)
2
n
m.o.e.  Z  pˆ (1 pˆ ) n
2
Example 2:

The governor will spend more on
promotion of a new program he
wants passed, if fewer than 50% of
registered voters support it.

In telephone survey of 200 randomly
selected registered voters, 82 say
they support the proposed program.

Construct a 95% confidence interval
for the true proportion of ALL voters
who support the proposed program.
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
10
Example 2.
^
p = sample proportion = 82 / 200 = .41
95% confidence interval for p:
^
p +
Z

–
2
^
^
p (1 – p)
n
1.96
.41 +
–
.41 +
– 0.068 =
.41(.59)
200
( .342, .478 )
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
11
Example 2.
What can be concluded
from this telephone survey?
 The value of concern is 50%. Why?
 The CI is .342 to .478.
 .50 is NOT in this CI; therefore,
 .50 is not a plausible value.
 Less than 50% of the registered
voters support the proposed
program; therefore,
spend more on promotion.
 Department of ISM, University of Alabama, 1995-2003
M35 C.I. for proportions
12
Example 3.
Election night; Birmingham;
two candidates for mayor.
Random exit poll results:
Sue Ellen: 462 votes of 900.
^ = .5133
p
Can we declare Sue Ellen the winner
at the .05 level of significance?
Hypothesized value is p = .50; no favorite.
m.o.e. = 1.96 
.5133 * .4867
900
= .03266
Example 3.
Construct 95% CI:
^
p
±
m.o.e.
.51333 ± .03266
The 95% CI is .48067 to .54599.
Statement in L.O.P:
“I am 95% confident that the true
proportion of votes cast for Sue Ellen
in the Birmingham mayoral election
falls between .4807 and .5460.
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 14
Example 3.
Decision:
Does the “hypothesized value”
Yes!
fall in the CI?
Therefore, .50 may be a plausible value;
so the election is “too close to call”
at the .05 level of significance..
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 15
Example 4.
In a survey about banking services,
responses were categorized by
age and “opinion of services.”
Of the 104 respondents that were 30 years
or less, 93 stated that the services were
“excellent or good.”
Of the 46 that were over 30, 36 stated that
the services were “excellent or good.”
Is there a dependence between
age and “opinion of services”?
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 16
Example 4.
Age
Service
Excellent Acceptable
Total
or Good or Poor
30 or less
93
11
104
Over 30
36
10
46
129
21
150
Total
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 17
Conditional probabilities:

p1 = P( “Excel or Good” | 30 or less)
93
= .894
=
104

p2 = P( “Excel or Good” | over 30)
36
= .783
=
46
Are these conditional probabilities
“far enough apart” to call the true
population proportions different?
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 18
Estimation of Parameters
A (1-)100% confidence interval estimate of a parameter is
point estimate  m.o.e.
Population
Parameter
Point Estimator
Mean, m
if s is known:
Mean, m
if s is unknown:
m.o.e. = Zα
x
m.o.e. = t( α
x
^
pp  X / n,
Proportion, p:
Margin of Error
at (1-)100% confidence
m.o.e. = Zα
x1  x2
m.o.e. = Z α
Diff. of two
proportions, p1 - p2 :
pˆ1  pˆ 2
m.o.e. = Zα
Mean from a
regression
when X = x*:
, n-1)
2
2
s
n
n
ˆ ˆ n
p(1-p)
s12 s22
+
n1 n2
pˆ1 (1-pˆ1 ) pˆ2 (1-pˆ2 )
+
2
n1
n2
m.o.e. = t( α , n-2) s
2
Equ.2
b
where s 
yˆ  a  bx *
2
2
Diff. of two
means, m1 - m2 :
(for large sample sizes only)
Slope of regression
line, b :
σ
MSE
m.o.e. =t( α
2
, n-2)
1 (x * -x)2
s
+
n Equ.2
Example 4.
Margin of Error for p1- p2:
m.o.e.= Z/2




p1 (1 - p1)
p2 (1 - p2)
+
n1
n2
For 95% confidence:
(.894)(.106) (.783)(.217)
m.o.e.= 1.96
+
104
46
= 1.96 (.06786) =
.1330
Example 4.
95% Confidence Interval for the
difference of two proportions:
p1- p2 + m.o.e.
(.894 - .783) + .1330
.111
+ .1330
( -.0220, + .2440 )
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 21
Example 4.
( -.0220, + .2440 )
Does “zero” fall inside
this confidence interval?
Yes!
Then “zero” is a plausible value
for the difference of the two
proportions.
Therefore, the evidence is not strong
enough to say a dependence exists.
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 22
Example 4.
Conclusion:
“Age” and “opinion of service”
may be independent,
at the 95% confidence level,
or
at the 5% level of significance.
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 23
Example 4.
The two SAMPLE proportions,
P( “Excel or Good” | 30 or less) = .894
P( “Excel or Good” | over 30) = .783
are “too close” together to
conclude that the corresponding
POPULATION proportions
are different.
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 24
Sample Size for Estimating m
Problem: What sample size is
needed to have a margin or error
less than E at (1–)100% confidence?
m.o.e. = z / 2
n>
s
n
z / 2 s
E
 Department of ISM, University of Alabama, 1995-2002
<E
2
M35- C.I. for proportions 25
What sample size is needed to
estimate the mean “actual mpg”
with an m.o.e. of 0.2 mpg with 90%
confidence for Honda Accords if
the pop. std. dev. is 0.88 mpg?
m.o.e. = Z 
l
s
2
0.2 = 1.645
l
n
0.88
n
1.6452 l 0.882
n=
=
52.39
2
0.2 M35- C.I. for proportions 26
 Department of ISM, University of Alabama, 1995-2002
What if s is unknown?

Use a conservative guess (high).

Use s from a pilot study.

Use a very rough guess of s;
H–L
such as s 
4
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 27
Sample Size for Estimating Proportions:
What sample size is needed to have a
margin or error for estimating p less
than “E” at (1–)100% confidence?
m.o.e. = E = Z  
2
^
^
p (1 – p)
n
2
n=
^
^
p
(1
–
p)
z/ 2 
 Department of ISM, University of Alabama, 1995-2002
2
E
M35- C.I. for proportions 28
^
But we don’t know p before we
take the sample!




Use a conservative guess
(one that results in a larger n.)
p = .5 is the most conservative.
Values close to .5 are more conservative
than those near 0 or 1.
If you know that the true p should be
between .20 and .30, then use .30.
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 29
Example 5: What is the smallest sample
size necessary to estimate proportion of
defective parts to within .02 with 95%
confidence if p is known to not exceed 4%?
 Department of ISM, University of Alabama, 1995-2002
M35- C.I. for proportions 30