Lecture 8.2

Chapter 8: Inference for Means
8.2 Comparing Two Proportions
Overview
Confidence intervals and tests designed to
compare two population proportions are based
on the difference in the sample proportions
D  pˆ1  pˆ 2
where pˆ i  X i ni , i  1 or 2 , and
X i is the number of successes in the sample ni .
When both sample sizes are sufficiently large,
the sampling distribution of the difference D is
approximately normal. Inference procedures for
comparing proportions are z procedures based
on the normal approximation and on
standardizing the difference D. The first step is
to obtain the mean and standard deviation of D.
By the addition rule for means, the mean of D is
the difference of the means:
 D   pˆ1   pˆ 2  p1  p2
That is, the difference D  pˆ1  pˆ 2 between the
sample proportions is an unbiased estimator of
the population difference p1  p2 . Similarly, the
addition rule for variances tells us that the
variance of D is the sum of the variances:
 D2   P2ˆ1   P2ˆ2

p1 (1  p1 ) p2 (1  p2 )

n1
n2
When n1 and n2 are large, D is approximately
normal with mean  D  p1  p2 and standard
deviation
D 
p1 (1  p1 ) p2 (1  p2 )

n1
n2
Significance tests
Significance tests for the equality of the two
proportions, H 0 : p1  p2 , use a different
squared error for the difference in the same
proportions which is based on a pooled estimate
of the common (Under H 0 ) value of p1 and p2 ,
pˆ 
X  X2
# of successes in both samples
 1
# of observatio ns in both samples n1  n2
Significance test for Comparing Two
Proportions
To test the hypothesis H 0 : p1  p2
compute the z-statistic
pˆ1  pˆ 2
z
SED p
where the pooled standard error is
SED p 
1 1
pˆ (1  pˆ )  
 n1 n2 
and where
X1  X 2
pˆ 
n1  n2
In terms of a standard normal random variable
Z, the P-value for a test of H 0 against
H a : p1  p2 is P( Z  z )
H a : p1  p2 is P( Z  z )
H a : p1  p2 is 2P( Z | z | )
Example. Are men and women college students
equally likely to be frequent binge drinkers? We
examine the survey data to answer the question.
Here is the data summary:
Population
1(men)
2(women)
Total
n
7180
9916
17096
X
1630
1684
3314
pˆ  X n
0.227
0.17
0.194
The sample proportions are certainly quite
different, but we will perform a significance test
to see if the difference is large enough to lead us
to believe that the population proportions are not
equal. Formally, we test the hypotheses
H 0 : p1  p2
H a : p1  p2
The pooled estimate of the common value of p
is
1630  1684 3314
pˆ 

 0.194
7180  9916 17096
The test statistic is calculated as follows:
SED p
1 
 1
 (0.194 )(0.806 )


 7180 9916 
 0.006126
pˆ 1  pˆ 2 0.227  0.170
z

 9.34
SED p
0.006126
The P-value is 2 P ( Z  9.34) .The largest value
of z in Table A is 3.49, so from this table we can
conclude P  2  0.0002  0.0004
Since P-value is less than   0.05 , we do reject
H 0 : p  0.5 at the level   0.05 .
We report: among college students in the study,
22.7% of the men and 17% of the women were
frequent binge drinkers; the difference is
statistically significant (z=9.34, P<0.0004).