252meanx1

252meanx1 10/6/05 (Open this document in 'Outline' view!) Re-edited to replace  with D .
D. COMPARISON OF TWO SAMPLES
Examples for comparison of means.
H0 :  1   2
H1 :  1   2
or more generally
H 0 : D  D0
H 1 : D  D0 .
where D   1   2 and d  x1  x 2 .
General formulas: Degrees of freedom DF  n1  n 2  2
a. Confidence Interval: D  d  t  2 s d .
b. Test Ratio: t 
d  D0
.
sd
c. Critical Value: d cv  D0  t  2 s d .
The difference between the cases comes down to the choice of t and the formula for s d . Let us
now consider the first four cases.
1. Two Means, Two Independent Samples, Large Samples.
If the total number of degrees of freedom is large (or the two samples come from normally distributed
populations with known variances  12 and  22 ), then replace t with z and use s d 
s12 s 22
.

n1 n 2
First Example: We wish to test the earnings of retail clerks in New York x1  and Philadelphia x 2  for
Equality.
H 0 : 1   2
H 0 : 1   2  0
H0 : D  0
or
or
  .05
H 1 : 1   2
H 1 : 1   2  0
H1 : D  0
Data:
x1  300
x 2  330
s12  400
s 22  360
n1  169
n 2  144
Since DF  n1  n2  2  169  144  2  311 are well over 100, we are justified
in using a large sample method. d  x1  x 2  300  330  30
sd 
s12 s 22
400 360



 2.3669  2.5000  4.8669  2.2061
n1 n 2
169 144
Solutions: Use z  z.025  1.960 in place of t .
2
a. Confidence Interval: D  d  t  2 s d  -30  1.960 2.2061   30  4.32 . Make a
diagram showing a Normal curve centered at -30 and a Confidence interval bounded by 34.32 and -25.68. Since D0  0 is not between them, reject H 0 .
b. Test Ratio: t 
d  D0
- 30 - 0

 13 .59 . Make a diagram showing a Normal curve
sd
2.2061
centered at zero and an 'accept' region bounded by z.025  1.960 and z .025  1.960 .
Since -13.59 is not between them, reject H 0 . Since this is actually a value of z , a p-


value would be easy. pval  2P d  30  2Pz  13.59  .5  .5  0.
c. Critical Value: d cv  D0  t  2 s d  0  1.960 2.2061   4.32 . Make a diagram
showing a Normal curve centered at zero and an 'accept' region bounded by -4.32 and
4.32. Since d   30 is not between them, reject H 0 .
Second Example: (Whitmore, Netter, Wasserman) We wish to learn if battery type B
x 2  has a longer service life (in months) than Battery type A x1  . Note that this
statement becomes an alternate hypothesis because it does not contain an equality.
H 0 : 1   2
H 0 : 1   2  0
H0 : D  0
or
or
  .05
H 1 : 1   2
H 1 : 1   2  0
H1 : D  0
Data:
x1  18 .4
x 2  21 .3
s1  3.3
s 2  4 .2
n1  121
n 2  36
Since DF  n1  n 2  2  121  36  2  155 are well over 100, we are justified
in using a large sample method. d  x1  x 2  18.4  21.3  2.9
sd 
s12 s 22


n1 n 2
3.32  4.22
121
36
 0.09  0.49  0.58  0.7616
Solutions: Use z  z.05  1.645 in place of t . This is a 1-sided test.
a. Confidence Interval: Given H 1 : D  0 , use D  d  t s d  -2.9  1.645 0.7616 
 2.9  1.253  1.647 . Make a diagram showing a Normal curve centered at -2.9 and a
confidence interval, D  1.647 , represented by shading the area below -1.647. the null
hypothesis H 0 : D  0 is represented shading the area above zero. Since D0  0 is not in
the confidence interval, reject H 0 .
b. Test Ratio: t 
d  D0
- 2.9 - 0

 3.808 . Make a diagram showing a Normal curve
sd
0.7616
centered at zero and an 'accept' region above z.05  1.645 . Since -3.808 is below 1,645, reject H 0 .
c. Critical Value: Given H 1 : D  0 , we want a critical value below zero. Use
d cv  D0  t s d  0 - 1.645 0.7616   1.253 . Make a diagram showing a Normal curve
centered at zero and a reject' region below -1.253. Since d   2.9 is below -1.253,
reject H 0 .
2. Two Means, Two Independent Samples, Populations Normally
Distributed, Population Variances Assumed Equal.
n  1s12  n2  1s 22 .
1 
  1

 , where s p2  1
t  t n1  n2 2 and s d  s p2  
n1  n 2  2
 n1 n 2 
Example: (Whitmore, Netter, Wasserman) Each of two groups of ten men are assigned a razor blade and
asked how many shaves they got from a package. We wish to find out if there is a significant difference in
the durability of the two blades. Type A will be x1 . type B will be x 2 .
H 0 : 1   2
H 1 : 1   2
Data:
x1  46 .9
or
H 0 : 1   2  0
H 1 : 1   2  0
or
H0 : D  0
H1 : D  0
  .01
x 2  75 .6
s1  14 .0
s 2  21 .0 Since DF  n1  n2  2  10  10  2  18 are well below 100, we need a small
n1  10
n 2  10
sample method. Because of the similarity of the two samples we assume that  12   22 .
d  x1  x2  46.9  75.6  28.7 . Because of our assumption about variances, we use a pooled variance
2
2
2
2
1
  1
 2 n1  1s1  n 2  1s 2 10  114 .0  10  121 .0
sp 

 318 .50 s d  s p2  
n1  n 2  2
10  10  2
 n1 n 2



1
1
 318 .50     318 .50 0.2  63 .7  7.9812 .
10
10


Solutions: Use t
n1  n2  2 

18
 t .005
 2.878 .
2
a. Confidence Interval: D  d  t  2 s d  -28.7  2.878 7.9812   28 .7  22 .96 . Make a
diagram showing a Normal curve centered at -28.7 and a Confidence interval bounded by
-51.66 and -5.74. Since D0  0 is not between them, reject H 0 .
d  D0
- 28.7 - 0

 3.596 . Make a diagram showing a Normal
sd
7.9812
curve centered at zero and an 'accept' region bounded by  t 18  2.878 and
b. Test Ratio: t 
.005
18
t .005
 2.878 . Since -3.596 is not between them, reject H 0 . If we want a p-value, we


18
need pval  2P d  28.70  2Pt  3.596. Since 3.596 lies between t .005
 2.878
18
 3.611 , we double the p-value to .002  pval  .01 .
and t .001
c. Critical Value: d cv  D0  t  2 s d  0  2.878 7.9812   22 .96 . Make a diagram
showing a Normal curve centered at zero and an 'accept' region bounded by -22.96 and
22.96. Since d   28.7 is not between them, reject H 0 .
(3. Two Means, Two independent Samples, Populations Normally
Distributed, Population Variances not Assumed Equal.
This time the degrees of freedom for t must be calculated by the Satterthwaite approximation. The




2
  s2 s2 

  1  2 

  n1 n 2 

formula is df  
 , but the formula for the standard deviation is the same as in method 1,
2
2
  s2 
 s 22  
1
  
 
 n2  
  n1 
 



n2 1 
 n1  1
sd 
s12 s 22
.

n1 n 2
Example: We wish to use a 2-sided 95% confidence interval to test for a significant difference between the
time it takes an employee to type a page on word processor A x1  and word processor B x 2  . 16 pages
are typed on each processor.
H 0 : 1   2
H 0 : 1   2  0
H0 : D  0
or
or
  .05
H 1 : 1   2
H 1 : 1   2  0
H1 : D  0
Data:
x1  8.20
x 2  7.10
s 22  4.20 Since DF  n1  n 2  2  16  16  2  30 are well below 100, we would be on
s12  4.10
n1  16
n 2  16
very, very shaky ground if we use a large sample method. Even with a small sample method we probably
need an assumption of Normality. If we do not want to assume  12   22 , we need the Satterthwaite
method. d  x1  x2  8.20  7.10  1.10 . To find the standard error of d and the number of degrees of
freedom, do the following calculations:
s12 4.1
s 2 4.2
s2 s2

 0.25625 , 2 
 0.26250 , so 1  2  0.25625  0.26250  0.51875 ,
n1 16
n2 16
n1 n2
DF 
 s12 s22 
  
n

 1 n2 
2

0.51875 2
0.25625 2  0.26250 2

0.26910
 29 .9 .
0.00438  0.00459
I take the conservative
 s12 
 s22 
 
 
15
15
n 
 
 1    n2 
n1  1
n2  1
approach of rounding this down to 29 degrees of freedom. Notice how little difference there is between this
and DF  n1  n 2  2  16  16  2  30 . This is because of the near-equality of the sample variances.
sd 
2
2
s12 s22

 0.51875  0.720 . This is almost the same result as we would have gotten if we had
n1 n2
assumed that  12   22 , again because of the near-equality of the sample variances.
Solutions: Use t 29  2.045 for a 2-sided confidence interval. D  d  t s  1.10  2.045 0.720 
.025

2
d
 1.10  1.47 . Make a diagram showing a Normal curve centered at 1.10 and a Confidence interval
bounded by -0.37 and 2.57. Since D0  0 is between them, do not reject H 0 .
A more complete version of this problem appears in Problem D3. Look
here for a computer example of this method.
at document 252meanx3
4. Two Means, Paired Samples (If samples are small, populations should
be normally distributed).
1
If n is the number of pairs of data, then t  t n1 and s d 
n
d 2  n d
n 1
2
. In this case
d1  x11  x 21 , d 2  x 21  x 22 , etc.
Example: We have been told that income in a region has risen by $6.00/week over the last year. We
interviewed 100 families last year and found an average weekly income x1  of $200. We reinterview the
same families and find out their present incomes so that we can compute how much they have risen. We
find that the new average income x 2  is $204. From the data we compute a standard deviation of the
income change of $6.00. We wish to test to see if the $6 rise is believable.
H 0 :  2  1  6
H 0 :  1   2  6
H 0 : D  6
or
or
  .05
H 1 :  2  1  6
H 1 :  1   2  6
H 1 : D  6
Data:
x1  200 , x 2  204 , s d  6 , n=100. Though we may have 200 pieces of data, we have 100 pairs, and
the actual numbers we use are the 100 differences in income. DF  n  1  100  1  99 and we ought to use
s
6
 0.6
t. d  x1  x 2  200  204  4 . s d  d 
n
100
Solutions: Use t n1  t 99  1.984 .

2
.025
a. Confidence Interval: D  d  t  2 s d  -4  1.984 0.6  4  1.19 . Make a diagram
showing a Normal curve centered at -4 and a Confidence interval bounded by -5.19 and 2.81. Since D0  6 is not between them, reject H 0 .
d  D0
- 4 - - 6 

 3.333 . Make a diagram showing a Normal curve
sd
0.6
centered at zero and an 'accept' region bounded by  t 99  1.984 and t 99  1.984 .
b. Test Ratio: t 
.025
.025
Since 3.333 is not between them, reject H 0 . If we want a p-value, we need


99
 3.175 , we double the
pval  2P d  4  2Pt  3.333. Since 3.333 lies above t .001
implied p-value to pval  .002 .
c. Critical Value: d cv  D0  t  2 s d  -6  1.984 0.61   6  1.19 . Make a diagram
showing a Normal curve centered at -6 and an 'accept' region bounded by -7.12 and 4.81. Since d   4 is not between them, reject H 0 .
Note: We might have been better off in this problem defining D as  2  1 . Then our
hypotheses would read H 0 : D  6 and H 1 : D  6 . We would say
d  x2  x1  204  200  4 and our critical value, for example, would be
d cv  6  1.9840.61  6  1.19. The conclusion would not change.
© 2002 Roger Even Bove