Independent Samples: Comparing Means

Independent Samples:
Comparing Means
Lecture 39
Section 11.4
Fri, Apr 1, 2005
Independent Samples




In a paired study, two observations are made on
each subject, producing one sample of bivariate
data.
Or we could think of it as two samples of
paired data.
Often these are “before” and “after”
observations.
By comparing the “before” mean to the “after”
mean, we can determine whether the intervening
treatment had an effect.
Independent Samples




On the other hand, with independent samples,
there is no logical way to “pair” the data.
One sample might be from a population of
males and the other from a population of
females.
Or one might be the treatment group and the
other the control group.
The samples could be of different sizes.
Independent Samples




We wish to compare population means 1 and
2.
We do so by comparing sample meansx1
andx2.
More specifically, we will usex1 –x2 as an
estimator of 1 – 2.
If we want to know whether 1 = 2, we test to
see whether 1 – 2 = 0 by computingx1 –x2.
The Distributions ofx1 andx2



Let n1 and n2 be the sample sizes.
If the samples are large, thenx1 andx2 have
(approx.) normal distributions.
However, if either sample is small, then we will
need an additional assumption.

The populations are normal.
Further Assumption




We will also assume that the two populations
have the same standard deviation.
Call it .
If this assumption is not supported by the
evidence, then it should not be made.
This assumption is often not warranted, but
without it, the formulas become much more
complicated. See p. 658.
The t Distribution


Let s1 and s2 be the sample standard deviations.
Whenever we use s1 and s2 instead of , then we
will have to use the t distribution instead of the
standard normal distribution, unless the sample
sizes are large.
The Distribution ofx1 –x2


Suppose thatx1 andx2 have normal
distributions with means 1 and 2 and standard
deviations 1/n1 and 2/n2 (according to the
Central Limit Theorem, p. 500).
Thenx1 –x2 is a normal random variable with
the following properties:
The mean is 1 – 2.
 The standard deviation is (12 /n1 + 22 /n2 ).

The Distribution ofx1 –x2

If we assume that 1 = 2, then the
standard deviation may be simplified to
2
2
1 1



n1 n2
n1 n2

That is,

1 1 

x1  x2 is N  1   2 , 
 .
n1 n2 

The Distribution ofx1 –x2

 

x1 is N 1 ,

n1 

0
1
The Distribution ofx1 –x2

 
x2 is N   2 ,

n2 

0
2
1
The Distribution ofx1 –x2

1 1 

x1  x2 is N  1   2 , 
 
n1 n2 

0
1 – 2
2
1
The Distribution ofx1 –x2

If


1
1
x1  x2 is N  1   2 , 
 .
n1 n2 

then it follows that

x1  x2   1   2 
Z
1 1


n1 n2
Estimating 



Individually, s1 and s2 estimate .
However, we can get a better estimate than
either one if we “pool” them together.
The pooled estimate is
n1  1s1  n2  1s2
2
sp 
n1  n2  2
2
.
x1 –x2 and the t Distribution



If we use sp instead of , and the sample sizes
are small, then we should use t instead of Z.
The number of degrees of freedom is
df = df1 + df2 = n1 + n2 – 2.
That is
x  x   1   2 
t (n  n  2)  1 2
1
2
sp
1 1

n1 n2
Hypothesis Testing


See Example 11.4, p. 647 – Comparing Two
Headache Treatments.
State the hypothesis.
H0: 1 = 2
 H1: 1 > 2


State the level of significance.

 = 0.05.
The t Statistic, a.k.a. Our Second
Bad Formula

Compute the value of the test statistic.

The test statistic is
x1  x2
t
1 1
sp

n1 n2
with df = n1 + n2 – 2.
Computations
9 s1  9 s2
sp 
 5.052.
18
22.6  19.4
t
 1.416.
1 1
5.052

10 10
2
2
Hypothesis Testing

Calculate the p-value.
The number of degrees of freedom is
df = df1 + df2 = 18.
 p-value = P(t > 1.416)
= tcdf(1.416, E99, 18)
= 0.0869.

Hypothesis Testing

State the conclusion.

Since p-value > , we conclude that,
At the 5% level of significance, the data do not
support the claim that Treatment 1 is more effective
than Treatment 2.
Confidence Intervals



Confidence intervals for 1 – 2 use the same
theory.
The point estimate isx1 –x2.
The standard deviation ofx1 –x2 is
approximately sp.
Confidence Intervals

The confidence interval is
1 1
x1  x2  z     
 n1 n2 
( known, large samples)
or
or
x1  x2  z  s p
1 1
  
 n1 n2 
x1  x2  t  s p
1 1

n1 n2
( unknown, large samples)
( known, normal pops.,
small samples)
Confidence Intervals

The choice depends on
Whether  is known.
 Whether the populations are normal.
 Whether the sample sizes are large.

Example





Find a 95% confidence interval for 1 – 2 in
Example 11.4.
x1 –x2 = 3.2.
sp = 5.052.
Use t = 2.101.
The confidence interval is
3.2  (2.101)(2.259) = 3.2  4.75.