Ch. 7: t-test for 2 independent sample means

Chapter 7:
The t Test for Two
Independent Sample Means
To conduct a t test for two independent
sample means, we need to know what
the sampling distribution of the difference
between two means looks like.
• We assume that it is a normal distribution.
• It will be centered on zero if the null
hypothesis is that the two population means
are the same (i.e., H0: µ1 = µ2).
• The SD of this distribution is called the
standard error of the difference.
• For large samples, a z test can be used:

X
z
Chapter 7
1

 X 2  1  2 
X
1X 2
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
1
Estimating the Standard Error of
the Difference When the Sample
Sizes Are Not Large
– Pooled-variances estimate: The two
sample variances can be pooled
together to form a single estimate of the
population variance called s2pooled or s2p
s 2pooled

N

1
 1s12   N 2  1s22
N1  N 2  2
– s2pooled can then be used to estimate the
standard error of the difference by
means of the following formula:
sX
1X 2
s 2pooled s 2pooled
 1
1 
2



 s pooled  
N1
N2
 N1 N 2 
– The use of s2p is based on the assumption that the two populations have the
same variance.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
2
The Pooled-Variances t Test
Formula
• The denominator of this formula is
the estimated standard error of the
difference (SED):

X
t

 X 2  1  2 
sX 1  X 2
1
• Inserting the formula for estimating
the SED from the pooled variances
yields the following formula:

X
t
1

 X 2  1  2 
s
2
pooled
1
1 



 n1 n2 
• Note that the use of s2pooled is based on the
assumption that the two populations have
the same variance, and that df= N1 + N2 – 2.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
3
Alternative Versions of the
Pooled-Variances t Test
Formula
• Here, the formula for s2pooled is included
in the t formula and it is assumed that
the null hypothesis is µ1 - µ2 = 0:
t
X
1
 X2

n1  1s12  n2  1s22  1
1
 n  n 
2 
 1
n1  n2  2
where the critical t is based on df = N1 + N2 – 2
• The formula for equal sample sizes
reduces to:
X1  X 2
t
 s12  s22 


 n 


where the critical t is based on df = 2n – 2.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
4
Try this example….
A researcher is studying the effect of drinking
Red Bull on the number of errors a participant makes in a motor skills test. Test the
null hypothesis that the population mean for
Red Bull drinkers is the same as for placebo
drinkers.
Mean
s2
n
t
t
Red Bull Placebo
X1
X2
19
15
4
5
18
16
19  15
17  4  155  1

1 

16 

32
 18
4
4

 5.51
4.46875 .118 .726
t.05 (32) = 2.04 (approx.) < 5.51, so the null
hypothesis can be rejected.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
5
Limitations of Statistical
Conclusions
– Our significant Red Bull result could
be a Type I error. If this is a novel
result, replication is recommended.
– Statistical significance is no guarantee
that you are dealing with a large or
important difference.
– If you have not assigned participants
to the two conditions (e.g., you are
comparing habitual coffee drinkers
with those who do not like coffee), you
cannot make causal conclusions from
your significant results.
– More information can be obtained by
constructing a confidence interval for
the difference of the means of the two
populations.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
6
Confidence Intervals for the
Difference Between
Two Population Means
– The formula for the CI for the difference of
two population means is obtained by
solving the t test formula for µ1 – µ2.


1  2  X 1  X 2  tcrit s X
1X 2
– For a 95% CI, tcrit is the critical t for a .05,
two-tailed significance test. For the Red
Bull example, the 95% CI is:
1  2  19  15  2.04 (.726)  4  1.48
– Thus, the CI extends from a difference of
2.52 errors up to a difference of 5.48 errors.
– Because zero is not in the interval, we
know that the usual H0 can be rejected at
the .05 level, with a two-tailed test.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
7
Assumptions for the Two-Group
t Test and CI for the Difference
of Population Means
– The dependent variable has been measured on an interval or ratio scale (there are
nonparametric tests available if an ordinal scale
was used).
– Independent random sampling
• Ideal: Both groups should be random samples.
• Each individual selected for one sample should
be independent of all the individuals in the
other sample.
• However, typically, the two groups are formed
by random assignment from one sample of
convenience.
– Normal distributions
• The DV should follow a normal distribution in
both groups.
• However, the Central Limit Theorem implies
that, if the samples are not very small, the twogroup t test will still be valid.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
8
Assumptions (cont.)
– To use the pooled-variance t test, an
additional assumption must be
made: Homogeneity of Variance.
– However, this assumption is usually
ignored if:
• Both samples are quite large, or
• The two samples are the same size, or
• The sample variances are not very
different (e.g., one variance is no more
than twice as large as the other)
– If it is not reasonable to assume HOV,
consider performing the separatevariances t test:
X 1  X 2  1  2 
t
s12 s22

n1 n2
– The df for the critical t are best found by
statistical software.
Chapter 7
For Explaining Psychological
Statistics, 4th ed. by B. Cohen
9