24. One-way ANOVA: Comparing several means

24. One-way ANOVA:
Comparing several means
The Practice of Statistics in the Life Sciences
Third Edition
© 2014 W. H. Freeman and Company
Objectives (PSLS Chapter 24)
One-way ANOVA

Comparing several means

The ANOVA F-test

ANOVA assumptions

ANOVA calculations

Using Table F

Interpreting results from an ANOVA
Comparing several means
When comparing >2 populations, the question is not only whether each
population mean µi is different from the others, but also whether they
are significantly different when taken as a group.
If we split a class of 200 into 20 random samples of 10 students and found
the 20 mean heights, the 2 most extreme samples could very well be
“significantly different” --- but that would be a Type 1 error.
We should ask instead if all 20 samples might have come from a single
population or not.
Handling multiple comparisons statistically
The
first step in examining multiple populations statistically is to test
for an overall statistical significance as evidence of any difference
among the parameters we want to compare.
ANOVA F test

If that overall test showed statistical significance, then a detailed
follow-up analysis can examine all pair-wise parameter comparisons to
define which parameters differ from which and by how much.
 more complex methods (see Chapter 26)
Reminder: variance and standard deviation
Sample variance s2
1
2
s 
df
n
2
(
x

x
)
 i
1
df  n  1
Sample standard deviation s
1
s s 
df
2
n
2
(
x

x
)
 i
1
Are the observed differences in sample means likely to have
occurred by chance just because of the random sampling?
C4
If we have sampled several times from the
μ and standard deviation σ, then
mean 25.2
B
mean 27.8
B
the variance of the
mean 23.4
sample means
B
C4
≈
mean 25.2
mean 27.8
20
22
sameAB population
*
24
*
AB
26
Data
*
28
30
A
mean
A
mean
A
C4
mean
A
mean
A
mean
A
C4
mean
A
mean
A
mean
A
mean
with mean
32
B
B
the average sample
variance
mean 23.4
C4B
mean 25.2
B
mean 27.8
B
mean 23.4
B
20
22
24
B
26
Data
28
30
32
C4
If we have sampled from distinct populations
deviation σ but different
means
μ, then
mean 25.2
B
mean 27.8
the variance of the
mean 23.4
sample means
the same standard
B
C4
20
22
*
24
mean 25.2
B
mean 25.2
B
27.8
mean 27.8
the average mean
sample
variance
mean 23.4
B
B
B
*
26
B
Data
22
22
24
24
26
26
Data
Data
A
mean
A
C4
mean
*
28
28
28
A
mean
A
mean
A
A
mean
mean
A
mean
A
30
32
B
20
20
mean
C4
A
B
C4
>
B
A
with
A
30
30
32
32
mean
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
• Group 1: access to chow only.
• Group 2: chow plus access to cafeteria food restricted to 1 hour every day.
• Group 3: chow plus extended access to cafeteria food (all day long every day).
Cafeteria food: bacon, sausage, cheesecake, pound cake, frosting and chocolate
Does the type of food access influence the
body weight of male lab rats significantly?
H0: µchow = µrestricted = µextended
Ha: H0 is not true (at least one mean µ is
different from at least one other mean µ)
Chow
Restricted
Extended
N
19
16
15
Mean
605.63
657.31
691.13
StDev
49.64
50.68
63.41
Reminders
A factor is a variable that can take one of several levels used to
differentiate one group from another.
An experiment has a one-way or completely randomized design if
several levels of one factor are being studied and the individuals are
randomly assigned to its levels. (There is only one way to group the data.)

One-way: 4 levels of nematode quantity in seedling growth experiment

Two-way: 2 seed species and 4 levels of nematodes
One-way ANOVA is used for completely randomized, one-way designs.
The ANOVA F test
The analysis of variance F test compares the variation due to specific
sources (levels of the factor) with the variation among individuals who
should be similar (individuals in the same sample).
H0: All means µi are equal
Ha: Not all means µi are equal (H0 is not true)
The analysis of variance F statistic for comparing several means is:
F
variation among the sample means
variation among individuals in the set of samples
A
mean 23.4
C4
MSG
variation
B among sample means
F

MSE variation among individuals in the set of samples
mean 25.2
B
mean 27.8
B
mean 23.4
B
20
22
*24 * 26
A
Data
C4
mean 25.2
B
mean 27.8
B
mean 23.4
B
*28
30
32
B
20
22
24
26
Data
28
30
C4
A
mean 25.2
A
mean 27.8
A
mean 23.4
32
(A) Variability of means smaller than variability within samples  F tends to be small.
(B) Variability of means larger than variability within samples  F tends to be large.
ANOVA assumptions
1. The k samples must be independent SRSs. The individuals in each
sample are completely unrelated.
2. Each population represented by the k samples must be Normally
distributed. However, the test is robust to deviations from Normality
(skew, mild outliers) for large-enough samples, thanks to the central
limit theorem.
3. The ANOVA F-test requires that all k populations have the same
standard deviation s.
There are inference tests for this, but they tend to be sensitive to deviations from
the Normality assumption or require equal sample sizes.
A simple and conservative approach: The ANOVA F test is
approximately correct when the largest sample standard deviation
is no more than ~ twice as large as the smallest sample standard
deviation.
Equal sample sizes make the ANOVA more robust to deviations from
the equal σ rule.
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
Conditions required for ANOVA:
•Independent random samples?
This is a randomized experiment, so the 3 groups are independent samples.  ok
•Normal populations or large enough samples?
We see no evidence of non-Normality (and
total sample size is large).  ok
•Equal population standard deviations?
largest si = 63.41; smallest si = 49.64  ok
Chow
Restricted
Extended
N
19
16
15
Mean
605.63
657.31
691.13
StDev
49.64
50.68
63.41
Meconium is a newborn’s first stool, indicative of the fetus’ exposure
to outside contaminants during pregnancy. Do cigarette chemicals
reach the fetus in the womb?
Cotinine is the metabolized form of nicotine. Its presence in the
meconium was assessed for 3 independent random samples of
newborns with different mother smoking status.
Level
Active smokers
Non-smokers
Passive smokers
Conditions required:
• Equal population standard
deviations?
than twice smallest si .
600
ANOVA is clearly not appropriate here.
Meconium cotinine (ng/uL)
700
smallest si = 24.23  STOP
Mean
367.20
185.00
263.40
StDev
143.67
24.23
52.54
Newborn meconium cotinine level (ng/uL) vs mother smoking status
Checking that largest si is no more
largest si = 143.67;
N
10
10
10
500
400
300
200
100
Active smokers
Non-smokers
Smoking status
Passive smokers
ANOVA calculations
We have k independent SRSs, from k populations or treatments.
The ith population has a Normal distribution with unknown mean µi.
All k populations have the same standard deviation σ, unknown.
H0: m1 = m2 = … = mk
MSG SSG (k  1)
F

MSE SSE ( N  k )
Under H0, all k samples come from the same population N(μ,σ) and sample
averages should be no more variable than points in individual samples.
When H0 is true, F has the
F distribution with k − 1 (numerator) and
N − k (denominator) degrees of freedom.
F
MSG
variation among sample means

MSE variation among individual s in same sample
MSG, the mean square for groups, is a variance of the means weighted
for sample size. It measures the variability of the sample averages.
 SSG (k  1)
MSE, the mean square for error or pooled sample variance sp2 is the
average sample variance weighted for sample sizes. It measures the
variability within each of the groups.
 SSE ( N  k )
Using Table F
The F distribution is asymmetrical and has two distinct degrees of
freedom. This was discovered by Fisher, hence the label "F".
Once again what we do is calculate the value of F for our sample data,
and then look up the corresponding area under the curve in Table F.
For df: 5,4
dfnum = k − 1
p
dfden
=
N−k
F
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
Does the type of access to food influence the body weight of lab rats significantly?
Chow Restricted Extended
516
546
564
547
599
611
546
612
625
564
627
644
577
629
660
570
638
679
582
645
687
594
635
688
597
660
706
599
676
714
606
695
738
606
687
733
624
693
744
623
713
780
641
726
794
655
736
667
690
703
Chow
Restricted
Extended
N
19
16
15
Mean
605.63
657.31
691.13
One-way ANOVA: Chow, Restricted, Extended
Source
Factor
Error
Total
DF
2
47
49
S = 54.42
StDev
49.64
50.68
63.41
SS
63400
139174
202573
MS
31700
2961
R-Sq = 31.30%
F
10.71
P
0.000
R-Sq(adj) = 28.37%
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
Does the type of access to food influence the body weight of lab rats significantly?
The test statistic is F = 10.71.
F, df1=2, df2=47
0
4
F
8
0.0001471
10.71
12
The F statistic (10.71) is large and the P-value is very small (<0.001). We reject H0.
 The population mean weights are not all the same; food access type influences
weight significantly in male lab rats.
Note: two-sample t test and ANOVA
A two-sample t test assuming equal variance with a two-sided Ha and
an ANOVA comparing only 2 groups will give the same P-value.
H0: m1 = m2
Ha: m1 ≠ m2
H0: m1 = m2
Ha: m1 ≠ m2
One-way ANOVA
t test assuming equal variance
F statistic
t statistic
F = t 2 and
both P-values are the same
The t test is more flexible: You may choose a one-sided alternative or
you may want to run a t test assuming unequal variance if you are not
sure that the 2 populations have the same standard deviation s.
Interpreting results from an ANOVA

H0 states that all mi are equal, while Ha states that H0 is not true.
 A significant P-value leads you to reject H0 indicating that not all mi
are equal.
What do you do next? Which means are different from which?

You can gain insight by looking at summary graphs (boxplot,
mean ± stdev) or finding the confidence interval for each mean mi .

There are also some tests of statistical significance designed
specifically for multiple tests (Chapter 26).
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
• Group 1: access to chow only.
• Group 2: chow plus access to cafeteria food restricted to 1 hour every day.
• Group 3: chow plus extended access to cafeteria food (all day long every day).
The ANOVA found that not all 3 food access
types result in the same mean body weight.
Access type significantly impacts the weight
of male lab rats.
The graph suggests that, on average, rats
given extended access to cafeteria food are
heavier and those with access only to chow
are lighter.
Chow
Restricted
Extended
N
19
16
15
Mean
605.63
657.31
691.13
StDev
49.64
50.68
63.41
Pooled variance and confidence intervals
MSE, the mean square for error or pooled sample variance sp2,
estimates the common variance σ 2 of the k populations.
Thus, we can easily calculate separate level C confidence intervals for
each population mean µI (often provided by software packages):
xi  t * s p
ni
or
xi  t * MSE / ni
Using the critical value t* from the t distribution
with N − k degrees of freedom .
Male lab rats were randomly assigned to 3 groups differing in access
to food for 40 days. The rats were weighed (in grams) at the end.
One-way ANOVA: Chow, Restricted, Extended
Source
Factor
Error
Total
DF
2
47
49
S = 54.42
Level
Chow
Restricted
Extended
SS
63400
139174
202573
MS
31700
2961
R-Sq = 31.30%
N
19
16
15
Mean
605.63
657.31
691.13
Pooled StDev = 54.42
F
10.71
P
0.000
R-Sq(adj) = 28.37%
StDev
49.64
50.68
63.41
Individual 95% CIs For Mean Based on
Pooled StDev
----+---------+---------+---------+----(------*------)
(-------*-------)
(-------*--------)
----+---------+---------+---------+----595
630
665
700
= √MSE = √2961
With 95% confidence, for df = N – k = 47, t* = 2.012 (or 2.021 for df ≈ 40 in Table C).
Each CI is:
xi  t * s p
ni
For “Chow”: 650.63  (2.012) (54.42)
650.63  25.12
19
The purpose of the one-way analysis of variance F test (ANOVA) is to compare
A) the variances of several populations.
B) the proportions of successes in several populations.
C) the means of several populations.
Three varieties of tropical plant Heliconia
are studied. Here are the flower lengths in
mm for 3 independent random samples.
50
length (mm)
40
H0 versus Ha?
30
20
10
ANOVA assumptions satisfied?
0
Interpretation?
bihai
red
variety
yellow
One-way ANOVA: length versus variety
Source
DF
SS
MS
F
variety
2 1082.87 541.44 259.12
Error
51
106.57
2.09
Total
53 1189.44
S = 1.446
Level
bihai
red
yellow
N
16
23
15
R-Sq = 91.04%
Mean
47.597
39.711
36.180
StDev
1.213
1.799
0.975
Pooled StDev = 1.446
P
0.000
R-Sq(adj) = 90.69%
Individual 95% CIs For Mean Based on Pooled StDev
---------+---------+---------+---------+
(-*-)
(*-)
(-*--)
---------+---------+---------+---------+
38.5
42.0
45.5
49.0