One factor ANOVA tutorial

One Factor ANOVA
March 2, 2017
Contents
•
•
•
•
•
•
•
•
•
•
•
One-factor ANOVA
Example 1: Exam scores
The Numerator of the ANOVA - variability between the means
The Denominator of the ANOVA - variability within each group
The F-statistic - the ratio of variability between and within.
Finding the critical value of F with table E
The Summary Table
Example 2: Preferred temperature for weather sensitivity
Effect Size
Questions
Answers
One-factor ANOVA
The ANOVA (”ANalysis Of VAriance”) is a hypothesis test for the difference between two or more means. This tutorial will
describe how to compute the simplest ANOVA - the ’1-way’ or ’1-factor’ ANOVA for independent measures.
Here’s how to get to the 1-factor ANOVA with the flow chart:
START
HERE
Test for
ρ=0
Ch 17.2
1
number of
correlations
correlation (r)
measurement
scale
frequency
number of
variables
2
2
χ test
independence
Ch 19.9
Means
Ch 17.4
Yes
Do you
know σ?
1
number of
means
More than 2
No
number of
factors
2
2-factor
ANOVA
Ch 21
1
one sample
t-test
Ch 13.14
independent measures
t-test
Ch 15.6
χ 2 test
frequency
Ch 19.5
2
Test for
ρ1 = ρ2
z-test
Ch 13.1
1
1-factor
ANOVA
Ch 20
2
Yes
independent
samples?
No
dependent measures
t-test
Ch 16.4
The null hypothesis for the ANOVA is that that all samples are drawn from populations with equal means and equal variances.
The alternative hypothesis is that the populations are not the same, but the variances are still the same.
The logic behind the ANOVA is to compare two estimates of the population variance. One is related to the variability
between the means and the other is related to the variability of scores within each group. If the null hypothesis is true, these
1
two estimates should be on average the same, so their ratio, called the F-ratio, should be around one. If the population
means, differ, then the F-statistic will become larger than one on average. If the measured value of F exceeds a critical value
determined by α, then we reject the null hypothesis that the population means are the same.
We’ll show how it works with an example.
Example 1: Exam scores
Suppose a professor teaches the same small graduate course over three years ( 2014, 2015 and 2016). You want to know if
the course grades have varied significantly over these 3 years. We’ll run an ANOVA using alpha value of α = 0.05. Here are
the scores:
2014
88.4
77.5
83.5
88.2
84.7
86.2
82.1
89.3
85.3
2015
84
81.3
80.2
72.3
87.3
67.5
66.8
78.2
73
2016
76.6
69.6
85.4
84.4
74.3
78
80.7
83.5
73.6
Here is a table of useful statistics from this list:
n
mean
SS
s
sem
2014
9
85.02
108.8156
3.6881
1.2294
2015
9
76.73
417.0001
7.2198
2.4066
2016
9
78.46
236.9624
5.4425
1.8142
Also, the mean of all scores, called the ’grand mean’, is 80.0704.
We’ll use the numbers from this table to visualize the data using a bar graph with error bars representing standard errors of
the mean, just like we did with the independent-measures t-test:
2
88
86
84
score
82
80
78
76
74
72
2014
2015
2016
Exam
We can use the same rule of thumb as we did for the independent measures t-test to estimate the significance of the differences
between pairs of means. Remember, if the error bars overlap, then we will almost always fail to reject the null hypothesis
for a one-tailed test with α = .05. Since this is the most liberal of all tests, we will also fail to reject for any other choice of
tails or alphas. But remember, we can’t make separate t-tests on each pair of means - the probability of at least one type I
error will be greater than α.
Looking at the graph, you might guess that there is a statistically significant difference between the means.
The Numerator of the ANOVA - variability between the means
The numerator of the ANOVA’s F-statistic reflects how different the means are from each other. Under the null hypothesis,
this number, called the ’variance between’, or s2bet is an estimate of the population variance.
We first calculate the sums of squared deviation of each mean from the grand mean, scaled by the sample size. This is called
SSbet :
SSbet =
P
¯ )2 =
n(X̄ − X̄
(9)(85.02 − 80.0704)2 + (9)(76.73 − 80.0704)2 + (9)(78.46 − 80.0704)2 =
220.4869 + 100.4244 + 23.3405 = 344.2518
The degrees of freedom for SSbet is the number of groups minus one: 3 - 1 = 2.
We can now calculate s2bet which is SSbet divided by its degrees of freedom:
SSbet
344.2518 = 172.1259
s2
2
bet = dfbet =
The Denominator of the ANOVA - variability within each group
Another estimate of the population variance is the variance of the scores within each group. This number, called the ’variance
3
within’ or s2w is calculated by first adding up the sum of the squared deviation of each score from it’s own group mean:
SSw =
PP
j i
(Xi − X̄j )2
The table above already gives us the sum of squared deviation for each group, so the total sums of squared within is:
SSw = 108.8156 + 417.0001 + 236.9624 = 762.7781
The degrees of freedom for SS within is the number of scores minus the number of groups: 9 + 9 + 9 - 3 = 24. The variance
within is SS within divided by its degrees of freedom:
SSw
762.7781 = 31.7824
s2
w = dfw =
24
The F-statistic - the ratio of variability between and within.
Finally, we’ll calculate the observed value of F which is the ratio of s2bet and s2w :
s2
F = bet
= 172.1259
31.7824 = 5.42
s2
w
Finding the critical value of F with table E
The distribution of the F-statistic is a family of curves that varies both with dfbet and dfw . Table E provides critical values
of F for α = .05 and α = .01 (bold). The rows of the table correspond to dfw and the columns are for dfbet . For our example,
dfbet is 2 and dfw is 24. Here’s the relevant part of the table:
dfw |df b
23
24
25
1
4.28
7.88
4.26
7.82
4.24
7.77
2
3.42
5.66
3.4
5.61
3.39
5.57
3
3.03
4.76
3.01
4.72
2.99
4.68
Looking at the row for dfbet = 2 and the column for dfw = 24, we see that in bold the critical value for F with α = 0.05 is
3.4.
Remember, both the numerator and the denominator for the ANOVA are measures of the population variance under the
null hypothesis. If the null hypothesis is false, then the means will have greater variability, so the numerator will grow
large compared to the denominator. Therefore, a large value of F is evidence against the null hypothesis. Specifically, if
our observed value of F is greater than the critical value, then we reject the null hypothesis and conclude that there is a
significant difference between the means.
For our example, since our observed value of F (5.42) is greater than the critical value of F (3.4), we reject the null hypothesis.
Using APA format, we state: ”There is a significant difference between the means of the 3 exams, F(2,24) = 5.42, p < 0.05”
We can also use the F-calculator in the Excel spreadsheet to find the actual p-value:
dfb
2
Convert α to F
dfw
α
24
0.05
F
3.4
dfb
2
Convert F to α
dfw
F
24
5.42
α
0.0114
4
Using this p-value we can conclude using APA format: ”There is a significant difference between the means of the 3 exams,
F(2,24) = 5.42, p = 0.0114”
The Summary Table
The statistics used to generate the F statistic for an ANOVA are traditionally reported in a summary table like this:
SS
df
MS
F
Fcrit
p-value
Between
344.2518
2
172.1259
5.42
3.4
0.0114
Within
762.7781
24
31.7824
Total
1107.1563
26
In this next example we’ll use this table to keep track of our values.
Example 2: Preferred temperature for weather sensitivity
At the beginning of the quarter I surveyed you for your preferred outdoor temperature. I also asked you how much weather
affected your mood with the options of Not at all, Just a little, A fair amount and Very much. Let’s see if there is a significant
difference between the preferred temperatures across these 4 options. We’ll us α = 0.05 again. Here’s a table of statistics:
n
mean
SS
s
sem
n
grand mean
SStotal
Not at all
3
71.67
16.6667
2.8868
1.6667
Just a little
30
73.33
1020.667
5.9326
1.0831
A fair amount
42
74.05
3113.905
8.7149
1.3447
Very much
23
75
1486
8.2186
1.7137
Totals:
98
73.9796
5689.9592
Here’s a graph of the means with error bars representing the standard error of the means:
5
78
Preferred Temperature (F)
76
74
72
70
68
Not at all
Just a little
A fair amount
Very much
How much does weather affect your mood?
We can easily fill in some of the entries in the summary table. I’ve told you that SStotal = 5689.9592. The df for SStotal
is the total sample size minus 1: 98 - 1 = 97.
SSw is just the sum of SSw for each cell: 16.6667 + 1020.667 + 3113.905 + 1486 = 5637.2387. The df for SSw is the total
sample size minus the number of groups: 98 - 4 = 94.
Between
Within
Total
SS
df
5637.2387
5689.9592
94
97
MS
F
Fcrit
p-value
SSbet can be hard to calculate, but since we know that SStotal = SSw + SSbet , we know that SSbet = 5689.9592 - 5637.2387 =
52.7205. The degrees of freedom is 4 - 1 = 3.
Just like for the last example, the variance within and between are just the SS divided by their df’s:
SSbet
52.8183 = 17.6061
s2
3
bet = dfbet =
SSw
5637.2387 = 59.9706
s2
w = dfw =
94
Finally, the F statistic is their ratio:
s2
17.6061 = 0.29
= 59.9706
F = bet
s2
w
The table now looks like this:
Between
Within
Total
SS
52.8183
5637.2387
5689.9592
df
3
94
97
MS
17.6061
59.9706
F
0.29
6
Fcrit
p-value
The critical value of F for α = 0.05 and dfbet = 3 and dfw = 94 can be found in table E:
dfw |df b
94
3
2.7
4
If you use the F calculaton the Excel spreadsheet you’ll see that the p-value is 0.8325:
dfb
3
Convert α to F
dfw
α
94
0.05
F
2.7
dfb
3
Convert F to α
dfw
F
94
0.29
α
0.8325
The final summary table is:
Between
Within
Total
SS
52.8183
5637.2387
5689.9592
df
3
94
97
MS
17.6061
59.9706
F
0.29
Fcrit
2.7
p-value
0.8325
The observed value of F (0.29) is not greater than the critical value of F (2.7), we fail to reject the null hypothesis.
Using APA format, we state: ”There is not a significant difference in preferred outdoor temperature across the 4 levels of
the survey about how weather affects mood, F(3,94) = 0.29, p = 0.8325.”
Note that the ANOVA doesn’t tell us anything about exactly how preferred temperature varies with how weather affects
mood. Looking at the bar graph, it appears that there is an upward trend so that the more weather affects mood, the
higher the preferred temperature. But to make statistical conclusions like this requires a post-hoc analysis that allows you
to compare specific means to other means.
Effect Size
There are several measures of effect size for an ANOVA, and most software packages will spit out more than one. Probably
the most common is η 2 , or ’eta squared’, which is the proportion of total variance in your data that is attributed to the effect
driving the difference between the means. It’s simply the ratio of SSbet to SStotal . From our last example:
η 2 = 52.8183
5689.96 = 0.0093
If SSw = 0, then SSbet = SStotal , which then makes η 2 = 1. This is as big as η 2 can get. It happens when all deviations
from the grand mean are caused by the differences between the means.
If SSbet = 0, η 2 = 0. This is small as η 2 can get. It happens when there are no differences between the means, so all of the
variaiblity in your data is attributed to varability within each group.
η 2 is simple, comonly used, but tends t ooverestimate effect size for larger number of treatments. Just so you know, η 2 is
sometimes called ’partial eta-squared’ for some reason.
7
Questions
Your turn again. Here are 7 practice questions based on the class survey, followed by their answers.
1) From our survey we can calculate Caffeiene consumption as a function of how much you said you liked math. Here is a
table of statistics based on our survey:
Not at all
Just a little
A fair amount
Very much
n
11
43
40
8
means
1.18
1.38
2.33
1.13
SSw
11.1364
160.9192
712.776
10.8752
ntotal
102
grand mean
1.7108
SStotal
921.2181
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.05, is there difference in Caffeine consumption across the 4 groups of students who vary their
preference for math?
2) From our survey we can calculate the preferred outdoor temperature as a function of how much weather affects
your mood. Here is a table of statistics based on our survey:
Not at all
Just a little
A fair amount
Very much
n
5
31
42
23
means
56.6
72.26
74.05
75
SSw
1727.2
2095.9356
3113.905
1486
ntotal
101
grand mean
72.8515
SStotal
9920.7723
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.05, is there difference in preferred outdoor temperature across the 4 groups of students who
vary in how much weather affects their mood?
3) From our survey we can calculate how much you drink across voting preferences.
based on our survey:
Democrat
I never (or can’t)
vote
Independent
Republican
n
62
23
6
7
means
2.89
0.52
0.83
1.43
SSw
661.7102
9.7392
6.8334
13.7143
ntotal
98
grand mean
2.102
SStotal
800.4796
Here is a table of statistics
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.01, is there difference in drink per week across the 4 groups of voting preference for students?
4) From our survey we can calculate how well you think you’ll do on Exam 1 as a function of how much you think
8
you’ll like this class. Here is a table of statistics based on our survey:
Not at all
Just a little
A fair amount
Very much
n
11
43
40
8
means
76.82
85.37
86.3
89.13
SSw
1463.6364
3148.0467
3222.4
332.8752
ntotal
102
grand mean
85.1078
SStotal
9111.8137
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.01, is there difference in predicted Exam 1 score across the 4 groups of how much you think
you’ll like Psych 315?
5) Suppose you want to know if how much students drink varies with how much they exercise.
statistics based on our survey:
Not at all
Just a little
A fair amount
Very much
n
8
37
33
24
means
2.13
2.28
2.11
1.79
SSw
116.8752
303.2708
284.8793
105.9584
ntotal
102
grand mean
2.098
SStotal
814.5196
Here is a table of
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.01, is there difference in alcoholic drinks across the 4 groups of students who exercise?
6) From our survey we can compute how many hours per week students play video games as a function of how
much they exercise. Here is a table of statistics based on our survey:
Not at all
Just a little
A fair amount
Very much
n
8
37
33
24
means
0.5
0.55
0.36
0.15
SSw
4
44.8925
17.1293
4.74
ntotal
102
grand mean
0.3897
SStotal
73.3217
Calculate the standard errors of the mean for each of the 4 groups.
Make a bar graph of the means for each of the 4 groups with error bars as the standard error of the means.
Using an alpha value of α =0.05, is there difference in hours of video game playing per week across the 4 groups of students
who exercise?
7) From our survey we can see if there is a difference in the heights of students across different levels of exercise.
Here is a table of statistics based on our survey:
9
Just a little
A fair amount
Very much
n
37
33
24
means
65.19
64.76
65.04
SSw
533.6757
336.0608
336.9584
ntotal
94
grand mean
65
SStotal
1210
Calculate the standard errors of the mean for each of the 3 groups.
Make a bar graph of the means for each of the 3 groups with error bars as the standard error of the means.
Using an alpha value of α =0.05, is there difference in height across the 3 groups of students who exercise?
10
Answers
1)
SSbet = (11)(1.18 − 1.7108)2 + (43)(1.38 − 1.7108)2 + (40)(2.33 − 1.7108)2 + (8)(1.13 − 1.7108)2 = 25.8396
25.8396 = 8.6132
s2
3
bet =
SSw = SStotal − SSbet = 921.218 − 25.8396 = 895.707
25.8396 = 8.6132
s2
3
bet =
8.6132 = 0.94
F = 9.1399
We fail to reject H0 .
There is not a significant difference in mean Caffeine consumption across the 4 groups of students who vary their preference
for math, F(3,98) = 0.94, p = 0.4244.
students who vary their preference for math
3.5
Caffeine consumption
3
2.5
2
1.5
1
0.5
0
Not at all
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
25.8396
3
8.6132
0.94
2.7
0.4244
Within
895.7068
98
9.1399
Total
921.2181
101
11
2)
SSbet = (5)(56.6 − 72.8515)2 + (31)(72.26 − 72.8515)2 + (42)(74.05 − 72.8515)2 + (23)(75 − 72.8515)2 = 1497.9
1497.9004 = 499.3001
s2
3
bet =
SSw = SStotal − SSbet = 9920.77 − 1497.9 = 8423.04
1497.9004 = 499.3001
s2
3
bet =
F = 499.3001
86.8355 = 5.75
We reject H0 .
There is a significant difference in mean preferred outdoor temperature across the 4 groups of students who vary in how
much weather affects their mood, F(3,97) = 5.75, p = 0.0012.
students who vary in how much weather affects their mood
preferred outdoor temperature
80
60
40
20
0
Not at all
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
1497.9004
3
499.3001
5.75
2.7
0.0012
Within
8423.0406
97
86.8355
Total
9920.7723
100
12
3)
SSbet = (62)(2.89 − 2.102)2 + (23)(0.52 − 2.102)2 + (6)(0.83 − 2.102)2 + (7)(1.43 − 2.102)2 = 108.93
108.9302 = 36.3101
s2
3
bet =
SSw = SStotal − SSbet = 800.48 − 108.93 = 691.997
108.9302 = 36.3101
s2
3
bet =
F = 36.3101
7.3617 = 4.93
We reject H0 .
There is a significant difference in mean drink per week across the 4 groups of voting preference for students, F(3,94) = 4.93,
p = 0.0032.
voting preference for students
3.5
3
drink per week
2.5
2
1.5
1
0.5
0
Democrat I never (or can't) vote
Independent
Republican
SS
df
MS
F
Fcrit
p-value
Between
108.9302
3
36.3101
4.93
4
0.0032
Within
691.9971
94
7.3617
Total
800.4796
97
13
4)
SSbet = (11)(76.82 − 85.1078)2 + (43)(85.37 − 85.1078)2 + (40)(86.3 − 85.1078)2 + (8)(89.13 − 85.1078)2 = 944.798
944.7985 = 314.9328
s2
3
bet =
SSw = SStotal − SSbet = 9111.81 − 944.798 = 8166.96
944.7985 = 314.9328
s2
3
bet =
F = 314.9328
83.3363 = 3.78
We fail to reject H0 .
There is not a significant difference in mean predicted Exam 1 score across the 4 groups of how much you think you’ll like
Psych 315, F(3,98) = 3.78, p = 0.013.
how much you think you'll like Psych 315
predicted Exam 1 score
100
80
60
40
20
0
Not at all
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
944.7985
3
314.9328
3.78
3.99
0.013
Within
8166.9583
98
83.3363
Total
9111.8137
101
14
5)
SSbet = (8)(2.13 − 2.098)2 + (37)(2.28 − 2.098)2 + (33)(2.11 − 2.098)2 + (24)(1.79 − 2.098)2 = 3.5153
3.5153 = 1.1718
s2
3
bet =
SSw = SStotal − SSbet = 814.52 − 3.5153 = 810.984
3.5153 = 1.1718
s2
3
bet =
F = 1.1718
8.2753 = 0.14
We fail to reject H0 .
There is not a significant difference in mean alcoholic drinks across the 4 groups of students who exercise, F(3,98) = 0.14, p
= 0.9358.
students who exercise
4
alcoholic drinks
3
2
1
0
Not at all
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
3.5153
3
1.1718
0.14
3.99
0.9358
Within
810.9837
98
8.2753
Total
814.5196
101
15
6)
SSbet = (8)(0.5 − 0.3897)2 + (37)(0.55 − 0.3897)2 + (33)(0.36 − 0.3897)2 + (24)(0.15 − 0.3897)2 = 2.4561
2.4561 = 0.8187
s2
3
bet =
SSw = SStotal − SSbet = 73.3217 − 2.4561 = 70.7618
2.4561 = 0.8187
s2
3
bet =
F = 0.8187
0.7221 = 1.13
hours of video game playing per week
We fail to reject H0 .
There is not a significant difference in mean hours of video game playing per week across the 4 groups of students who
exercise, F(3,98) = 1.13, p = 0.3408.
students who exercise
0.8
0.6
0.4
0.2
0
Not at all
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
2.4561
3
0.8187
1.13
2.7
0.3408
Within
70.7618
98
0.7221
Total
73.3217
101
16
7)
SSbet = (37)(65.19 − 65)2 + (33)(64.76 − 65)2 + (24)(65.04 − 65)2 = 3.2749
3.2749 = 1.6375
s2
2
bet =
SSw = SStotal − SSbet = 1210 − 3.2749 = 1206.69
3.2749 = 1.6375
s2
2
bet =
1.6375 = 0.12
F = 13.2604
We fail to reject H0 .
There is not a significant difference in mean height across the 3 groups of students who exercise, F(2,91) = 0.12, p = 0.8871.
students who exercise
70
60
height
50
40
30
20
10
0
Just a little
A fair amount
Very much
SS
df
MS
F
Fcrit
p-value
Between
3.2749
2
1.6375
0.12
3.1
0.8871
Within
1206.6949
91
13.2604
Total
1210
93
17