Factorial ANOVA - krigolson teaching

1
Factorial ANOVA
•
The ANOVA designs we have dealt with up to this point, known as simple ANOVA or oneway ANOVA, had only one independent grouping variable or factor. However, oftentimes a
researcher has more than one independent grouping variable, or factor of interest.
•
Factorial ANOVA is used when we want to consider the effect of more than one factor on
differences in the dependent variable. A factorial design is an experimental design in which
each level of each factor is paired up or crossed with each level of every other factor. In
other words each combination of the levels of the factors is included in the design. This type
of design is often depicted in a table.
•
We typically refer to ANOVA designs by the number of factors and/or by the number of
levels within a factor. A one-way ANOVA refers to a design with one factor, two-way
ANOVA has two factors, three-way ANOVA has three factors, etc. A two-by- three
ANOVA is a two-way ANOVA with two levels of the first factor and three levels of the
second factor. A three-by-four-by-two ANOVA is a three-way ANOVA with three levels of
the first factor, four of the second, and two of the third.
•
Factorial designs allow us to determine if there are interactions between the independent
variables or factors considered. An interaction implies that differences in one of the factors
depend on differences in another factor.
Example
Consider a researcher who is interested in determining whether a new mathematics curriculum is
better at helping students develop spatial visualization skills. Furthermore, he wonders whether
there is a difference between boys and girls, because it is known that males tend to be better at
spatial visualization than females. The researcher has the following two-way (two-by-two)
factorial design:
Factor B: Curriculum
New Math
Control
Curriculum
(B2)
(B1)
Factor A:
Gender
Overall Mean
(marginal)
Females (A1)
X 11
X 12
X 1.
Males (A2)
X 21
X 22
X 2.
X .1
X .2
X ..
Overall Mean (marginal)
Suppose the new curriculum was found to improve spatial visualization scores equally as well
for both males and females. Then there would be main effect differences only. Main effect
differences reflect differences in the means of one of the factors, ignoring other factors.
However, if, for example, the new curriculum worked better for females then there would be an
interaction effect. Typically we graph each of the cell means to depict differences obtained in
factorial ANOVA.
•
The assumptions underlying the statistical tests associated with factorial ANOVA are the
same as those associated with a simple one-way ANOVA. Specifically, it is assumed the
dependent variable is normally distributed within each cell, that the population variances are
2
identical within each cell, and that the observations and groups are independent of each
other.
•
Conceptually, the way we calculate the statistics associated with factorial ANOVA designs
is comparable to what we did for simple one-way ANOVA designs. Basically, we
determine the variability associated with different means; there are just more means to deal
with now.
•
The SStotal in a factorial design is exactly the same as it was in simple ANOVA. It
represents the total variability among all observations around the grand mean or ∑ ( X −X ) 2
•
In a simple one-way ANOVA the SSwithin = SSerror represented the variability of observations
within a particular group. However, now we are partitioning the groups even further so each
“group” is represented by a cell in our table. In other words, the SSerror represents the
variability of observations within a particular cell of the table. It is the variability that is
expected among individuals and can be thought of as an estimate of variability that is
common to all cells.
•
In a factorial ANOVA the SSbetween still represents the variability of the group means from
the overall mean. However, now we have to determine which of the variability is due to
main effects and which is due to interaction effects. For a two-way ANOVA design, as
depicted in the example above, SSbetween is partitioned into SSA, SSB, and SSAB.
•
SSA represents the variability in the marginal means associated with the different levels of
factor A, when compared to the overall mean. In our example, it would represent the
variability in the means obtained for boys and girls, ignoring curriculum. It is computed by
using the row marginal means and the grand mean.
•
SSB represents the variability in the marginal means associated with the different levels of
factor B, when compared to the overall mean. In our example, it would represent the
variability in the different curriculum programs, ignoring gender. It is computed by using
the column marginal means and the grand mean.
•
SSAB represents the variability in the cell means, after controlling for main effect differences,
when compared to the overall mean. It is computed by using the cell means and the overall
means, as well as SSA and SSB. Basically, we compute the variability in the cell means and
then subtract the variability due to the main effects.
Example:
Suppose we obtained the following data for the ANOVA design explained previously:
Females - New
Females - Control
Males - New
Males - Control
6
8
5
6
6
3
4
2
7
9
8
7
5
3
2
4
7
8
4
9
4
6
4
3
9
10
6
6
4
3
3
1
6
5
5
3
10
8
5
4
3
Calculating the cell and marginal means we obtain the following:
Factor B: Curriculum
New Math
Control
Curriculum
(B2)
(B1)
Factor A:
Gender
Overall Mean
(marginal)
Females (A1)
X 11 = 6.4
X 12 = 4.0
X 1. = 5.2
Males (A2)
X 21 = 8.0
X 22 = 3.4
X 2. = 5.7
X .1 = 7.2
X .2 =3.7
X .. = 5.45
Overall Mean (marginal)
The SSerror = ∑ ( X − X j ) =
(5 − 6.4) 2 + (8 − 6.4) 2 + ... + (6 − 4.0) 2 + (3 − 4.0) 2 + ... + (7 − 8.0) 2 + (9 − 8.0) 2 + ... + (5 − 3.4) 2 + (3 − 3.4) 2 = 72.8
The SSA = ni. ∑ ( X i. − X .. ) 2 = 20(5.2 − 5.45) 2 + 20(5.7 − 5.45) 2 = 1.25 + 1.25 = 2.5
The SSB = n. j ∑ ( X . j − X .. ) 2 = 20(7.2 − 5.45) 2 + 20(3.7 − 5.45) 2 = 61.25 + 61.25 = 122.5
The SSAB = [ nij ∑ ( X ij − X .. ) 2 ] – SSA - SSB =
[10(6.4 − 5.45) 2 + 10(4.0 − 5.45) 2 + 10(8.0 − 5.45) 2 + 10(3.4 − 5.45) ] – 2.5 – 122.5 =
[9.025 + 21.025 + 65.025 + 42.025] – 126 = 137.1 – 125 = 12.1
SStotal = SSerror + SSA + SSB + SSAB = 72.8 + 2.5 + 122.5 + 12.1 = 209.9
•
To obtain our F-ratios for each test we need to use the df associated with each main effect
and interaction.
dfA = Number of levels of Factor A – 1 = 2 – 1 = 1, for our example
dfB = Number of levels of Factor B – 1 = 2 – 1 = 1, for our example
B
dfAB = (dfA)( dfB) = 1(1) = 1, for our example
B
dferror = N – (number of cells) = 40 – 4 = 36, for our example
dftotal = N – 1 (checking this number is a good way to make sure you’ve entered your data
correctly)
4
Using the appropriate df we can obtain the corresponding MS term needed to calculate our Fstatistic:
MSA = SSA / df A = 2.5 / 1 = 2.5, for our example
MSB = SSB / df B = 122.5 / 1 = 122.5 , for our example
MSAB = SSAB / dfAB = 12.1 / 1 = 12.1, for our example
MSerror = SSerror / dferror = 72.8 / 36 = 2.022, for our example
The null hypothesis for each test is that there is no difference in the means.
FA = MSA / MSerror = 2.5 / 2.022 ≈ 1.24, (compare to a critical F with 1 and 36 df ≈ 4.125)
FB = MSB / MSerror = 122.5 / 2.022 ≈ 60.58, (compare to critical F with 1 and 36 df ≈ 4.125)
FA = MSAB / MSerror = 12.1 / 2.022 ≈ 5.98, (compare to critical F with 1 and 36 df ≈ 4.125)
SPSS Output: Univariate Analysis of Variance - obtained using defaults under
"Analyze" and “General Linear Model”and "Univariate"
Between-Subjects Factors
Value Label
sex
curriculum
N
1
female
20
2
male
20
1
new
program
20
2
control
20
Tests of Between-Subjects Effects
Dependent Variable: spatial
Source
Corrected Model
Intercept
sex
curriculum
Type III Sum
of Squares
137.100a
df
Mean Square
Sig.
45.700
22.599
.000
1188.100
1
1188.100
587.522
.000
2.500
1
2.500
1.236
.274
122.500
1
122.500
60.577
.000
5.984
.019
sex * curriculum
12.100
1
12.100
Error
72.800
36
2.022
Total
1398.000
40
209.900
39
Corrected Total
F
3
a. R Squared = .653 (Adjusted R Squared = .624)
•
Under the “model” option in SPSS you can choose to use either Type II SS, Type III SS
(default) or Type IV SS. It is recommended that you go with the default which adjusts the
tests conducted when you have an unequal number of observations in each cell and conducts
each test independently of other tests.
5
•
Typically when one finds an interaction they graph it to aid in the interpretation. However,
our example wasn’t very “interesting” so let’s consider a more “interesting” example.
Suppose a counseling psychologist conducted a study to determine the best type of therapy
for various levels of depression and obtained the following data:
Tests of Between-Subjects Effects
Dependent Variable: score
Type III Sum
of Squares
Source
Corrected Model
Intercept
treatment
399.111a
df
Mean Square
F
Sig.
8
49.889
7.782
.000
5292.089
1
5292.089
825.456
.000
51.511
2
25.756
4.017
.027
severity
235.244
2
117.622
18.347
.000
treatment * severity
112.356
4
28.089
4.381
.005
Error
230.800
36
6.411
Total
5922.000
45
629.911
44
Corrected Total
a. R Squared = .634 (Adjusted R Squared = .552)
Estimated Marginal Means - obtained under "options" button
3. treatment * severity
Dependent Variable: score
95% Confidence Interval
treatment
severity
Lower Bound
Upper Bound
hypnosis
mild
13.200
1.132
10.903
15.497
moderate
11.400
1.132
9.103
13.697
severe
10.400
1.132
8.103
12.697
mild
16.800
1.132
14.503
19.097
moderate
12.000
1.132
9.703
14.297
CBT
severe
behavioral
•
Mean
Std. Error
5.800
1.132
3.503
8.097
11.000
1.132
8.703
13.297
moderate
9.000
1.132
6.703
11.297
severe
8.000
1.132
5.703
10.297
mild
There is a significant interaction in this example and the best way to interpret it is to create
separate line graphs for each level of one factor that depicts the cell means for the other
factor. This can be done in two different ways as the following demonstrates:
6
19
17
15
Mild
13
Moderate
11
Severe
9
7
5
Hypnosis
CBT
Behavioral
19
17
15
Hypnosis
13
CBT
11
Behavioral
9
7
5
Mild
Moderate
Severe
•
If the interaction was not found to be significant than the lines in the above plots would be
parallel. If, for example, we had only compared hypnosis to behavioral therapy then we
would not have found a significant interaction.
•
Once we find a significant interaction many methodologists would argue that any significant
main effects that are found should not be interpreted. However, this is somewhat dependent
on the type of interaction that is obtained. In the example above a disordinal interaction was
obtained, because the interaction lines intersect (or move in opposite directions). In this case
it is not appropriate to interpret any significant main effects because differences found in
different levels of one factor depend on differences in the second factor.
•
However, it is also possible to obtain an ordinal interaction. In this case, the lines would not
be parallel, however the lines would not cross or move in different directions. For example,
7
suppose the following results had been obtained from 8 patients at each severity level in each
of the 3 therapy groups:
Tests of Between-Subjects Effects
Dependent Variable: score
Type III Sum
of Squares
Source
Corrected Model
Intercept
df
187.917a
Mean Square
F
Sig.
5
37.583
6.746
.000
7252.083
1
7252.083
1301.656
.000
treatment
89.542
2
44.771
8.036
.001
severity
56.333
1
56.333
10.111
.003
3.773
.031
treatment * severity
42.042
2
21.021
Error
234.000
42
5.571
Total
7674.000
48
421.917
47
Corrected Total
a. R Squared = .445 (Adjusted R Squared = .379)
3. treatment * severity
Dependent Variable: score
95% Confidence Interval
treatment
severity
Lower Bound
Upper Bound
hypnosis
mild
13.250
.835
11.566
14.934
severe
11.875
.835
10.191
13.559
mild
14.000
.835
12.316
15.684
severe
13.625
.835
11.941
15.309
mild
12.875
.835
11.191
14.559
8.125
.835
6.441
9.809
CBT
behavioral
Mean
severe
•
Std. Error
In this case the interaction is significant, but the following interaction graphs would be
obtained, which makes it clear that all treatments seemed to work better for mildly depressed
patients so the researcher may be justified in interpreting the main effect:
15
14
13
12
Hypnosis
11
CBT
10
Behavioral
9
8
7
Mild
Severe
8
15
14
13
12
Mild
11
Severe
10
9
8
7
Hypnosis
CBT
Behavior
•
Whenever an interaction is obtained, one might want to do a test of simple main effects.
This test “teases apart” the interaction. A test of simple main effects is different from simply
interpreting the main effects, which ignores different levels of the second factor. Rather a test
of simple main effects is a test for main effect differences at each level of the other factor.
•
For example, one might want to test the main effect of treatment within each of the two
different levels of depression severity. This is accomplished by obtaining the SStherapy for
mildly depressed patients and SStherapy for severely depressed patients. We do this using the
cell means for the different therapy treatments and the following marginal means for severity
of depression.
2. severity
Dependent Variable: score
95% Confidence Interval
severity
Mean
Std. Error
Lower Bound
Upper Bound
mild
13.375
.482
12.403
14.347
severe
11.208
.482
10.236
12.181
SStherapy for mild depression = n∑ ( X therapy,mild − X mild ) 2 =
8[(13.25 – 13.375)2 + (14 – 13.375)2 + (12.875 – 13.375)2] = 8[0.016 + 0.391 + 0.25] = 5.25
SStherapy for severe depression = n∑ ( X therapy, severe − X severe ) 2 =
8[(11.875 – 11.208)2 + (13.625 – 11.208)2 + (8.125 – 11.208)2] = 126.333
Each of these SS have 2 df because they use 3 means in the calculation. The F ratio for
testing main effect differences of therapy for patients that are mildly depressed is based on
MStherapy = 5.25/2 = 2.625 and MSwithin = 5.571 so F = 2.625 / 5.571 = 0.471 which needs to
be compared to a critical F with 2 and 42 df, which is approximately 3.23. There are
obviously no main effect differences for therapy treatments for patients that with mild
9
depression. The F statistic for testing main effect difference of therapy for patients that are
severely depressed is (126.33/2) / 5.571 = 63.167/5.571 = 11.338. So there is a difference in
depression scores for the different therapy treatments for patients that are severely depressed.
One could also test the main effect of depression severity within each of the treatment levels.
This is accomplished by obtaining the SSseverity within each treatment, using the cell means
for the different levels of depression severity and the following marginal means for the
different therapy treatments.
1. treatment
Dependent Variable: score
95% Confidence Interval
treatment
Lower Bound
Upper Bound
hypnosis
Mean
12.563
Std. Error
.590
11.372
13.753
CBT
13.813
.590
12.622
15.003
behavioral
10.500
.590
9.309
11.691
SSseverity for hypnosis = n∑ ( X severity,hypnosis − X hypnosis ) 2 =
8[(13.25 – 12.563)2 + (11.875 – 12.563)2 ] = 8[0.472 + 0.473] = 7.563
SSseverity for CBT = n∑ ( X severity,CBT − X CBT ) 2 =
8[(14.0 – 13.813)2 + (13.625 – 13.813)2 ] = 0.563
SSseverity for behavioral therapy = n∑ ( X severity,behaviroal − X behavioral ) 2 =
8[(12.875 – 10.5)2 + (8.125 – 10.5)2 ] = 90.25
Each of these SS have 1 df because they use 2 means in the calculation. The F ratio for
testing main effect differences of severity of depression for patients that treated using
hypnosis is based on MS = 7.563/1 and MSwithin = 5.571 so F = 7.563 / 5.571 = 1.357 which
needs to be compared to a critical F with 1 and 42 df, which is approximately 4.08. There are
obviously no main effect differences for severity of depression for patients that are treated
with hypnosis therapy. The F statistic for testing main effect difference for severity of
depression for patients that are treated with CBT is 0.563 / 5.571 = 0.101. So there are no
main effect differences for severity of depression for patients that are treated with CBT. The
F statistic for testing main effect differences for severity of depression for patients treated
with behavioral therapy is 90.25 / 5.571 = 16.199 so there is a main effect difference for
severity of depression for patients that are treated with behavioral therapy
•
It should be noted that there is no way to test simple main effects in SPSS without using the
syntax window. Rather the syntax window in SPSS must be used. The following SPSS
syntax can be used to obtain a test of simple main effects, as well as an ANOVA. The
majority of the syntax is what is run when one “clicks” General Linear Model under the
“Analyze” menu option and then chooses Univariate. The middle lines beginning with
/EMMEANS are added when one chooses to get an estimate of the means under the
“options” button. All of this syntax can be obtained by choosing the “paste” button when
10
running an ANOVA from the point and click menu. The last two lines provide a test of the
simple main effects (as well as some extraneous output) and must be typed in by the user.
UNIANOVA
score BY treatment severity
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/EMMEANS = TABLES(treatment)
/EMMEANS = TABLES(severity)
/EMMEANS = TABLES(treatment*severity)
/CRITERIA = ALPHA(.05)
/DESIGN = treatment severity treatment*severity
/ EMMEANS = tables (treatment * severity) comp (treatment)
/ EMMEANS = tables (treatment * severity) comp (severity).
•
All of the multiple comparison procedures discussed in terms of simple one-way ANOVA
can be generalized to higher way ANOVA designs and these are easily obtained using the
“point and click” menu options in SPSS. However, it should be noted that these are tests of
the main effects that ignore other factors. Therefore, I would not recommend interpreting
pair-wise comparisons if a significant interaction is obtained.
•
Power analyses for factorial ANOVA designs can also be conducted, similar to how they
were conducted for simple one-way ANOVA designs. For factorial ANOVA designs we
simply conduct separate power analyses for each factor individually, ignoring any additional
factors that may exist.
•
Once again, statistical significance does not imply differences that are important from a
practical perspective. An effect size measure can be estimated by dividing the SSeffect by
SStotal. Although this is conceptually simple, estimates of SSeffect and SStotal are dependent on
knowing how to determine the expected mean squares, which is technically difficult.
However, estimates of effect size can be obtained under the “options” button when running a
factorial ANOVA in SPSS. Effect size measures will be printed out in the ANOVA table,
next to each of the F-statistics for the main effects and the interaction terms.
•
Having unequal cell sizes in a factorial ANOVA is a complex issue, from a technical
perspective, because it results in a dependency among the main effect and interaction
estimates of variability. Using the Type III SS, which is the default in SPSS, will provide
you with a test of unweighted means, which is usually the appropriate test to conduct with
unequal cell sizes.
•
It should be noted that higher-order factorial designs are typical in Social Science research
and all of the procedures that relate to a two-way ANOVA can easily be applied to higherorder factorial designs. However, with higher order designs there are more interaction terms
to deal with and considering anything above a three-way ANOVA makes interpreting the
results extremely difficult.
•
Suppose one had a 3-by-4-by-2 factorial design. In other words, a three-way factorial design
with three levels of factor A, four levels of factor B, and two levels of factor C. The
corresponding ANOVA would be a test of the following: (1) Three tests of the Main Effects
of Factor A, Factor B, and Factor C; (2) Three tests of the Two-way Interaction Effects of
AB, AC, and BC, and (3) One test of the Three-way Interaction Effect of ABC.