Lecture 33 Multiple Factor ANOVA

Lecture 33
Multiple Factor ANOVA
STAT 512
Spring 2011
Background Reading
KNNL: Chapter 24
33-1
Topic Overview
• ANOVA with multiple factors
33-2
3-Way ANOVA Model
• Three factors A, B, and C having a, b, and c,
levels, respectively
• Notation is similar to before.
33-3
Data for three-way ANOVA
−Y, the response variable
−Factor A with levels i = 1 to a
−Factor B with levels j = 1 to b
−Factor C with levels k = 1 to c
−Yijkl is the lth observation in cell (i,j,k), l = 1 to
nijk
−A balanced design has nijk = n
33-4
Cell Means Model
Yijkl = µ ijk + εijkl
− µijk is the theoretical mean or expected value
of all observations in cell (i,j,k).
( )
~ N ( µ , σ ) are independent
iid
2
ε
~
N
0,
σ
− ijkl
− Yijkl
2
ijk
33-5
Treatment Means
µij i =
1
c
∑ µijk
k
µi ii = bc1 ∑ µijk
j ,k
µiii =
1
abc
µi ik = b1 ∑ µijk
µi jk =
1
a
j
µi j i =
1
ac
∑ µijk
i ,k
∑µ
ijk
i
µiik =
1
ab
∑µ
ijk
i, j
∑µ
ijk
i , j ,k
33-6
Estimates
µˆijk =
1
n
∑Y
ijkl
l
µˆij i =
1
cn
∑Y
ijkl
k ,l
j ,k ,l
1
abcn
µˆi jk =
1
an
j ,l
1
µˆi ii = bcn
∑Yijkl
µˆiii =
µˆi ik = bn1 ∑Yijkl
µˆi j i =
1
acn
∑Yijkl
i ,k ,l
∑Y
ijkl
i ,l
µˆiik =
1
abn
∑Y
ijkl
i , j ,l
∑Y
ijkl
i , j ,k ,l
33-7
Factor effects model
Yijk = µ + α i + β j + γ k + ( αβ )ij + ( αγ )ik + ( βγ ) jk + ( αβγ )ijk + εijkl
− µ is the overall (grand) mean
− α i , β j , γ k are the main effects of factors A, B,
and C
− ( αβ )ij , ( αγ )ik , ( βγ ) jk are the two-way (first
order) interactions
− ( αβγ )ijk is the three-way (second order)
interaction
33-8
Factor Effects
αi = µi ii − µiii
(αβ )ij = µij i − µi ii − µi j i + µiii
β j = µi j i − µiii
(αγ )ik = µi ik − µi ii − µiik + µiii
γk = µiik − µiii
(αβ )jk = µi jk − µi j i − µiik + µiii
(αβγ )ijk = µijk − µij i − µi ik − µi jk + µi ii
+ µi j i + µiik − µiii
Plug in cell means to estimate.
33-9
Constraints
• Usual constraints listed on page 997 – sums
of effects for ANY of the indices are zero.
Under these, µiii will be the grand mean.
• In SAS, constraints are all set up to compare
everything to µabc . Thus a factor effect is
zero if it includes any of the “last” levels of
the factors.
33-10
Assumptions
• Constancy of variance applies across cells;
can do residual plots across treatment
combinations
• For violations, transformations can
sometimes be useful; WLS is a standard
remedial measure if the error distribution is
normal but the variances are different.
33-11
ANOVA Table
• SSTR/Model is partitioned into:
Main Effects
Two Way Interactions
Three Way Interactions
Etc.
• DF are multiplicative. For example, threeway interaction between A, B, C, takes up
(a − 1)(b − 1)(c − 1) DF.
• SS formulas given on page 1008.
33-12
Steps in 3-Factor Analysis
1. Fit full model and check assumptions
2. Start with the 3-way interaction and
determine if it is significant.
3. If not, may consider pooling. To avoid
likelihood of Type I errors, best to pool only
in cases where p-value is not close to
significant.
4. If 3-way interaction (or multiple 2-way
interactions) are significant, then analyze the
three factors jointly in terms of µijk .
33-13
Steps in 3-Factor Analysis (2)
5. If only a single two-way interaction is
significant, may again consider pooling, and
can analyze via regular interaction plot. Do
NOT pool any term for which higher order
terms are significant.
6. Can analyze main effects if factor not
involved in important interaction. May also
be able to look at main effects if they are
large compared to the interactions.
33-14
With More than three factors...
• Hope that higher order interactions are not
significant (this is often the case). If they
are, try to analyze cell means. Assuming
they are not...
• Interactions that overlap (e.g. AB and BC)
and are significant suggest analysis of the
three-factor level means.
• Another potential strategy is to combine
factors (e.g. gender and smoking might be
considered one factor with 4 levels)
33-15
Multiple Comparisons
• Tukey, Bonferroni, and Scheffe adjustments
can be made as before (see page 1017 for
appropriate degrees of freedom to use;
generally model and/or error).
• Can utilize contrasts to study specific
questions (should use Scheffe if looking at
any unplanned contrasts; Bonferroni is
appropriate for contrasts that have been
planned in advance)
33-16
Unequal Cell Sizes
• Formulas change a bit as not all of the nijk
are the same
• Look at Type III SS as well as Type I (the
closer the sample sizes are to each other,
the less difference there will be).
• MUST use LSMeans to do comparisons
33-17
Empty Cells
• Can often be problematic for larger designs
• Create situations where some effects are
confounded; generally interactions can
only be partially studied.
• Usually forced to assume some interactions
are zero.
• See page 964 for more on empty cells
33-18
Example
• Problem 24.6 (alloy.sas)
• Studying the effects of three factors on the
hardness of an alloy
• Factor A: Use of a chemical additive (1 =
low amount; 2 = high amount)
• Factor B: Temperature (1 = low, 2 = high)
• Factor C: Time allowed for process (1 =
low, 2 = high)
• Three observations per cell, balanced design
33-19
33-20
33-21
33-22
33-23
Interactions
• Parallel lines suggests no interactions. If we
look at the ANOVA table, this is seen there
as well.
Source
DF
additive
1
time
1
add*time
1
temp
1
add*temp
1
time*temp 1
ad*tim*tem 1
Error
16
Total
23
SS
789
2440
0.20
1539
0.24
2.94
0.60
53.7
4826
MS
789
2440
0.20
1539
0.24
2.94
0.60
3.36
F
235
727
0.06
458
0.07
0.88
0.18
Pr > F
<.0001
<.0001
0.8095
<.0001
0.7926
0.3634
0.6778
33-24
Analysis
• In this (nice) case we can simply look at the
individual means and draw conclusions
additive
1_low
2_high
time
1_low
2_high
LSMEAN
54.2250000
65.6916667
LSMEAN
49.8750000
70.0416667
Pr > |t|
<.0001
Pr > |t|
<.0001
33-25
Analysis (2)
temp
1_low
2_high
LSMEAN
51.9500000
67.9666667
Pr > |t|
<.0001
• High levels for all three variables are
preferred.
• Don’t forget assumptions (in this case not
too bad; something weird in cell #1)
33-26
Example (adjusted)
• Data changed a bit (see SAS code)
• Basically, for illustration, interchanged the
cells for A = 1, B = 2 and A = 2, B = 2
• Interaction Plot now suggests interaction
33-27
33-28
Two-way Interaction Plots
• From the 3-way interaction plot we can
guess that the interaction has to do with
time (but not temp since individually, lines
for same level of temp are parallel)
• This is confirmed by looking at the 2-way
interaction plots
33-29
• Interaction between additive and
temperature
33-30
• No interaction between additive/time (and
no apparent effect of additive if we ignore
temperature)
33-31
• No interaction between time/temp; there is
apparently a main effect of temperature in
addition to the interaction.
33-32
ANOVA Output
Source
DF
additive
1
time
1
add*time
1
temp
1
add*temp
1
time*temp 1
ad*tim*tem 1
Error
16
Total
23
SS
0.24
2440
0.60
1539
789
2.94
0.20
53.7
4826
MS
0.24
2440
0.60
1539
789
2.94
0.20
3.36
F
0.07
727
0.18
458
235
0.88
0.06
Pr > F
0.7926
<.0001
0.6778
<.0001
<.0001
0.3634
0.8095
33-33
Results
• Additive interacts with Temperature; Will want
to examine that interaction
• Temperature is by itself significant; so
probably can look at main effect for that as
well.
• Would be inappropriate to look at main effect
for Additive; factor is important in how it
interacts with temp and main effect here will
be misleading
• Can look at main effect for time since there is
no interaction there.
33-34
Results (2)
time
1_low
LSMEAN
49.8750000
2_high
70.0416667
temp
1_low
2_high
LSMEAN
51.9500000
67.9666667
Pr > |t|
<.0001
Pr > |t|
<.0001
• Longer time is better
• Apparently higher temperature is better
33-35
Results (3)
additive
1_low
1_low
2_high
2_high
i/j
1
2
3
4
temp
1_low
2_high
1_low
2_high
1
<.0001
<.0001
<.0001
2
<.0001
<.0001
<.0001
LSMEAN
46.317
73.800
57.583
62.133
3
<.0001
<.0001
Number
1
2
3
4
4
<.0001
<.0001
0.0028
0.0028
33-36
Results (4)
• Can identify a “best” combination of
additive and temperature (low additive,
high temperature)
• As we saw in the interaction plot, the
additive counteracts the effect of raising
the temperature to some degree
33-37
Upcoming …
• More multiple ANOVA / ANCOVA
examples
• Fixed vs. Random Effects
33-38