Lab 7 – Part B Multi-factor ANOVA, Testing Assumptions, Multiple

Lab 7 – Part B
Multi-factor ANOVA, Testing Assumptions, Multiple Comparisons
Now that we have mastered one-way ANOVA, we’ll move on to look at comparisons when there is more
than just one treatment (independent variable). In this lab we’ll cover standard R and SAS functions to
carry out an analysis of variance with multiple treatment levels, to test for assumptions, and to follow up
with pair-wise comparisons if the treatment effects are significant.
7. 2. Multi-factor ANOVA

A multi-factor ANOVA is just like a one-way ANOVA, but with more than one treatment. Each
treatment may still have many levels. In multi-factor ANOVA, we not only look at the effect of each
treatment, but we also look at the interaction between the treatments.

For a multi-factor ANOVA, there are two effects: one main effects (one for each treatment) and an
interaction effect (for all combinations of treatments):
Main effect: the effect of an independent variable (treatment) on a group.
Interaction: the effect of one independent variable on the other.

The good news is that you already know how to do this analysis! In R, the one-way and multi-factor
ANOVAs use the same code. We will use the lentil data for this lab also, so start by importing the
data into R.
data=read.csv("lentils.csv")
attach(data)

First, you should state your hypotheses for the analysis. Remember that you will need a null and
alternative hypothesis for each main effect and each interaction.

Now, run a multi-factor ANOVA by including both the FARM and VARIETY variables in the model.
You can also quickly visualize the interactions if you have a significant effect for this:
anova(lm(YIELD~FARM*VARIETY))
interaction.plot(FARM,VARIETY,YIELD)
7.3. Checking the assumption of normality

You can use a boxplot for a visual check of normality for multi-level experiments by listing all your
treatments with a multiplier. You can use more than two treatments. Note that the order in which you
specify your treatments determines how they are grouped in the box plot. The treatment levels always
come in alphabetical order, so if you want a different order you have to rename the treatments.
boxplot(YIELD~LOCATION*VARIETY)
boxplot(YIELD~VARIETY*LOCATION)

You can also use histograms to visually inspect your data. The histograms should show a
symmetrical distribution (however this does not work with small sample sizes). You can theoretically
do visual inspections and test for significant departures from normality in any of your treatment
combinations:
Farm1VarB=lentils[LOCATION=="Farm1"&VARIETY=="B",]
hist(Farm1VarB$YIELD)
shapiro.test(Farm1VarB$YIELD)

Instead of checking normality for each treatment combination, you can also calculate residuals, then
look at the histogram of residual values or execute a test of deviations from normality in the residuals.
Remember that the test may not make much sense, but you should know how to do it:
RES=residuals(lm(YIELD~FARM*VARIETY))
hist(RES)
shapiro.test(RES)

Finally, we can look at a plot of residuals. The command below will give you multiple graphs that you
sequentially visualize by repeatedly clicking into the graph window. The one we care about (i.e. that
we covered in class) is the first graph you see after clicking into the empty plot window.
plot(lm(YIELD~FARM*VARIETY)) # several versions
plot(residuals(lm(YIELD~FARM*VARIETY))) # just residuals
7.4. Checking for equal variances

You can use Bartlett’s test R in to check the assumption of equal variances. This test only works for a
single factor ANOVA. If you have a multi-factor ANOVA, you can do the tests for sub-sets:
Farm1=lentils[FARM=="Farm1",]
Farm2=lentils[FARM =="Farm2",]
bartlett.test(Farm1$YIELD~Farm1$VARIETY)
bartlett.test(Farm2$YIELD~Farm2$VARIETY)

And, of course we should look at residuals:
plot(lm(YIELD~FARM*VARIETY))
7.5. Data transformations

You can see a “wedge” in your plot of residuals, which indicates unequal variances, although the test
could not identify a significant deviation from the equal variance assumption because of the small
sample size. In any case, try to transform your data and revisit the residual plots or check the result of
transformations with boxplots. Do you see improvements?
sqrt_YIELD=sqrt(YIELD)
log10_YIELD=log10(YIELD)
inv_YIELD=1/YIELD
boxplot(YIELD~FARM*VARIETY)
boxplot(sqrt_YIELD~FARM*VARIETY)
boxplot(log10_YIELD~FARM*VARIETY)
boxplot(inv_YIELD~FARM*VARIETY)
plot(lm(log_YIELD~FARM*VARIETY))
...

Remember that you can fine-tune the transformations: they become more powerful if you subtract a
constant, so that the smallest value in the dataset approaches 1. You can also change the base of
the logarithm to smaller values to make the log-transformation less powerful:
sqrt2_YIELD=sqrt(YIELD-160)
ln_YIELD=log(YIELD)
log2_YIELD=log2(YIELD)
7.6. Follow-up with pairwise comparisons

If you find that a treatment effect is significant in ANOVA, you may follow your analysis up with
pairwise T-tests and an adjustment for multiple inference, as discussed in class. The line
TukeyHSD(aov(…)) carries out pairwise comparisons with Tukey’s adjustment for multiple
inference. First for a one-way ANOVA:
anova(lm(YIELD~FARM*VARIETY))
TukeyHSD(aov(YIELD~FARM*VARIETY))

This carries out and makes adjustments for all possible pairwise comparisons (c=15), but that may
not correspond to the question you are asking. You probably do not care whether the combination
Variety A Farm 1 is different from Variety 3 on Farm 2. Instead, you likely want to compare all
varieties at Farm 1 (c=3), and Farm 2 (c=3), for a total of 6 comparison for which you need to make
adjustments.
So you need to subset your dataset (line 1 and 2) to make the comparisons of interest without an
automatic adjustment like Tuckey’s HSD above (line 3 and 4), then copy and paste your p-value into
a vector and make a manual adjustment. I recommend a sequential Bonferroni method devised by a
statistician by the name of Holm.
Farm1=lentils[FARM=="Farm1",]
Farm2=lentils[FARM =="Farm2",]
pairwise.t.test(Farm1$YIELD, Farm1$VARIETY, p.adj="none")
pairwise.t.test(Farm2$YIELD, Farm2$VARIETY, p.adj="none")
p.adjust(c(7.8e-07,1.0e-06,0.74,1.8e-10,2.1e-10,0.61),
n=6, method="holm")

There is a good paper written for biologists about Holm’s adjustment method, and it also explains how
you can do the adjustment manually (very easy). You can cite it in your own thesis/publications:
Rice, W.R. 1989. Analyzing tables of statistical tests. Evolution 43: 223–225.
7.7. And now, all the above with SAS (AS USUAL, THIS IS OPTIONAL)

Import:
proc import out=lentils
datafile="C:\Lab7\lentils.csv"
dbms=csv replace;
getnames=yes;
datarow=2;
guessingrows=10000;
run;

Multi-factor ANOVA:
proc glm data=lentils;
class location variety;
model yield=location|variety;
run;

Boxplots for all treatment combinations:
proc sort data=lentils; by location variety; run;
proc boxplot data=lentils;
ID id;
plot yield*variety(location) /boxstyle=schematicid;
run;

Test of normality for each treatment combination:
proc univariate data=lentils normal;
var yield;
class location variety;
run;

Residual Plots:
proc glm data=lentils;
class location variety;
model yield=location|variety;
output out=res residual=res predicted=pred;
run; quit;
proc gplot data=res;
plot res*pred;
run;

Bartlett’s test of equal variances for one-way ANOVA:
data lentils1;
set lentils;
if location='Farm1';
run;
proc glm data=lentils1;
class variety;
model yield=variety;
means variety /hovtest=bartlett;
run;

Bartlett's test for multi-factor ANOVA (also works for single-factor).
The relevant statistic to check is “Chi-Square & Pr > ChiSq”
proc mixed data=lentils;
class variety location;
model yield=variety|location;
repeated /group=variety*location;
run;

Multiple pairwise comparisons
proc glm data=lentils;
class location variety;
model yield=location|variety;
lsmeans location*variety/pdiff=all adjust=tukey;
run;