The ANalysis Of Variance = ANOVA Family

Comparing ≥ 3 Groups
Analysis of Biological Data/Biometrics
Dr. Ryan McEwan
Department of Biology
University of Dayton
[email protected]
So far in class we have looked at
situations where we have two
groups to compare.
Like LeBron’s assists
Or
House Prices
What do we do when there are
more than 2 groups?
You may be tempted to t-test everything one at a time.
This would be statistically invalid!
Instead you need to first ask: is there an overall effect of the treatment
This is done using….
Analysis of Variance or ANalysis Of Variance =
ANOVA
Let’s grab some data from the book
comp=read.csv("C:/Users/rmcewan1/Documents/R/A_BioData/Data/competition.csv")
First test for an overall treatment effect
summary(aov(comp$biomass~comp$clipping))
of
anova(lm(comp$biomass~comp$clipping))
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
Note that a lot of comparisons are being made in this kind of process…that has implications
Bonferroni correction
Basic idea is that as you increase the number of
comparisons you mathematically or de facto increase
the probability of making a Type 1 error.
(False positive)
So you need to adjust the P-value – make it more
stringent- for each individual test.
So you cannot just run regular t-tests as a Post-hoc
procedure. You have to do a correction.
Post-hoc, pairwise, procedures that follow ANOVA
have these corrections built in.
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
This is often referred to as a Post-hoc procedure
pairwise.t.test(comp$biomass,comp$clipping,padj="bonferroni")
Second test for individual differences
PAIRWISE COMPARISONS aka “Compare all Pairs”
This is often referred to as a Post-hoc procedure
Another version is the Tukey test (also called the Tukey-Kramer Multiple
Comparisons Test)
TukeyHSD(aov(comp$biomass~comp$clipping))
Generally, the Tukey HSD test is considered a superior method and we will
adopt this for our class.
ANalysis Of Variance =
Assumptions!
-Independence
-Normality
-Equal Variance
ANOVA
ANalysis Of Variance =
ANOVA
Assumptions! -Independence; -Normality; -Equal Variance
Guidelines
- One way ANOVA is robust to minor violations of normality and equal variance
- Assessing normality using tests (Shapiro, etc, is losing favor among some statisticians)
- Graphical methods (histogram, normal Q-Q plots) remain useful, however they are also
somewhat ambiguous.
- General practice we will adopt for this class:
- (a) Exploratory data analysis including outlier scanning
- (b) graphical methods
- (c) tests of both normality and equal variance
- (d) as the analyst, take all of this into account, make a decision and report it
-Generally speaking, the data should be “a mess” to resort to the non-parametric approach.
If you decide that you need to run a non-parametric test
because your data are not meeting the assumptions of ANOVA.
Run the Kruskal-Wallis Test
kruskal.test(dataset$measurement~dataset$group)
It is important to note that this test is less powerful for detecting
differences. It is more conservative, thus you run the risk of a type
II error- false negative.
For Non-normal data, you can use the pairwise Wilcoxon test.
pairwise.wilcox.test(data$measures, data$group, p.adj=“bonferroni”)
It is important to note that this test is less powerful for detecting
differences. It is more conservative, thus you run the risk of a type
II error- false negative.
(1) Exploratory Data Analysis (screen for outliers, look at the range of variation, etc)
(2) Test for a normal distribution and equal variance
-hist(DATA) (use this to take a look at the distributions
-use plot(aov( ) to create 4 plots. FIRST set par(mfrow=c(2,2))
Shapiro-Wilk test: shapiro.test(DATA) : [P ≤0.05 = not-normal]
Bartlett test for homogeneity of variance: bartlett.test(Data ~ Site) [P ≤0.05 = not-equal variance]
Reject the null hypothesis of a normal distribution
“Non-NORMAL”
kruskal.test(data$measures~data$group)
P ≤ 0.05
Significant Treatment Effect!
P > 0.05
Cannot reject the nullTreatments are statistically
indistinguishable
Analysis complete
pairwise.wilcox.test(data$measures, data$group,
p.adj=“bonferroni”)
This code will show you difference among each pair of
treatments in the data set.
You need to report the result of the overall ANOVA and the
individual comparisons.
Cannot reject the null of a normal distribution
“NORMAL”
anova(lm(data$measures~data$group))
P ≤ 0.05
Significant Treatment Effect!
What are the individual differences
P > 0.05
Cannot reject the nullTreatments are statistically
indistinguishable
Analysis complete
TukeyHSD(aov(data$measures~data$group))
This code will show you difference among each pair of treatments in
the data set.
You need to report the result of the overall ANOVA and the individual
comparisons.
-hist(DATA) (use this to take a look at the distributions
-use plot(aov( ) to create 4 plots. FIRST set par(mfrow=c(2,2))
Shapiro-Wilk test: shapiro.test(DATA)
Bartlett test for homogeneity of variance: bartlett.test(Data ~ Site)
Overall treatment effect P = 0.0087
A
Overall treatment effect P = 0.0087
A
A
Overall treatment effect P = 0.0087
A
A
A
Overall treatment effect P = 0.0087
B
A
A
A
Overall treatment effect P = 0.0087
B
A
A
A
B
Overall treatment effect P = 0.0087
B
AB
A
A
B
Overall treatment effect P = 0.0087
B
AB
A
AB
B
Overall treatment effect P = 0.0087
B
AB
A
AB
B
Overall treatment effect P = 0.0087
AB
A
B
B
AB
boxplot(comp$biomass~comp$clipping,ylab="me
an biomass",
xlab="competition treatment",col="darkgreen")
hist(comp$biomass)
par(mfrow=c(2,2))
plot(aov(comp$biomass~comp$clipping))
shapiro.test(comp$biomass)
bartlett.test(comp$biomass~comp$clipping)
anova(lm(comp$biomass~comp$clipping))
TukeyHSD(aov(comp$biomass~comp$clipping))
(1) Exploratory Data Analysis (screen for outliers, look at the range of variation, etc)
(2) Test for a normal distribution and equal variance
-hist(DATA) (use this to take a look at the distributions
-use plot(aov( ) to create 4 plots. FIRST set par(mfrow=c(2,2))
Shapiro-Wilk test: shapiro.test(DATA)
Bartlett test for homogeneity of variance: bartlett.test(Data ~ Site)
Reject the null hypothesis of a normal distribution
“Non-NORMAL”
kruskal.test(data$measures~data$group)
P ≤ 0.05
Significant Treatment Effect!
P > 0.05
Cannot reject the nullTreatments are statistically
indistinguishable
Analysis complete
pairwise.wilcox.test(data$measures, data$group,
p.adj=“bonferroni”)
This code will show you difference among each pair of
treatments in the data set.
You need to report the result of the overall ANOVA and the
individual comparisons.
Cannot reject the null of a normal distribution
“NORMAL”
anova(lm(data$measures~data$group))
P ≤ 0.05
Significant Treatment Effect!
What are the individual differences
P > 0.05
Cannot reject the nullTreatments are statistically
indistinguishable
Analysis complete
TukeyHSD(aov(data$measures~data$group))
This code will show you difference among each pair of treatments in
the data set.
You need to report the result of the overall ANOVA and the individual
comparisons.
B
B
A
The ANalysis Of Variance =
ANOVA Family
Repeated Measures ANOVA - if you measure the same treatments many times
Two-way ANOVA
– if you have more than one treatment influence a single set of
experimental units
Interaction term!!
Randomized
Complete
Block
ANOVA - RCB-ANOVA
-If your experiment is a block design (more on this later)
ANCOVA-
If your experiment has a “Co-Variate”
MANOVA-
Multivariate ANOVA for when you have large complex data sets