ASSUMPTIONS IN THE ANOVA Assumptions in the ANOVA and the

ASSUMPTIONS IN THE ANOVA
Assumptions in the ANOVA and the mathematical model may not always be true in
data from experiments.
1.
The error terms or residual effects, eij, are independent from observation to
observation and are randomly and normally distributed with zero mean and the
same variance ó 2. This can be expressed as eij are iN (0, ó2).
2.
Variances of different samples are homogeneous.
3.
Variances and means of different samples are not correlated, i.e., are
independent.
4.
The main effects (block and treatment) are additive.
Need to ensure that the data fits the assumptions of the analysis.
Know assumptions and tests for violations of the assumptions
Weights, lb, of vitamin-treated and control animals in a RCBD (from Little and Hills)
Block
Treatment
I
II
III
IV
Total
Mean
Mice-control
0.18
0.30
0.28
0.44
1.2
0.3
Mice-vitamin
0.32
0.40
0.42
0.46
1.6
0.4
Subtotal
0.50
0.70
0.70
0.90
2.8
0.35
Chickens-control
2.0
3.0
1.8
2.8
9.6
2.40
Chickens-vitamin
2.5
3.3
2.5
3.3
11.6
2.90
Subtotal
4.5
6.3
4.3
6.1
21.2
2.65
Sheep-control
108.0
140.0
135.0
165.0
548.0
137.0
Sheep-vitamin
127.0
153.0
148.0
176.0
604.0
151.0
Subtotal
235.0
293.0
283.0
341.0
1152.0
144.0
Total
240.0
300.0
288.0
348.0
1176.0
Mean
40.0
50.0
48.0
58.0
1
49.0
ANOVA
Source
df
SS
MS
F
F.05
F.01
Total
23
111,567.40
Treatment
(5)
108,713.68
21,742.74
174.43**
2.90
4.56
Species
2
108,321.16
54,160.58
434.51**
3.68
6.36
Vitamin
1
142.11
142.11
1.14
4.54
8.68
Species x Vitamin
2
250.41
125.20
1.00
3.68
6.36
Block
3
984.00
328.00
2.63
3.29
5.42
Error
15
1,869.72
124.65
Weights of mice, chickens and sheep are significantly different, which is hardly a
surprise.
No effect of vitamin is seen.
Test whether assumptions are met.
1. Normally-distributed, Random and Independent Errors
Generally deviations from the assumption of normality do not seriously affect the validity
of the analysis of variance.
One informal test for normality is to graph the data. This is also very useful for detecting
outliers.
A better test for normality is to calculate and graph the error components or residuals
for each observation. This is equivalent to graphing the data after correcting for
treatment and block effects. Calculation of error component:
2
Error components in vitamin experiment
Block
Treatment
I
II
III
IV
Total
Mice-control
8.88
-1.00
0.98
-8.86
0
Mice-vitamin
8.92
-1.00
1.02
-8.94
0
Chickens-control
8.60
-0.40
0.40
-8.60
0
Chickens-vitamin
8.60
-0.60
0.60
-8.60
0
Sheep-control
-20.00
2.00
-1.00
19.00
0
Sheep-vitamin
-15.00
1.00
-2.00
16.00
0
Totals
0
0
0
0
0
These errors appear to occur in groups, rather than randomly.
Graphing the errors gives the following:
Errors in vitamin experiment
Range of errors is larger for the larger means.
For means over 100, that the errors increase linearly as the means increase. This
hardly appears random. This data set fails the assumption of normally-distributed,
random and independent errors.
3
2. Homogeneity of Variances
Calculate the variance for each treatment. For treatment 1:
SS(t1) = ÓX2 - (ÓX)2/r
= 0.182 + ... + 0.442 - (1.2)2/4 = 0.0345
2
s (t1) = SS/df = 0.0345/3 = 0.0115
Perform Bartlett’s test for homogeneity of variance
Treatment
df
s2
Coded s2
Log coded s2
Mice - C
3
0.0115
11.5
1.06
Mice - V
3
0.0035
3.5
0.54
Chick - C
3
0.3467
346.7
2.54
Chick - V
3
0.2133
213.3
2.33
Sheep - C
3
546.0
546,000
5.74
Sheep - V
3
425.3
425,000
5.63
971,875
17.84
Total
Mean
18
161,979
Log of Mean
5.209
4
3. Independence of Means and Variances
If variances are homogeneous and independent of the means then
Treatment
Mean
s2
s
s2/Mean
s/Mean
Mice - C
0.3
0.01147
0.107
0.04
0.36
Mice - V
0.4
0.0347
0.059
0.01
0.15
Chick - C
2.4
0.3467
0.589
0.14
0.24
Chick - V
2.9
0.2133
0.462
0.07
0.16
Sheep - C
137.0
546.0
23.367
3.98
0.17
Sheep - V
151.0
425.3
20.624
2.82
0.14
Standard deviation is closely related to the mean. This suggests that a log
transformation should be used.
5
4. Additivity
Terms in the mathematical model for a design are additive.
In an RCB, the treatment and block effects are assumed to be additive.
This means the treatment effects are the same in all blocks and the block effects are
the same in all treatments.
Additive effects
Block
I
Trt 1
10
II
(+10)
20
(+20)
Trt 2
30
(+20)
(+10)
40
Multiplicative effects
Block
I
Trt 1
10
II
(+100%)
20
(+200%)
Trt 2
30
(+20)
(+200%)
60
Log transformation can transform multiplicative effects into additive effects.
Log Transformed effects
Block
I
Trt 1
1.00
II
(+0.30)
1.30
(+0.48)
Trt 2
1.48
(+0.48)
(+0.30)
1.78
6
Tukey’s test for additivity
Block
Treatment
I
II
III
IV
Mean
Trt Effect
Mice-control
0.18
0.30
0.28
0.44
0.3
-48.7
Mice-vitamin
0.32
0.40
0.42
0.46
0.4
-48.6
Chickens-control
2.0
3.0
1.8
2.8
2.40
-46.6
Chickens-vitamin
2.5
3.3
2.5
3.3
2.90
-46.1
Sheep-control
108.0
140.0
135.0
165.0
137.0
88.0
Sheep-vitamin
127.0
153.0
148.0
176.0
151.0
102.0
Mean
40.0
50.0
48.0
58.0
49.0
Block Effect
-9.0
1.0
-1.0
9.0
ANOVA
Source
df
SS
MS
Error (BxT)
15
1869.72
Nonadditivity
1
1822.94
1822.94
Residual Err
14
46.78
3.34
F
545.7
F.05
4.60
Assumption of additivity is incorrect.
POSSIBLE COURSES OF ACTION
1. Analyze species separately. Have valid tests for each species, but no information on
interactions.
7
Species
Source
df
Mice
Block
3
0.0400
0.0133
8.31
Vitamin
1
0.0200
0.0200
12.50*
Error
3
0.0048
0.0016
Block
3
1.64
0.547
41.00**
Vitamin
1
0.50
0.500
37.5**
Error
3
0.04
0.013
Block
3
2834.0
944.7
157.4**
Vitamin
1
392.0
392.0
66.3**
Error
3
18.0
6.0
Chickens
Sheep
SS
MS
F
See a significant effect of vitamin. No test for effect of species or for interaction.
2.Transform data and re-analyze. Because standard deviation is proportional to the
mean, use a log transformation.
Source
df
SS
MS
F
Block
3
0.12075
0.04025
13.77**
Treatment
5
28.60738
5.72148
1959.41**
Vitamin
1
0.04860
0.04860
16.62**
Species
2
28.54926
14.27463
4883.00**
VxS
2
0.00952
0.00476
Error
15
0.04385
0.00292
1.63
Effect of species is significant. Effect of vitamin is significant. Interaction is not
significant.
Implications: Can use a mouse model instead of the larger and more expensive sheep.
After transforming, the tests of the assumptions need to be redone to make sure the
assumptions are now met.
8
Example: Bartlett’s test on transformed data
Treatment
s2
df
Coded s2
Log coded s2
Mice - C
3
0.0243
24.3
1.39
Mice - V
3
0.0040
4.0
0.60
Chick - C
3
0.0118
11.8
1.07
Chick - V
3
0.0048
4.8
0.68
Sheep - C
3
0.0062
6.2
0.79
Sheep - V
3
0.0038
3.8
0.58
54.9
5.11
Total
18
Mean
9.15
Log of Mean
0.9614
The assumption of homogeneity of variance is now met.
TRANSFORMATIONS
1. Log transformation
Use when
standard deviation is proportional to the mean
main effects are multiplicative rather than additive
data are whole numbers and cover a wide range of values
can not use if data has negative values
Examples
number of insects per plot
number of egg masses per plant
9
Coding
if data has values <10, multiply by a constant (power of 10)
if data has 0's, use X + 1
Meaning of analysis
Original data, does the amount of change in X vary in response to treatment?
Transformed, does the proportion or percent change in X vary in response to
treatment?
2. Square root transformation
Use when
variance is proportional to the mean (Poisson distribution)
data are counts of rare events
data are small whole numbers
percentage data where the range is 0 to 30% or 70 to 100%
Examples
number of infected plants per plot
number of insects caught in traps
number of weeds in a plot
Coding
if data has values <10, use X + ½
Meaning of analysis
detransformed means are weighted with more weight given to smaller values
(smaller variates are measured with less sampling error than the larger ones)
increases the precision with which differences between small means are
measured
deviations from assumptions and corrections generally smaller than for log
transformation
3. Arcsine or angular transformation
Transform by taking the arcsine of the square root of each data point expressed as a
proportion (percentage/100). Tables are available to go directly from percentages to
arcsine transformation.
Use when
data are based on counts and expressed as percentages or proportions of
sample total
data follow a binomial distribution
variances are small on the two ends of the range of values, but larger in the
middle
range of percentages is over 40%
Examples
10
percent germination
Coding
none
Meaning of analysis
means are weighted with more emphasis given to means at the two ends of the
range
precision of comparisons is increased at the ends of the range
Tests of Means
Comparison of means of transformed data by multiple range tests
use transformed means
use variance of the transformed data
Transformed means are weighted means and are correct for the transformed data.
Presentation of Means
For presentation or publication, detransform the means to the original units to make
them more understandable.
Showing mean differences by letters can be confusing because detransformed means
may not look very different from each other, although they are significantly different
when transformed.
Explain the transformation that was used and why.
Discuss the mean differences in the text.
11