class_Aug2808_onewayAnova.pdf

ST 524
One way Analysis of variance – Variances not homogeneous
NCSU - Fall 2008
One way Analysis of variance
Example (Yandell, 1997)
A plant scientist measured the concentration of a particular virus in plant sap using ELISA
(enzyme-linked immunosorbent assay) (Novy 1992 1 ). A subset of the study is presented here.
The scientist wants to understand the resistance to the virus among the three selected
potato clones. Plant sap was taken from 5 inoculated plants of each clone, for a total
of 15 (3 Clones * 5 Rep) measurements of titer.
Data
Linear Model
yij = μ + α i + eij ,
i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN ( 0, σ 2 ) , and yij ∼ iidN ( μi , σ 2 ) ,
where μi = μ + α i is the population mean for ith clone.
32 = μ1 + e11 = μ + α1 + e11
529 = μ2 + e21 = μ + α 2 + e21
363 = μ3 + e32 = μ + α 3 + e32
Matrix representation of data according to linear model 1
Novy RG (1992) ``Characterization of somatic hybrids between Solanum etuberosum and
diploid, tuber-bearing Solanums. PhD Dissertation, Department of Plant Pathology, UW-Madison.
123 pp.
Tuesday August 26, 2008
1
ST 524
One way Analysis of variance – Variances not homogeneous
⎡ 32 ⎤ ⎡1
⎢ 12 ⎥ ⎢1
⎢
⎥ ⎢
⎢ 25⎥ ⎢1
⎢
⎥ ⎢
⎢ 61⎥ ⎢1
⎢ 93⎥ ⎢1
⎢
⎥ ⎢
⎢ 529 ⎥ ⎢1
⎢ 396 ⎥ ⎢1
⎢
⎥ ⎢
⎢ 629 ⎥ = ⎢1
⎢ 261⎥ ⎢1
⎢
⎥ ⎢
⎢ 325⎥ ⎢1
⎢
⎥ ⎢
⎢1361⎥ ⎢1
⎢ 363⎥ ⎢1
⎢
⎥ ⎢
⎢ 418⎥ ⎢1
⎢ 579 ⎥ ⎢1
⎢
⎥ ⎢
⎣⎢1660 ⎦⎥ ⎣⎢1
1 0⎤
⎡ e11 ⎤
⎢e ⎥
⎥
1 0⎥
⎢ 12 ⎥
⎢ e13 ⎥
1 0⎥
⎢ ⎥
⎥
1 0⎥
⎢ e14 ⎥
⎢ e15 ⎥
1 0⎥
⎢ ⎥
⎥
0 1⎥
⎢ e21 ⎥
0 1⎥ ⎡ μ ⎤ ⎢ e22 ⎥
⎢ ⎥
⎥
0 1⎥ ⎢⎢ α1 ⎥⎥ + ⎢ e23 ⎥
0 1⎥ ⎢⎣α 2 ⎥⎦ ⎢ e24 ⎥
⎢ ⎥
⎥
⎢ e25 ⎥
0 1⎥
⎢ ⎥
⎥
0 0⎥
⎢ e31 ⎥
⎢ e32 ⎥
0 0⎥
⎢ ⎥
⎥
0 0⎥
⎢ e33 ⎥
⎢e ⎥
0 0⎥
⎢ 34 ⎥
⎥
0 0 ⎦⎥
⎢⎣ e35 ⎥⎦ NCSU - Fall 2008
32 = μ + 1⋅ α1 + e11
529 = μ + 1⋅ α 2 + e21
363 = μ + 0 ⋅ α1 + 0 ⋅ α 2 + e32 = μ + e32
Y = Xβ + e
Analysis of variance table - Least Squares estimation
H o : μ1 = μ2 = μ3 = μ
H1 : at least one μi is different
Brown-Forsythe Homogeneity test
H o : σ 12 = σ 22 = σ 32 = σ 2
H1 : at least one σ is different
p‐value = 0.0675 Do not reject H0 at a significance level of 5%
2
i
Conclusion: We can assume that residual variances for each group are not significantly different from each other. Error Mean Squares (Error MS = 125447.0 ) is the least squares estimate of the common residual variance σ 2 , Tuesday August 26, 2008
2
ST 524
One way Analysis of variance – Variances not homogeneous
NCSU - Fall 2008
Bartlett’s Test for Homogeneity of variances H o : σ 12 = σ 22 = σ 32 = σ 2
H1 : at least one σ is different
p‐value =<.0001 Reject H0 at a significance level of 5%
2
i
Conclusion: We can assume that at least one residual variances for each group is significantly different from others. Assumption of Variance equality Error Mean Squares (Error MS = 125447. ) is the least squares estimate of the common residual variance σ 2 , Homogeneity of variance test (SAS Manual)
One of the usual assumptions for the GLM procedure is that the underlying errors are all
uncorrelated with homogeneous variances. You can test this assumption in PROC GLM by using the
HOVTEST option in the MEANS statement, requesting a homogeneity of variance test. This section
discusses the computational details behind these tests. Note that the GLM procedure allows
homogeneity of variance testing for simple one-way models only.
Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory
likelihood ratio test (the HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I
error rates and optimal power when the underlying distribution of the data is normal, it can be very
inaccurate if that distribution is even slightly nonnormal (Box 1953).
An approach that leads to tests that are much more robust to the underlying distribution is to
transform the original values of the dependent variable to derive a dispersion variable and then to
perform analysis of variance on this variable. The significance level for the test of homogeneity of
variance is the p-value for the ANOVA F-test on the dispersion variable.
Brown and Forsythe (1974) suggest using the absolute deviations from the group medians:
zijBF = yij − mi , where mi is the median of the ith group. You can use the HOVTEST=BF option to
specify this test.
Simulation results show that the Brown-Forsythe test seems best at providing power to detect
variance differences while protecting the probability of a Type I error
If one of these tests rejects the assumption of homogeneity of variance, you should use Welch's
ANOVA instead of the usual ANOVA to test for differences between group means.
Unless the group variances are extremely different or the number of groups is large, the usual
ANOVA test is relatively robust when the groups are all about the same size.
Residual plot
Tuesday August 26, 2008
3
ST 524
One way Analysis of variance – Variances not homogeneous
NCSU - Fall 2008
Residual plot shows dispersion is not homogeneous among clones, clone 7 shows a larger variability
than clone 2, with clone 3 having moderate variability.
Hartley’s Max F test for homogeneity of k variances.
H o : σ 12 =
= σ k2
H1 : at least one σ i2 is different
Test-statistic: Fmax =
max si2
∼ FHartley ,k ,ν , where k is the number of groups and ν is the number
min si2
of degrees of freedom for each sample, ν = r-1,
Assumptions
Random sampling within each group
Equal sample size for all k groups
Normality of observations. Highly sensitive to deviations from normality.
Example
H o : σ 12 = σ 22 = σ 32 = σ 2
H1 : at least one σ i2 is different
352755.7
= 334.59
1054.3
= 14.8
calculated Fmax =
FHartley ,3,4,0.05
Conclusion: Reject Ho
Tuesday August 26, 2008
4
ST 524
One way Analysis of variance – Variances not homogeneous
NCSU - Fall 2008
Mixed Model approach to heterogeneity of variances
1.
2.
3.
Fit linear model for titter with common residual variance
Fit Linear model for titter allowing separate residual variance within each group
Compare both fitting: likelihood ratio test, AIC, AICC, BIC
1.
Linear model yij = μ + α i + eij , i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN 0, σ
yij ∼ iidN ( μi , σ 2 )
(
2
) , and
e ∼ iidN ( 0, σ 2 I ) , as presented above
a.
Y = Xβ + e
b.
Results from PROC MIXED
σˆ e2 = Error MSGLM
H o : μ1 = μ2 = μ3 = μ
H1 : at least one μi is different
2.
Linear model
yij = μ + α i + eij , i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN ( 0, σ i2 ) , and yij ∼ iidN ( μi , σ i2 )
e ∼ iidN ( 0, R )
a.
Y = Xβ + e
b.
Results from PROC MIXED
Tuesday August 26, 2008
5
ST 524
One way Analysis of variance – Variances not homogeneous
NCSU - Fall 2008
Likelihood ratio test:
LRT is used to test whether a model with common variance should be preferred to a model with three
separate variances.
H o : σ 12 = σ 22 = σ 32 = σ 2 ,
H1 : σ ≠ σ ≠ σ
2
1
2
2
2
3
Null hypothesis fit a common residual variance for all three groups while
the alternative hypothesis fit a separate residual variance for each group.
-2ResLogL (common residual variance) = 179.8
-2ResLogL (three residual variances) = 157.9
(fit null hypothesis, 1 variance parameter)
(fit alternative hypothesis, 3 variance parameter)
Difference = [-2ResLogLreduced model ] – [-2ResLogLfull model ] = 179.8 – 157.9 = 21.9
Under null hypothesis Difference of is distributed as a Chi-square random variable with (3 – 1) degrees of
freedom.
Critical value at a 0.05 significance level, χ 22df ,0.05 = 5.99
P ( χ 22df > 21.85 )
Conclusion: Reject Ho , there is enough statistical evidence, at 5% significance level, to conclude that
variances are not equal.
Test of hypothesis for fixed effects
H o : μ1 = μ2 = μ3 = μ
H1 : at least one μi is different
Den DF = 3 × ( 5 − 1) = 12
Satterthwaite approximation for calculation of degrees of freedom when variances are not the same
Denominator degrees of freedom in test of hypothesis for fixed effects
[ MS1 + MS2 + MS3 ]
df satterthwaite =
2
⎡( MS1 ) df1 + ( MS 2 )2 df 2 + ( MS3 )2
⎣
2
df 3 ⎤
⎦
Thus,
Tuesday August 26, 2008
6
ST 524
One way Analysis of variance – Variances not homogeneous
[1054.3 + 22531 + 352756]
df satterthwaite =
⎡(1054.3)2 4 + ( 22531)2 2 + ( 352756 )2
⎣
NCSU - Fall 2008
2
4⎤
⎦
= 4.53
Results from PROC MIXED
Conclusion: There is statistical evidence (p-value=0.0064) that at least one clone titter mean is
different from others, at 0.05 significance level.
Least squares means
> sqrt(1054.3/5)
[1] 14.52102
> sqrt(22531/5)
[1] 67.12824
> sqrt(352756/5)
[1] 265.6148
Differences between pairs of least squares means
> sqrt(1054.3/5+22531/5)
[1] 68.68086
> sqrt(1054.3/5+352756/5)
[1] 266.0114
> sqrt(22531/5+352756/5)
[1] 273.9661
Residual plot
Important: Analysis of variance F test is robust to small variance heterogeneity when sample sizes are
equal. It is sensitive to deviations from normality.
Question: Is there a relationship between mean and variance, should a transformation be used?
Tuesday August 26, 2008
7