ST 524 One way Analysis of variance – Variances not homogeneous NCSU - Fall 2008 One way Analysis of variance Example (Yandell, 1997) A plant scientist measured the concentration of a particular virus in plant sap using ELISA (enzyme-linked immunosorbent assay) (Novy 1992 1 ). A subset of the study is presented here. The scientist wants to understand the resistance to the virus among the three selected potato clones. Plant sap was taken from 5 inoculated plants of each clone, for a total of 15 (3 Clones * 5 Rep) measurements of titer. Data Linear Model yij = μ + α i + eij , i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN ( 0, σ 2 ) , and yij ∼ iidN ( μi , σ 2 ) , where μi = μ + α i is the population mean for ith clone. 32 = μ1 + e11 = μ + α1 + e11 529 = μ2 + e21 = μ + α 2 + e21 363 = μ3 + e32 = μ + α 3 + e32 Matrix representation of data according to linear model 1 Novy RG (1992) ``Characterization of somatic hybrids between Solanum etuberosum and diploid, tuber-bearing Solanums. PhD Dissertation, Department of Plant Pathology, UW-Madison. 123 pp. Tuesday August 26, 2008 1 ST 524 One way Analysis of variance – Variances not homogeneous ⎡ 32 ⎤ ⎡1 ⎢ 12 ⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 25⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 61⎥ ⎢1 ⎢ 93⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 529 ⎥ ⎢1 ⎢ 396 ⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 629 ⎥ = ⎢1 ⎢ 261⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 325⎥ ⎢1 ⎢ ⎥ ⎢ ⎢1361⎥ ⎢1 ⎢ 363⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ 418⎥ ⎢1 ⎢ 579 ⎥ ⎢1 ⎢ ⎥ ⎢ ⎣⎢1660 ⎦⎥ ⎣⎢1 1 0⎤ ⎡ e11 ⎤ ⎢e ⎥ ⎥ 1 0⎥ ⎢ 12 ⎥ ⎢ e13 ⎥ 1 0⎥ ⎢ ⎥ ⎥ 1 0⎥ ⎢ e14 ⎥ ⎢ e15 ⎥ 1 0⎥ ⎢ ⎥ ⎥ 0 1⎥ ⎢ e21 ⎥ 0 1⎥ ⎡ μ ⎤ ⎢ e22 ⎥ ⎢ ⎥ ⎥ 0 1⎥ ⎢⎢ α1 ⎥⎥ + ⎢ e23 ⎥ 0 1⎥ ⎢⎣α 2 ⎥⎦ ⎢ e24 ⎥ ⎢ ⎥ ⎥ ⎢ e25 ⎥ 0 1⎥ ⎢ ⎥ ⎥ 0 0⎥ ⎢ e31 ⎥ ⎢ e32 ⎥ 0 0⎥ ⎢ ⎥ ⎥ 0 0⎥ ⎢ e33 ⎥ ⎢e ⎥ 0 0⎥ ⎢ 34 ⎥ ⎥ 0 0 ⎦⎥ ⎢⎣ e35 ⎥⎦ NCSU - Fall 2008 32 = μ + 1⋅ α1 + e11 529 = μ + 1⋅ α 2 + e21 363 = μ + 0 ⋅ α1 + 0 ⋅ α 2 + e32 = μ + e32 Y = Xβ + e Analysis of variance table - Least Squares estimation H o : μ1 = μ2 = μ3 = μ H1 : at least one μi is different Brown-Forsythe Homogeneity test H o : σ 12 = σ 22 = σ 32 = σ 2 H1 : at least one σ is different p‐value = 0.0675 Do not reject H0 at a significance level of 5% 2 i Conclusion: We can assume that residual variances for each group are not significantly different from each other. Error Mean Squares (Error MS = 125447.0 ) is the least squares estimate of the common residual variance σ 2 , Tuesday August 26, 2008 2 ST 524 One way Analysis of variance – Variances not homogeneous NCSU - Fall 2008 Bartlett’s Test for Homogeneity of variances H o : σ 12 = σ 22 = σ 32 = σ 2 H1 : at least one σ is different p‐value =<.0001 Reject H0 at a significance level of 5% 2 i Conclusion: We can assume that at least one residual variances for each group is significantly different from others. Assumption of Variance equality Error Mean Squares (Error MS = 125447. ) is the least squares estimate of the common residual variance σ 2 , Homogeneity of variance test (SAS Manual) One of the usual assumptions for the GLM procedure is that the underlying errors are all uncorrelated with homogeneous variances. You can test this assumption in PROC GLM by using the HOVTEST option in the MEANS statement, requesting a homogeneity of variance test. This section discusses the computational details behind these tests. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory likelihood ratio test (the HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I error rates and optimal power when the underlying distribution of the data is normal, it can be very inaccurate if that distribution is even slightly nonnormal (Box 1953). An approach that leads to tests that are much more robust to the underlying distribution is to transform the original values of the dependent variable to derive a dispersion variable and then to perform analysis of variance on this variable. The significance level for the test of homogeneity of variance is the p-value for the ANOVA F-test on the dispersion variable. Brown and Forsythe (1974) suggest using the absolute deviations from the group medians: zijBF = yij − mi , where mi is the median of the ith group. You can use the HOVTEST=BF option to specify this test. Simulation results show that the Brown-Forsythe test seems best at providing power to detect variance differences while protecting the probability of a Type I error If one of these tests rejects the assumption of homogeneity of variance, you should use Welch's ANOVA instead of the usual ANOVA to test for differences between group means. Unless the group variances are extremely different or the number of groups is large, the usual ANOVA test is relatively robust when the groups are all about the same size. Residual plot Tuesday August 26, 2008 3 ST 524 One way Analysis of variance – Variances not homogeneous NCSU - Fall 2008 Residual plot shows dispersion is not homogeneous among clones, clone 7 shows a larger variability than clone 2, with clone 3 having moderate variability. Hartley’s Max F test for homogeneity of k variances. H o : σ 12 = = σ k2 H1 : at least one σ i2 is different Test-statistic: Fmax = max si2 ∼ FHartley ,k ,ν , where k is the number of groups and ν is the number min si2 of degrees of freedom for each sample, ν = r-1, Assumptions Random sampling within each group Equal sample size for all k groups Normality of observations. Highly sensitive to deviations from normality. Example H o : σ 12 = σ 22 = σ 32 = σ 2 H1 : at least one σ i2 is different 352755.7 = 334.59 1054.3 = 14.8 calculated Fmax = FHartley ,3,4,0.05 Conclusion: Reject Ho Tuesday August 26, 2008 4 ST 524 One way Analysis of variance – Variances not homogeneous NCSU - Fall 2008 Mixed Model approach to heterogeneity of variances 1. 2. 3. Fit linear model for titter with common residual variance Fit Linear model for titter allowing separate residual variance within each group Compare both fitting: likelihood ratio test, AIC, AICC, BIC 1. Linear model yij = μ + α i + eij , i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN 0, σ yij ∼ iidN ( μi , σ 2 ) ( 2 ) , and e ∼ iidN ( 0, σ 2 I ) , as presented above a. Y = Xβ + e b. Results from PROC MIXED σˆ e2 = Error MSGLM H o : μ1 = μ2 = μ3 = μ H1 : at least one μi is different 2. Linear model yij = μ + α i + eij , i = 1, 2, 3, j = 1, . . ., 5 where eij ∼ iidN ( 0, σ i2 ) , and yij ∼ iidN ( μi , σ i2 ) e ∼ iidN ( 0, R ) a. Y = Xβ + e b. Results from PROC MIXED Tuesday August 26, 2008 5 ST 524 One way Analysis of variance – Variances not homogeneous NCSU - Fall 2008 Likelihood ratio test: LRT is used to test whether a model with common variance should be preferred to a model with three separate variances. H o : σ 12 = σ 22 = σ 32 = σ 2 , H1 : σ ≠ σ ≠ σ 2 1 2 2 2 3 Null hypothesis fit a common residual variance for all three groups while the alternative hypothesis fit a separate residual variance for each group. -2ResLogL (common residual variance) = 179.8 -2ResLogL (three residual variances) = 157.9 (fit null hypothesis, 1 variance parameter) (fit alternative hypothesis, 3 variance parameter) Difference = [-2ResLogLreduced model ] – [-2ResLogLfull model ] = 179.8 – 157.9 = 21.9 Under null hypothesis Difference of is distributed as a Chi-square random variable with (3 – 1) degrees of freedom. Critical value at a 0.05 significance level, χ 22df ,0.05 = 5.99 P ( χ 22df > 21.85 ) Conclusion: Reject Ho , there is enough statistical evidence, at 5% significance level, to conclude that variances are not equal. Test of hypothesis for fixed effects H o : μ1 = μ2 = μ3 = μ H1 : at least one μi is different Den DF = 3 × ( 5 − 1) = 12 Satterthwaite approximation for calculation of degrees of freedom when variances are not the same Denominator degrees of freedom in test of hypothesis for fixed effects [ MS1 + MS2 + MS3 ] df satterthwaite = 2 ⎡( MS1 ) df1 + ( MS 2 )2 df 2 + ( MS3 )2 ⎣ 2 df 3 ⎤ ⎦ Thus, Tuesday August 26, 2008 6 ST 524 One way Analysis of variance – Variances not homogeneous [1054.3 + 22531 + 352756] df satterthwaite = ⎡(1054.3)2 4 + ( 22531)2 2 + ( 352756 )2 ⎣ NCSU - Fall 2008 2 4⎤ ⎦ = 4.53 Results from PROC MIXED Conclusion: There is statistical evidence (p-value=0.0064) that at least one clone titter mean is different from others, at 0.05 significance level. Least squares means > sqrt(1054.3/5) [1] 14.52102 > sqrt(22531/5) [1] 67.12824 > sqrt(352756/5) [1] 265.6148 Differences between pairs of least squares means > sqrt(1054.3/5+22531/5) [1] 68.68086 > sqrt(1054.3/5+352756/5) [1] 266.0114 > sqrt(22531/5+352756/5) [1] 273.9661 Residual plot Important: Analysis of variance F test is robust to small variance heterogeneity when sample sizes are equal. It is sensitive to deviations from normality. Question: Is there a relationship between mean and variance, should a transformation be used? Tuesday August 26, 2008 7
© Copyright 2024 Paperzz