Chapter 28 Analysis of Variance (ANOVA) Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 1 / 13 In Ch.23, we studied one-sample mean questions (CI and test). In Ch.24 and Ch.25, we studied two-sample mean questions. Review: independent vs paired; pooled vs non-pooled. What if we want to compare means for more than two groups? e.g., we want to know the average income in all the provinces and territories across Canada. Objective: we test whether more than two groups have equal means. Such test is called Analysis of Variance (ANOVA). Remark: ANOVA tests mean NOT variance. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 2 / 13 Intuition of Comparison First, we need a “standard” for differences. In ANOVA, we choose “Within-Groups Variability” as the standard. Our real purpose is to investigate “Between-Groups Variability”. In the second graph, the variation within each group is so small the differences between groups stands out. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 3 / 13 Naive Understanding We compare the differences between the means of the groups with the variation within the groups. When the differences between means are large compared with the variation within the groups, we conclude the means are probably not equal. We look at the ratio differences among the means . variability within groups When this ratio is very large, there is strong evidence to suggest that not all means are equal. Measurement for differences is variance. The more the means differ, the larger this variance will be. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 4 / 13 Variation 1 Variation within groups: Error Mean Square (also called Within Mean Square), denoted by MSError or MSE . 2 Variation between groups: Treatment Mean Square (also called Between Mean Square), denoted by MSTreatment or MST . Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 5 / 13 F-Test 1 H0 : µ1 = µ2 = · · · = µk VS Ha : NOT all means are equal. Remark: We are comparing the means of k groups. Think why Ha can NOT be stated as: All means are NOT equal. Ha is unique in F-Test. 2 Test statistic k−1 F ratio = FN−k = MST SST /(k − 1) = , MSE SSE /(N − k) where N is the size of all samples, that is, N = n1 + n2 + · · · + nk . SST : Treatment sum of squares (between groups) SSE : Error sum of squares (within groups) 3 Find P-value from the F-table. (similar to t-table, we will obtain a range for P-value instead of a number in Z-table case.) 4 Compare P-value with α, significance level, to make conclusions. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 6 / 13 F-Table k−1 A F -model has two separate degrees of freedoms, e.g., FN−k , has a numerator df k − 1 and a denominator df N − k. All F -models are NOT symmetric, and are strictly positive, which implies we only have right tail for F models. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 7 / 13 Assumptions and Conditions The groups must be independent of each other. The data within each treatment group must be independent as well. Randomization Condition ANOVA requires that the variances of the treatment groups be equal. Remark: Look at side-by-side boxplots to check this assumption. Normal Population Assumption. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 8 / 13 Be Careful with Making Conclusions When P-value < α, we reject H0 . That is to say we believe not all means are equal. Anything further we can infer? NO! F-test does NOT tell us what groups have different means. There are a lot of possibilities. Take k = 3, upon rejecting H0 , we can conclude not all three means are equal. But it may be the case that µ1 = µ2 6= µ3 , or µ1 = µ3 6= µ2 , or µ2 = µ3 6= µ1 , or µ1 6= µ2 6= µ3 . When P-value > α, our job is easier since there is no enough evidence to suggest means are different. Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 9 / 13 ANOVA Table 1 Start with DF, degree of freedom, the first df=k − 1 (treatment) while the second df=N − k (error). 2 Sum of Square (SS) = DF × Mean of Square (MS). Alternatively, SST = (k − 1) × MST , and SSE = (N − k) × MSE . 3 F ratio is calculated by F= Bin Zou ([email protected]) MST . MSE STAT 141 University of Alberta Winter 2015 10 / 13 Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 11 / 13 Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 12 / 13 Bin Zou ([email protected]) STAT 141 University of Alberta Winter 2015 13 / 13
© Copyright 2026 Paperzz