1 Newsom USP 634 Data Analysis I Spring 2013 Some Comments and Definitions Related to the Assumptions of Within-subjects ANOVA The Sphericity Assumption The sphericity assumption states that the variance of the difference scores in a withinsubjects design (the S D2 in a paired t-test) are equal across all the groups. Imagine a withinsubjects design with three levels of the independent variable (e.g., pretest, posttest, and followup), and that difference scores are calculated for each subject comparing pretest with posttest, posttest with follow-up, and pretest with follow-up. 1 The sphericity assumption (sometimes called the “circularity” assumption) assumes that the variances of each of these sets of difference scores are not statistically different from one another. The sphericity assumption is similar to the homogeneity of variance assumption with between subjects ANOVA. When this assumption is violated, there will be an increase in Type I errors, because the critical values in the F-table are too small. One could say that the F-test is positively biased under these conditions. There are several different approaches to correcting for this bias. Lower bound correction. This is a correction for a violation of the sphericity assumption. The correction works by using a higher F-critical value to reduce the likelihood of a Type I error. In the non-SPSS world, the lower bound correction is really referred to as the “Geisser-Greenhouse” correction in which dfA = 1 and dfAxS = s - 1 are used instead of the usual dfA = a - 1 and dfAxS = (a - 1) (s - 1). Under this correction to the dfs, the F-critical values will be larger and it will be harder to detect significance, thus reducing Type I error. Unfortunately, this correction approach tends to overcorrect, so there are too many Type II errors with this procedure (i.e., low power). This correction assumes the maximum degree of heterogeneity among the cells. Huynh & Feldt correction. This correction is based on a similar correction by Box (1954). In both cases, an adjustment factor based on the amount of variance heterogeneity (i.e., how much the variances are unequal) is computed (the adjustment factor is called epsilon). Then, both dfAxS and dfA are adjusted by this factor, so that the F-critical will be somewhat larger. The correction is not as severe as the "lower bound" correction. Geisser-Greenhouse correction. The Geisser-Greenhouse correction referred to in SPSS is another variant on the procedure described above under Huynh & Feldt. A slightly different correction factor (epsilon) is computed, which corrects the degrees of freedom slightly more than the Huynh & Feldt correction. So, significance tests with this correction will be a little more conservative (higher p-value) than those using the Huynh-Feldt correction. Mauchly's test. Mauchly's test is a test of the sphericity assumption using the chisquare test. Unfortunately, this test is not a very useful one. For small sample sizes, it tends to have too many Type II errors (i.e., it misses sphericity violations), and for large sample sizes, it tends to be significant even though the violation is small in magnitude (Type I error; e.g., Kesselman, Rogan, Mendoza, & Breen, 1980). Thus, the Mauchly’s test may not be much help and may be misleading when deciding if there is a violation of the sphericity assumption. Compound Symmetry Another assumption of within-subjects ANOVA that you may hear about is the “compound symmetry” assumption. The compound symmetry assumption is a stricter assumption than the sphericity assumption. Not only do the variances of the difference scores need to be equal for pairs of conditions, but their correlations (technically, the assumption concerns covariances— the unstandardized version of correlation) must also be equal. Imagine taking differences between scores for each possible pair of cells. Then correlations are calculated among all those difference scores. Under the compound symmetry assumption these correlations (or covariances, actually) must not be different in the population. A violation of this stricter 1 The sphericity assumption does not apply to within-subjects ANOVAs that have only two levels. Newsom USP 634 Data Analysis I Spring 2013 2 compound symmetry assumption does not necessarily mean the sphericity assumption will be violated. SPSS does not currently provide a test of this assumption. Nonadditivity The error term for within-subjects is the interaction term, S x A. In other words, we assume that any variation in differences between levels of the independent variable is due to error variation. It is possible, however, that the effect of the independent variable A is different for different subjects, and thus there is truly an interaction between S and A. Thus, some of what we consider to be error when we calculate S x A, is really an interaction of subject and treatment and not error variation. For example if Factor A represents program groups, then an S X A interaction suggests the program is not equally effective for each subject. This is the so-called “additivity” assumption that there is no interaction between A and S that is not error (interactions are multiplicative or “nonadditive”). Because nonadditivity (a violation of this assumption) implies hetergeneous variances for the difference scores, the sphericity assumption will be violated if nonadditivity occurs. SPSS provides a test for this assumption, the Tukey test for nonadditivity, under the scale reliability command. The output includes a suggested transformation of the data (raising each score to a certain power) that can be employed if the test is significant. Comments and Recommendations The sphericity and compound symmetry assumptions do not apply when there are only two levels (or cells) of the within-subjects factor (e.g., pre vs. post test only). The sphericity assumption, for instance, does not stipulate that the variances of the scores for each level are the same, but that difference scores are calculated for pairs of levels (e.g., Time 2 scores minus Time 1 scores vs. Time 3 scores minus Time 2 scores) because these assumptions involve the differences between levels being equal. In SPSS, the Mauchly’s chi-square is reported as 1 and the sig as “.” when there are only two levels of the within-subjects factor. I'll make a couple of important points regarding these corrections (based on recommendations of Greenhouse & Geisser): 1) If F is nonsignifcant, do not worry about the corrections, because the corrections will only increase the p-values. 2) If the F is significant using all three approaches, do not worry about the corrections. That is, if the most conservative approach is still significant, there is no increased risk of Type I error. If the various correction tests lead to different conclusions, there is no one perfect solution. I generally use the Huynh and Feldt correction, because it addresses the Type I error problem and is more powerful than the "lower bound" correction. It does not perform perfectly, however, and it might be safest to report the results from all correction approaches when they do not show the same result. With large sample sizes, a small departure from sphericity might be significant, but the correction in these situations should be relatively minor and there is not likely to be difference in the conclusions drawn from the significance tests using the various corrections. There are two other possible solutions: 1) transform the dependent variable scores using a square root transformation or raise the score to a fractional power (e.g., .33), or 2) use a multivariate analysis of variance (MANOVA) test. For the transformation approach, you may have to try different transformations until the sphericity problem is resolved. The transformation approach may be quite helpful in resolving the problem, but the researcher will have more difficulty interpreting the results. Under some conditions, the MANOVA approach can be more powerful than the within-subjects ANOVA, and the MANOVA test does not assume sphericity. The MANOVA test is printed on the GLM repeated measures output in SPSS. MANOVA assumes that the data have a “multivariate” normal distribution—that the analysis variable is jointly normally distributed when all levels are considered together. Algina and Kesselman (1997) suggest that for few levels and large sample sizes, the MANOVA approach may be more Newsom USP 634 Data Analysis I Spring 2013 3 powerful. Their guidelines are to use MANOVA if a) the number of levels is less than or equal to 4 (a < 4) and n greater than the number of levels plus 15 (a + 15); or b) the number of levels is between 5 and 8 (5 < a < 8) and n is greater than the number of levels plus 30 (a + 30). Example Below is an example of a within-subjects ANOVA with three levels of the independent variable which shows the sphericity assumption test and corrections. This hypothetical study compares performance on a vocabulary test after different lecture topics. Each student hears each lecture topic and takes a vocabulary test afterward. Notice that the adjustments are not made to the calculated F-values. Instead, the corrections are made to adjusted critical values (not shown), so the only difference you may see is in the “Sig” values (p-values). This is accomplished by adjusting the degrees of freedom. The biggest correction to df is the Lower-bound, followed by the Greenhouse-Geiser, and the Huynh-Feldt. Thus, the Lower-bound will have the largest pvalue (the most conservative significance test), the Greenhouse-Geisser will have an intermediate p-value, and the Huynh-Feldt will have the smallest p-value (the most liberal significance tests of the corrections). The following is the MANOVA portion of the output automatically generated by the GLM repeated measures test. Newsom USP 634 Data Analysis I Spring 2013 4 There are a = 3 levels and 12 cases, so there were not more than a + 15 cases in this example, so I would not recommend the use of the MANOVA test. If MANOVA was appropriate, there are four different tests presented by SPSS, and with a large sample size they will all tend to show the same results. Roy’s largest root is the most liberal of the tests and Pillai’s trace is said to be the most robust to variance differences with small samples (Olson, 1979). Wilk’s lambda seems to be the most commonly reported for some reason. Below I ran a reliability analysis, analyze scale reliability analysis, and chose the statistics button and check the Tukey’s test of additivity. The reliability analysis is usually used to assess internal reliability of a scale and obtain Cronbach’s alpha, but we can use it here for the nonadditivity test. The row of the table labeled “Nonadditivity” gives the test of significance of this assumption violation, F(1,24) = 2.027, ns, and suggests no evidence of the A X S interaction. The footnote contains the recommended transformation of the data that could be used if the nonadditivity test was significant. Write-up Example A within-subjects ANOVA was used to compare the vocabulary scores in the three lecture conditions. The average vocabulary score was the highest in the first topic (M1 = 40.00), followed by the third topic (M3 = 34.50), and then second topic (M2 = 26.00) [means not shown in the above output excerpts]. The results indicated that there was a significant difference among the three conditions, F(2,22) = 12.305, p < .05, for all sphericity assumption correction tests. References Algina, J., & Kesselman, H.J. (1997). Detecting repeated measures effects with univariate and multivariate statistics. Psychological Methods, 2, 208-218. Kesselman, H.J., Rogan, J.C., Mendoza, J.L., & Breen, L.J. (1980). Testing the validity conditions of repeated measures F tests. Psychological Bulletin, 87, 479-481.
© Copyright 2026 Paperzz