Journal of Educational and Behavioral Statistics http://jebs.aera.net Scheffé's More Powerful F-Protected Post Hoc Procedure Alan J. Klockars and Gregory R. Hancock JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS 2000 25: 13 DOI: 10.3102/10769986025001013 The online version of this article can be found at: http://jeb.sagepub.com/content/25/1/13 Published on behalf of American Educational Research Association and http://www.sagepublications.com Additional services and information for Journal of Educational and Behavioral Statistics can be found at: Email Alerts: http://jebs.aera.net/alerts Subscriptions: http://jebs.aera.net/subscriptions Reprints: http://www.aera.net/reprints Permissions: http://www.aera.net/permissions Citations: http://jeb.sagepub.com/content/25/1/13.refs.html Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Journal of Educational and Behavioral Statistics Spring 2000, Vdl. 25, No. 1, pp. 13-19 Scheff 's More Powerful F-Protected Post Hoc Procedure Alan J. Klockars University o f Washington Gregory R. Hancock University o f Maryland, College Park Keywords: experimentwise error rate; multiple comparisons; Scheff~; simultaneous inference In 1970 Henry Scheffd proposed a more powerful version of his well known post hoc multiple comparison procedure, only to fail to recommend it by the paper's end. The point of the current paper is to bring this simple modification to a wider audience, complete with an original derivation, in hopes that the method will be embraced by researchers despite its creator's hesitations. Specifically, whereas Scheff~'s original (1953) procedure advocates testing any exploratory post hoc contrast or comparison using a critical value assuming k - I between-group degrees of freedom, Scheffd's later modification (1970) will be demonstrated here showing that a more liberal critical value assuming k - 2 between-group degrees of freedom may be used if an omnibus null hypothesis across all means has been rejected. If a researcher has a planned set of theoretically meaningful tests involving k sample means, such as pairwise comparisons or specific complex contrasts, methods exist that are tailored toE such testing scenarios so as to optimize statistical power (Hancock & Kiockars, 1996, 1997; Shaffer, 1995). These methods, while maintaining strong Type I error control, are made powerful by combinations of sharpened critical values, sequential testing strategies, and integration of the logical structure of the tests in a given family. The last two points include accommodating the logical implications of a rejected preliminary omnibus test across all between group variability, as proposed, for example, within tests of pairwise comparisons (Hayter, 1986; Shaffer, 1979, 1986) or orthogonal contrasts (Klockars & Hancock, 1992; Shaffer, 1986). On the other hand, when one decides to conduct comparisons or contrasts after an inspection of results (i.e., exploratorily), a more stringent approach than those alluded to above, most commonly that of Scheff6 (1953), is customarily applied. As Scheff6 pointed out subsequently (Scheff6, 1970), his original method could be modified by incorporating an initial omnibus null hypothesis across all k means. That is, given that the researcher has already rejected an We wish to thank Juliet P. Shaffer for bringing Scheff6's 1970 paper to our attention. 13 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Klbckars and Hancock omnibus null hypothesis, the logical consequences of that rejection may be accommodated by making the post hoc critical value more liberal. Interestingly, Scheff6 ultimately did not advocate this improvement because it would precipitate improper confidence intervals for estimating simple or complex effects; that is, probabilistic values associated with said intervals would be conditional upon rejection of the preliminary omnibus null hypothesis (see Scheffr, 1970, for details). Perhaps for this reason, as well as the original paper's unfamiliarity to social scientists, his modified method has not infiltrated educational and behavioral statistics texts. However, the lack of proper confidence intervals has little hindered the development and dissemination of sequential and F-protected testing strategies within other multiple comparison scenarios (Hancock & Klockars, 1996; Shaffer, 1995), nor do we believe it should keep Scheffr's modification from being embraced by researchers seeking to improve statistical power when testing exploratory post hoc contrasts or comparisons. For this reason we offer the current paper to explain and recommend Scheff6's simple F-protected post hoc procedure. Scheffr's S2 Procedure The consonance between an a-level omnibus F-test across k sample means and Scheffr's (1953) post hoc S procedure (hereafter referred to as the S n procedure') is well known. Under conditions of normality and homogeneity of variance (cr2j,= o-2 f o r j = 1. . . . . k), a rejected omnibus null hypothesis across all k, groups (i.e., rejecting pj= p for j = 1 . . . . . k) implies that S t will yield the rejection of'at least one contrast null hypothesis (although such contrasts may not be 0ftheoreticai interest). Given a retained omnibus null hypothesis, on the other hand, Sn will be unable to lead to the rejection of any contrast null hypothesis involving any of the k means. ..Though often recommended, statistical significance in the omnibus F-test is technically not required in order to maintain a-level experimentwise (familywise) Type I error control while conducting post hoc contrasts. Scheffr's S n procedure facilitates this control without the omnibus test prerequisite. However, aarejeetion of the omnibus null hypothesis does have logical implications about the verity of contrast null hypotheses within the infinite set over which S t exacts control, implications that Scheff6 (1970) noted could be used to modify his Sn procedure giving it more power with no additional computational complexity. This F-protected modification, referred to as Scheff6's S 2 procedure, is as follows. A sample contrast ~ may be formed as the linear combination of sample means a ' m , where m is the k x l sample mean vector and a' is the 1 x k transpose of a k x 1 weight vector containing elements a I . . . ak that sum to 0. The sample contrast • estimates • = a'la, where Ia is the k x 1 population mean k vector, and ~"~N[0, ~2~(aj2/nj)] for k random samples of size n / ( j = 1. . . . . j=l 14 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 k) Scheffd's F-Protected Procedure drawn from populations with equal means and meeting normality and variance homogeneity assumptions. For testing any member of a set of m (1 --< m <-- oo) exploratory post hoc contrasts (i.e., contrasts unplanned prior to inspection of data or o f summary statistics such as sample means), Scheff6 (1970) offered the following protected method: 1) Conduct an a-level omnibus F test over all k sample means. If the observed test statistic does not exceed the critical value ~_,~Fk_~,N_~, k where N = ~nj, no contrast tests are conducted. If (and only if) the j=l critical value is exceeded, proceed to Step 2. 2) Given the rejection of the omnibus null hypothesis in Step i, any contrast k with test statistic of the form t~, =~/[MSw~(aj2/nj)l1/2__ (where j=l MSw is the within-groups pooled sample variance estimate) should have its associated contrast null hypothesis rejected if It~,l>[(k-2) l_,~F~_2,N_~]l/2. Note that the critical value in Step 2 is similar to that used in the traditional S~ procedure, but assumes k - 2 between-group degrees of freedom rather than k - 1. The logical implications of a rejected omnibus null hypothesis in Step 1 allow for this more liberal critical value to be used in testing contrasts while still preserving the experimentwise Type I error rate at the desired nominal a level. This is derived in the next section. Development An infinite^number of possible post hoc contrasts over k sample means exist o f the form ~ = a ' m (where a is such that l ' a = 0). Under the omnibus null hypothesis these contrasts constitute a k - 1 dimensional space centered at the origin 0. One may define any set of k - 1 orthogonal contrasts ~b~ . . . ~bk_ ~ where ~bi has associated weight vector e i = [cg~ . . . c~k]'; the numerical value of a contrast in the population is thus dp~= ei'la. An infinite number of such possible orthogonal contrast sets exist, all of which are simply k - 1 dimensional rotations of each other. As proven by Scheff6 (I 953), such a contrast set serves as an orthogonal basis spanning the k - 1 dimensional contrast space; that is, any original research contrast ~ with weight vector a may have its weights exk-I pressed as a linear combination of the basis contrast vectors: a = ~ bie i, where i=1 b~ are the required basis weights. The numerical value of contrast ~ in the population is thus a linear function of the values of the basis contrasts: xIt = ~bidPi. Under the omnibus null hypothesis and standard conditions, the distribution of vectors [$~ . . . S k - ~ ] ' containing numerical values of contrast estimates is multivariate normal N(0, E) where E is a k - I dimensional diagonal matrix 15 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Klockars and Hancock k with elements o ' 2 X (cij2/nj) for i = 1. . . . . k - 1. As originally described, j=l Scheffr's S t method for exacting a-level experimentwise Type I error rate control over all possible contrasts W in this space is tantamount to constructing a k - 1 dimensional spheroid within which 1 0 0 ( l - 00% of the multivariate distribution is captured. The rejection of the omnibus null hypothesis is often taken to imply that at least one of the k population means differs from the others, or, more generally, that at least one contrast W in the infinite set of possible contrasts has a false null hypothesis. Similarly, a rejection of the omnibus null hypothesis implies that at least one contrast d~i in the basis set has a false null hypothesis; the actual number of false basis contrast null hypotheses will, in general, vary with the choice of orthogonal basis set. For our purposes, basis contrasts will be chosen such that the source of all false contrast null hypotheses is localized on a single basis vector. Let the first basis contrast d~I have contrast weights equal to the k population means' deviations from the grand population mean la.. (where p.. = k ~laj/k); i.e., the associated weight vector for gbI is c I = [(la I - l a . . ) . . . (~tk- j=l la..)]'. Thus, this basis c o n t r a s t ' s numerical value in the population is ~bI = (la I - la..)lai + . . . +(lak - la..)lak. Now let contrast d?i with weight vector ci(i = 2 . . . k - 1 ) be any of the remaining contrasts in the basis set that are mutually orthogonal as well as orthogonal to d~l. By virtue of orthogonality with ~bI, ci'c I = 0; i.e., cil (lal - la..) + . . . + Cik(lak -- la..) = 0. Rearranging terms, (cidal + . . . + Clklak) + (Cil + . . . +Cik)la.. = 0. Because (cil + . . • + cik) = 0 for any contrast, this implies that (cidal + . . . + Ciklak) is necessarily 0. That is, ci' p = 0, which means that any contrast ~b,.orthogonal to the choice of d~l specified above must necessarily have a true null hypothesis. Recall that the population value of any contrast W of substantive interest to a researcher is W = blab I + b2gb2 + . . . + bk_ i~bk_l. Given that under a false omnibus null hypothesis basis vectors may be chosen such that the numerical value of the contrasts in the population are ~bl 4 : 0 and ~bi = 0 for i = 2 . . . k - 1, then the expression for Xo" simplifies to W = b~b I. This means that any contrast xF whose weight vector a requires c I as part of the linear combination of basis vectors (i.e., requires bl 4: 0) must represent a false null hypothesis; as such, it does not require Type I error control. On the other hand, all other contrasts with weight vector a not requiring e I as part of the linear combination of basis vectors (i.e., having bl = 0) constitute a k - 2 dimensional space spanned by the remaining basis vectors gb2 . . . 0 k - i. Because d~i = 0 for i = 2 . . . k - !, all such contrasts must have true null hypotheses. In this situation, exacting a-level experimentwise Type I error rate control over all possible contrasts with true null hypotheses is symbolized by the construction of a k - 2 dimensional cylinder extending above and below the true contrast null hypothesis hyperplane into the false null hypothesis dimension such that 100(1 - 00% of the multivariate distribution is captured within. Equivalently, one may imagine the perpendicular projec16 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Scheffd's F-Protected Procedure tion of the k - 1 dimensional multivariate distribution into the k - 2 true contrast null hypothesis dimensions qb2.., dp~_ 1, followed by the construction of a k - 2 dimensional spheroid within which 100(1 - a ) % of the multivariate distribution is captured. This is Scheff6's (1970) S 2 procedure, using a critical value based on k - 2 between-group degrees of freedom following a rejected omnibus null hypothesis. Conclusions In practice, rejection of the omnibus null hypothesis over all k group means in the first step of testing either represents a Type I error or the detection of a nonnull contrast dimension. Under null conditions, the omnibus F test ensures an error rate of ~. If a Type I error has been made in the omnibus test, the experiment is already tainted by error and no subsequent contrast null hypothesis rejections can be interpreted meaningfully given the flawed nature of the experiment resulting from the incorrectly rejected omnibus null hypothesis. Conversely, if the omnibus null hypothesis is false, Type I error control in that first step is moot; instead, the second step's error control over ail possible contrasts in the k - 2 true contrast null hypothesis dimensions is facilitated by Scheff6's S z critical value for k - 2 between-group degrees of freedom. The improvement offered by this F-protected procedure over the traditional S I method is reflected in critical t values with, for example, 60 within-groups degrees of freedom: 2.000 versus 2.510 for contrasts involving k = 3 groups, • 2.510 versus 2.876 for contrasts involving k = 4 groups, 2.876 versus 3.178 for contrasts involving k = 5 groups, and 3.178 versus 3.441 for contrasts involving k = 6 groups. Four points about this more powerful method are worth noting. First, rather than conducting an omnibus test in the initial tes[ing step, one could use the c_ustomary_Si critical value to conduct the contrast dpI = a ' m , where a ' = [(71 Y . . ) • • • ( Y k - - 7..)] and 7.. is the grand mean of scores across all k groups. This provides a direct test of the specially chosen basis contrast dot, which contains all between-group variability. The omnibus test was recommended as the first step in the current paper only because it is already computed in the course of a typical analysis of variance. Second, the increased power afforded by the twostep method proposed does not come without a price. While the experimentwise Type I error rate is indeed maintained at the desired et level, the error rate per experiment (Ryan 1959) necessarily increases. That is, although the probability of one or more contrast Type I errors within an experiment (experimentwise error rate) is controlled, the overall count of the number o f contrast Type I errors expected within a given experiment (per experiment error rate) will increase with this more liberal contrast testing strategy. (For a discussion of this issue as pertains to traditional multiple comparison procedures, see Klockars and Hancock, 1994.) However, if a researcher subscribes to the common belief that multiple errors within an experiment are just as detrimental as a single unspecified error, then the F-protected S 2 procedure can represent a welcome increase in 17 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Klockars and Hancock statistical power for exploratory post hoc contrast testing. Third, the S 2 procedure has the potential to be merged with another modification of the S 1 procedure (Klockars & Hancock, 1998) in which a more liberal critical value is derived for testing exploratory contrasts or comparisons from a finite and theoretically useful subset of all possible post hoc tests. This combined method should yield even greater power than either modification alone. And finally, we reiterate our belief that the S 2 method's absence of proper confidence intervals need not impede practitioners from implementing the method in order to obtain more powerful tests of exploratory post hoc contrasts or comparisons; computation of the problematic intervals must simply be avoided. References Hancock, G. R., & Klockars, A.J. (1996). The quest for ct: Developments in multiple comparison procedures in the quarter century since Games (1971). Review of Educational Research, 66, 269-306. Hancock, G.R., & Klockars, A.J. (1997). Finite Intersection Tests: A paradigm for optimizing simultaneous and sequential inference. Journal of Educational and Behavioral Statistics, 22, 291-307. Hayter, A.J. (1986). The maximum familywise error rate of Fisher's least significant difference test. Journal of the American Statistical Association, 81, 1000-1004. Klockars, A. J., & Hancock, G. R. (1992). Power of recent multiple comparison procedures as applied to a complete set of planned orthogonal contrasts. Psychological Bulletin, I11,505-510. Klockars, A. J., & Hancock, G. R. (1994). Per experiment error rates: The hidden costs of several multiple comparison procedures. Educational and Psychological Measurement, 54, 292-298. Klockars, A. J., & Hancock, G. R. (1998). A more powerful post hoc multiple comparison procedure in analysis of variance. Journal of Educational and Behavioral Statistics, 23, 279-289. Ryan, T. A. (1959). Multiple comparisons in psychological research. Psychological Bulletin, 56, 26---47. Scheff6, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika, 40, 87-104. Scheff6, H. (1970). Multiple testing versus multiple estimation. Improper confidence sets. Estimation of directions and ratios. The Annals of Mathematical Statistics, 41, 1-29. Shaffer, J.P. (1979). Comparison of means: An F test followed by a multiple range procedure. Journal of Educational Statistics, 4, 14-23. Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81,826-83 i. Shaffer, J.P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561-584. Authors ALAN J. KLOCKARS is Professor, Area of Educational Psychology, College of Education, University of Washington, Seattle, WA 98195-3600; [email protected]. He specializes in experimental design and multiple comparisons. 18 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011 Scheffd's F-Protected Procedure GREGORY R. HANCOCK is Associate Professor, Department of Measurement, Statistics and Evaluation, University of Maryland, College Park, MD 20742-1115; [email protected]. He specializes in structural equation modeling and multiple comparisons. 19 Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
© Copyright 2025 Paperzz