Behavioral Statistics Journal of Educational and

Journal of Educational and
Behavioral
Statistics
http://jebs.aera.net
Scheffé's More Powerful F-Protected Post Hoc Procedure
Alan J. Klockars and Gregory R. Hancock
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS 2000 25: 13
DOI: 10.3102/10769986025001013
The online version of this article can be found at:
http://jeb.sagepub.com/content/25/1/13
Published on behalf of
American Educational Research Association
and
http://www.sagepublications.com
Additional services and information for Journal of Educational and Behavioral Statistics can be
found at:
Email Alerts: http://jebs.aera.net/alerts
Subscriptions: http://jebs.aera.net/subscriptions
Reprints: http://www.aera.net/reprints
Permissions: http://www.aera.net/permissions
Citations: http://jeb.sagepub.com/content/25/1/13.refs.html
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Journal of Educational and Behavioral Statistics
Spring 2000, Vdl. 25, No. 1, pp. 13-19
Scheff
's More Powerful F-Protected Post Hoc
Procedure
Alan J. Klockars
University o f Washington
Gregory R. Hancock
University o f Maryland, College Park
Keywords: experimentwise error rate; multiple comparisons; Scheff~; simultaneous inference
In 1970 Henry Scheffd proposed a more powerful version of his well known
post hoc multiple comparison procedure, only to fail to recommend it by the
paper's end. The point of the current paper is to bring this simple modification
to a wider audience, complete with an original derivation, in hopes that the
method will be embraced by researchers despite its creator's hesitations. Specifically, whereas Scheff~'s original (1953) procedure advocates testing any
exploratory post hoc contrast or comparison using a critical value assuming
k - I between-group degrees of freedom, Scheffd's later modification (1970)
will be demonstrated here showing that a more liberal critical value assuming
k - 2 between-group degrees of freedom may be used if an omnibus null
hypothesis across all means has been rejected.
If a researcher has a planned set of theoretically meaningful tests involving k
sample means, such as pairwise comparisons or specific complex contrasts,
methods exist that are tailored toE such testing scenarios so as to optimize
statistical power (Hancock & Kiockars, 1996, 1997; Shaffer, 1995). These
methods, while maintaining strong Type I error control, are made powerful by
combinations of sharpened critical values, sequential testing strategies, and
integration of the logical structure of the tests in a given family. The last two
points include accommodating the logical implications of a rejected preliminary
omnibus test across all between group variability, as proposed, for example,
within tests of pairwise comparisons (Hayter, 1986; Shaffer, 1979, 1986) or
orthogonal contrasts (Klockars & Hancock, 1992; Shaffer, 1986).
On the other hand, when one decides to conduct comparisons or contrasts
after an inspection of results (i.e., exploratorily), a more stringent approach than
those alluded to above, most commonly that of Scheff6 (1953), is customarily
applied. As Scheff6 pointed out subsequently (Scheff6, 1970), his original
method could be modified by incorporating an initial omnibus null hypothesis
across all k means. That is, given that the researcher has already rejected an
We wish to thank Juliet P. Shaffer for bringing Scheff6's 1970 paper to our attention.
13
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Klbckars and Hancock
omnibus null hypothesis, the logical consequences of that rejection may be
accommodated by making the post hoc critical value more liberal. Interestingly,
Scheff6 ultimately did not advocate this improvement because it would precipitate improper confidence intervals for estimating simple or complex effects; that
is, probabilistic values associated with said intervals would be conditional upon
rejection of the preliminary omnibus null hypothesis (see Scheffr, 1970, for
details). Perhaps for this reason, as well as the original paper's unfamiliarity to
social scientists, his modified method has not infiltrated educational and behavioral statistics texts. However, the lack of proper confidence intervals has little
hindered the development and dissemination of sequential and F-protected
testing strategies within other multiple comparison scenarios (Hancock &
Klockars, 1996; Shaffer, 1995), nor do we believe it should keep Scheffr's
modification from being embraced by researchers seeking to improve statistical
power when testing exploratory post hoc contrasts or comparisons. For this
reason we offer the current paper to explain and recommend Scheff6's simple
F-protected post hoc procedure.
Scheffr's S2 Procedure
The consonance between an a-level omnibus F-test across k sample means
and Scheffr's (1953) post hoc S procedure (hereafter referred to as the S n
procedure') is well known. Under conditions of normality and homogeneity of
variance (cr2j,= o-2 f o r j = 1. . . . . k), a rejected omnibus null hypothesis across all
k, groups (i.e., rejecting pj= p for j = 1 . . . . . k) implies that S t will yield the
rejection of'at least one contrast null hypothesis (although such contrasts may
not be 0ftheoreticai interest). Given a retained omnibus null hypothesis, on the
other hand, Sn will be unable to lead to the rejection of any contrast null
hypothesis involving any of the k means.
..Though often recommended, statistical significance in the omnibus F-test is
technically not required in order to maintain a-level experimentwise (familywise) Type I error control while conducting post hoc contrasts. Scheffr's S n
procedure facilitates this control without the omnibus test prerequisite. However,
aarejeetion of the omnibus null hypothesis does have logical implications about
the verity of contrast null hypotheses within the infinite set over which S t exacts
control, implications that Scheff6 (1970) noted could be used to modify his Sn
procedure giving it more power with no additional computational complexity.
This F-protected modification, referred to as Scheff6's S 2 procedure, is as
follows.
A sample contrast ~ may be formed as the linear combination of sample
means a ' m , where m is the k x l sample mean vector and a' is the 1 x k
transpose of a k x 1 weight vector containing elements a I . . . ak that sum to 0.
The sample contrast • estimates • = a'la, where Ia is the k x 1 population mean
k
vector, and ~"~N[0, ~2~(aj2/nj)] for k random samples of size n / ( j = 1. . . . .
j=l
14
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
k)
Scheffd's F-Protected Procedure
drawn from populations with equal means and meeting normality and variance
homogeneity assumptions.
For testing any member of a set of m (1 --< m <-- oo) exploratory post hoc
contrasts (i.e., contrasts unplanned prior to inspection of data or o f summary
statistics such as sample means), Scheff6 (1970) offered the following protected
method:
1) Conduct an a-level omnibus F test over all k sample means. If the
observed test statistic does not exceed the critical value ~_,~Fk_~,N_~,
k
where N =
~nj,
no contrast tests are conducted. If (and only if) the
j=l
critical value is exceeded, proceed to Step 2.
2) Given the rejection of the omnibus null hypothesis in Step i, any contrast
k
with test statistic of the form
t~, =~/[MSw~(aj2/nj)l1/2__ (where
j=l
MSw
is the within-groups pooled sample variance estimate) should have its
associated contrast null hypothesis rejected if
It~,l>[(k-2) l_,~F~_2,N_~]l/2.
Note that the critical value in Step 2 is similar to that used in the traditional S~
procedure, but assumes k - 2 between-group degrees of freedom rather than
k - 1. The logical implications of a rejected omnibus null hypothesis in Step 1
allow for this more liberal critical value to be used in testing contrasts while still
preserving the experimentwise Type I error rate at the desired nominal a level.
This is derived in the next section.
Development
An infinite^number of possible post hoc contrasts over k sample means exist
o f the form ~ = a ' m (where a is such that l ' a = 0). Under the omnibus null
hypothesis these contrasts constitute a k - 1 dimensional space centered at the
origin 0. One may define any set of k - 1 orthogonal contrasts ~b~ . . . ~bk_ ~
where ~bi has associated weight vector e i = [cg~ . . . c~k]'; the numerical value of a
contrast in the population is thus dp~= ei'la. An infinite number of such possible
orthogonal contrast sets exist, all of which are simply k - 1 dimensional rotations of each other. As proven by Scheff6 (I 953), such a contrast set serves as an
orthogonal basis spanning the k - 1 dimensional contrast space; that is, any
original research contrast ~ with weight vector a may have its weights exk-I
pressed as a linear combination of the basis contrast vectors: a = ~
bie i, where
i=1
b~ are the required basis weights. The numerical value of contrast ~ in the
population is thus a linear function of the values of the basis contrasts: xIt =
~bidPi. Under the omnibus null hypothesis and standard conditions, the distribution of vectors [$~ . . . S k - ~ ] ' containing numerical values of contrast estimates
is multivariate normal N(0, E) where E is a k - I dimensional diagonal matrix
15
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Klockars and Hancock
k
with elements o ' 2 X (cij2/nj)
for i = 1. . . . . k -
1. As originally described,
j=l
Scheffr's S t method for exacting a-level experimentwise Type I error rate
control over all possible contrasts W in this space is tantamount to constructing a
k - 1 dimensional spheroid within which 1 0 0 ( l - 00% of the multivariate
distribution is captured.
The rejection of the omnibus null hypothesis is often taken to imply that at
least one of the k population means differs from the others, or, more generally,
that at least one contrast W in the infinite set of possible contrasts has a false null
hypothesis. Similarly, a rejection of the omnibus null hypothesis implies that at
least one contrast d~i in the basis set has a false null hypothesis; the actual
number of false basis contrast null hypotheses will, in general, vary with the
choice of orthogonal basis set. For our purposes, basis contrasts will be chosen
such that the source of all false contrast null hypotheses is localized on a single
basis vector. Let the first basis contrast d~I have contrast weights equal to the k
population means' deviations from the grand population mean la.. (where p.. =
k
~laj/k); i.e., the associated weight vector for gbI is c I = [(la I - l a . . ) . . .
(~tk-
j=l
la..)]'. Thus, this basis c o n t r a s t ' s numerical value in the population is
~bI = (la I - la..)lai + . . . +(lak - la..)lak. Now let contrast d?i with weight vector
ci(i = 2 . . . k - 1 ) be any of the remaining contrasts in the basis set that are
mutually orthogonal as well as orthogonal to d~l. By virtue of orthogonality with
~bI, ci'c I = 0; i.e., cil (lal - la..) + . . . + Cik(lak -- la..) = 0. Rearranging terms,
(cidal + . . . + Clklak) + (Cil + . . . +Cik)la.. = 0. Because (cil + . . • + cik) = 0 for any
contrast, this implies that (cidal + . . . + Ciklak) is necessarily 0. That is, ci' p = 0,
which means that any contrast ~b,.orthogonal to the choice of d~l specified above
must necessarily have a true null hypothesis.
Recall that the population value of any contrast W of substantive interest to a
researcher is W = blab I + b2gb2 + . . . + bk_ i~bk_l. Given that under a false omnibus null hypothesis basis vectors may be chosen such that the numerical value of
the contrasts in the population are ~bl 4 : 0 and ~bi = 0 for i = 2 . . . k - 1, then the
expression for Xo" simplifies to W = b~b I. This means that any contrast xF whose
weight vector a requires c I as part of the linear combination of basis vectors
(i.e., requires bl 4: 0) must represent a false null hypothesis; as such, it does not
require Type I error control. On the other hand, all other contrasts with weight
vector a not requiring e I as part of the linear combination of basis vectors (i.e.,
having bl = 0) constitute a k - 2 dimensional space spanned by the remaining
basis vectors gb2 . . . 0 k - i. Because d~i = 0 for i = 2 . . . k - !, all such contrasts
must have true null hypotheses. In this situation, exacting a-level experimentwise Type I error rate control over all possible contrasts with true null hypotheses is symbolized by the construction of a k - 2 dimensional cylinder extending above and below the true contrast null hypothesis hyperplane into the false
null hypothesis dimension such that 100(1 - 00% of the multivariate distribution is captured within. Equivalently, one may imagine the perpendicular projec16
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Scheffd's
F-Protected
Procedure
tion of the k - 1 dimensional multivariate distribution into the k - 2 true contrast null hypothesis dimensions qb2.., dp~_ 1, followed by the construction of a
k - 2 dimensional spheroid within which 100(1 - a ) % of the multivariate distribution is captured. This is Scheff6's (1970) S 2 procedure, using a critical value
based on k - 2 between-group degrees of freedom following a rejected omnibus
null hypothesis.
Conclusions
In practice, rejection of the omnibus null hypothesis over all k group means in
the first step of testing either represents a Type I error or the detection of a
nonnull contrast dimension. Under null conditions, the omnibus F test ensures
an error rate of ~. If a Type I error has been made in the omnibus test, the
experiment is already tainted by error and no subsequent contrast null hypothesis rejections can be interpreted meaningfully given the flawed nature of the
experiment resulting from the incorrectly rejected omnibus null hypothesis.
Conversely, if the omnibus null hypothesis is false, Type I error control in that
first step is moot; instead, the second step's error control over ail possible
contrasts in the k - 2 true contrast null hypothesis dimensions is facilitated by
Scheff6's S z critical value for k - 2 between-group degrees of freedom. The
improvement offered by this F-protected procedure over the traditional S I
method is reflected in critical t values with, for example, 60 within-groups
degrees of freedom: 2.000 versus 2.510 for contrasts involving k = 3 groups,
• 2.510 versus 2.876 for contrasts involving k = 4 groups, 2.876 versus 3.178 for
contrasts involving k = 5 groups, and 3.178 versus 3.441 for contrasts involving
k = 6 groups.
Four points about this more powerful method are worth noting. First, rather
than conducting an omnibus test in the initial tes[ing step, one could use the
c_ustomary_Si critical value to conduct the contrast dpI = a ' m , where a ' = [(71 Y . . ) • • • ( Y k - - 7..)] and 7.. is the grand mean of scores across all k groups. This
provides a direct test of the specially chosen basis contrast dot, which contains
all between-group variability. The omnibus test was recommended as the first
step in the current paper only because it is already computed in the course of a
typical analysis of variance. Second, the increased power afforded by the twostep method proposed does not come without a price. While the experimentwise
Type I error rate is indeed maintained at the desired et level, the error rate per
experiment (Ryan 1959) necessarily increases. That is, although the probability
of one or more contrast Type I errors within an experiment (experimentwise
error rate) is controlled, the overall count of the number o f contrast Type I errors
expected within a given experiment (per experiment error rate) will increase
with this more liberal contrast testing strategy. (For a discussion of this issue as
pertains to traditional multiple comparison procedures, see Klockars and Hancock, 1994.) However, if a researcher subscribes to the common belief that
multiple errors within an experiment are just as detrimental as a single unspecified error, then the F-protected S 2 procedure can represent a welcome increase in
17
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Klockars and Hancock
statistical power for exploratory post hoc contrast testing. Third, the S 2 procedure has the potential to be merged with another modification of the S 1 procedure (Klockars & Hancock, 1998) in which a more liberal critical value is
derived for testing exploratory contrasts or comparisons from a finite and
theoretically useful subset of all possible post hoc tests. This combined method
should yield even greater power than either modification alone. And finally, we
reiterate our belief that the S 2 method's absence of proper confidence intervals
need not impede practitioners from implementing the method in order to obtain
more powerful tests of exploratory post hoc contrasts or comparisons; computation of the problematic intervals must simply be avoided.
References
Hancock, G. R., & Klockars, A.J. (1996). The quest for ct: Developments in multiple
comparison procedures in the quarter century since Games (1971). Review of Educational Research, 66, 269-306.
Hancock, G.R., & Klockars, A.J. (1997). Finite Intersection Tests: A paradigm for
optimizing simultaneous and sequential inference. Journal of Educational and Behavioral Statistics, 22, 291-307.
Hayter, A.J. (1986). The maximum familywise error rate of Fisher's least significant
difference test. Journal of the American Statistical Association, 81, 1000-1004.
Klockars, A. J., & Hancock, G. R. (1992). Power of recent multiple comparison procedures as applied to a complete set of planned orthogonal contrasts. Psychological
Bulletin, I11,505-510.
Klockars, A. J., & Hancock, G. R. (1994). Per experiment error rates: The hidden costs of
several multiple comparison procedures. Educational and Psychological Measurement,
54, 292-298.
Klockars, A. J., & Hancock, G. R. (1998). A more powerful post hoc multiple comparison
procedure in analysis of variance. Journal of Educational and Behavioral Statistics,
23, 279-289.
Ryan, T. A. (1959). Multiple comparisons in psychological research. Psychological Bulletin, 56, 26---47.
Scheff6, H. (1953). A method for judging all contrasts in the analysis of variance.
Biometrika, 40, 87-104.
Scheff6, H. (1970). Multiple testing versus multiple estimation. Improper confidence sets.
Estimation of directions and ratios. The Annals of Mathematical Statistics, 41, 1-29.
Shaffer, J.P. (1979). Comparison of means: An F test followed by a multiple range
procedure. Journal of Educational Statistics, 4, 14-23.
Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of
the American Statistical Association, 81,826-83 i.
Shaffer, J.P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46,
561-584.
Authors
ALAN J. KLOCKARS is Professor, Area of Educational Psychology, College of Education, University of Washington, Seattle, WA 98195-3600; [email protected].
He specializes in experimental design and multiple comparisons.
18
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011
Scheffd's F-Protected Procedure
GREGORY R. HANCOCK is Associate Professor, Department of Measurement, Statistics and Evaluation, University of Maryland, College Park, MD 20742-1115;
[email protected]. He specializes in structural equation modeling and multiple
comparisons.
19
Downloaded from http://jebs.aera.net at UNIV OF MARYLAND on September 16, 2011