Significantly Insignificant F Tests Ronald CHRISTENSEN P values near 1 are sometimes viewed as unimportant. In fact, P values near 1 should raise red flags cautioning data analysts that something may be wrong with their model. This article examines reasons why F statistics might get small in general linear models. One-way and two-way analysis of variance models are used to illustrate the general ideas. The article focuses on the intuitive motivation behind F tests based on second moment arguments. In particular, it argues that when the mean structure of the model being tested is correct, small F statistics can be caused by not accounting for negatively correlated data or heteroscedasticity; alternatively, they can be caused by an unsuspected lack of fit. It is also demonstrated that large F statistics can be generated by not accounting for positively correlated data or heteroscedasticity. KEY WORDS: Analysis of variance; Correlation; Heteroscedasticity; Homoscedasticity; Linear models; Regression; Robustness. 1. INTRODUCTION This article looks at statistical reasons why F statistics might get small in general linear models. The discussion targets results for general models, using familiar models to illustrate the ideas. It focuses on the intuitive motivation behind F tests using second-moment arguments. In particular, we argue that when the mean structure of the model being tested is correct, small F statistics can be caused by not accounting for negatively correlated data or heteroscedasticity. Alternatively, small F statistics can be caused by an unsuspected lack of fit. We also demonstrate that large F statistics can be generated by not accounting for positively correlated data or heteroscedasticity even when the mean structure of the model being tested is correct. Example: Schneider and Pruett (1994) discussed control chart data obtained “to monitor the production of injection molded bottles. Experience has shown that the outside diameter of the bottle is an appropriate measure of process performance.” A molding machine makes four bottles simultaneously, so rational subgroups of size four were used. Data were presented for 20 groups. The one-way ANOVA test that the groups means are all equal gives F = 0.45 with 19 and 60 degrees of freedom for a P value of 0.970. The F statistic is suspiciously small. Standard residual plots look fine. Although the small F value suggests that something is wrong, it does not provide any specific suggestions as to the problem. In this example, the solution is easy to find. The four observations in each group were being made by different molding heads that were assumed to be performing identically. Cross-classifying the data and running a two-way ANOVA, the test for groups gives F = 1.34 while the test for heads gives F = 40.03. Note that if there had been substantial differences between the groups, we would not have had a significantly small F statistic in the oneway ANOVA to suggest the existence of a problem. Nonetheless, whenever we do see a significantly small F statistic, it should give us pause. There exists a substantial literature on the robustness of t and F statistics. One noteworthy historical discussion is Scheffé (1959, chap. 10). Recent examples include Kiviet (1980), Benjamini (1983), Westfall (1988), Posten (1992), and Barksalary, Nurhonen, and Puntanen (1992). Robustness studies typically involve examining the effect of nonnormality, correlation, or heteroscedasticity on the Type I error rate of the test. For example, Scariano and Davenport (1987) examined the effect of correlation on the normal theory one-way ANOVA F test by examining a specific family of covariance matrices as an alternative to the usual homoscedastic, independence model. The argument here is similar but distinct. We argue that if the F statistic is unusual, in this case small, that it suggests problems with the (null) model and we examine models that can generate small F statistics. Corresponding Type I error rates do not look at how often F is small, but how infrequently it is large. Section 2 examines first moment lack-of-fit issues. Section 3 provides simple inequalities involving the covariance matrix that determine the behavior of the F statistic. In particular, for a rank r full model yi = xi β + εi and a rank r0 reduced model yi = xi0 γ + εi , we will see that if the reduced model is true but n n var(yi ) i=1 var(xi β̂) < i=1 , (1) r n and n i=1 var(xi β̂ − x0i γ̂) < r − r0 n i=1 var(yi ) , n (2) the F statistic for testing the models tends to get small. Conversely, if the inequalities are reversed, the F statistic gets large. The entire discussion is based on first and second moments. Normality is not assumed. The presentation of testing in general linear models follows ideas and notations in Christensen (2002, sec. 3.2). In particular, for any matrix A, let r(A) denote the rank of A, C(A) denote the column space of A, and MA denote the perpendicular projection operator (ppo) onto C(A), that is, MA = A(A A)− A , where (A A)− denotes a generalized inverse. Additionally, for Ronald Christensen is Professor, Department of Mathematics and Statistics, matrices X, X0 , denote their ppos M , and M0 , respectively. University of New Mexico, Albuquerque, NM 87131 (E-mail: [email protected]. C(X) ⊥ C(A) indicates that the column spaces are orthogedu). I thank the associate editor and referees for many good suggestions on both content and presentation. onal. Write C(X)⊥ for the orthogonal complement of C(X) c 2003 American Statistical Association DOI: 10.1198/0003130021019 The American Statistician, February 2003, Vol. 57, No. 1 1 and, if C(X0 ) ⊂ C(X), write C(X0 )⊥ C(X) for the orthogonal complement of C(X0 ) with respect to C(X). To derive a test, start by assuming that a full model Y = Xβ + e, E(e) = 0, cov(e) = σ 2 I fits the data. Here Y is an n × 1 vector of observable random variables, X is an n × p matrix of fixed observed numbers, β is an n × 1 vector of unknown fixed parameters, and e is an n × 1 vector of unobservable random errors. We then test the adequacy of a reduced model Y = X0 γ + e which has the same assumptions about e but in which C(X0 ) ⊂ C(X). Based on second-moment arguments, the test statistic is a ratio of variance estimates. We construct an unbiased estimate of σ 2 , Y (I − M )Y/r(I − M ), and another statistic Y (M − M0 )Y/r(M − M0 ) that has E[Y (M − M0 )Y/r(M − M0 )] = σ 2 + β X (M − M0 )Xβ/r(M − M0 ). Given the covariance structure, this second statistic is an unbiased estimate of σ 2 if and only if the reduced model is correct. The test statistic F = Y (M − M0 )Y/r(M − M0 ) Y (I − M )Y/r(I − M ) (3) =1+ β X (M − M0 )Xβ . σ 2 r(M − M0 ) Under the null hypothesis, F is an estimate of the number 1. Values of F much larger than 1 suggest that F is estimating something larger than 1, which suggests that β X (M − M0 )Xβ/σ 2 r(M − M0 ) > 0, which occurs if and only if the reduced model is false. The usual normality assumption leads to an exact central F distribution for the test statistic under the null hypothesis, so we are able to quantify how unusual it is to observe any F statistic greater than 1. If, when rejecting a test, you want to reasonably conclude that the alternative is true, you have to have a reasonable belief in the validity of all the assumptions of the model other than the null hypothesis. Thus, in a linear model, you need to check independence, homoscedasticity, and normality. There are standard methods based on residual analysis for checking homoscedasticity and normality. If the data are taken in time sequence, there are standard methods based on residual analysis for checking independence. More generally, Christensen and Bedrick (1997) discussed methods for testing the independence assumption. Ultimately, there is no substitute for thinking hard about the data and how they were collected. If all of the assumptions check out, then one can conclude that a large F statistic indicates that the full model is needed. This article illustrates two things: (1) that if these assumptions are violated, there are perfectly plausible explanations for getting a large F statistic other than that the full model is more appropriate than the reduced model and, more importantly, and (2) that if you get a significantly small F statistic, something may be wrong with your assumptions, regardless of whether you performed standard diagnostic checks. In other words, large P values should be not be ignored, they should be used as a diagnostic that suggests problems with the model. 2 General LACK OF FIT AND SMALL F STATISTICS In testing Y = Xβ + e against Y = X0 γ + e, C(X0 ) ⊂ C(X), large F statistics (3) are generally taken to indicate that the reduced model suffers from a lack of fit, that is, the more general mean structure of the full model is needed. However, small F statistics can occur if the reduced model’s lack of fit exists in the error space of Y = Xβ + e. Suppose that the correct model for the data has the form Y = X0 γ + Wδ + e, C(W) ⊥ C(X0 ), (4) where assuming C(W) ⊥ C(X0 ) creates no loss of generality. Rarely will anyone tell us the true matrix W. The behavior of F depends on the vector Wδ. 1. The F statistic estimates 1 if the model Y = X0 γ + e is correct, that is, Wδ = 0. 2. The F statistic estimates something greater than 1 if the full model Y = Xβ + e is correct and needed, that is, if 0 = Wδ ∈ C(X0 )⊥ C(X) . 3. F estimates something less than 1 if 0 = Wδ ∈ C(X)⊥ , that is, if Wδ is in the error space of the full model. Then the numerator estimates σ 2 but the denominator estimates is a (biased) estimate of σ 2 + β X (M − M0 )Xβ/r(M − M0 ) σ2 2. σ 2 + δ W (I − M )Wδ/r(I − M ) = σ 2 + δ W Wδ/r(I − M ). ⊥ 4. If Wδ is in neither C(X0 )⊥ C(X) nor C(X) , it is not clear how F will behave because neither the numerator nor the denominator estimates σ 2 . (Under normality, F is doubly noncentral.) Christensen (1989, 1991) contains related discussion of these concepts. We now illustrate the ideas. Example: Simple Linear Regression. The full model is yi = β0 + β1 xi + εi and we test the reduced model yi = β0 + εi . If the true model is yi = β0 + β1 xi + δ(xi − x̄· )2 + εi , Case 4 applies and neither the numerator nor the denominator estimate σ 2 . However, if the xi ’s are equally spaced, say, xi = a + bi and β1 = 0, we are in Case 3 so that unless δ is close to 0, the true model causes the F statistic to be close to 0 and the P value to be close to 1. The term δ(xi − x̄· )2 only determines Wδ indirectly but Wδ is easily seen to be in the error space of the simple linear n regression model because i=1 (xi − x̄· )(xi − x̄· )2 = 0. Lack of fit testing for a general regression model Y = Xβ+e involves testing the model against a constructed full model, say, Y = X∗ β ∗ + e with C(X) ⊂ C(X∗ ) that generalizes the mean structure of Y = Xβ + e, see Christensen (2002, sec. 6.6). However, if lack of fit exists because of features that are not part of the original model, generalizing Y = Xβ + e may be inappropriate. Example: Traditional lack of fit test. For simple linear regression this begins with the replication model yij = β0 + β1 xi + eij , i = 1, . . . , a, j = 1, . . . , Ni . It then assumes E(yij ) = µ(xi ) for some function µ(·) which leads to constructing the full model yij = µi + eij , a one-way ANOVA, and thus to the traditional lack of fit test. The test statistic is SSTrts − SS(lin) F = MSE, a−2 (5) where, from the one-way ANOVA model, SSTrts is the sum of squares for treatments, SS(lin) is the sum of squares for the linear contrast, and MSE is the mean squared error. If there is no lack of fit in the reduced model, F should be near 1 (case 1 in list). If lack of fit exists in the replicated simple linear regression model because the more general mean structure of the one-way ANOVA fits the data better, then the F statistic tends to be larger than 1 (Case 2 in the list). Suppose that the simple linear regression model is balanced, that is, all Ni = N , that for each i the data are taken in time order t1 < t2 < · · · < tN , and that the lack of fit is due to the true model being yij = β0 + β1 xi + δtj + eij , δ = 0. (6) Thus, depending on the sign of δ, the observations within each group are subject to an increasing or decreasing trend. In this model, for fixed i the E(yij )s are not the same for all j, thus invalidating the assumption of the traditional test. In fact, this form of lack of fit puts us in Case 3 of our list because Wδ becomes a vector with elements δ(tj − t̄) which is orthogonal to the vector of xi ’s because ij xi (tj − t̄) = 0. This causes the traditional lack-of-fit test to have a small F statistic. Another way to see this is to view the problem in terms of a balanced two-way ANOVA. The true model (6) is a special case of the two-way ANOVA model yij = µ + αi + ηj + eij in which the only nonzero contrasts are the linear contrast in the αi ’s and the linear contrast in the ηj ’s. Under model (6), the numerator of the statistic (5) gives an unbiased estimate of σ 2 because SSTrts in (5) is SS(α) for the two-way model and the only nonzero α effect is being eliminated from the treatments. However, the mean squared error in the denominator of (5) is a weighted average of the error mean square from the two-way model and the mean square for the ηj ’s in the two-way model. The sum of squares for the significant linear contrast in the ηj ’s from model (6) is included in the error term of the lack-of-fit test (5), thus biasing the error term to estimate something larger than σ 2 . In particular, the denominator N has an expected value of σ 2 + δ 2 a j=1 (tj − t̄· )2 /a(N − 1). If the appropriate N model is (6), the statistic in (5) estimates σ 2 /[σ 2 + δ 2 a j=1 (tj − t̄· )2 /a(N − 1)] which is a number less than 1. Thus, a lack of fit that exists within the groups of the one-way ANOVA can cause values of F much smaller than 1. Note that in this balanced case, true models involving interaction terms, for example, models like yij = β0 + β1 xi + δtj + γxi tj + eij also fall into Case 3 and tend to make the F statistic small if either δ = 0 or γ = 0. Finally, if there exists lack of fit both between the groups of observations and within the groups, we are in Case 4 and it can be very difficult to identify. For example, if β2 = 0 and either δ = 0 or γ = 0 in the true model yij = β0 + β1 xi + β2 x2i + δtj + γxi tj + eij , there is both a traditional lack of fit between the groups (the significant β2 x2i term) and lack of fit within the groups (δtj + γxi tj ). In this case, neither the numerator nor the denominator in (5) is an estimate of σ 2 . Graphing such data is unlikely to show these kinds of problems. In the one-way ANOVA example, it would be possible to plot the data against time in such a way that the relationship with time is visible, but if you thought there might be a time trend you would not do a one-way ANOVA in the first place. If you thought there might be a time trend, you would fit a two-way ANOVA with time as a factor and you would have seen the important time effect. If you did not think to put time in the ANOVA, it is unlikely that you would think of using time in a graphical procedure. The point is that when you get a small F statistic, you need to try to figure out what is causing it. You need to look for things like possible time trends even though you initially were convinced that they would not occur. The illustration involving a linear trend within treatment groups is closely related to a more general phenomenon that happens all too frequently: analyzing a randomized complete block design as if it were a completely randomized design (CRD). If there are no treatment effects, the numerator in the CRD analysis estimates σ 2 , but the denominator estimates σ 2 plus a positive quantity due to nonzero block effects, so the F statistic tends to be small. More generally, if the treatment effects are nonzero but small relative to the block effects, the F statistic still tends to be small, even though neither the numerator nor the denominator of the statistic is estimating σ 2 . Specifically, in this case the expected value of the numerator of the F statistic is still much smaller than the expected value of the denominator of the F statistic, thus causing F to be small. Similar things happen in other situations. Ignoring a split plot design and analyzing it as at two-way ANOVA leads to using only one error term obtained by pooling the whole plot and subplot errors. Typically, whole plot effects look more significant than they should while subplot and interaction effects look less significant. With fractional factorials, high order interactions are often assumed to be 0. If they are not 0, F statistics become inappropriately small. Another possible explanation for a small F statistic is the existence of “negative correlation” in the data. 3. THE EFFECT OF CORRELATION AND HETEROSCEDASTICITY ON F STATISTICS The test of the reduced model is no better than the assumptions made about the full model. The mean structure of the reduced model may be perfectly valid, but the F statistic can become small or large because the assumed covariance structure is incorrect. We now examine the F statistic (3) when the true model is Y = X0 γ + e, E(e) = 0, cov(e) = σ 2 V. We refer to the condition tr[M V ] < 1 r(X) tr(V)tr[M ] = tr(V) n n The American Statistician, February 2003, Vol. 57, No. 1 (7) 3 as having negative correlation in the full model. The condition tr([M − M0 ]V) < 1 tr(V)tr[M − M0 ] n r(X) − r(X0 ) = tr(V) (8) n is referred to as having negative correlation for distinguishing between models. If the inequalities are reversed, we refer to positive correlations. Note that when V = I, equality holds in both conditions. In spite of the terminology, the inequalities can be caused by heteroscedasticity as well as negative correlations. Together, (7) and (8) cause small F statistics even when the mean structure of the reduced model is true. Reversing the inequalities causes large F statistics. To establish these claims we use a standard result on the expected value of a quadratic form, for example, Christensen (2002, theorem 1.3.1). The numerator of the F statistic (3) estimates E{Y [M − M0 ]Y/r(M − M0 )} = tr{[M − M0 ]V}/r(M − M0 ) < tr(V)/n and the denominator of the F statistic estimates E{Y (I − M )Y/r(I − M )} = tr{[I − M ]V}/r(I − M ) = (tr{V} − tr{[M ]V}) /r(I − M ) 1 > tr{V} − tr(M )tr(V) /r(I − M ) n n − r(M ) = tr(V)/r(I − M ) = tr(V)/n. n F in (3) estimates E{Y [M − M0 ]Y/r(M − M0 )} E{Y (I − M )Y/r(I − M ))} tr{[M − M0 ]V}/r(M − M0 ) tr(V)/n = < = 1, tr{[I − M ]V}/r(I − M ) tr(V)/n so having both negative correlation in the full model and having negative correlation for evaluating differences between models tends to make F statistics small. Exactly analogous computations show that having both positive correlation in the full model and having positive correlation for evaluating differences between models tends to make F statistics greater than 1. The general condition (7) is equivalent to (1), so, under homoscedasticity, negative correlation in the full model amounts to having an average variance for the fitted values (averaging over the rank of the covariance matrix of the fitted values) that is less than the common variance of the observations. Similarly, having negative correlation for distinguishing the full model from the reduced model leads to (2). Example: General Mixed Models. For mixed models, results similar to those for lack of fit hold. Suppose that the correct model is Y = X0 γ + Wδ + e, E(e) = 0, cov(e) = σ 2 I, where δ is random with E(δ) = 0 and cov(δ) = D, so cov(Y) = σ 2 I + WDW . If C(W) ⊂ C(X)⊥ , conditions (7) 4 General and (8) hold, so F tends to be small. If C(W) ⊂ C(X0 )⊥ C(X) , the reverse of (7) and (8) hold, causing large F statistics. Example: Scariano and Davenport (1987) mentioned the possibility of generalizing their work on F test robustness beyond one-way ANOVA. To study lack of independence, their class of alternative covariance matrices can be generalized to the family determined by V = c1 M0 + c2 (M − M0 ) + c3 (I − M ), where the ci ’s are positive. Scariano and Davenport (1987) presented normal theory size and power computations in one-way ANOVA for two cases with homoscedastic, equicorrelation covariance structures. These determine specific values of M , M0 , and the ci ’s. In particular, their Cases 1 and 2 are specific instances with c1 = c2 > c3 and c1 = c3 > c2 , respectively. Additionally, when c1 = c2 < c3 , the covariance structure is similar to that of the previous example with C(W) ⊂ C(X)⊥ and, when c1 = c3 < c2 , the covariance structure is similar to that with C(W) ⊂ C(X0 )⊥ C(X) . It is easy to check that conditions (7) and (8) hold by taking c1 = c2 < c3 or c1 = c3 > c2 , Similarly, by taking c1 = c3 < c2 or c1 = c2 > c3 , the reverse of conditions (7) and (8) hold. The resulting second moment based qualitative assessments of how F will behave agree with Scariano and Davenport’s explicit normal theory computations. Although this alternative family is well suited for studying lack of independence, it does not lend itself to studying independent observations with unequal variances. Example: Homoscedastic balanced one-way ANOVA. Consider both a full model yij = µi + eij , i = 1, . . . , a, j = 1, . . . , N , and n ≡ aN , which we write Y = Zγ + e where Z is a matrix of 0’s and 1’s indicating group inclusion, and a reduced model yij = µ + eij , which in matrix terms we write Y = Jµ + e, where J is an n × 1 vector of 1’s. In matrix terms the usual one-way ANOVA F statistic is F = Y [MZ − (1/n)JJ ]Y/(a − 1) , Y (I − MZ )Y/a(N − 1) (9) so M ≡ MZ and M0 ≡ (1/n)JJ . We now assume that the true model is Y = Jµ + e, E(e) = 0, cov(e) = σ 2 V and examine the behavior of the F statistic. From condition (7), negative correlation in the full model is characterized by a i=1 N var(ȳi· ) 1 a = tr[MZ V] < tr(V)tr[MZ ] = tr(V). 2 σ n n (10) a This can also be written as i=1 var(ȳi· )/a < σ 2 /N , that is, the average variance of the group means is less than σ 2 /N . Positive correlation in the full model is characterized by the reverse inequality. From condition (8), negative correlation for evaluating differences between models reduces to a N var(ȳi· − ȳ·· ) = tr([MZ − (1/n)JJ ]V) σ 2 i=1 1 a−1 < tr(V)tr[MZ − (1/n)JJ ] = tr(V). (11) n n This condition can also be written a a − 1 σ2 i=1 var(ȳi· − ȳ·· ) . < a N a Positive correlation for evaluating differences between models is characterized by the reverse inequality. If all the observations in different groups are uncorrelated, they have a block diagonal covariance matrix σ 2 V, in which case condition (11) holds if and only if condition (10) holds. This follows because tr(MZ V) = tr[(1/N )Z VZ] = atr[(1/n)J VJ] = atr[(1/n)JJ V]. To illustrate these ideas for homoscedastic balanced one-way ANOVA, we examine some simple examples with a = 2, N = 2. The first two observations are a group and the last two are a group. First, consider a covariance structure 1 .9 .1 .09 1 .09 .1 .9 V1 = . .1 .09 1 .9 .09 .1 .9 1 Intuitively, there is high positive correlation between the two observations in each group, and weak positive correlation between the groups. Condition (10) becomes 3.8 = 2(1/2)[3.8] = tr[MZ V1 ] > a tr(V1 ) = (2/4)4 = 2, n so there is positive correlation within groups, and (11) becomes 1.71 = 3.8 − 2.09 = tr([MZ − (1/n)JJ ]V1 ) a−1 > tr(V1 ) = (1/4)4 = 1, n so there is positive correlation for evaluating differences between groups. Now consider 1 .1 .9 .09 1 .09 .9 .1 V2 = . .9 .09 1 .1 .09 .9 .1 1 Intuitively, this has weak positive correlation between the two observations in each group, with high positive correlation between some observations in different groups. Now 2.2 = 2(1/2)[2.2] = tr[MZ V2 ] > a tr(V2 ) = (2/4)4 = 2, n so there is positive correlation within groups, but .11 = 2.2 − 2.09 = tr([MZ − (1/n)JJ ]V2 ) a−1 tr(V2 ) = (1/4)4 = 1, < n so there is negative correlation for evaluating differences between groups. Suppose the observations have the correlation structure of an AR(1) process, 1 ρ V3 = 2 ρ ρ3 ρ 1 ρ ρ2 ρ2 ρ 1 ρ ρ3 2 ρ . ρ 1 When −1 < ρ < 0, we have negative correlation within groups because 2(1 + ρ) = tr[MZ V3 ] < 2. If 0 < ρ < 1, the inequalities are reversed. Similarly, for −1 < ρ < 0 we have negative correlation for evaluating differences between groups because ρ 1 + (1 − 2ρ − ρ2 )2 = tr([MZ − (1/n)JJ ]V3 ) < 1. 2 However, we only get positive correlation √ for evaluating differences between groups when 0 < ρ < 2−1. Thus, √ for negative 2 − 1 we tend ρ we tend to get small F statistics, for 0 < ρ < √ to get large F statistics, and for 2 − 1 < ρ < 1 the result is not clear. To illustrate what happens for large ρ, suppose ρ = 1 and the observations all have the same mean, then with probability one, all the observations are equal and, in particular, ȳi· = ȳ·· with probability one. It follows that a var(ȳi· − ȳ·· ) a − 1 σ2 0 = i=1 < a N a and there is negative correlation for evaluating differences between groups. More generally, for very strong positive correlations, there will be positive correlation within groups but negative correlation for evaluating differences between groups. Note also that if ρ = −1, it is not difficult to see that F = 0. Example: Heteroscedastic balanced one-way ANOVA. In the heteroscedastic balanced one-way ANOVA with uncorrelated observations, V is diagonal. This generates equality between the left sides and right sides of (7) and (8), so under heteroscedasticity F still estimates the number 1. Example: Heteroscedastic unbalanced one-way ANOVA. The covariance conditions (7) and (8) can be caused for uncorrelated observations by heteroscedasticity. For example, consider the unbalanced one-way ANOVA F test. For concreteness, assume that var(yij ) = σi2 . As mentioned earlier, since the observations are uncorrelated, we need only check condition (10), which amounts to a i=1 σi2 /a < a Ni i=1 n σi2 . Thus, when the group means are equal, F statistics will get small if many observations are taken in groups with large variances and few observations are taken on groups with small variances. F statistics will get large if the reverse relationship holds. The American Statistician, February 2003, Vol. 57, No. 1 5 4. CONCLUSIONS Christensen (1995) argued that testing should be viewed as an exercise in examining whether or not the data are consistent with a particular (predictive) model. While possible alternative hypotheses may drive the choice of a test statistic, any unusual values of the test statistic should be considered important. By this standard, perhaps the only general way to decide which values of the test statistic are unusual is to identify as unusual those values that have small densities or mass functions as computed under the model being tested. The argument here is similar. The test statistic is driven by the idea of testing a particular model, in this case the reduced model, against an alternative that is here determined by the full model. However, given the test statistic, any unusual values of that statistic should be recognized as indicating data that are inconsistent with the model being tested. Values of F much larger than 1 are inconsistent with the reduced model. Values of F much larger than 1 are consistent with the full model but, as we have seen, they are consistent with other models as well. Similarly, values of F much smaller than 1 are also inconsistent with the reduced model and we have examined models that can generate small F statistics. Thus, significantly small F statistics should always cause one to rethink the assumptions of the model. As always, F values near 1 merely mean that the data are consistent with the assumptions, they do not imply that the assumptions are valid. [Received March 2001. Revised December 2001.] 6 General REFERENCES Baksalary, J. K., Nurhonen, M., Puntanen, S. (1992), “Effect of Correlations and Unequal Variances in Testing for Outliers in Linear Regression,” Scandinavian Journal of Statistics, 19, 91–95. Benjamini, Y. (1983), “Is the t Test Really Conservative When the Parent Distribution is Long-Tailed?” Journal of the American Statistical Association, 78, 645–654. Christensen, R. (1989), “Lack of Fit Tests Based on Near or Exact Replicates,” The Annals of Statistics, 17, 673–683. (1991), “Small Sample Characterizations of Near Replicate Lack of Fit Tests,” Journal of the American Statistical Association, 86, 752–756. (1995), Comment on Inman (1994), The American Statistician, 49, 400. (2002), Plane Answers to Complex Questions: The Theory of Linear Models (3rd ed.), New York: Springer-Verlag. Christensen, R., and Bedrick, E. J. (1997), “Testing the Independence Assumption in Linear Models,” Journal of the American Statistical Association, 92, 1006–1016. Kiviet, J. F. (1980), “Effects of ARMA Errors on Test for Regression Coefficients: Comments on Vinod’s Article; Improved and Additional Results,” Journal of the American Statistical Association, 75, 353–358. Posten, H. O. (1992), “Robustness of the Two-Sample t Test Under Violations of the Homogeneity of Variance Assumption, Part II,” Communications in Statistics, A, 21, 2169–2184. Scariano, S. M., and Davenport, J. M. (1987), “Robustness of the Two-Sample t Test Under Violations of the Homogeneity of Variance Assumption, Part II,” The American Statistician, 41, 123–129. Scheffé, H. (1959), The Analysis of Variance, New York: Wiley. Schneider, H., and Pruett, J. M. (1994), “Control Charting Issues in the Process Industries,” Quality Engineering, 6, 345–373. Westfall, P. (1988), “Robustness and Power of Tests for a Null Variance Ratio,” Biometrika, 75, 207–214.
© Copyright 2026 Paperzz