Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 331 British Journal of Mathematical and Statistical Psychology (2008), 61, 331–360 q 2008 The British Psychological Society The British Psychological Society www.bpsjournals.co.uk Goodness-of-fit testing using components based on marginal frequencies of multinomial data Mark Reiser* Arizona State University, USA The goodness-of-fit test based on Pearson’s chi-squared statistic is sometimes considered to be an omnibus test that gives little guidance to the source of poor fit when the null hypothesis is rejected. It has also been recognized that the omnibus test can often be outperformed by focused or directional tests of lower order. In this paper, a test is considered for a model on a data table formed by the cross-classification of q dichotomous variables, and a score statistic on overlapping cells that correspond to the first- through qth-order marginal frequencies is presented. Then orthogonal components of the Pearson–Fisher statistic are defined on marginal frequencies. The orthogonal components may be used to form test statistics, and a log-linear version of an item response model is used to investigate the order and dilution of a test based on these components, as well as the projection of components onto the space of lowerorder marginals. The advantage of the components in terms of power and detection of the source of poor fit is demonstrated. Overcoming the adverse effects of sparseness provides another motive for using components based on marginal frequencies because an asymptotic chi-squared distribution will be more reliable for a statistic formed on overlapping cells if expected frequencies in the joint distribution are small. 1. Introduction The goodness-of-fit test based on Pearson’s chi-squared statistic is sometimes considered to be an omnibus test that gives little guidance to the source of poor fit when the null hypothesis is rejected. It has also been recognized that the omnibus test can often be outperformed by focused or directional tests of lower order. Goodness-of-fit tests are frequently employed in psychological research when logistic regressions and log-linear models are used for applications such as tests of independence, item response models, latent class models and discrete-state transition models. Psychologists sometimes employ transition models to study child development. Applications of such models often involve large cross-classified frequency tables. Fergusson, Horwood, and Lynskey (1995), for example, used a discrete-state transition model to study disruptive childhood * Correspondence should be addressed to Mark Reiser, School of Social and Family Dynamics, Arizona State University, Box 873701, Tempe, AZ 85287-3701, USA (e-mail: [email protected]). DOI:10.1348/000711007X204215 Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 332 Mark Reiser behaviours. Behaviour was classified into one of the two categories at five time points, producing a cross-classified table with 32 cells. When the table has so many cells, there is a disadvantage for traditional goodness-of-fit tests because power may be reduced. If the model fits poorly in such an application, lack of fit is often due to misspecification of the two-way associations among behaviours measured across the time points. If so, traditional goodness-of-fit tests, which incorporate degrees of freedom from higher-way associations that are not included in the model, are less sensitive than a lower order or focused test to the misspecification of the two-way associations. The loss of sensitivity is sometimes known as dilution (Kendall & Stuart, 1973, Chapter 30). In any area of study, the size of the table grows exponentially with the number of variables. From a study of social life feelings, Schuessler (1982) produced a selfdetermination scale consisting of 14 items. An item response model for 14 dichotomous items would require a table with 16,384 cells. Unless the sample of observations is very large, a table of this size will have sparse frequencies. Another disadvantage for traditional goodness-of-fit tests is that sparseness may adversely affect the validity of using the chi-squared distribution as an approximation for the distribution of statistics such as the Pearson and likelihood ratio, even if the sample size could be considered large (Koehler & Larntz, 1980; Koehler, 1986). Working with tables that are so extensive highlights the need for overcoming the effects of sparseness, obtaining higher power with a focused test, and finding more guidance on the source of poor fit. In this paper, methods based on orthogonal components and marginal frequencies are considered for testing a model fit to the T ¼ 2q frequencies from the crossclassification of q dichotomous variables. The proposed methods are in an area that has recently become a topic of significant practical interest in psychology because there is growing recognition that traditional goodness-of-fit tests based on the likelihood ratio or Pearson statistics have considerable shortcomings, as discussed above, for an application with a large number of variables. Marginal frequencies can be obtained by summation across joint frequencies. In Section 2, the transformation from joint proportions to marginal proportions is presented in terms of linear combinations. In Section 3, a score statistic is defined on a set, or a subset, of the marginal frequencies from the first to the qth order. In Section 4, orthogonal components defined on marginal frequencies are obtained for the Pearson–Fisher statistic, and the score statistic from Section 3 is shown to be a sum of these orthogonal components. Given the shortcomings of traditional tests, there are two primary motivations for considering tests based on orthogonal components defined on marginal frequencies. First, a test based on a subset of marginals is essentially a lower-order cell-focusing test (Rayner & Best, 1989, Chapter 7) that should have higher power than a traditional test to detect a departure from certain null hypotheses in specific directions for any finite sample size. Second, marginal frequencies are overlapping cells, and since these overlapping cells have larger frequencies than the cells from the joint distribution of the full cross-classification, using them to form a test statistic should improve the reliability of the asymptotic chi-squared distribution when expected frequencies in the joint distribution are small due to sparseness (Hall, 1985). While most of the previous work on using marginal frequencies for goodness-of-fit testing has been carried out in the context of model selection when data are sparse, the emphasis of this paper is directed more towards the power of a focused test based on orthogonal components against various false null hypotheses when data are not necessarily sparse. An approach to performing power calculations for the test of a composite null hypothesis using components defined on marginal frequencies is presented in Section 5. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 333 The log-linear version of the Rasch latent trait model is reviewed in Section 6, and some advantages of using orthogonal components in a focused test for model selection are demonstrated in Section 7, where the log-linear Rasch model is used to investigate power, the effective order of alternative hypotheses, dilution of the test statistics, as well as the projection onto the space of lower-order marginals. The advantage of examining components for the purpose of detecting the source of poor fit is demonstrated in Section 8, which includes an example of fitting a model to aptitude item responses. Results on power comparisons to be presented in Section 7 demonstrate that in addition to overcoming the adverse effects of sparseness, another primary motivation for using a test based on components for lower-order marginals is the attainment of higher power than traditional tests in some circumstances even when data are not sparse. Thus, the results presented below will establish the case that tests based on orthogonal components associated with marginal proportions should be regularly examined for certain types of applications in psychology and social sciences in general. Some results also indicate that a test based on lower-order marginals may be insensitive to a departure from the null hypothesis in the higher-order marginals. To guard against this possibility in practice, a recommendation is given to examine the residual chi-squared statistic, similar to the recommendation given by Durbin and Knott (1972) in their study of components of the Cramér–von Mises statistic. Christofferson (1975) originally introduced the idea of using first- and second-order marginals for a test of fit in dichotomous variable factor analysis. Muthén (1978) improved the statistic given by Christofferson, although neither of these authors presented their test as attaining higher power or as a remedy for sparseness. The approach presented here is a direct extension of results from Reiser (1996) on using first- and second-order marginals to test the fit of item response models and Reiser and Lin (1999) on testing the fit of latent class models. A goodness-of-fit statistic focused on lower-order marginals can also be viewed as a special case of the score statistic given by Rayner and Best (1989). The idea of decomposing Pearson’s chi-squared statistic into components has a long history which is discussed in Lancaster (1969). As mentioned earlier, overcoming effects of sparseness is one of the primary motives for using tests based on orthogonal components associated with marginal frequencies. There is a large literature on the topic of sparse frequencies and their effect on traditional goodness-of-fit tests for multinomial models. Frequency tables are said to be sparse when the ratio of the sample size to the number of cells is relatively small (Agresti & Yang, 1987). Cochran (1954), Fisher (1941), Cramér (1946), Kendall (1952) and Tate and Hyer (1973) have given recommendations for minimum expected frequency. Lancaster (1969, p. 175) gives the following summary. Some general conclusions can be made – it is probably desirable not to have any expectation less than unity; with several degrees of freedom, for class frequencies of 5 or more, the distributions of the Pearson x2 approximate satisfactorily to the asymptotic or theoretical x2 distributions. If there are a number of classes, perhaps a third or quarter of them can have expectations in the interval, 1 to 5, without causing serious departures of the distribution of the Pearson x2 from the theoretical : : : the parameter of non-centrality of the x2 test may be greatly diminished if too much ‘pooling’ is carried out : : : More recently, Haberman (1988) warns that given any minimum expected cell size under the null hypothesis and given any significance level, it is possible to make the power of X2 arbitrarily close to zero by the selection of a large enough number of cells and suitable cell probabilities for the null and alternative hypotheses. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 334 Mark Reiser If sparseness is present in a situation when a goodness-of-fit test is desired on a multinomial model, tests based on components formed from lower-order marginals can overcome the adverse effects of sparseness, as shown in Reiser and Lin (1999), because the overlapping cells have larger frequencies than the cells of the full crossclassification. As mentioned previously, the overlapping cells result in a statistic for which the asymptotic chi-squared distribution is more reliable. Overcoming the effects of sparseness to obtain valid tests was the original motivation for using statistics based on marginal frequencies. Of course, sparseness becomes more of a concern when sample size is smaller, which is also a condition under which utilizing a test that possesses higher power becomes more vital. Hence, the two motivations discussed here for using tests based on components formed from lower-order marginals are both frequently pertinent to the choice of a test statistic when fitting models to data. Other work in the area of tests based on lower-order marginals as a remedy for sparseness includes Knott and Tzamourani (1997), who advise that a chi-squared or likelihood ratio statistic may not be useful in a test of fit when a latent trait model is applied to a large number of possible response patterns. They suggest that it would be informative to instead compare observed and fitted values for first-, second- and thirdorder marginal frequencies when assessing model fit. Bartholomew and Tzamourani (1999) also examine sparseness issues for goodness-of-fit testing with latent trait models. Bartholomew and Leung (2002) developed an alternative statistic on second-order marginals in the context of testing a two-parameter latent trait model, and MaydeuOlivares (2001) examined limited-information testing of Thurstonian models for paired comparisons. Tollenaar and Mooijaart (2003) investigated the performance of a modified version of the statistic presented in Reiser (1996) under non-standard conditions. Maydeu-Olivares and Joe (2005) proposed a set or hierarchy of statistics based on marginals with an application to the item response model. Their approach is closely related to the technique proposed here, but their hierarchy does not correspond to the decomposition of the Pearson–Fisher statistic as given in Section 4. Although overcoming the effects of sparseness is a fundamental reason for using orthogonal components based on marginals, those adverse effects are not investigated in this paper. An important area for future research is the performance of components of different orders when data have various degrees of sparseness. As the order of marginal proportions becomes higher, the associated orthogonal components of X 2PF are estimated less precisely because the higher-dimensional cross-classification produces frequencies that become more sparse. A significant question to be addressed in additional studies that would employ Monte Carlo simulations is what orders of components can be adequately evaluated using an approximate chi-squared distribution under various conditions of sparse frequencies. 2. Marginal proportions This section includes a presentation of transformations from joint proportions or frequencies to marginal proportions as a prelude to testing a model based on the fit to marginal frequencies. 2.1. First- and second-order marginals The relationship between joint proportions and first- and second-order marginals can be shown using zeros and ones to code the levels of dichotomous response variables. Then, Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 335 a q-dimensional vector of zeros and ones, sometimes called a response pattern, will indicate a specific cell from the contingency table formed by the cross-classification of q response variables. A T-dimensional set of response patterns can be generated by varying the levels of the qth variable most rapidly, the (q 2 1)th variable next, etc. Define V as the T by q matrix with response patterns as rows. For q ¼ 3; 3 2 0 0 0 7 6 60 0 17 7 6 7 6 60 1 07 7 6 60 1 17 7 6 7 V¼6 6 1 0 0 7: 7 6 7 6 61 0 17 7 6 7 6 61 1 07 5 4 1 1 1 Let vis represent element i of response pattern s, and let Y be a vector of dichotomous variables. Also, define u as a parameter vector for a model of interest and ps(u) as the expected proportion for cell s as a function of the parameter vector u. Then, under the model, the first-order marginal proportion for variable Yi can be defined as X P i ðuÞ ¼ ProbðY i ¼ 1juÞ ¼ vis ps ðuÞ; s and the true first-order marginal proportion is given by X P i ¼ ProbðY i ¼ 1Þ ¼ vis ps : s The summation across the frequencies associated with the response patterns to obtain the marginal proportions represents a linear transformation of the frequencies in the multinomial vector p which can be implemented via multiplication by a certain matrix, denoted here generically by the symbol H. The symbol H[t ] denotes the transformation matrix that would produce marginals of order t. The symbol H[t:u ], t # u # q; denotes the transformation matrix that would produce marginals from order t up to and including order u. Furthermore, H ½t ; H ½t:t ; and H ; H ½t:u : There will be occasions to delete certain rows from the matrix H[t:u ] due to collinearity, and the symbol H ½t:u;2d denotes the matrix H[t:u ] with d rows deleted. Matrix H[1] can be defined from matrix V such that H ½1 ¼ V 0 : Under the model, the second-order marginal proportion for variables Yi and Yj can be defined as X P ij ðuÞ ¼ ProbðY i ¼ 1; Y j ¼ 1juÞ ¼ vis vjs ps ðuÞ; s and the true second-order marginal proportion is given by X vis vjs ps : P ij ¼ ProbðY i ¼ 1; Y j ¼ 1Þ ¼ s Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 336 Mark Reiser For second-order P marginals, where j ¼ 1; 2; : : : ; k; i ¼ j þ 1; : : : ; q; s ¼ 1; 2; : : : ; T ; and l ¼ i 2 j þ 0,r,j ðq 2 rÞ; element ls of H [2] is given by ( ½H ½2 ls ¼ 1; if vis ¼ vjs ¼ 1; 0; otherwise: Alternatively, matrix H[2] can be defined by forming Hadamard products (Magnus & Neudecker, 1999, Section 3.6) among the columns of the matrix V: 2 H ½2 6 6 6 6 6 6 6 6 6 6 6 6 ¼6 6 6 6 6 6 6 6 6 6 6 4 ðv 1 +v 2 Þ0 3 7 7 7 7 7 7 7 0 7 ðv 1 +v q Þ 7 7 7 ðv 2 +v 3 Þ0 7 7; 7 .. 7 . 7 7 0 7 ðv 2 +v q Þ 7 7 7 .. 7 . 7 5 ðv q21 +v q Þ0 ðv 1 +v 3 Þ0 .. . where vf represents column f of matrix V and v f +v g represents the Hadamard product of columns f and g. 2.2. Higher-order marginals The third-order marginal proportions for variables Yi, Yj and Yk can be obtained by employing the matrix H[3], which can also be defined as Hadamard products among the columns of V, 2 H ½3 6 6 6 6 6 6 6 6 6 6 6 6 ¼6 6 6 6 6 6 6 6 6 6 6 4 ðv 1 +v 2 +v 3 Þ0 3 7 7 7 7 7 7 7 7 ðv 1 +v 2 +v q Þ0 7 7 ðv 2 +v 3 +v 4 Þ0 7 7 7; 7 .. 7 . 7 7 0 ðv 2 +v 3 +v q Þ 7 7 7 7 .. 7 . 7 5 0 ðv q22 +v q21 +v q Þ ðv 1 +v 2 +v 4 Þ0 .. . Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 337 and then, for example, 2 H ½1:3 H ½1 3 7 6 6 ··· 7 7 6 7 6 H ½2 7: ¼6 7 6 7 6 6 ··· 7 5 4 H ½3 For q ¼ 3; 2 H ½1:3 0 0 0 6 60 6 6 60 6 6 6 6 6 ¼ 60 6 60 6 6 60 6 6 6 4 0 0 0 1 1 1 0 1 ··· 0 0 0 0 0 0 0 0 1 ··· 0 0 0 1 1 1 1 3 7 0 0 1 17 7 7 0 1 0 17 7 7 7 7 0 0 1 17 7: 7 0 1 0 17 7 7 0 0 0 17 7 7 7 5 0 0 0 1 ð1Þ A general matrix H[t:u ] to obtain marginals of any order can be defined in a similar fashion using Hadamard products among the columns of V. H[1:q ] gives a mapping from joint proportions to the entire set of 2q 2 1 marginal proportions: P ¼ H ½1:q p; ð2Þ where P ¼ ðP 1 P 2 P 3 : : : P q P 12 P 13 : : : P q21;q P 112 : : : P q22;q21;q : : : P 123 : : : q Þ0 is the vector of marginal proportions 2.3. Residuals ^ where p^ s ¼ ns =n is element s of p, ^ Define the unstandardized residual r s ¼ p^ s 2 ps ðuÞ; the vector of multinomial proportions; n ¼ element s of n, the vector of observed P s ^ ¼ frequencies; n ¼ total sample size ¼ Ts¼1 ns ; u^ ¼ parameter estimator vector; ps ðuÞ estimated expected proportion for cell s and denote the vector of unstandardized residuals as r with element rs. A vector of simple residuals for marginals of any order may be defined such that ^ ¼ Hr; e ¼ Hðp^ 2 pðuÞÞ and a vector, j, of differences between the marginals specified by the relevant model and the true population marginals may be defined for marginals of any order such that j ¼ Hðp 2 pðuÞÞ: Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 338 Mark Reiser 3. A test of fit using marginal distributions 3.1. The logic of employing marginal frequencies Traditional full-information statistics assess the fit of a multinomial model by testing the null hypothesis H0: p ¼ pðuÞ; where p(u) is a vector of multinomial probabilities as a function of u. The null hypothesis that a vector of differences, j, as defined in the previous section, is equal to zero can be written in the equivalent form H0: Hp ¼ HpðuÞ. Logically, testing H0: p ¼ pðuÞ using lower-order marginals is an application of ‘denying the consequent’. The hypothesis p ¼ pðuÞ implies Hp ¼ HpðuÞ must hold. The converse, of course, is not true. But if Hp ¼ HpðuÞ is not true, then p ¼ pðuÞ cannot be true. In logic, this type of argument is known as modus tollens. Thus, for u , q; the relationship between these null hypotheses works as follows: ‘reject H0: Hp ¼ HpðuÞ’ is a sufficient but not necessary condition for ‘reject H0: p ¼ pðuÞ’; ‘do not reject H0: Hp ¼ HpðuÞ’ is a necessary but not sufficient condition for ‘do not reject H0: p ¼ pðuÞ’. ‘Do not reject H0: Hp ¼ HpðuÞ’ is not a sufficient condition because it is possible that lack of fit may be manifested only in marginals of higher order than u, where u is defined as in H[t:u ]. Power studies presented below address this possibility. The omnibus null hypothesis, H0: p ¼ pðuÞ can be stated in terms of orthogonal components, as demonstrated in Section 4. 3.2. The test statistic Let e represent the covariance matrix of the residuals, e. Using the matrix H[t:u ] as given previously, an extended version of the statistic from Reiser (1996) and Reiser and Lin (1999) can be defined as follows. The statistic ^ 21 e X 2½t:u ¼ e0 e ð3Þ may be used to test H0: j ½t:u ¼ 0; where ^ e ¼ n 21 V e ; ð4Þ with V e ¼ HðDðpÞ 2 pp0 2 GðA 0 AÞ21 G0 ÞH0 ; ^ where D(p) is the diagonal ^ and u, evaluated at the maximum-likelihood estimates p matrix with (s, s)th element eqaul to ps(u), A ¼ DðpÞ21=2 ›pðuÞ ; ›u and G¼ ›pðuÞ ›u V r ¼ ðDðpÞ 2 pp0 2 GðA0 AÞ21 G0 Þ is the asymptotic covariance matrix of r (see ^ which may be the maximumHaberman, 1973). Matrices are evaluated with u ¼ u; likelihood estimator. If the model satisfies the regularity conditions given by Birch (1964), and if the columns of H0 are linearly independent and are also linearly ^ e will be non-singular, and a conventional independent of the columns of G, then inverse can be used to calculate X 2½t:u : If there is a violation of certain regularity Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 339 conditions, including the presence of linear dependencies, then a generalized inverse would be needed to calculate X 2½t:u : The limiting distribution of X 2½t:u as n ! 1 can be shown to be the x2 distribution P L ^ e! because e is a linear combination of the elements of r, e ; and e! MVNðj; e Þ; see Moore (1977) for the principles employed in constructing chi-squared tests. The regularity conditions for the asymptotic chi-squared distribution are given by Birch (1964) and are discussed by Bishop, Fienberg, and Holland (1975). The regularity conditions include the assumptions that w, the true value of u, is not on the boundary of the parameter space Q; ps ðfÞ . 0 for s ¼ 1,: : :,T, so p does not lie on the boundary of the relevant parameter space; p(u) is totally differentiable at w; and the Jacobian matrix ›pðuÞ=›u is of full rank, so p(u) maps a small neighbourhood of w into a small neighbourhood of p(w). Furthermore, the assumption n ! 1 is accompanied by an implicit assumption that the number of cells T remains fixed, so that expected cell frequencies are assured of becoming large with n (see Read & Cressie, 1988, Section 4.3). Since the statistics proposed here use overlapping cells in the form of lower-order marginals, the last assumption is more likely to be met, as compared with traditional statistics, when T is large relative to n. If a generalized inverse is needed to calculate X 2½t:u , then additional regularity conditions must be satisfied in order to establish convergence of X 2½t:u to a chi-squared variate. These additional regularity conditions are given in Section 8.5 of Magnus and Neudecker (1999). The degrees of freedom are determined by the rank of Ve. The focus of this paper is on X 2½2 ; X 2½2:3 ; and X 2½2:4 ; which can be calculated using H[2], H[2:3] and H[2:4] as defined in the previous section. In general, X 2½2 will have degrees of freedom up to minð2q 2 g; 0:5qðq 2 1ÞÞ; and X 2½2:3 will have degrees of freedom up to min ð2q 2 g; 1=6qðq 2 1Þðq þ 1ÞÞ, where g is the number of estimated parameters. As demonstrated in Section 6, some model parameterizations may reduce the rank of Ve and hence the degrees of freedom for X 2½2 and X 2½t:u in general. Any linear dependency among the rows of H and the columns of G will produce a marginal or sum of marginals that is perfectly fitted when calculating X 2½t:u and will lead to the loss of a degree of freedom. A second-order marginal that is perfectly fitted under the model will reduce the degrees of freedom for X 2½t:u by 1 if t # 2 # u; and a third-order marginal that is perfectly fitted under the model will reduce the degrees of freedom for X 2½t:u by 1 if t # 3 # u; etc. X 2½2 and X 2½2:3 ; for q . 2 and q . 3 respectively, represent full-information estimation followed by limited-information test of fit. Reiser (1996) introduced X 2½1:2 with an application to the logit latent trait model, and Reiser and Lin (1999) developed X 2½2 for applications to the latent class model. In these applications, X 2½1:2 and X 2½2 were shown to perform well as goodness-of-fit test statistics even when sparseness is present in the joint frequencies because the frequencies in the overlapping cells formed by the lower-order marginals are usually not sparse. 3.3. Relationship to other statistics In the preceding section, H was presented as a matrix of constants. If, instead, a random variable H l is considered that takes on value hls with probability ps(ua), where ps(ua) represents a probability under a Neyman smooth alternative hypothesis, then the test statistic given above would be written as 2 ^ 21 Hr; ^ 0 ^ X^ ½t:u ¼ r0 H e ð5Þ Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 340 Mark Reiser ^ Written in this form, X^ 2 can be seen as a special case of the score ^ ¼ HðuÞ: where H ½t:u statistic given in Theorem 7.1.1 of Rayner and Best (1989). Rayner and Best present H as a general matrix, not necessarily specifically for producing marginal probabilities. When H produces marginal probabilities, as defined previously in expression (2), the 2 use of X^ ½t:u becomes both an overlapping cells test and a focused test. One benefit of an overlapping cells test, first proposed by Hall (1985), is improved reliability of the 2 asymptotic chi-squared distribution when expected cell frequencies are small. In X^ ½t:u , the marginals constitute the overlapping cells. A benefit of a focused test is increased 2 power to detect a departure from the null hypothesis in a specific direction. X^ ½t:u is focused when it is used for a test on lower-order marginals instead of the full set of marginals from the first through qth order. Studies are reported below for the purpose of examining the power of tests based on marginal frequencies without regard to the sparseness issue. For a cross-classified table, the Pearson chi-squared statistic is obtained by comparing observed probabilities with the probabilities specified under the null hypothesis. Since no parameters are estimated, the degrees of freedom are known to be equal to the number of cells minus 1, which in this case is T 2 1. The Pearson–Fisher statistic is obtained in a similar fashion, except the probabilities under the null hypothesis depend on unknown parameters, and so observed probabilities are compared with estimated expected probabilities that are calculated using parameters estimated under the model of interest: ^ 21 r: X 2PF ¼ nr0 DðpðuÞÞ If g parameters are estimated, then the degrees of freedom are T 2 g 2 1: Bartholomew (1987, p. 94) demonstrates that the joint probability function of Y, a q-dimensional vector of dichotomous variables, can be uniquely expressed in terms of the 2q 2 1 marginal probabilities from first to qth order (see expression (2)). This implies that the Pearson and Pearson–Fisher statistics can be calculated by comparing observed with expected marginal frequencies, using marginals up to order q, and hence X 2½1:q ¼ X 2PF : See the Appendix for a related proof. Although X 2½1:q will reproduce the Pearson–Fisher statistic, in fact, fewer than 2q 2 1 marginals are required to achieve this purpose for a composite null hypothesis. For a composite null, some residuals on the marginals are degenerate variables identically equal to zero due to linear dependencies among the rows of H[1:q ] and columns of G. Only 2q 2 g 2 1 rows of H are linearly independent of the columns of G. In other words, the equivalence X 2½1:q ¼ X 2PF will hold if certain rows, g in number, are deleted from H[1:q ]. Which rows can or should be deleted will depend on the particular model of interest. With commonly used hierarchical log-linear models, for example, a first-order effect is included in the model for each observed variable. The first-order effect produces first-order marginals that are fitted exactly, which implies that X 2½1:q ¼ X 2½2:q . Thus, the rows corresponding to H[1] can be deleted from H[1:q ] for testing such a model. Depending on the other effects included in the model, other rows may also be deleted. Deleting such rows can improve accuracy of numerical calculations. Recall that H[t:u ],2d denotes the matrix H[t:u ] with d rows deleted. Let X 2T 2g21 denote ^ e are calculated using H ¼ H ½1:q;2g and g a full-information statistic, where e and Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 341 represents the number of estimated model parameters. Then, ^ 21 e X 2T 2g21 ¼ e0 e is a special case of expression (3), and X 2T 2g21 ¼ X 2PF : The proof of this equivalence follows from results in Theorems 7.1.1 and 7.1.2 and Section 7.2 of Rayner and Best (1989), as shown in the Appendix. Maydeu-Olivares and Joe (2005) developed a statistic, Mr , that is closely related to ^ 21 in the X 2½1:r . The two statistics are not equivalent, however, because instead of e quadratic form, Mr uses ^ 0 Þ21 2 ðHGH ^ 0 Þ21 HGð ^ 0 Þ21 HGÞ ^ 0 Þ21 ; ^ 0 H0 ðHGH ^ G ^ 0 H0 ðHGH ^ 21 G C^ r ¼ ðHGH ^ ¼ DðpðuÞÞ ^ 2 pðuÞpð ^ ^ 0 , and H is always equal to H[1:r ] when applied to the where G uÞ ^ definition of Mr . e is a generalized inverse of C^ r : Since C^ r appears in the quadratic form, the degrees of freedom for Mr do not match degrees of freedom for X 2½1:r ; when r , q; and the two statistics are not equivalent under that condition. Also, X 2½t:u is more general in two ways. First, the possibility that the statistic will not include some marginals of an entire lower order is allowed because the power of a test may be reduced by including them. Second, Maydeu-Olivares and Joe adopt a condition whereby their statistic does not apply to certain circumstances in which X 2½t:u would apply. This condition is r $ r0 ; where r0 is the smallest integer r such that the model is (locally) identified from the joint moments up to order r. Consider, for example, estimating and testing a log-linear model with a non-hierarchical three-way interaction. The third-order marginal tables contain the sufficient statistics for this model (Agresti, 2002), and these tables identify the model. Thus, r0 ¼ 3: However, if the fitted model happens to specify the wrong three-way interaction, the model might be usefully assessed using a test on the second-order marginals, especially for a sparse data table. X 2½1:2 or X[2] could be used for this test, but M2 would not be applicable. Another limited-information statistic has also been studied by Maydeu-Olivares (2001). The original limited-information statistic with features of cell collapsing and cell focusing was given by Christoffersson (1975). In our notation, his statistic would be written as ^ 2 p^ p^ 0 ÞH ½1:2 r: r0 H0 ½1:2 ðDðpÞ This statistic is similar to X 2½1:2 but parameter estimation is not incorporated into the ^ Not covariance matrix, including the use of observed proportions instead of pðuÞ: 2 surprisingly, simulations show that this statistic performs less well than X ½1:2 when sample sizes are smaller. Christoffersson’s statistic could be easily extended to include higher-order marginals. However, even if marginals from first to qth order were included, this statistic would not be equivalent to the Pearson–Fisher statistic. Muthén (1978) developed a modified version of Christoffersson’s statistic. A statistic on only second-order marginals was presented by Bartholomew and Leung (2002). Their statistic uses a diagonal covariance matrix for e, with a less efficient estimator for variance than the one presented here. As a score statistic, X 2½t:u possesses certain optimality properties not enjoyed by these other statistics. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 342 Mark Reiser 4. Orthogonal components 4.1. Overview In this section, results from Lancaster (1969, Sections 8.4 and 9.12) are used to decompose X 2T 2g21 ; and hence the Pearson–Fisher statistic, into orthogonal components associated with marginals from order 1 to q. Explicit rules for obtaining independent components are also given in Agresti (2002, p. 84). The individual components themselves are useful as a guide to how the model of interest may not fit well to the observations. In addition, different versions of X 2½t:u from the previous section can be shown to be sums of these orthogonal components. Define the standardized cell residual (Cochran, 1954) as ^ 21=2 ðp^ s 2 ps ðuÞÞ: ^ zs ¼ ðps ðuÞÞ P Of course, n s z2s ¼ X 2PF . Since the columns of H0 form a basis for the 2q cells of the multinomial on which X 2PF is calculated, X 2PF can be decomposed into orthogonal components associated with marginal frequencies by obtaining the sequential sum of squares from a weighted regression of z on the columns of H0 . Here, H can be either H[1:q ] or H[1:q ],2g. If the decomposition is based on H[1:q ], then g of the components will be degenerate random variables identically equal to zero. If the decomposition is based on H[1:q ],2g, then none of the components will be identically equal to zero. The sequential sum of squares mentioned earlier can be represented as the elements ^ to be completely defined below. Let g^ 2j , an element of g^ 0 g, ^ of an inner product, g^ 0 g, denote an orthogonal component of X 2PF . For q ¼ 3; the maximum number of orthogonal components is seven, and they could be partitioned as follows: .. .. 2 2 2 2 2 2 2 0: g^ 1 g^ 2 g^ 3 . g^ 4 g^ 5 g^ 6 . g^ 7 g^ 21 ; g^ 22 and g^ 23 are components produced from the first three rows of matrix H[1:3] shown earlier in expression (1) and are associated with first-order marginals. g^ 24 ; g^ 25 and g^ 26 are components produced from rows 4, 5 and 6 of H[1:3] and are associated with secondorder marginals. Finally, g^ 27 is a component produced from row 7 of H[1:3] and is associated with the single third-order marginal. In an application to a log-linear model, for example, each component is associated with a specific model effect, and some components may be identically equal to zero. To test a model of independence on frequency counts, a log-linear model with only firstorder effects might be fitted. In the case of the independence model for q ¼ 3; three of the components would be identically equal to zero because the first-order marginals for variables 1, 2 and 3 are fitted exactly. Thus, under this model and with q ¼ 3; g^ 21 ¼ g^ 22 ¼ g^ 23 ¼ 0. Each component that is identically equal to zero corresponds to an estimated parameter. The other components correspond to association effects not included in the model. g^ 24 would give an indication of whether there is a lack of fit for the two-way association between variables 1 and 2, g^ 25 would give a similar indication for variables 1 and 3, and g^ 26 would refer to the association between variables 2 and 3. g^ 27 would give an indication of lack of fit in the three-way association among variables 1, 2 and 3. Used in this way, an orthogonal component is similar to a Wald statistic for the absence of a certain interaction in a log-linear model, but it is not equivalent to the Wald statistic because the components are order-dependent. That is, the values taken on by these orthogonal components depend on the order of the columns in H0 . In other loglinear models, components might indicate lack of fit due to constraints on parameters. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 343 For multinomial models that are not log-linear, components would not necessarily be associated with specific model effects. In general, and assuming that the rows of H are ordered in the same manner as shown in expression (1), and if b1 represents the number of columns in H0 that produce first-order marginals, then g^ 21 þ g^ 22 þ · · · þ g^ 2b1 ¼ X 2½1 ; where X 2½1 is a test statistic on the first-order marginals as developed in the previous section. If b2 is the number of columns from H0 that produce second-order marginals then, for 0 , b1 # q and 0 , b2 # 12qðq 2 1Þ; X 2½1:2 ¼ ðbX 1 þb2 Þ g^ 2j : j¼1 Sums of squares and orthogonal components can be defined in a similar fashion for 0 additional partitions of the H matrix. For example, if b3is the number of columns in H q that produce third-order marginals, then for 0 , b3 # 3 and b2 . 0; X 2½1:3 ¼ j¼b1X þb2 þb3 g^ 2j : j¼1 4.2. Weighted regression To calculate the orthogonal components using a weighted regression, the appropriate ^ is the estimated covariance matrix of the standardized residuals: weight matrix, W, ^ 1=2 Þ0 2 Að ^ 1=2 ððpðuÞÞ ^ 21 A^ 0 Þ; ^ A^ 0 AÞ ^ ¼ ð I 2 pðuÞ W ð6Þ ^ G ^ and pðuÞ ¼ pðuÞ. ^ Let the ^ ¼ G evaluated at u ¼ u, where A^ 5 A evaluated at u ¼ u, vector z, where ^ 21=2 ðp^ 2 pðuÞÞ; ^ z ¼ DðpðuÞÞ ð7Þ ^ can be applied to z, but it have elements zs, as defined earlier. The weight matrix W produces no effect: ^ ¼ DðpðuÞÞ ^ 21=2 ðp^ 2 pðuÞÞ ^ ¼ z: ^ 21=2 ðp^ 2 pðuÞÞ ^ WDðpð uÞÞ ^ 21=2 to adjust for the standardization of Finally, H0 ½1:q can be premultiplied by DðpðuÞÞ ^ the residuals, and if the weight matrix is then applied, let the resulting matrix be M, where ^ 21=2 H0 ½1:q : ^ ^ ¼ WDðpð M uÞÞ ð8Þ With these adjustments, the orthogonal components, g^ 2j , of X 2T 2g21 can be obtained from the sequential sum of squares resulting from an ordinary regression of z on the ^ columns of M. 0 Let hl represent row l of H, and let ½H0 l represent column l of matrix H0 . Then, using results from linear models for the regression that produces the orthogonal components, Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 344 Mark Reiser the sum of squares that constitute the first component is given by ^ 21=2 ½H0 1 ^ SSð½H0 1 Þ ¼g^ 21 ¼ n 21 z 0 WDðpð uÞÞ ^ 21=2 WDðpð ^ 21=2 Wz; ^ 21=2 ½H0 2 h0 DðpðuÞÞ ^ ^ £ h01 DðpðuÞÞ uÞÞ 1 1 where B2 indicates the generalized inverse of the matrix B. The orthogonal complement of [H0 ]2 to [H0 ]1 is given by ’ ^ 0 1 2 h01 W½H ^ 0 2 ; ½H0 2 ¼ ½H0 2 2 ½H0 1 h01 W½H and the sequential sum of squares that constitute the second orthogonal component is given by ^ 21=2 ^ SSð½H0 2 j½H0 1 Þ ¼g^ 22 ¼ n 21 z 0 WDðpð uÞÞ 2 ’ ’ ^ 21=2 ½H0 ’ h0 ’ DðpðuÞÞ ^ 21=2 WDðpð ^ 21=2 Wz: ^ ^ uÞÞ £ ½H0 2 h0 2 DðpðuÞÞ 2 2 Sequential sums of squares for additional orthogonal components may be obtained in a similar manner. 4.3. Cholesky decomposition Alternatively, the orthogonal components may be obtained using a Cholesky decomposition. The matrix H[1:q ],2g should be used in the Cholesky decomposition because conventional methods for calculating the Cholesky factor will fail if the matrix of interest is not positive definite. Consider the T 2 g 2 1 by 2q matrix H* that has full row rank, where H ¼ FH ½1:q;2g ; and where F is the upper triangular matrix such that F0 V e F ¼ I: F ¼ ðC0 Þ21 ; where C is the Cholesky factor of Ve. Premultiplication by (C0 )21 orthonormalizes the matrix H[1:q ],2g with respect to DðpÞ 2 pp0 2 GðA 0 AÞ21 G0 : Then, ^ Þ0 H ^ r; X 2T 2g21 ¼ nr0 ðH ð9Þ ^ r; g^ ¼ n 1=2 F^ 0 Hr ¼ n 1=2 H ð10Þ ^ Define ^ ¼ H ðuÞ. where H P 2g21 2 ^ Then, X 2 ^ 0 g^ ¼ j¼T where F^ is the matrix F evaluated at u ¼ u: g^ j : Since T 2g21 ¼ g j¼1 0 ^ r has asymptotic covariance matrix F V e F ¼ I T 2g21 ; the elements g^ 2j are H asymptotically independent x21 random variables. The orthogonal components may be examined individually, or they may be pooled into components due to first-, second-order marginals, etc. as described earlier. Define h as a vector of orthogonal parameters on the cell probabilities, with h ¼ F0 Hðp 2 pðuÞÞ ¼ F0 j: The omnibus null hypothesis H0: p ¼ pðuÞ is equivalent to the null hypothesis H 0 : h1 ¼ h2 ¼ h3 ¼ : : : ¼ hT 2g21 ¼ 0: A test using an individual component of X 2T 2g21 is a test of an individual element of h. A test based on only the first- and second-order marginals, for example, would apply to the null hypothesis H 0 : h1 ¼ h2 ¼ h3 ¼ : : : ¼ hðb1 þb2 Þ ¼ 0, which could be considered to be a component of the omnibus null hypothesis. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 345 Other limited-information statistics discussed in the previous section do not permit a decomposition into components of the Pearson–Fisher statistic. The statistics of Christoffersson (1975) and Bartholomew and Leung (2002) would not be equivalent to the Pearson–Fisher statistic even if marginals from the first to the qth order were incorporated. Although the statistic Mr from Maydeu-Olivares and Joe (2005) is equivalent to X 2T 2g21 ¼ X 2½1:q;2g , when r ¼ q; the statistic Mr, unlike X 2½1:r;2d ; does not form a component of the Pearson–Fisher statistic when r , q because the degrees of freedom for Mr do not match degrees of freedom for components based on marginals up to order r, as discussed earlier. 5. Power analysis 5.1. Non-centrality parameter When a model and unknown parameters are estimated and fitted to a set of cell frequencies, the test of fit is an assessment of a composite null hypothesis. In this situation, numerical simulations are sometimes used to investigate power to detect a false null hypothesis. Asymptotic power of the Pearson statistic for the situation of a composite null hypothesis can be considered using a sequence of local alternatives, pffiffiffi p n ¼ pðuÞ þ d= n; ð11Þ where pn is the vector of true probabilities indexed by the sample size n. In this approach, the best fit of the model to the population gives pp s(u) ffiffiffi as the probability for cell s, but the true probability differs from that value by d s = n: The model lack of fit goes to zero at the rate n 21=2 as n approaches infinity. With this technique, Mitra (1958) shows that X 2PF has a limiting non-central chi-squared distribution with non-centrality parameter l, where l ¼ d0 Diag½pðuÞ21 d; and df ¼ T 2 g 2 1, where T ¼ 2q in the present case. Under the condition H ¼ H ½1:q;2g ; and using a strategy similar to that used in the Appendix, it can be shown that l ¼ d0 Diag½pðuÞ21 d ¼ d0 H0 21 e Hd: ð12Þ Based on the right-hand side of this expression, it is possible to decompose the noncentrality parameter into orthogonal components associated with marginals of order 1 to q in a manner very close to the earlier decomposition of X 2PF : Using a Cholesky decomposition of the non-centrality parameter, let z ¼ ðF 0 ÞHd ¼ H d; ð13Þ where F and H * are as defined in Section 4. Then, l ¼ z0 z and orthogonal components are z2j , where zj is an element of z. These components may be used to calculate the power for tests based on marginals of differing order. For example, the non-centrality parameter for X 2½1:2 is given by 1 2qðqþ1Þ n X j¼1 z2j : Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 346 Mark Reiser 5.2. An approximation for l Although expression (11) implies that for large samples the non-central chi-squared approximation is valid when the model is barely incorrect, Agresti (2002) suggests that it is often reasonable to adopt expression (11) for fixed, finite n in order to approximate the distribution of X 2PF even though it might not be expected to hold as substantially more data are obtained. For purposes of power calculations under fixed, finite n, as presented below in Section 7, cell proportions were generated from a known model, with parameter vector ua. These proportions were multiplied by a selected initial sample size such as n0 ¼ 1,000. The model of the null hypothesis was then analysed using maximum likelihood on the resulting cell frequencies without any added random variability as input. Let ua be the vector that maximizes the function X Fðp; pðuÞÞ ¼ n ps logðps ðuÞÞ s when pðu a Þ is the vector of multinomial proportions. d * was then chosen such that pffiffiffi d ¼ nðp a 2 pðua ÞÞ; where p a ¼ pðu a Þ corresponds to the known generated cell proportions. This method uses d 0 Diag½pðua Þ21 d as an approximation to l as given in expression (12), assuming that if ua is close to the value specified by the null hypothesis, nðp n 2 pðun ÞÞ0 Diag½pðun Þ21 ðp n 2 pðun ÞÞ ¼ l þ oð1Þ; where un is the vector maximizing Fðp n ; pðuÞÞ: A similar approximation was used by Satorra and Saris (1985) for the likelihood ratio statistic applied to structural equation models. The chosen value of d * can be used to approximate the non-centrality parameter for the initial sample size n0. Orthogonal components based on the right-hand side of 2 expression (13) can be calculated using e evaluated at u ¼ ua . The symbol z^ j is used 2 below in Section 7 to represent zj calculated in this way. The non-centrality parameter for any other sample size, say simply n, can be approximated using the expression l < n=n0 l0 : Some Monte Carlo simulations were conducted to cross-validate this approach to the calculation of power for fixed, finite n, and the Monte Carlo results were very close to the those obtained using the method described here. It should be noted that for some of the hypotheses examined in Section 7, asymptotic power could be calculated using a definition for d that would be essentially equivalent to the definition given in expression (18) in Maydeu-Olivares and Joe (2005). 6. Application to a log-linear latent structure model This section presents a psychometric model commonly known as the Rasch model. A log-linear version of the Rasch model will be used in power calculations demonstrated below. 6.1. The Rasch model Latent structure models are used to postulate latent variables to account for the association among manifest variables. When a latent structure model includes a continuous latent variable, it is more commonly known as a latent trait or item response Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 347 model. Suppose that data are available on q manifest variables, or items, for each of n individuals; each variable can take only the values 0 and 1. Denote by Yij the response of the jth individual on the ith variable, and by Xj the latent variable associated with individual j. b0i is an intercept parameter for item i and b1i is the slope parameter that represents the association between item i and the latent variable X. The model that includes both intercept and slope parameters is sometimes referred to as the two-parameter model. A restriction of equality can be imposed on the q slope parameters, and the common slope parameter can be absorbed into the scale of X. Then, the logit version of the one-parameter or Rasch (1980) model is as follows: Y ij jb0i ; X j , independent; ð14Þ PðY ij ¼ 1jb0i ; X j ¼ x j Þ ¼ ð1 þ expð2b0i 2 x j ÞÞ21 ; ð15Þ for i ¼ 1; : : : ; q; j ¼ 1; : : : ; n; where X 1 ; : : : ; X n are independent and identically distributed random variables with common cumulative distribution function F(·). In practice, the Rasch model is frequently an unrealistic approach for a model of the responses to a set of items because the requirement for equal slopes is too restrictive. It is similar to imposing a requirement on a factor analysis model that all manifest variables must have the same factor loading. Often, an application of the two-parameter model will be more likely to result in a successful fit. The one- and two-parameter models are both models for the cell counts of a multinomial distribution. The methods presented in this paper can be used for any model fitted to a 2q table. The Rasch model has been selected to demonstrate power calculations in the next section because a log-linear version of the model is available. Using the log-linear version to demonstrate power calculations has the advantages that it is convenient to both demonstrate the influence of higher-order interactions and to estimate the model with widely available software. A more extensive presentation of the latent structure model as a logistic regression model is given in Reiser (1996). 6.2. The generalized Rasch model Tjur (1982) and Cressie and Holland (1983) demonstrate the equivalence of the logit version of the Rasch model to a log-linear ‘generalized’ version. In this generalized version, let ms be the expected count for cell s, then Y logðms Þ ¼ l þ lYf 1 þ lYg 2 þ · · · þ lh 1q þ lTt ; ð16Þ where lYf i is the effect for level f of manifest variable i, and lTt an effect for respondents with the same total score, t ¼ 0; 1; : : : ; k: P A set of individuals who have the same total score, c ¼ i yij ; has been called a score group. Whereas the logit version requires an assumption regarding the distribution of the latent variable, this distribution is left unspecified in the log-linear version. Reiser and Schuessler (1991) demonstrate applications of the various models discussed earlier. The model given in expression (16) is a special case of the homogeneous association model. For this model, the parameter vector u contains the l parameters Y Y u0 ¼ l lY0 1 lY1 1 lY0 2 lY1 2 : : : l0 q l1 q lT0 : : : lTq : Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 348 Mark Reiser Familiar restrictions are required on these parameters in order to identify the model. For a log-linear model, e us ps ðuÞ ¼ P u ; l le ð17Þ and for the log-linear Rasch model, ›pðuÞ ¼ ½DðpÞ 2 pp0 X; ð18Þ ›u where X is a 2q by g full-rank model matrix that contains a column for each manifest item and columns for q 2 1 score groups, and g is the dimension of u. The methods given in this paper can be employed with any multinomial model applied to a 2q table, and other models would have a different expression for ›pðuÞ=›u. Using the log-linear Rasch model as the model for p(u) under the null hypothesis, and q ¼ 5 dichotomous manifest variables, power against various alternative hypotheses was investigated for fixed, finite n by employing the methods described in Section 5. 7. Results 7.1. Linear dependencies Under the log-linear Rasch model, the first-order marginal frequencies are fitted exactly. Since they do not contribute to the test of a null hypothesis, first-order marginals are not shown in the results presented below. Also under the model, there is a linear dependency among the second-order marginal proportions such that they sum to 1. Specifically, there is a linear dependency among the columns of the model matrix X and the columns of H0 ½2 . Therefore, the number of orthogonal components for secondorder marginals is one less than the number of second-order marginals. A similar linear dependency exists among each set of marginal proportions from order 3 to q. For q ¼ 5; the single fifth-order marginal is fitted exactly. 7.2. Two-way effect Figures 1–3 show power curves for fixed, finite n under the condition that the null hypothesis is false due to the specification of two-way effects among variables in the model. True frequencies for the alternative hypothesis were generated under the loglinear model logðms Þ ¼ l þ lYg 1 þ lYh 2 þ lYi 3 þ lYj 4 þ lYk 5 þ lY111 ;Y 2 : For results shown in Figure 1, this log-linear model was used with l ¼ 0:5; lY1 1 ¼ lY1 2 ¼ lY1 3 ¼ lY1 4 ¼ lY1 5 ¼ 0:10 and lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:10. The log-linear Rasch model demands equality for all two-way effects, so in this application it is a false model because the two-way effect for variables Y1 and Y2 are not equal to the other two-way effects. In the figures, the solid line curve shows approximate power for a test based on only the second-order marginals, the curve with a line of constant-length dashes shows approximate power for a test based on second- and third-order marginals, and the curve with a line of alternating long and short dashes shows approximate power for the full-information test based on second-, third- and fourth-order marginals. As is apparent from the plot in Figure 1, a test based on second-order marginals has higher Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 349 Figure 1. Estimated power vs. sample size (two-way hierarchical effect). power. This result is not surprising because the lack of fit is in the second-order associations, and including third- or fourth-order marginals in the test statistic dilutes the test. Orthogonal components for the approximated non-centrality parameter with sample size set at 400 are shown in Table 1, where it can be seen that the effective order of the alternative is actually 1. The first orthogonal component corresponds to the secondorder marginal for variables Y1 and Y2. Since this marginal is the location of the lack of fit for the model, a one degree of freedom test based on this component has higher power than a test of any other order. Including any additional components dilutes the test. The one degree of freedom test based on the first orthogonal component is actually a score test of the null hypothesis H 0 : lYij 1 ;Y 2 ¼ 0; where all other pairs of variables have association equal to zero. Since the log-linear Rasch model requires equality among all two-way effects, Figure 1 is based on a model for the null hypothesis which is a nested version of the model under the alternative hypothesis. Maydeu-Olivares and Joe (2005) also considered asymptotic power for the item response model in the case where item associations under the null Figure 2. Estimated power vs. sample size (two-way hierarchical effect). Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 350 Mark Reiser Figure 3. Estimated power vs. sample size (two-way hierarchical effect). hypothesis can be obtained by a constraint on the item associations under the alternative hypothesis. Figures 2 and 3 show power curves generated using the same procedure and model as in Figure 1, except lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:20 in Figure 2, and Table 1. Orthogonal components and associated power (two-way effect) P 2 ^i Component Marginal 6^ 2i i6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (1,2,3) (1,2,4) (1,2,5) (1,3,4) (1,3,5) (1,4,5) (2,3,4) (2,3,5) (2,4,5) (1,2,3,4) (1,2,3,5) (1,2,4,5) (1,3,4,5) 0.034831 1.896E-6 2.391E-6 3.120E-6 4.625E-6 6.862E-6 0.000011 3.548E-15 1.064E-14 1.574E-6 1.938E-6 2.444E-6 1.719E-10 2.363E-10 3.52E-10 6.148E-10 1.228E-9 3.703E-9 2.853E-6 4.751E-6 9.485E-6 4.108E-16 0.034831 0.034833 0.034835 0.034838 0.034843 0.034850 0.034861 0.034861 0.034861 0.034862 0.034864 0.034867 0.034867 0.034867 0.034867 0.034867 0.034867 0.034867 0.034870 0.034874 0.034884 0.034884 Components that are identically equal to zero are omitted from the table. Power calculated with n ¼ 400; a ¼ :05. Estimated power .96 .93 .89 .86 .84 .81 .79 .76 .74 .72 .70 .68 .67 .65 .64 .62 .61 .59 .58 .57 .56 .55 Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 351 lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:30 in Figure 3. The test based on secondorder marginals maintains higher power even as lYij 1 ;Y 2 increases. 7.3. Three-way effect Figures 4–6 show power curves under the condition that the null hypothesis is false due to a single three-way interaction among variables Y2, Y3 and Y4. The log-linear Rasch model of the null hypothesis allows no three-way interactions. The curves shown in Figure 4 were obtained under frequencies generated from a log-linear model that now includes three-way associations: logðms Þ ¼l þ lYg 1 þ lYh 2 þ lYi 3 þ lYj 4 þ lYk 5 þ lYgh1 ;Y 2 þ lYgi1 ;Y 3 þ lYgj1 ;Y 4 þ lYgk1 ;Y 5 þ lYhi2 ;Y 3 þ lYhj2 ;Y 4 þ lYhk2 ;Y 5 þ lYij 3 ;Y 4 þ lYik3 ;Y 5 þ lYjk4 ;Y 5 þ lYhij1 ;Y 2 ;Y 3 ; Y ;Y Y ;Y Y ;Y where l ¼ 0:5; lY1 1 ¼ lY1 2 ¼ lY1 3 ¼ lY1 4 ¼ lY1 5 ¼ 0:10; l11i j ¼ l00i j ¼ 2l01i j ¼ Y i ;Y j 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 ¼ lY010 ¼ lY100 ¼ lY111 ¼ 2l10 ¼ 0:20; for i, j ¼ 1; 2; 3; 4; 5; and lY001 Y 2 ;Y 3 ;Y 4 Y 2 ;Y 3 ;Y 4 Y 2 ;Y 3 ;Y 4 Y 2 ;Y 3 ;Y 4 ¼ 2l011 ¼ 2l101 ¼ 2l110 ¼ 0:20: Since the two-way effects 2l000 for all pairs of variables are constrained equal in the generating model, the model of the null hypothesis is false only with respect to the three-way effect. As can be seen in Figure 4, the test based on both second- and third-order marginals has higher power, and the test based on only the second-order marginals has very low power for detecting the three-way effect. Table 2 shows the orthogonal components. From viewing the table, it appears that the effective order of the alternative is 16, but this is due only to the arbitrary order of the rows of the H matrix. That is, row 16 corresponds to the third-order marginal for variables Y2, Y3 and Y4, and if this row had been placed as the first row in the matrix, the effective order of the test would have been 1. A one degree-of-freedom test on this component would be a score test on the three-way effect. Figure 4. Estimated power vs. sample size (three-way hierarchical effect). Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 352 Mark Reiser Figure 5. Estimated power vs. sample size (three-way hierarchical effect). For the curves shown in Figure 5, 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 lY001 ¼ lY010 ¼ lY100 ¼ lY111 ¼ 2lY000 ¼ 2lY011 ¼ 2lY101 2 ;Y 3 ;Y 4 ¼ 2lY110 ¼ 0:30: The power of the test based on second-order marginals is still low. For the curves shown in Figure 6, 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 2 ;Y 3 ;Y 4 ¼ lY010 ¼ lY100 ¼ lY111 ¼ 2lY000 ¼ 2lY011 ¼ 2lY101 lY001 2 ;Y 3 ;Y 4 ¼ 2lY110 ¼ 0:50: Figure 6. Estimated power vs. sample size (three-way hierarchical effect). Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 353 Table 2. Orthogonal components and associated power (three-way hierarchical effect) P 2 ^i Component Marginal 6^ 2i Estimated power i6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (1,2,3) (1,2,4) (1,2,5) (1,3,4) (1,3,5) (1,4,5) (2,3,4) (2,3,5) (2,4,5) (1,2,3,4) (1,2,3,5) (1,2,4,5) (1,3,4,5) 0.000021 0.000022 0.000024 0.000149 0.000076 0.000104 0.000032 0.000120 1.82E-14 0.000040 0.000044 0.001303 0.000270 0.001795 0.002144 0.015416 9.658E-15 7.06E-15 0.000027 4.9918E-6 9.9999E-6 0.000030 0.000021 0.000043 0.000067 0.000215 0.000291 0.000396 0.000428 0.000548 0.000548 0.000588 0.000632 0.002205 0.002205 0.004000 0.006144 0.021560 0.021560 0.021560 0.021587 0.021592 0.021602 0.021632 .051 .052 .052 .056 .058 .059 .059 .061 .060 .060 .084 .087 .087 .122 .166 .579 .565 .552 .541 .529 .518 .509 Components that are identically equal to zero are omitted from the table. Power calculated with n ¼ 600; a ¼ :05. The test based on the second-order marginals has power larger than the size of the test when the sample size is quite large, although the power is low. Figures 7 and 8 show power curves under the condition that the null hypothesis is false due to a non-hierarchical three-way effect. In other words, all two-way effects are equal to zero, but variables Y1, Y2 and Y3 have a three-way interaction. Results here are similar to the curves for a hierarchical three-way effect. 7.4. Two- and three-way effects Figures 9 and 10 show curves for power to detect a departure from the null hypothesis in both the second- and third-order associations. Under the alternative hypothesis, Y1 and Y4 have no association with the other variables, but Y2, Y3 and Y4 have uniform pairwise associations. Since the model of the null hypothesis demands a homogeneous association for all pairs of variables, there is a discrepancy in the two-way associations. Furthermore, under the alternative hypothesis, a three-way interaction effect is present for variables Y2, Y3 and Y4. Since the model of the null hypothesis does not allow a three-way interaction, there is a discrepancy also in the three-way associations. For the curves shown in Figure 9, the two-way association effect for variables Y2, Y3 and Y4 was held constant and equal to 0.075, but the three-way association effect was varied over the range 0.0–0.30. In Figure 10, the two-way association effect for variables Y2, Y3 and Y4 was held constant and equal to 0.10, while the three-way association effect was varied over the range 0.0–0.30. As can be seen in both figures, when the three-way effect is small, the test based on X 2½2 has Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 354 Mark Reiser Figure 7. Estimated power vs. sample size (three-way non-hierarchical effect). considerably higher power than a test based on either X 2½2:3 or the full-information Pearson–Fisher statistic. As the three-way association effect becomes larger, however, the power of the test based on X 2½2 rises only gradually, while the power of tests based on the other two statistics rises rapidly. In both Figures 9 and 10, the power of the test based on X 2½2:3 surpasses the power of the test based on X 2½2 when the three-way association effect becomes larger than the two-way association effect. In Figure 9 the curves cross at the point where the three-way association becomes larger than 0.075, and in Figure 10 they cross at the point where the three-way association becomes larger than 0.10. Table 3 shows the orthogonal components that correspond to a slice of Figure 10 where the two- and three-way associations are both equal to 0.10, and the sample size is 1,000. As can be seen, the tests based on X 2½2 and X 2½2:3 have essentially the same power. The effective order of the alternative is 16, as in the case considered above for an Figure 8. Estimated power vs. sample size (three-way non-hierarchical effect). Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 355 Figure 9. Estimated power vs. sample size (three-way hierarchical effect; n ¼ 500). alternative hypothesis with a departure from the null hypothesis due only to a three-way hierarchical effect. The effective order will shift as the three-way association changes from small to large. 8. Example The log-linear Rasch model of Section 6 was fitted to data from Bock and Lieberman (1970) on responses from a sample of size 1,000 to five items from Section 7 of the LSAT. Orthogonal components of X 2PF for this application are shown in Table 4, with two of the components very large relative to a central chi-squared distribution on one degree of freedom. The log-linear Rasch model is a special case of the homogeneous association model, obtained from that model by imposing an equality constraint among all two-way associations. The largest orthogonal component shows that this constraint produces an especially poor fit for the association between items 2 and 3. The third-order marginal for items 1, 2 and 5 is also poorly fitted. X 2½2 ¼ 22:94 on 9 degrees of freedom ðp , :007Þ; so the model does not fit well to the second-order Figure 10. Estimated power vs. sample size (three-way hierarchical effect; n ¼ 500). Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 356 Mark Reiser Table 3. Orthogonal components and power (two- and three-way hierarchical effects) P 2 ^i Component Marginal g^ 2i Estimated power ig 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (1,2,3) (1,2,4) (1,2,5) (1,3,4) (1,3,5) (1,4,5) (2,3,4) (2,3,5) (2,4,5) (1,2,3,4) (1,2,3,5) (1,2,4,5) (1,3,4,5) 0.001402 0.001709 0.002129 0.002782 0.004281 0.006307 0.001287 0.008932 5.091E-13 0.000086 0.000105 0.000176 0.000189 0.000320 0.000461 0.006629 1.37E-13 8.186E-13 8.189E-6 1.507E-6 3.024E-6 9.108E-6 0.001402 0.003111 0.005240 0.008022 0.012303 0.018610 0.019897 0.028829 0.028829 0.028915 0.029020 0.029196 0.029385 0.029705 0.030166 0.036795 0.036795 0.036795 0.036803 0.036805 0.036807 0.036816 .220 .332 .460 .606 .781 .921 .926 .986 .983 .979 .976 .973 .970 .968 .967 .989 .987 .986 .984 .982 .980 .978 Components that are identically equal to zero are omitted from the table. Power calculated with n ¼ 1,000, a ¼ :05. marginals, and it follows from the result of this test that the model also does not fit well to the joint frequencies of the 32-cell full cross-classification. Although, in this case, the same conclusion would be reached by performing a test using X 2PF , the results presented above show that a test based on X 2½2 has higher power because the source of poor fit is primarily in the second-order marginals. Both tests give the same result in this application because the sample size and effect size are large. If these LSAT items were selected by psychologists for the purpose of constructing a set of questions that would follow the Rasch model principles, the results for orthogonal components show that the content and wording of items 2 and 3 should be investigated to identify the source of the association not shared by the other item pairs. Salomaa (1990) found that a log-linear model with unconstrained two-way associations fits well to these frequencies. Results from such a model show that the association between items 2 and 3 is substantially stronger than the associations among other item pairs. A model that contains the entire set of two-way association cannot be tested with X 2½2 because all of the second-order marginals are fitted exactly. Under this model of unconstrained two-way associations, the component of X 2PF for the third-order marginal among items 1, 2 and 5 is still fairly large at 5.41, but X 2½3 ¼ 14:1 on 10 degrees of freedom ðp , :168Þ; which leads to a conclusion that is consistent with Salomaa’s result. Bock and Lieberman (1970) found the fit of the two-parameter item response model to be marginal ð:05 , p , :10Þ. In their results, items 2 and 3 had estimated slope parameters of larger magnitude than the other items. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 357 Table 4. Orthogonal components of X 2PF for Rasch model (fitted to LSAT Section 7 data) Component Marginal 6^ 2i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (1,2,3) (1,2,4) (1,2,5) (1,3,4) (1,3,5) (1,4,5) (2,3,4) (2,3,5) (2,4,5) (1,2,3,4) (1,2,3,5) (1,2,4,5) (1,3,4,5) 0.39 0.38 0.58 0.39 17.70 0.12 2.12 0.25 1.00 0.87 0.62 8.07 1.06 1.51 0.89 0.77 1.24 0.01 0.83 0.39 0.03 5.15 P ^ 2j j6 0.39 0.78 1.36 1.75 19.45 19.57 21.69 21.95 22.94 23.81 24.43 32.50 33.56 35.07 35.96 36.73 37.97 37.98 38.81 39.20 39.24 44.38 Components that are identically equal to zero are omitted from the table. 9. Discussion X 2½t:u is an extension of the limited-information statistic from Reiser (1996) and Reiser and Lin (1999), designed to include marginals of any order from a multinomial distribution with 2q cells. It was shown that the Pearson–Fisher statistic can be partitioned into orthogonal components associated with marginal frequencies, and that X 2½t:u constitutes a sum of these orthogonal components. The non-centrality parameter for the large-sample distribution of X 2½t:u was also partitioned into orthogonal components based on marginal distributions. Power calculations based on these orthogonal components, and assuming that each estimated ^ is large, showed that a test based on low-order expected cell frequency, nps ðuÞ, marginals, X 2½2 , has higher power to detect lack of fit located in the second-order associations when compared with a statistic that incorporates higher-order marginals such as X 2½2:3 or the Pearson statistic. X 2½2 , however, would be very insensitive to a lack of fit that is present only in the third-order marginals. Other limited-information statistics discussed in the paper can be expected to suffer the same low power in this type of situation. The result demonstrates that X 2½2 is focused on the second-order marginals. The optimal form of X 2½t:u for a particular test will depend on the order of an alternative hypothesis, which is determined by the relative magnitude of parameters under the alternative model that contribute to lack of fit for the hypothesized model in marginals of differing order. In many applications of latent variable models in the social sciences, manifest variables are designed to have high bivariate association, and so X 2½2 or X 2½1:2 may be the Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 358 Mark Reiser best choice for a test statistic in this area of application. To guard against the possibility of failing to detect a departure from the null hypothesis H0:p ¼ pðuÞ in higher-order marginals, the residual X 2PF 2 X 2½2 could be examined, since X 2½2 is a component of X 2PF : If the residual is large, relative to degrees of freedom, then inclusion of higher-order marginals may be warranted. Use of residuals in a similar way was suggested by Durbin and Knott (1972) in the context of the Cramér–von Mises statistic. It is also apparent from the results presented here that a test based on an entire bank of marginals, such as the complete set of second-order marginals, may not be optimal in terms of power against a false null hypothesis. An alternative approach would be to estimate the optimal value for the number of components to be included in the test statistic. Eubank (1997), for example, presented a method for selecting the number of components that is similar to using Mallows’ Cp to choose the best regression model. In an application of a model to contingency tables, it may also be useful to examine each of the orthogonal components on a single degree of freedom in order to detect possible sources of lack of fit. As suggested by Lancaster (1969), a multiple comparison procedure should be used when assessing a set of components. Acknowledgements The author would like to thank Martin Knott for suggesting the idea of an orthogonal decomposition of the test statistics presented in this paper and for other helpful comments. References Agresti, A. (2002). Categorical data analysis. New York: Wiley. Agresti, A., & Yang, M. C. (1987). An empirical investigation of some effects of sparseness in contingency tables. Computational Statistics & Data Analysis, 5, 9–21. Bartholomew, D. J. (1987). Latent variable models and factor analysis. New York: Oxford University Press. Bartholomew, D. J., & Leung, S. O. (2002). A goodness of fit test for sparse 2p contingency tables. British Journal of Mathematical and Statistical Psychology, 55, 1–15. Bartholomew, D. J., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods and Research, 27, 525–546. Birch, M. W. (1964). A new proof of the Pearson–Fisher theorem. Annals of Mathematical Statistics, 35, 818–824. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis. Cambridge, MA: MIT Press. Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197. Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5–32. Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. Biometrical Journal, 10, 417–451. Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press. Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48, 129–141. Durbin, J., & Knott, M. (1972). Components of Cramér-von Mises statistics I. Journal of the Royal Statistical Society, B, 34, 290–307. Eubank, R. (1997). Testing goodness of fit with multinomial data. Journal of the American Statistical Association, 92, 1084–1093. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society Goodness-of-fit testing using components 359 Fergusson, D. M., Horwood, L. J., & Lynskey, M. T. (1995). The stability of disruptive childhood behaviours. Journal of Abnormal Child Psychology, 23, 379–396. Fisher, R. A. (1941). Statistical methods for research workers (8th ed.). Edinburgh: Oliver and Boyd. Haberman, S. J. (1973). The analysis of residuals in cross-classified tables. Biometrics, 29, 205–220. Haberman, S. J. (1988). A warning on the use of chi-squared statistics with frequency tables with small expected cell counts. Journal of the American Statistical Association, 83, 555–560. Hall, P. (1985). Tailor-made tests of goodness of fit. Journal of the Royal Statistical Society, B, 47, 125–131. Kendall, M. G. (1952). The advanced theory of statistics (5th ed., Vol. 1). London: Griffin. Kendall, M. G., & Stuart, A. S. (1973). The advanced theory of statistics (3rd ed., Vol. 2). London: Griffin. Knott, M., & Tzamourani, P. (1997). Fitting a latent trait model for missing observations to racial prejudice data. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 224–252). Münster: Waxmann. Koehler, K.J. (1986) Goodness-of-fit tests for log-linear models in sparse contingency tables. Journal of the American Statistical Association 81, 483–493. Koehler, K.J., and Larntz, K. (1980) An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association 75, 336–344. Lancaster, H. O. (1969). The chi-squared distribution. New York: Wiley. Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus (rev. ed.). New York: Wiley. Maydeu-Olivares, A. (2001). Limited information estimation and testing of Thurstonian models for paired comparison data under multiple judgement sampling. Psychometrika, 66, 209–228. Maydeu-Olivares, A., & Joe, H. (2005). Limited and full information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009–1020. Mitra, S. K. (1958). On the limiting power function of the frequency chi-square test. Annals of Statistics, 29, 1221–1233. Moore, D. S. (1977). Generalized inverses, Wald’s method, and the construction of chi-squared tests of fit. Journal of the American Statistical Association, 72, 131–137. Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551–560. Rasch, G. (1980). Probabilisitic models for some intellegence and attainment tests. Chicago: The University of Chicago Press. Rayner, J. C. W., & Best, D. J. (1989). Smooth tests of goodness of fit. New York: Oxford. Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer-Verlag. Reiser, M. (1996). Analysis of residuals for the multinomial item response model. Psychometrika, 61, 509–528. Reiser, M., & Lin, G. (1999). A goodness-of-fit test for the latent class model when expected frequencies are small. In M. Sobel & M. Becker (Eds.), Sociological methodology 1999 (pp. 81–111). Boston: Blackwell. Reiser, M., & Schuessler, K. (1991). A hierarchy for some latent structure models. Sociological Methods and Research, 19, 419–465. Salomaa, H. (1990). Factor analysis of dichotomous data. Helsinki: Statistical Society. Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90. Schuessler, K. F. (1982). Measuring social life feelings. San Francisco: Jossey-Bass. Tate, M. W., & Hyer, L. A. (1973). Inaccuracy of the chi-squared test of goodness of fit when expected frequencies are small. Journal of the American Statistical Association, 68, 836–841. Copyright © The British Psychological Society Reproduction in any form (including the internet) is prohibited without prior permission from the Society 360 Mark Reiser Tjur, T. (1982). A connection between Rasch’s item analysis model and a multiplicative Poisson model. Scandinavian Journal of Statistics, 9, 23–30. Tollenaar, N., & Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap goodness-of-fit test: Full and limited information. British Journal of Mathematical and Statistical Psychology, 56, 271–288. Received 11 March 2006; revised version received 4 March 2007 Appendix The equivalence of X 2T 2g21 and X 2PF can be established as shown in Theorems 7.1.1 and 7.1.2 and Section 7.2 in Rayner and Best (1989). Assume that H is a T 2 g 2 1 by T matrix with rank T 2 g 2 1. It is not necessary to assume that H represents transformations to obtain marginal proportions, that is, the proof is valid for any transformation matrix that has rank T 2 g 2 1. From Section 3, ^ 21 Hr: X 2T 2g21 ¼ nr0 H0 V e Using matrix F as defined in Section 4, ^ 21 ðF^ 0 Þ21 F^ 0 Hr ¼ nr0 H0 Fð ^ e FÞ ^ Þ0 H ^ F^ 0 V ^ 21 F^ 0 Hr ¼ nr0 ðH ^ r: X 2T 2g21 ¼ nr0 H0 F^ F^ 21 V e ð19Þ Define 1 . . ^ 21 pðuÞÞ ^ .. DðpðuÞÞ ^ 21 G ^ 0 ðA^ 0 AÞ ^ 2 Þ: ^ † Þ0 ¼ ððHÞ ^ 0 .. DðpðuÞÞ ðH ^ 21 pðuÞÞ ^ ¼ 1 and GDðpð ^ 21 pðuÞÞ ^ ¼ 0, ^ 0 DðpðuÞÞ ^ uÞÞ Since pðuÞÞ † 0 † 21 † 0 † ^ H ^ ^ Þ DðpðuÞÞ ^ ¼ I T and DðpðuÞÞ ^ ÞH ^ : Then, ðH ¼ ðH it follows that ^ 21 2 DðpðuÞÞ ^ 21 pðuÞÞpð ^ ^ 0 DðpðuÞÞ ^ 21 ^ Þ0 H ^ ¼ DðpðuÞÞ ðH uÞÞ ^ 21 G ^ 21 G ^ 21 : ^ 0 ðGDðpð ^ 0 ÞGDðpð ^ ^ 2 DðpðuÞÞ uÞÞ uÞÞ Substituting into equation (19), ^ 21 r; X 2T 2g21 ¼ nr0 DðpðuÞÞ ^ 0 DðpðuÞÞ ^ 21 r ¼ 0 and GDðpð ^ 21 r ¼ 0 because it is the score vector equation ^ as pðuÞ uÞÞ that the maximum-likelihood estimator satisfies. Under a simple null hypothesis, a similar proof with g ¼ 0 can be used to show that X 2T 21 ¼ X 2½1:q ¼ X 2P ; the Pearson statistic (see also Maydeu-Olivares and Joe 2005, for a proof under a simple null hypothesis).
© Copyright 2026 Paperzz