Goodnessoffit testing using components based on marginal

Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
331
British Journal of Mathematical and Statistical Psychology (2008), 61, 331–360
q 2008 The British Psychological Society
The
British
Psychological
Society
www.bpsjournals.co.uk
Goodness-of-fit testing using components based
on marginal frequencies of multinomial data
Mark Reiser*
Arizona State University, USA
The goodness-of-fit test based on Pearson’s chi-squared statistic is sometimes
considered to be an omnibus test that gives little guidance to the source of poor fit
when the null hypothesis is rejected. It has also been recognized that the omnibus test
can often be outperformed by focused or directional tests of lower order. In this paper,
a test is considered for a model on a data table formed by the cross-classification of
q dichotomous variables, and a score statistic on overlapping cells that correspond to
the first- through qth-order marginal frequencies is presented. Then orthogonal
components of the Pearson–Fisher statistic are defined on marginal frequencies. The
orthogonal components may be used to form test statistics, and a log-linear version of
an item response model is used to investigate the order and dilution of a test based on
these components, as well as the projection of components onto the space of lowerorder marginals. The advantage of the components in terms of power and detection of
the source of poor fit is demonstrated. Overcoming the adverse effects of sparseness
provides another motive for using components based on marginal frequencies because
an asymptotic chi-squared distribution will be more reliable for a statistic formed on
overlapping cells if expected frequencies in the joint distribution are small.
1. Introduction
The goodness-of-fit test based on Pearson’s chi-squared statistic is sometimes considered
to be an omnibus test that gives little guidance to the source of poor fit when the null
hypothesis is rejected. It has also been recognized that the omnibus test can often be
outperformed by focused or directional tests of lower order. Goodness-of-fit tests are
frequently employed in psychological research when logistic regressions and log-linear
models are used for applications such as tests of independence, item response models,
latent class models and discrete-state transition models. Psychologists sometimes
employ transition models to study child development. Applications of such models
often involve large cross-classified frequency tables. Fergusson, Horwood, and Lynskey
(1995), for example, used a discrete-state transition model to study disruptive childhood
* Correspondence should be addressed to Mark Reiser, School of Social and Family Dynamics, Arizona State University, Box
873701, Tempe, AZ 85287-3701, USA (e-mail: [email protected]).
DOI:10.1348/000711007X204215
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
332 Mark Reiser
behaviours. Behaviour was classified into one of the two categories at five time points,
producing a cross-classified table with 32 cells. When the table has so many cells, there
is a disadvantage for traditional goodness-of-fit tests because power may be reduced.
If the model fits poorly in such an application, lack of fit is often due to misspecification
of the two-way associations among behaviours measured across the time points. If so,
traditional goodness-of-fit tests, which incorporate degrees of freedom from higher-way
associations that are not included in the model, are less sensitive than a lower order or
focused test to the misspecification of the two-way associations. The loss of sensitivity
is sometimes known as dilution (Kendall & Stuart, 1973, Chapter 30).
In any area of study, the size of the table grows exponentially with the number of
variables. From a study of social life feelings, Schuessler (1982) produced a selfdetermination scale consisting of 14 items. An item response model for 14 dichotomous
items would require a table with 16,384 cells. Unless the sample of observations is very
large, a table of this size will have sparse frequencies. Another disadvantage for
traditional goodness-of-fit tests is that sparseness may adversely affect the validity of
using the chi-squared distribution as an approximation for the distribution of statistics
such as the Pearson and likelihood ratio, even if the sample size could be considered
large (Koehler & Larntz, 1980; Koehler, 1986). Working with tables that are so extensive
highlights the need for overcoming the effects of sparseness, obtaining higher power
with a focused test, and finding more guidance on the source of poor fit.
In this paper, methods based on orthogonal components and marginal frequencies
are considered for testing a model fit to the T ¼ 2q frequencies from the crossclassification of q dichotomous variables. The proposed methods are in an area that
has recently become a topic of significant practical interest in psychology because there
is growing recognition that traditional goodness-of-fit tests based on the likelihood
ratio or Pearson statistics have considerable shortcomings, as discussed above, for an
application with a large number of variables. Marginal frequencies can be obtained by
summation across joint frequencies. In Section 2, the transformation from joint
proportions to marginal proportions is presented in terms of linear combinations. In
Section 3, a score statistic is defined on a set, or a subset, of the marginal frequencies
from the first to the qth order. In Section 4, orthogonal components defined on marginal
frequencies are obtained for the Pearson–Fisher statistic, and the score statistic from
Section 3 is shown to be a sum of these orthogonal components.
Given the shortcomings of traditional tests, there are two primary motivations for
considering tests based on orthogonal components defined on marginal frequencies.
First, a test based on a subset of marginals is essentially a lower-order cell-focusing test
(Rayner & Best, 1989, Chapter 7) that should have higher power than a traditional test to
detect a departure from certain null hypotheses in specific directions for any finite
sample size. Second, marginal frequencies are overlapping cells, and since these
overlapping cells have larger frequencies than the cells from the joint distribution of the
full cross-classification, using them to form a test statistic should improve the reliability
of the asymptotic chi-squared distribution when expected frequencies in the joint
distribution are small due to sparseness (Hall, 1985).
While most of the previous work on using marginal frequencies for goodness-of-fit
testing has been carried out in the context of model selection when data are sparse, the
emphasis of this paper is directed more towards the power of a focused test based on
orthogonal components against various false null hypotheses when data are not necessarily
sparse. An approach to performing power calculations for the test of a composite null
hypothesis using components defined on marginal frequencies is presented in Section 5.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
333
The log-linear version of the Rasch latent trait model is reviewed in Section 6, and some
advantages of using orthogonal components in a focused test for model selection are
demonstrated in Section 7, where the log-linear Rasch model is used to investigate power,
the effective order of alternative hypotheses, dilution of the test statistics, as well as the
projection onto the space of lower-order marginals. The advantage of examining
components for the purpose of detecting the source of poor fit is demonstrated in Section 8,
which includes an example of fitting a model to aptitude item responses.
Results on power comparisons to be presented in Section 7 demonstrate that in
addition to overcoming the adverse effects of sparseness, another primary motivation
for using a test based on components for lower-order marginals is the attainment of
higher power than traditional tests in some circumstances even when data are not
sparse. Thus, the results presented below will establish the case that tests based on
orthogonal components associated with marginal proportions should be regularly
examined for certain types of applications in psychology and social sciences in general.
Some results also indicate that a test based on lower-order marginals may be insensitive
to a departure from the null hypothesis in the higher-order marginals. To guard against
this possibility in practice, a recommendation is given to examine the residual
chi-squared statistic, similar to the recommendation given by Durbin and Knott (1972)
in their study of components of the Cramér–von Mises statistic.
Christofferson (1975) originally introduced the idea of using first- and second-order
marginals for a test of fit in dichotomous variable factor analysis. Muthén (1978)
improved the statistic given by Christofferson, although neither of these authors
presented their test as attaining higher power or as a remedy for sparseness. The
approach presented here is a direct extension of results from Reiser (1996) on using
first- and second-order marginals to test the fit of item response models and Reiser and
Lin (1999) on testing the fit of latent class models. A goodness-of-fit statistic focused on
lower-order marginals can also be viewed as a special case of the score statistic given by
Rayner and Best (1989). The idea of decomposing Pearson’s chi-squared statistic into
components has a long history which is discussed in Lancaster (1969).
As mentioned earlier, overcoming effects of sparseness is one of the primary motives
for using tests based on orthogonal components associated with marginal frequencies.
There is a large literature on the topic of sparse frequencies and their effect on
traditional goodness-of-fit tests for multinomial models. Frequency tables are said to be
sparse when the ratio of the sample size to the number of cells is relatively small (Agresti
& Yang, 1987). Cochran (1954), Fisher (1941), Cramér (1946), Kendall (1952) and Tate
and Hyer (1973) have given recommendations for minimum expected frequency.
Lancaster (1969, p. 175) gives the following summary.
Some general conclusions can be made – it is probably desirable not to have any expectation
less than unity; with several degrees of freedom, for class frequencies of 5 or more, the
distributions of the Pearson x2 approximate satisfactorily to the asymptotic or theoretical x2
distributions. If there are a number of classes, perhaps a third or quarter of them can have
expectations in the interval, 1 to 5, without causing serious departures of the distribution of the
Pearson x2 from the theoretical : : : the parameter of non-centrality of the x2 test may be greatly
diminished if too much ‘pooling’ is carried out : : :
More recently, Haberman (1988) warns that given any minimum expected cell size
under the null hypothesis and given any significance level, it is possible to make the
power of X2 arbitrarily close to zero by the selection of a large enough number of cells
and suitable cell probabilities for the null and alternative hypotheses.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
334 Mark Reiser
If sparseness is present in a situation when a goodness-of-fit test is desired on a
multinomial model, tests based on components formed from lower-order marginals can
overcome the adverse effects of sparseness, as shown in Reiser and Lin (1999), because
the overlapping cells have larger frequencies than the cells of the full crossclassification. As mentioned previously, the overlapping cells result in a statistic for
which the asymptotic chi-squared distribution is more reliable. Overcoming the effects
of sparseness to obtain valid tests was the original motivation for using statistics based
on marginal frequencies. Of course, sparseness becomes more of a concern when
sample size is smaller, which is also a condition under which utilizing a test that
possesses higher power becomes more vital. Hence, the two motivations discussed here
for using tests based on components formed from lower-order marginals are both
frequently pertinent to the choice of a test statistic when fitting models to data.
Other work in the area of tests based on lower-order marginals as a remedy for
sparseness includes Knott and Tzamourani (1997), who advise that a chi-squared or
likelihood ratio statistic may not be useful in a test of fit when a latent trait model is
applied to a large number of possible response patterns. They suggest that it would be
informative to instead compare observed and fitted values for first-, second- and thirdorder marginal frequencies when assessing model fit. Bartholomew and Tzamourani
(1999) also examine sparseness issues for goodness-of-fit testing with latent trait models.
Bartholomew and Leung (2002) developed an alternative statistic on second-order
marginals in the context of testing a two-parameter latent trait model, and MaydeuOlivares (2001) examined limited-information testing of Thurstonian models for paired
comparisons. Tollenaar and Mooijaart (2003) investigated the performance of a modified
version of the statistic presented in Reiser (1996) under non-standard conditions.
Maydeu-Olivares and Joe (2005) proposed a set or hierarchy of statistics based on
marginals with an application to the item response model. Their approach is closely
related to the technique proposed here, but their hierarchy does not correspond to the
decomposition of the Pearson–Fisher statistic as given in Section 4.
Although overcoming the effects of sparseness is a fundamental reason for using
orthogonal components based on marginals, those adverse effects are not investigated
in this paper. An important area for future research is the performance of components of
different orders when data have various degrees of sparseness. As the order of marginal
proportions becomes higher, the associated orthogonal components of X 2PF are
estimated less precisely because the higher-dimensional cross-classification produces
frequencies that become more sparse. A significant question to be addressed in
additional studies that would employ Monte Carlo simulations is what orders of
components can be adequately evaluated using an approximate chi-squared distribution
under various conditions of sparse frequencies.
2. Marginal proportions
This section includes a presentation of transformations from joint proportions or
frequencies to marginal proportions as a prelude to testing a model based on the fit to
marginal frequencies.
2.1. First- and second-order marginals
The relationship between joint proportions and first- and second-order marginals can be
shown using zeros and ones to code the levels of dichotomous response variables. Then,
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
335
a q-dimensional vector of zeros and ones, sometimes called a response pattern, will
indicate a specific cell from the contingency table formed by the cross-classification of q
response variables. A T-dimensional set of response patterns can be generated by varying
the levels of the qth variable most rapidly, the (q 2 1)th variable next, etc. Define V as
the T by q matrix with response patterns as rows. For q ¼ 3;
3
2
0 0 0
7
6
60 0 17
7
6
7
6
60 1 07
7
6
60 1 17
7
6
7
V¼6
6 1 0 0 7:
7
6
7
6
61 0 17
7
6
7
6
61 1 07
5
4
1 1 1
Let vis represent element i of response pattern s, and let Y be a vector of dichotomous
variables. Also, define u as a parameter vector for a model of interest and ps(u) as the
expected proportion for cell s as a function of the parameter vector u. Then, under the
model, the first-order marginal proportion for variable Yi can be defined as
X
P i ðuÞ ¼ ProbðY i ¼ 1juÞ ¼
vis ps ðuÞ;
s
and the true first-order marginal proportion is given by
X
P i ¼ ProbðY i ¼ 1Þ ¼
vis ps :
s
The summation across the frequencies associated with the response patterns to obtain
the marginal proportions represents a linear transformation of the frequencies in the
multinomial vector p which can be implemented via multiplication by a certain matrix,
denoted here generically by the symbol H. The symbol H[t ] denotes the transformation
matrix that would produce marginals of order t. The symbol H[t:u ], t # u # q; denotes
the transformation matrix that would produce marginals from order t up to and
including order u. Furthermore, H ½t ; H ½t:t ; and H ; H ½t:u : There will be occasions to
delete certain rows from the matrix H[t:u ] due to collinearity, and the symbol H ½t:u;2d
denotes the matrix H[t:u ] with d rows deleted.
Matrix H[1] can be defined from matrix V such that
H ½1 ¼ V 0 :
Under the model, the second-order marginal proportion for variables Yi and Yj can be
defined as
X
P ij ðuÞ ¼ ProbðY i ¼ 1; Y j ¼ 1juÞ ¼
vis vjs ps ðuÞ;
s
and the true second-order marginal proportion is given by
X
vis vjs ps :
P ij ¼ ProbðY i ¼ 1; Y j ¼ 1Þ ¼
s
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
336 Mark Reiser
For second-order
P marginals, where j ¼ 1; 2; : : : ; k; i ¼ j þ 1; : : : ; q; s ¼ 1; 2; : : : ; T ;
and l ¼ i 2 j þ 0,r,j ðq 2 rÞ; element ls of H [2] is given by
(
½H ½2 ls ¼
1;
if vis ¼ vjs ¼ 1;
0;
otherwise:
Alternatively, matrix H[2] can be defined by forming Hadamard products (Magnus &
Neudecker, 1999, Section 3.6) among the columns of the matrix V:
2
H ½2
6
6
6
6
6
6
6
6
6
6
6
6
¼6
6
6
6
6
6
6
6
6
6
6
4
ðv 1 +v 2 Þ0
3
7
7
7
7
7
7
7
0 7
ðv 1 +v q Þ 7
7
7
ðv 2 +v 3 Þ0 7
7;
7
..
7
.
7
7
0 7
ðv 2 +v q Þ 7
7
7
..
7
.
7
5
ðv q21 +v q Þ0
ðv 1 +v 3 Þ0
..
.
where vf represents column f of matrix V and v f +v g represents the Hadamard product of
columns f and g.
2.2. Higher-order marginals
The third-order marginal proportions for variables Yi, Yj and Yk can be obtained by
employing the matrix H[3], which can also be defined as Hadamard products among the
columns of V,
2
H ½3
6
6
6
6
6
6
6
6
6
6
6
6
¼6
6
6
6
6
6
6
6
6
6
6
4
ðv 1 +v 2 +v 3 Þ0
3
7
7
7
7
7
7
7
7
ðv 1 +v 2 +v q Þ0 7
7
ðv 2 +v 3 +v 4 Þ0 7
7
7;
7
..
7
.
7
7
0
ðv 2 +v 3 +v q Þ 7
7
7
7
..
7
.
7
5
0
ðv q22 +v q21 +v q Þ
ðv 1 +v 2 +v 4 Þ0
..
.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
337
and then, for example,
2
H ½1:3
H ½1
3
7
6
6 ··· 7
7
6
7
6
H ½2 7:
¼6
7
6
7
6
6 ··· 7
5
4
H ½3
For q ¼ 3;
2
H ½1:3
0 0 0
6
60
6
6
60
6
6
6
6
6
¼ 60
6
60
6
6
60
6
6
6
4
0
0
0 1
1
1 0
1
···
0 0
0
0 0
0
0 0
1
···
0 0
0
1 1 1 1
3
7
0 0 1 17
7
7
0 1 0 17
7
7
7
7
0 0 1 17
7:
7
0 1 0 17
7
7
0 0 0 17
7
7
7
5
0 0 0 1
ð1Þ
A general matrix H[t:u ] to obtain marginals of any order can be defined in a similar
fashion using Hadamard products among the columns of V. H[1:q ] gives a mapping from
joint proportions to the entire set of 2q 2 1 marginal proportions:
P ¼ H ½1:q p;
ð2Þ
where
P ¼ ðP 1 P 2 P 3 : : : P q P 12 P 13 : : : P q21;q P 112 : : : P q22;q21;q : : : P 123 : : : q Þ0
is the vector of marginal proportions
2.3. Residuals
^ where p^ s ¼ ns =n is element s of p,
^
Define the unstandardized residual r s ¼ p^ s 2 ps ðuÞ;
the vector of multinomial proportions;
n
¼
element
s
of
n,
the
vector
of
observed
P s
^ ¼
frequencies; n ¼ total sample size ¼ Ts¼1 ns ; u^ ¼ parameter estimator vector; ps ðuÞ
estimated expected proportion for cell s and denote the vector of unstandardized
residuals as r with element rs.
A vector of simple residuals for marginals of any order may be defined such that
^ ¼ Hr;
e ¼ Hðp^ 2 pðuÞÞ
and a vector, j, of differences between the marginals specified by the relevant model
and the true population marginals may be defined for marginals of any order such that
j ¼ Hðp 2 pðuÞÞ:
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
338 Mark Reiser
3. A test of fit using marginal distributions
3.1. The logic of employing marginal frequencies
Traditional full-information statistics assess the fit of a multinomial model by testing the
null hypothesis H0: p ¼ pðuÞ; where p(u) is a vector of multinomial probabilities as a
function of u. The null hypothesis that a vector of differences, j, as defined in the
previous section, is equal to zero can be written in the equivalent form H0:
Hp ¼ HpðuÞ. Logically, testing H0: p ¼ pðuÞ using lower-order marginals is an
application of ‘denying the consequent’. The hypothesis p ¼ pðuÞ implies Hp ¼ HpðuÞ
must hold. The converse, of course, is not true. But if Hp ¼ HpðuÞ is not true, then
p ¼ pðuÞ cannot be true. In logic, this type of argument is known as modus tollens.
Thus, for u , q; the relationship between these null hypotheses works as follows:
‘reject H0: Hp ¼ HpðuÞ’ is a sufficient but not necessary condition for ‘reject H0:
p ¼ pðuÞ’; ‘do not reject H0: Hp ¼ HpðuÞ’ is a necessary but not sufficient condition
for ‘do not reject H0: p ¼ pðuÞ’. ‘Do not reject H0: Hp ¼ HpðuÞ’ is not a sufficient
condition because it is possible that lack of fit may be manifested only in marginals of
higher order than u, where u is defined as in H[t:u ]. Power studies presented below
address this possibility. The omnibus null hypothesis, H0: p ¼ pðuÞ can be stated in
terms of orthogonal components, as demonstrated in Section 4.
3.2. The test statistic
Let e represent the covariance matrix of the residuals, e. Using the matrix H[t:u ] as
given previously, an extended version of the statistic from Reiser (1996) and Reiser and
Lin (1999) can be defined as follows. The statistic
^ 21 e
X 2½t:u ¼ e0 e
ð3Þ
may be used to test H0: j ½t:u ¼ 0; where
^ e ¼ n 21 V e ;
ð4Þ
with
V e ¼ HðDðpÞ 2 pp0 2 GðA 0 AÞ21 G0 ÞH0 ;
^ where D(p) is the diagonal
^ and u,
evaluated at the maximum-likelihood estimates p
matrix with (s, s)th element eqaul to ps(u),
A ¼ DðpÞ21=2
›pðuÞ
;
›u
and
G¼
›pðuÞ
›u
V r ¼ ðDðpÞ 2 pp0 2 GðA0 AÞ21 G0 Þ is the asymptotic covariance matrix of r (see
^ which may be the maximumHaberman, 1973). Matrices are evaluated with u ¼ u;
likelihood estimator. If the model satisfies the regularity conditions given by Birch
(1964), and if the columns of H0 are linearly independent and are also linearly
^ e will be non-singular, and a conventional
independent of the columns of G, then inverse can be used to calculate X 2½t:u : If there is a violation of certain regularity
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
339
conditions, including the presence of linear dependencies, then a generalized inverse
would be needed to calculate X 2½t:u :
The limiting distribution of X 2½t:u as n ! 1 can be shown to be the x2 distribution
P
L
^ e!
because e is a linear combination of the elements of r, e ; and e!
MVNðj; e Þ; see
Moore (1977) for the principles employed in constructing chi-squared tests. The
regularity conditions for the asymptotic chi-squared distribution are given by Birch
(1964) and are discussed by Bishop, Fienberg, and Holland (1975). The regularity
conditions include the assumptions that w, the true value of u, is not on the boundary of
the parameter space Q; ps ðfÞ . 0 for s ¼ 1,: : :,T, so p does not lie on the boundary of
the relevant parameter space; p(u) is totally differentiable at w; and the Jacobian matrix
›pðuÞ=›u is of full rank, so p(u) maps a small neighbourhood of w into a small
neighbourhood of p(w). Furthermore, the assumption n ! 1 is accompanied by an
implicit assumption that the number of cells T remains fixed, so that expected cell
frequencies are assured of becoming large with n (see Read & Cressie, 1988, Section
4.3). Since the statistics proposed here use overlapping cells in the form of lower-order
marginals, the last assumption is more likely to be met, as compared with traditional
statistics, when T is large relative to n. If a generalized inverse is needed to calculate
X 2½t:u , then additional regularity conditions must be satisfied in order to establish
convergence of X 2½t:u to a chi-squared variate. These additional regularity conditions are
given in Section 8.5 of Magnus and Neudecker (1999).
The degrees of freedom are determined by the rank of Ve. The focus of this paper is on
X 2½2 ; X 2½2:3 ; and X 2½2:4 ; which can be calculated using H[2], H[2:3] and H[2:4] as defined in the
previous section. In general, X 2½2 will have degrees of freedom up to minð2q 2
g; 0:5qðq 2 1ÞÞ; and X 2½2:3 will have degrees of freedom up to min
ð2q 2 g; 1=6qðq 2 1Þðq þ 1ÞÞ, where g is the number of estimated parameters. As
demonstrated in Section 6, some model parameterizations may reduce the rank of Ve and
hence the degrees of freedom for X 2½2 and X 2½t:u in general. Any linear dependency among
the rows of H and the columns of G will produce a marginal or sum of marginals that is
perfectly fitted when calculating X 2½t:u and will lead to the loss of a degree of freedom. A
second-order marginal that is perfectly fitted under the model will reduce the degrees of
freedom for X 2½t:u by 1 if t # 2 # u; and a third-order marginal that is perfectly fitted under
the model will reduce the degrees of freedom for X 2½t:u by 1 if t # 3 # u; etc.
X 2½2 and X 2½2:3 ; for q . 2 and q . 3 respectively, represent full-information
estimation followed by limited-information test of fit. Reiser (1996) introduced X 2½1:2
with an application to the logit latent trait model, and Reiser and Lin (1999) developed
X 2½2 for applications to the latent class model. In these applications, X 2½1:2 and X 2½2 were
shown to perform well as goodness-of-fit test statistics even when sparseness is present
in the joint frequencies because the frequencies in the overlapping cells formed by the
lower-order marginals are usually not sparse.
3.3. Relationship to other statistics
In the preceding section, H was presented as a matrix of constants. If, instead, a random
variable H l is considered that takes on value hls with probability ps(ua), where ps(ua)
represents a probability under a Neyman smooth alternative hypothesis, then the test
statistic given above would be written as
2
^ 21 Hr;
^ 0
^
X^ ½t:u ¼ r0 H
e
ð5Þ
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
340 Mark Reiser
^ Written in this form, X^ 2 can be seen as a special case of the score
^ ¼ HðuÞ:
where H
½t:u
statistic given in Theorem 7.1.1 of Rayner and Best (1989). Rayner and Best present
H as a general matrix, not necessarily specifically for producing marginal probabilities.
When H produces marginal probabilities, as defined previously in expression (2), the
2
use of X^ ½t:u becomes both an overlapping cells test and a focused test. One benefit of an
overlapping cells test, first proposed by Hall (1985), is improved reliability of the
2
asymptotic chi-squared distribution when expected cell frequencies are small. In X^ ½t:u ,
the marginals constitute the overlapping cells. A benefit of a focused test is increased
2
power to detect a departure from the null hypothesis in a specific direction. X^ ½t:u is
focused when it is used for a test on lower-order marginals instead of the full set of
marginals from the first through qth order. Studies are reported below for the purpose of
examining the power of tests based on marginal frequencies without regard to the
sparseness issue.
For a cross-classified table, the Pearson chi-squared statistic is obtained by comparing
observed probabilities with the probabilities specified under the null hypothesis. Since
no parameters are estimated, the degrees of freedom are known to be equal to the
number of cells minus 1, which in this case is T 2 1. The Pearson–Fisher statistic is
obtained in a similar fashion, except the probabilities under the null hypothesis depend
on unknown parameters, and so observed probabilities are compared with estimated
expected probabilities that are calculated using parameters estimated under the model
of interest:
^ 21 r:
X 2PF ¼ nr0 DðpðuÞÞ
If g parameters are estimated, then the degrees of freedom are T 2 g 2 1: Bartholomew
(1987, p. 94) demonstrates that the joint probability function of Y, a q-dimensional
vector of dichotomous variables, can be uniquely expressed in terms of the 2q 2 1
marginal probabilities from first to qth order (see expression (2)). This implies that the
Pearson and Pearson–Fisher statistics can be calculated by comparing observed with
expected marginal frequencies, using marginals up to order q, and hence
X 2½1:q ¼ X 2PF :
See the Appendix for a related proof.
Although X 2½1:q will reproduce the Pearson–Fisher statistic, in fact, fewer than 2q 2 1
marginals are required to achieve this purpose for a composite null hypothesis. For a
composite null, some residuals on the marginals are degenerate variables identically
equal to zero due to linear dependencies among the rows of H[1:q ] and columns of G.
Only 2q 2 g 2 1 rows of H are linearly independent of the columns of G. In other words,
the equivalence X 2½1:q ¼ X 2PF will hold if certain rows, g in number, are deleted from
H[1:q ]. Which rows can or should be deleted will depend on the particular model of
interest. With commonly used hierarchical log-linear models, for example, a first-order
effect is included in the model for each observed variable. The first-order effect
produces first-order marginals that are fitted exactly, which implies that X 2½1:q ¼ X 2½2:q .
Thus, the rows corresponding to H[1] can be deleted from H[1:q ] for testing such a
model. Depending on the other effects included in the model, other rows may also be
deleted. Deleting such rows can improve accuracy of numerical calculations.
Recall that H[t:u ],2d denotes the matrix H[t:u ] with d rows deleted. Let X 2T 2g21 denote
^ e are calculated using H ¼ H ½1:q;2g and g
a full-information statistic, where e and Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
341
represents the number of estimated model parameters. Then,
^ 21 e
X 2T 2g21 ¼ e0 e
is a special case of expression (3), and
X 2T 2g21 ¼ X 2PF :
The proof of this equivalence follows from results in Theorems 7.1.1 and 7.1.2 and
Section 7.2 of Rayner and Best (1989), as shown in the Appendix.
Maydeu-Olivares and Joe (2005) developed a statistic, Mr , that is closely related to
^ 21 in the
X 2½1:r . The two statistics are not equivalent, however, because instead of e
quadratic form, Mr uses
^ 0 Þ21 2 ðHGH
^ 0 Þ21 HGð
^ 0 Þ21 HGÞ
^ 0 Þ21 ;
^ 0 H0 ðHGH
^ G
^ 0 H0 ðHGH
^ 21 G
C^ r ¼ ðHGH
^ ¼ DðpðuÞÞ
^ 2 pðuÞpð
^
^ 0 , and H is always equal to H[1:r ] when applied to the
where G
uÞ
^
definition of Mr . e is a generalized inverse of C^ r : Since C^ r appears in the quadratic
form, the degrees of freedom for Mr do not match degrees of freedom for X 2½1:r ; when
r , q; and the two statistics are not equivalent under that condition. Also, X 2½t:u is more
general in two ways. First, the possibility that the statistic will not include some
marginals of an entire lower order is allowed because the power of a test may be
reduced by including them. Second, Maydeu-Olivares and Joe adopt a condition
whereby their statistic does not apply to certain circumstances in which X 2½t:u would
apply. This condition is r $ r0 ; where r0 is the smallest integer r such that the model is
(locally) identified from the joint moments up to order r. Consider, for example,
estimating and testing a log-linear model with a non-hierarchical three-way interaction.
The third-order marginal tables contain the sufficient statistics for this model (Agresti,
2002), and these tables identify the model. Thus, r0 ¼ 3: However, if the fitted model
happens to specify the wrong three-way interaction, the model might be usefully
assessed using a test on the second-order marginals, especially for a sparse data table.
X 2½1:2 or X[2] could be used for this test, but M2 would not be applicable. Another
limited-information statistic has also been studied by Maydeu-Olivares (2001).
The original limited-information statistic with features of cell collapsing and cell
focusing was given by Christoffersson (1975). In our notation, his statistic would be
written as
^ 2 p^ p^ 0 ÞH ½1:2 r:
r0 H0 ½1:2 ðDðpÞ
This statistic is similar to X 2½1:2 but parameter estimation is not incorporated into the
^ Not
covariance matrix, including the use of observed proportions instead of pðuÞ:
2
surprisingly, simulations show that this statistic performs less well than X ½1:2 when
sample sizes are smaller. Christoffersson’s statistic could be easily extended to include
higher-order marginals. However, even if marginals from first to qth order were
included, this statistic would not be equivalent to the Pearson–Fisher statistic. Muthén
(1978) developed a modified version of Christoffersson’s statistic.
A statistic on only second-order marginals was presented by Bartholomew and Leung
(2002). Their statistic uses a diagonal covariance matrix for e, with a less efficient
estimator for variance than the one presented here. As a score statistic, X 2½t:u possesses
certain optimality properties not enjoyed by these other statistics.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
342 Mark Reiser
4. Orthogonal components
4.1. Overview
In this section, results from Lancaster (1969, Sections 8.4 and 9.12) are used to
decompose X 2T 2g21 ; and hence the Pearson–Fisher statistic, into orthogonal
components associated with marginals from order 1 to q. Explicit rules for obtaining
independent components are also given in Agresti (2002, p. 84). The individual
components themselves are useful as a guide to how the model of interest may not fit
well to the observations. In addition, different versions of X 2½t:u from the previous
section can be shown to be sums of these orthogonal components.
Define the standardized cell residual (Cochran, 1954) as
^ 21=2 ðp^ s 2 ps ðuÞÞ:
^
zs ¼ ðps ðuÞÞ
P
Of course, n s z2s ¼ X 2PF . Since the columns of H0 form a basis for the 2q cells of the
multinomial on which X 2PF is calculated, X 2PF can be decomposed into orthogonal
components associated with marginal frequencies by obtaining the sequential sum of
squares from a weighted regression of z on the columns of H0 . Here, H can be either
H[1:q ] or H[1:q ],2g. If the decomposition is based on H[1:q ], then g of the components will
be degenerate random variables identically equal to zero. If the decomposition is based
on H[1:q ],2g, then none of the components will be identically equal to zero.
The sequential sum of squares mentioned earlier can be represented as the elements
^ to be completely defined below. Let g^ 2j , an element of g^ 0 g,
^
of an inner product, g^ 0 g,
denote an orthogonal component of X 2PF . For q ¼ 3; the maximum number of
orthogonal components is seven, and they could be partitioned as follows:
..
..
2
2
2
2
2
2
2 0:
g^ 1 g^ 2 g^ 3 . g^ 4 g^ 5 g^ 6 . g^ 7
g^ 21 ; g^ 22 and g^ 23 are components produced from the first three rows of matrix H[1:3] shown
earlier in expression (1) and are associated with first-order marginals. g^ 24 ; g^ 25 and g^ 26 are
components produced from rows 4, 5 and 6 of H[1:3] and are associated with secondorder marginals. Finally, g^ 27 is a component produced from row 7 of H[1:3] and is
associated with the single third-order marginal.
In an application to a log-linear model, for example, each component is associated
with a specific model effect, and some components may be identically equal to zero. To
test a model of independence on frequency counts, a log-linear model with only firstorder effects might be fitted. In the case of the independence model for q ¼ 3; three of
the components would be identically equal to zero because the first-order marginals for
variables 1, 2 and 3 are fitted exactly. Thus, under this model and with q ¼ 3;
g^ 21 ¼ g^ 22 ¼ g^ 23 ¼ 0. Each component that is identically equal to zero corresponds to an
estimated parameter. The other components correspond to association effects not
included in the model. g^ 24 would give an indication of whether there is a lack of fit for the
two-way association between variables 1 and 2, g^ 25 would give a similar indication for
variables 1 and 3, and g^ 26 would refer to the association between variables 2 and 3. g^ 27
would give an indication of lack of fit in the three-way association among variables 1, 2
and 3. Used in this way, an orthogonal component is similar to a Wald statistic for the
absence of a certain interaction in a log-linear model, but it is not equivalent to the Wald
statistic because the components are order-dependent. That is, the values taken on by
these orthogonal components depend on the order of the columns in H0 . In other loglinear models, components might indicate lack of fit due to constraints on parameters.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
343
For multinomial models that are not log-linear, components would not necessarily be
associated with specific model effects.
In general, and assuming that the rows of H are ordered in the same manner as
shown in expression (1), and if b1 represents the number of columns in H0 that produce
first-order marginals, then
g^ 21 þ g^ 22 þ · · · þ g^ 2b1 ¼ X 2½1 ;
where X 2½1 is a test statistic on the first-order marginals as developed in the previous
section. If b2 is the number of columns from H0 that produce second-order marginals
then, for 0 , b1 # q and 0 , b2 # 12qðq 2 1Þ;
X 2½1:2 ¼
ðbX
1 þb2 Þ
g^ 2j :
j¼1
Sums of squares and orthogonal components can be defined in a similar fashion for
0
additional partitions of the H matrix. For example, if b3is the
number of columns in H
q
that produce third-order marginals, then for 0 , b3 # 3 and b2 . 0;
X 2½1:3 ¼
j¼b1X
þb2 þb3
g^ 2j :
j¼1
4.2. Weighted regression
To calculate the orthogonal components using a weighted regression, the appropriate
^ is the estimated covariance matrix of the standardized residuals:
weight matrix, W,
^ 1=2 Þ0 2 Að
^ 1=2 ððpðuÞÞ
^ 21 A^ 0 Þ;
^ A^ 0 AÞ
^ ¼ ð I 2 pðuÞ
W
ð6Þ
^ G
^ and pðuÞ ¼ pðuÞ.
^ Let the
^ ¼ G evaluated at u ¼ u,
where A^ 5 A evaluated at u ¼ u,
vector z, where
^ 21=2 ðp^ 2 pðuÞÞ;
^
z ¼ DðpðuÞÞ
ð7Þ
^ can be applied to z, but it
have elements zs, as defined earlier. The weight matrix W
produces no effect:
^ ¼ DðpðuÞÞ
^ 21=2 ðp^ 2 pðuÞÞ
^ ¼ z:
^ 21=2 ðp^ 2 pðuÞÞ
^
WDðpð
uÞÞ
^ 21=2 to adjust for the standardization of
Finally, H0 ½1:q can be premultiplied by DðpðuÞÞ
^
the residuals, and if the weight matrix is then applied, let the resulting matrix be M,
where
^ 21=2 H0 ½1:q :
^
^ ¼ WDðpð
M
uÞÞ
ð8Þ
With these adjustments, the orthogonal components, g^ 2j , of X 2T 2g21 can be obtained
from the sequential sum of squares resulting from an ordinary regression of z on the
^
columns of M.
0
Let hl represent row l of H, and let ½H0 l represent column l of matrix H0 . Then, using
results from linear models for the regression that produces the orthogonal components,
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
344 Mark Reiser
the sum of squares that constitute the first component is given by
^ 21=2 ½H0 1
^
SSð½H0 1 Þ ¼g^ 21 ¼ n 21 z 0 WDðpð
uÞÞ
^ 21=2 WDðpð
^ 21=2 Wz;
^ 21=2 ½H0 2 h0 DðpðuÞÞ
^
^
£ h01 DðpðuÞÞ
uÞÞ
1
1
where B2 indicates the generalized inverse of the matrix B.
The orthogonal complement of [H0 ]2 to [H0 ]1 is given by
’
^ 0 1 2 h01 W½H
^ 0 2 ;
½H0 2 ¼ ½H0 2 2 ½H0 1 h01 W½H
and the sequential sum of squares that constitute the second orthogonal component is
given by
^ 21=2
^
SSð½H0 2 j½H0 1 Þ ¼g^ 22 ¼ n 21 z 0 WDðpð
uÞÞ
2
’
’
^ 21=2 ½H0 ’ h0 ’ DðpðuÞÞ
^ 21=2 WDðpð
^ 21=2 Wz:
^
^
uÞÞ
£ ½H0 2 h0 2 DðpðuÞÞ
2
2
Sequential sums of squares for additional orthogonal components may be obtained in a
similar manner.
4.3. Cholesky decomposition
Alternatively, the orthogonal components may be obtained using a Cholesky
decomposition. The matrix H[1:q ],2g should be used in the Cholesky decomposition
because conventional methods for calculating the Cholesky factor will fail if the
matrix of interest is not positive definite. Consider the T 2 g 2 1 by 2q matrix H*
that has full row rank, where H ¼ FH ½1:q;2g ; and where F is the upper triangular
matrix such that F0 V e F ¼ I: F ¼ ðC0 Þ21 ; where C is the Cholesky factor of Ve.
Premultiplication by (C0 )21 orthonormalizes the matrix H[1:q ],2g with respect to
DðpÞ 2 pp0 2 GðA 0 AÞ21 G0 : Then,
^ Þ0 H
^ r;
X 2T 2g21 ¼ nr0 ðH
ð9Þ
^ r;
g^ ¼ n 1=2 F^ 0 Hr ¼ n 1=2 H
ð10Þ
^ Define
^ ¼ H ðuÞ.
where H
P 2g21 2
^ Then, X 2
^ 0 g^ ¼ j¼T
where F^ is the matrix F evaluated at u ¼ u:
g^ j : Since
T 2g21 ¼ g
j¼1
0
^ r has asymptotic covariance matrix F V e F ¼ I T 2g21 ; the elements g^ 2j are
H
asymptotically independent x21 random variables.
The orthogonal components may be examined individually, or they may be pooled
into components due to first-, second-order marginals, etc. as described earlier. Define h
as a vector of orthogonal parameters on the cell probabilities, with h ¼
F0 Hðp 2 pðuÞÞ ¼ F0 j: The omnibus null hypothesis H0: p ¼ pðuÞ is equivalent to the
null hypothesis H 0 : h1 ¼ h2 ¼ h3 ¼ : : : ¼ hT 2g21 ¼ 0: A test using an individual
component of X 2T 2g21 is a test of an individual element of h. A test based on only the
first- and second-order marginals, for example, would apply to the null hypothesis
H 0 : h1 ¼ h2 ¼ h3 ¼ : : : ¼ hðb1 þb2 Þ ¼ 0, which could be considered to be a component
of the omnibus null hypothesis.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
345
Other limited-information statistics discussed in the previous section do not permit a
decomposition into components of the Pearson–Fisher statistic. The statistics of
Christoffersson (1975) and Bartholomew and Leung (2002) would not be equivalent to
the Pearson–Fisher statistic even if marginals from the first to the qth order were
incorporated. Although the statistic Mr from Maydeu-Olivares and Joe (2005) is
equivalent to X 2T 2g21 ¼ X 2½1:q;2g , when r ¼ q; the statistic Mr, unlike X 2½1:r;2d ; does not
form a component of the Pearson–Fisher statistic when r , q because the degrees of
freedom for Mr do not match degrees of freedom for components based on marginals up
to order r, as discussed earlier.
5. Power analysis
5.1. Non-centrality parameter
When a model and unknown parameters are estimated and fitted to a set of
cell frequencies, the test of fit is an assessment of a composite null hypothesis.
In this situation, numerical simulations are sometimes used to investigate power
to detect a false null hypothesis. Asymptotic power of the Pearson statistic for the
situation of a composite null hypothesis can be considered using a sequence of local
alternatives,
pffiffiffi
p n ¼ pðuÞ þ d= n;
ð11Þ
where pn is the vector of true probabilities indexed by the sample size n. In this
approach, the best fit of the model to the population gives pp
s(u)
ffiffiffi as the probability for
cell s, but the true probability differs from that value by d s = n: The model lack of fit
goes to zero at the rate n 21=2 as n approaches infinity. With this technique, Mitra (1958)
shows that X 2PF has a limiting non-central chi-squared distribution with non-centrality
parameter l, where
l ¼ d0 Diag½pðuÞ21 d;
and df ¼ T 2 g 2 1, where T ¼ 2q in the present case. Under the condition H ¼
H ½1:q;2g ; and using a strategy similar to that used in the Appendix, it can be shown that
l ¼ d0 Diag½pðuÞ21 d ¼ d0 H0 21
e Hd:
ð12Þ
Based on the right-hand side of this expression, it is possible to decompose the noncentrality parameter into orthogonal components associated with marginals of order 1
to q in a manner very close to the earlier decomposition of X 2PF :
Using a Cholesky decomposition of the non-centrality parameter, let
z ¼ ðF 0 ÞHd ¼ H d;
ð13Þ
where F and H * are as defined in Section 4. Then, l ¼ z0 z and orthogonal components
are z2j , where zj is an element of z. These components may be used to calculate the
power for tests based on marginals of differing order. For example, the non-centrality
parameter for X 2½1:2 is given by
1
2qðqþ1Þ
n
X
j¼1
z2j :
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
346 Mark Reiser
5.2. An approximation for l
Although expression (11) implies that for large samples the non-central chi-squared
approximation is valid when the model is barely incorrect, Agresti (2002) suggests that
it is often reasonable to adopt expression (11) for fixed, finite n in order to approximate
the distribution of X 2PF even though it might not be expected to hold as substantially
more data are obtained. For purposes of power calculations under fixed, finite n, as
presented below in Section 7, cell proportions were generated from a known model,
with parameter vector ua. These proportions were multiplied by a selected initial
sample size such as n0 ¼ 1,000. The model of the null hypothesis was then analysed
using maximum likelihood on the resulting cell frequencies without any added random
variability as input. Let ua be the vector that maximizes the function
X
Fðp; pðuÞÞ ¼ n ps logðps ðuÞÞ
s
when pðu a Þ is the vector of multinomial proportions. d * was then chosen such that
pffiffiffi
d ¼ nðp a 2 pðua ÞÞ;
where p a ¼ pðu a Þ corresponds to the known generated cell proportions. This method
uses d 0 Diag½pðua Þ21 d as an approximation to l as given in expression (12), assuming
that if ua is close to the value specified by the null hypothesis,
nðp n 2 pðun ÞÞ0 Diag½pðun Þ21 ðp n 2 pðun ÞÞ ¼ l þ oð1Þ;
where un is the vector maximizing Fðp n ; pðuÞÞ: A similar approximation was used by
Satorra and Saris (1985) for the likelihood ratio statistic applied to structural equation
models.
The chosen value of d * can be used to approximate the non-centrality parameter for
the initial sample size n0. Orthogonal components based on the right-hand side of
2
expression (13) can be calculated using e evaluated at u ¼ ua . The symbol z^ j is used
2
below in Section 7 to represent zj calculated in this way. The non-centrality parameter
for any other sample size, say simply n, can be approximated using the expression
l < n=n0 l0 : Some Monte Carlo simulations were conducted to cross-validate this
approach to the calculation of power for fixed, finite n, and the Monte Carlo results
were very close to the those obtained using the method described here. It should be
noted that for some of the hypotheses examined in Section 7, asymptotic power could
be calculated using a definition for d that would be essentially equivalent to the
definition given in expression (18) in Maydeu-Olivares and Joe (2005).
6. Application to a log-linear latent structure model
This section presents a psychometric model commonly known as the Rasch model. A
log-linear version of the Rasch model will be used in power calculations demonstrated
below.
6.1. The Rasch model
Latent structure models are used to postulate latent variables to account for the
association among manifest variables. When a latent structure model includes a
continuous latent variable, it is more commonly known as a latent trait or item response
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
347
model. Suppose that data are available on q manifest variables, or items, for each of
n individuals; each variable can take only the values 0 and 1. Denote by Yij the response
of the jth individual on the ith variable, and by Xj the latent variable associated with
individual j. b0i is an intercept parameter for item i and b1i is the slope parameter
that represents the association between item i and the latent variable X. The model
that includes both intercept and slope parameters is sometimes referred to as the
two-parameter model. A restriction of equality can be imposed on the q slope
parameters, and the common slope parameter can be absorbed into the scale of X.
Then, the logit version of the one-parameter or Rasch (1980) model is as follows:
Y ij jb0i ; X j , independent;
ð14Þ
PðY ij ¼ 1jb0i ; X j ¼ x j Þ ¼ ð1 þ expð2b0i 2 x j ÞÞ21 ;
ð15Þ
for i ¼ 1; : : : ; q; j ¼ 1; : : : ; n; where X 1 ; : : : ; X n are independent and identically
distributed random variables with common cumulative distribution function F(·).
In practice, the Rasch model is frequently an unrealistic approach for a model of the
responses to a set of items because the requirement for equal slopes is too restrictive. It
is similar to imposing a requirement on a factor analysis model that all manifest variables
must have the same factor loading. Often, an application of the two-parameter model
will be more likely to result in a successful fit. The one- and two-parameter models are
both models for the cell counts of a multinomial distribution. The methods presented in
this paper can be used for any model fitted to a 2q table. The Rasch model has been
selected to demonstrate power calculations in the next section because a log-linear
version of the model is available. Using the log-linear version to demonstrate power
calculations has the advantages that it is convenient to both demonstrate the influence
of higher-order interactions and to estimate the model with widely available software. A
more extensive presentation of the latent structure model as a logistic regression model
is given in Reiser (1996).
6.2. The generalized Rasch model
Tjur (1982) and Cressie and Holland (1983) demonstrate the equivalence of the logit
version of the Rasch model to a log-linear ‘generalized’ version. In this generalized
version, let ms be the expected count for cell s, then
Y
logðms Þ ¼ l þ lYf 1 þ lYg 2 þ · · · þ lh 1q þ lTt ;
ð16Þ
where lYf i is the effect for level f of manifest variable i, and lTt an effect for respondents
with the same total score, t ¼ 0; 1; : : : ; k:
P
A set of individuals who have the same total score, c ¼ i yij ; has been called a score
group. Whereas the logit version requires an assumption regarding the distribution of
the latent variable, this distribution is left unspecified in the log-linear version. Reiser
and Schuessler (1991) demonstrate applications of the various models discussed earlier.
The model given in expression (16) is a special case of the homogeneous association
model.
For this model, the parameter vector u contains the l parameters
Y
Y
u0 ¼ l lY0 1 lY1 1 lY0 2 lY1 2 : : : l0 q l1 q lT0 : : : lTq :
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
348 Mark Reiser
Familiar restrictions are required on these parameters in order to identify the model. For
a log-linear model,
e us
ps ðuÞ ¼ P u ;
l
le
ð17Þ
and for the log-linear Rasch model,
›pðuÞ
¼ ½DðpÞ 2 pp0 X;
ð18Þ
›u
where X is a 2q by g full-rank model matrix that contains a column for each manifest item
and columns for q 2 1 score groups, and g is the dimension of u. The methods given in
this paper can be employed with any multinomial model applied to a 2q table, and other
models would have a different expression for ›pðuÞ=›u.
Using the log-linear Rasch model as the model for p(u) under the null hypothesis,
and q ¼ 5 dichotomous manifest variables, power against various alternative
hypotheses was investigated for fixed, finite n by employing the methods described
in Section 5.
7. Results
7.1. Linear dependencies
Under the log-linear Rasch model, the first-order marginal frequencies are fitted exactly.
Since they do not contribute to the test of a null hypothesis, first-order marginals are not
shown in the results presented below. Also under the model, there is a linear
dependency among the second-order marginal proportions such that they sum to 1.
Specifically, there is a linear dependency among the columns of the model matrix X and
the columns of H0 ½2 . Therefore, the number of orthogonal components for secondorder marginals is one less than the number of second-order marginals. A similar linear
dependency exists among each set of marginal proportions from order 3 to q. For q ¼ 5;
the single fifth-order marginal is fitted exactly.
7.2. Two-way effect
Figures 1–3 show power curves for fixed, finite n under the condition that the null
hypothesis is false due to the specification of two-way effects among variables in the
model. True frequencies for the alternative hypothesis were generated under the loglinear model
logðms Þ ¼ l þ lYg 1 þ lYh 2 þ lYi 3 þ lYj 4 þ lYk 5 þ lY111 ;Y 2 :
For results shown in Figure 1, this log-linear model was used with l ¼ 0:5; lY1 1 ¼
lY1 2 ¼ lY1 3 ¼ lY1 4 ¼ lY1 5 ¼ 0:10 and lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:10. The
log-linear Rasch model demands equality for all two-way effects, so in this application it
is a false model because the two-way effect for variables Y1 and Y2 are not equal to the
other two-way effects. In the figures, the solid line curve shows approximate power for
a test based on only the second-order marginals, the curve with a line of constant-length
dashes shows approximate power for a test based on second- and third-order marginals,
and the curve with a line of alternating long and short dashes shows approximate power
for the full-information test based on second-, third- and fourth-order marginals. As is
apparent from the plot in Figure 1, a test based on second-order marginals has higher
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
349
Figure 1. Estimated power vs. sample size (two-way hierarchical effect).
power. This result is not surprising because the lack of fit is in the second-order
associations, and including third- or fourth-order marginals in the test statistic dilutes
the test.
Orthogonal components for the approximated non-centrality parameter with sample
size set at 400 are shown in Table 1, where it can be seen that the effective order of the
alternative is actually 1. The first orthogonal component corresponds to the secondorder marginal for variables Y1 and Y2. Since this marginal is the location of the lack of fit
for the model, a one degree of freedom test based on this component has higher power
than a test of any other order. Including any additional components dilutes the test. The
one degree of freedom test based on the first orthogonal component is actually a score
test of the null hypothesis H 0 : lYij 1 ;Y 2 ¼ 0; where all other pairs of variables have
association equal to zero.
Since the log-linear Rasch model requires equality among all two-way effects, Figure 1
is based on a model for the null hypothesis which is a nested version of the model under
the alternative hypothesis. Maydeu-Olivares and Joe (2005) also considered asymptotic
power for the item response model in the case where item associations under the null
Figure 2. Estimated power vs. sample size (two-way hierarchical effect).
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
350 Mark Reiser
Figure 3. Estimated power vs. sample size (two-way hierarchical effect).
hypothesis can be obtained by a constraint on the item associations under the
alternative hypothesis.
Figures 2 and 3 show power curves generated using the same procedure and model
as in Figure 1, except lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:20 in Figure 2, and
Table 1. Orthogonal components and associated power (two-way effect)
P 2
^i
Component
Marginal
6^ 2i
i6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(1,2)
(1,3)
(1,4)
(1,5)
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(1,2,3)
(1,2,4)
(1,2,5)
(1,3,4)
(1,3,5)
(1,4,5)
(2,3,4)
(2,3,5)
(2,4,5)
(1,2,3,4)
(1,2,3,5)
(1,2,4,5)
(1,3,4,5)
0.034831
1.896E-6
2.391E-6
3.120E-6
4.625E-6
6.862E-6
0.000011
3.548E-15
1.064E-14
1.574E-6
1.938E-6
2.444E-6
1.719E-10
2.363E-10
3.52E-10
6.148E-10
1.228E-9
3.703E-9
2.853E-6
4.751E-6
9.485E-6
4.108E-16
0.034831
0.034833
0.034835
0.034838
0.034843
0.034850
0.034861
0.034861
0.034861
0.034862
0.034864
0.034867
0.034867
0.034867
0.034867
0.034867
0.034867
0.034867
0.034870
0.034874
0.034884
0.034884
Components that are identically equal to zero are omitted from the table.
Power calculated with n ¼ 400; a ¼ :05.
Estimated power
.96
.93
.89
.86
.84
.81
.79
.76
.74
.72
.70
.68
.67
.65
.64
.62
.61
.59
.58
.57
.56
.55
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
351
lY111 ;Y 2 ¼ lY001 ;Y 2 ¼ 2lY011 ;Y 2 ¼ 2lY101 ;Y 2 ¼ 0:30 in Figure 3. The test based on secondorder marginals maintains higher power even as lYij 1 ;Y 2 increases.
7.3. Three-way effect
Figures 4–6 show power curves under the condition that the null hypothesis is false due
to a single three-way interaction among variables Y2, Y3 and Y4. The log-linear Rasch
model of the null hypothesis allows no three-way interactions. The curves shown in
Figure 4 were obtained under frequencies generated from a log-linear model that now
includes three-way associations:
logðms Þ ¼l þ lYg 1 þ lYh 2 þ lYi 3 þ lYj 4 þ lYk 5 þ lYgh1 ;Y 2 þ lYgi1 ;Y 3 þ lYgj1 ;Y 4 þ lYgk1 ;Y 5
þ lYhi2 ;Y 3 þ lYhj2 ;Y 4 þ lYhk2 ;Y 5 þ lYij 3 ;Y 4 þ lYik3 ;Y 5 þ lYjk4 ;Y 5 þ lYhij1 ;Y 2 ;Y 3 ;
Y ;Y
Y ;Y
Y ;Y
where l ¼ 0:5; lY1 1 ¼ lY1 2 ¼ lY1 3 ¼ lY1 4 ¼ lY1 5 ¼ 0:10;
l11i j ¼ l00i j ¼ 2l01i j ¼
Y i ;Y j
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
¼ lY010
¼ lY100
¼ lY111
¼
2l10 ¼ 0:20; for i, j ¼ 1; 2; 3; 4; 5; and lY001
Y 2 ;Y 3 ;Y 4
Y 2 ;Y 3 ;Y 4
Y 2 ;Y 3 ;Y 4
Y 2 ;Y 3 ;Y 4
¼ 2l011
¼ 2l101
¼ 2l110
¼ 0:20: Since the two-way effects
2l000
for all pairs of variables are constrained equal in the generating model, the model of
the null hypothesis is false only with respect to the three-way effect. As can be seen in
Figure 4, the test based on both second- and third-order marginals has higher power,
and the test based on only the second-order marginals has very low power for
detecting the three-way effect. Table 2 shows the orthogonal components. From
viewing the table, it appears that the effective order of the alternative is 16, but this is
due only to the arbitrary order of the rows of the H matrix. That is, row 16
corresponds to the third-order marginal for variables Y2, Y3 and Y4, and if this row had
been placed as the first row in the matrix, the effective order of the test would have
been 1. A one degree-of-freedom test on this component would be a score test on the
three-way effect.
Figure 4. Estimated power vs. sample size (three-way hierarchical effect).
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
352 Mark Reiser
Figure 5. Estimated power vs. sample size (three-way hierarchical effect).
For the curves shown in Figure 5,
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
lY001
¼ lY010
¼ lY100
¼ lY111
¼ 2lY000
¼ 2lY011
¼ 2lY101
2 ;Y 3 ;Y 4
¼ 2lY110
¼ 0:30:
The power of the test based on second-order marginals is still low. For the curves shown
in Figure 6,
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
2 ;Y 3 ;Y 4
¼ lY010
¼ lY100
¼ lY111
¼ 2lY000
¼ 2lY011
¼ 2lY101
lY001
2 ;Y 3 ;Y 4
¼ 2lY110
¼ 0:50:
Figure 6. Estimated power vs. sample size (three-way hierarchical effect).
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
353
Table 2. Orthogonal components and associated power (three-way hierarchical effect)
P 2
^i
Component
Marginal
6^ 2i
Estimated power
i6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(1,2)
(1,3)
(1,4)
(1,5)
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(1,2,3)
(1,2,4)
(1,2,5)
(1,3,4)
(1,3,5)
(1,4,5)
(2,3,4)
(2,3,5)
(2,4,5)
(1,2,3,4)
(1,2,3,5)
(1,2,4,5)
(1,3,4,5)
0.000021
0.000022
0.000024
0.000149
0.000076
0.000104
0.000032
0.000120
1.82E-14
0.000040
0.000044
0.001303
0.000270
0.001795
0.002144
0.015416
9.658E-15
7.06E-15
0.000027
4.9918E-6
9.9999E-6
0.000030
0.000021
0.000043
0.000067
0.000215
0.000291
0.000396
0.000428
0.000548
0.000548
0.000588
0.000632
0.002205
0.002205
0.004000
0.006144
0.021560
0.021560
0.021560
0.021587
0.021592
0.021602
0.021632
.051
.052
.052
.056
.058
.059
.059
.061
.060
.060
.084
.087
.087
.122
.166
.579
.565
.552
.541
.529
.518
.509
Components that are identically equal to zero are omitted from the table.
Power calculated with n ¼ 600; a ¼ :05.
The test based on the second-order marginals has power larger than the size of the test
when the sample size is quite large, although the power is low.
Figures 7 and 8 show power curves under the condition that the null hypothesis is
false due to a non-hierarchical three-way effect. In other words, all two-way effects are
equal to zero, but variables Y1, Y2 and Y3 have a three-way interaction. Results here are
similar to the curves for a hierarchical three-way effect.
7.4. Two- and three-way effects
Figures 9 and 10 show curves for power to detect a departure from the null hypothesis in
both the second- and third-order associations. Under the alternative hypothesis, Y1 and Y4
have no association with the other variables, but Y2, Y3 and Y4 have uniform pairwise
associations. Since the model of the null hypothesis demands a homogeneous association
for all pairs of variables, there is a discrepancy in the two-way associations. Furthermore,
under the alternative hypothesis, a three-way interaction effect is present for variables Y2,
Y3 and Y4. Since the model of the null hypothesis does not allow a three-way interaction,
there is a discrepancy also in the three-way associations. For the curves shown in Figure 9,
the two-way association effect for variables Y2, Y3 and Y4 was held constant and equal to
0.075, but the three-way association effect was varied over the range 0.0–0.30. In Figure 10,
the two-way association effect for variables Y2, Y3 and Y4 was held constant and equal to
0.10, while the three-way association effect was varied over the range 0.0–0.30. As can be
seen in both figures, when the three-way effect is small, the test based on X 2½2 has
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
354 Mark Reiser
Figure 7. Estimated power vs. sample size (three-way non-hierarchical effect).
considerably higher power than a test based on either X 2½2:3 or the full-information
Pearson–Fisher statistic. As the three-way association effect becomes larger, however, the
power of the test based on X 2½2 rises only gradually, while the power of tests based on the
other two statistics rises rapidly. In both Figures 9 and 10, the power of the test based on
X 2½2:3 surpasses the power of the test based on X 2½2 when the three-way association effect
becomes larger than the two-way association effect. In Figure 9 the curves cross at the point
where the three-way association becomes larger than 0.075, and in Figure 10 they cross at
the point where the three-way association becomes larger than 0.10.
Table 3 shows the orthogonal components that correspond to a slice of Figure 10
where the two- and three-way associations are both equal to 0.10, and the sample size is
1,000. As can be seen, the tests based on X 2½2 and X 2½2:3 have essentially the same power.
The effective order of the alternative is 16, as in the case considered above for an
Figure 8. Estimated power vs. sample size (three-way non-hierarchical effect).
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
355
Figure 9. Estimated power vs. sample size (three-way hierarchical effect; n ¼ 500).
alternative hypothesis with a departure from the null hypothesis due only to a three-way
hierarchical effect. The effective order will shift as the three-way association changes
from small to large.
8. Example
The log-linear Rasch model of Section 6 was fitted to data from Bock and Lieberman
(1970) on responses from a sample of size 1,000 to five items from Section 7 of the
LSAT. Orthogonal components of X 2PF for this application are shown in Table 4, with
two of the components very large relative to a central chi-squared distribution on one
degree of freedom. The log-linear Rasch model is a special case of the homogeneous
association model, obtained from that model by imposing an equality constraint
among all two-way associations. The largest orthogonal component shows that this
constraint produces an especially poor fit for the association between items 2 and 3.
The third-order marginal for items 1, 2 and 5 is also poorly fitted. X 2½2 ¼ 22:94 on 9
degrees of freedom ðp , :007Þ; so the model does not fit well to the second-order
Figure 10. Estimated power vs. sample size (three-way hierarchical effect; n ¼ 500).
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
356 Mark Reiser
Table 3. Orthogonal components and power (two- and three-way hierarchical effects)
P 2
^i
Component
Marginal
g^ 2i
Estimated power
ig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(1,2)
(1,3)
(1,4)
(1,5)
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(1,2,3)
(1,2,4)
(1,2,5)
(1,3,4)
(1,3,5)
(1,4,5)
(2,3,4)
(2,3,5)
(2,4,5)
(1,2,3,4)
(1,2,3,5)
(1,2,4,5)
(1,3,4,5)
0.001402
0.001709
0.002129
0.002782
0.004281
0.006307
0.001287
0.008932
5.091E-13
0.000086
0.000105
0.000176
0.000189
0.000320
0.000461
0.006629
1.37E-13
8.186E-13
8.189E-6
1.507E-6
3.024E-6
9.108E-6
0.001402
0.003111
0.005240
0.008022
0.012303
0.018610
0.019897
0.028829
0.028829
0.028915
0.029020
0.029196
0.029385
0.029705
0.030166
0.036795
0.036795
0.036795
0.036803
0.036805
0.036807
0.036816
.220
.332
.460
.606
.781
.921
.926
.986
.983
.979
.976
.973
.970
.968
.967
.989
.987
.986
.984
.982
.980
.978
Components that are identically equal to zero are omitted from the table.
Power calculated with n ¼ 1,000, a ¼ :05.
marginals, and it follows from the result of this test that the model also does not fit
well to the joint frequencies of the 32-cell full cross-classification. Although, in this
case, the same conclusion would be reached by performing a test using X 2PF , the
results presented above show that a test based on X 2½2 has higher power because the
source of poor fit is primarily in the second-order marginals. Both tests give the same
result in this application because the sample size and effect size are large. If these
LSAT items were selected by psychologists for the purpose of constructing a set of
questions that would follow the Rasch model principles, the results for orthogonal
components show that the content and wording of items 2 and 3 should be
investigated to identify the source of the association not shared by the other
item pairs.
Salomaa (1990) found that a log-linear model with unconstrained two-way
associations fits well to these frequencies. Results from such a model show that the
association between items 2 and 3 is substantially stronger than the associations among
other item pairs. A model that contains the entire set of two-way association cannot be
tested with X 2½2 because all of the second-order marginals are fitted exactly. Under this
model of unconstrained two-way associations, the component of X 2PF for the third-order
marginal among items 1, 2 and 5 is still fairly large at 5.41, but X 2½3 ¼ 14:1 on 10 degrees
of freedom ðp , :168Þ; which leads to a conclusion that is consistent with Salomaa’s
result. Bock and Lieberman (1970) found the fit of the two-parameter item response
model to be marginal ð:05 , p , :10Þ. In their results, items 2 and 3 had estimated slope
parameters of larger magnitude than the other items.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
357
Table 4. Orthogonal components of X 2PF for Rasch model (fitted to LSAT Section 7 data)
Component
Marginal
6^ 2i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(1,2)
(1,3)
(1,4)
(1,5)
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(1,2,3)
(1,2,4)
(1,2,5)
(1,3,4)
(1,3,5)
(1,4,5)
(2,3,4)
(2,3,5)
(2,4,5)
(1,2,3,4)
(1,2,3,5)
(1,2,4,5)
(1,3,4,5)
0.39
0.38
0.58
0.39
17.70
0.12
2.12
0.25
1.00
0.87
0.62
8.07
1.06
1.51
0.89
0.77
1.24
0.01
0.83
0.39
0.03
5.15
P
^ 2j
j6
0.39
0.78
1.36
1.75
19.45
19.57
21.69
21.95
22.94
23.81
24.43
32.50
33.56
35.07
35.96
36.73
37.97
37.98
38.81
39.20
39.24
44.38
Components that are identically equal to zero are omitted from the table.
9. Discussion
X 2½t:u is an extension of the limited-information statistic from Reiser (1996) and Reiser
and Lin (1999), designed to include marginals of any order from a multinomial
distribution with 2q cells. It was shown that the Pearson–Fisher statistic can be
partitioned into orthogonal components associated with marginal frequencies, and that
X 2½t:u constitutes a sum of these orthogonal components.
The non-centrality parameter for the large-sample distribution of X 2½t:u was also
partitioned into orthogonal components based on marginal distributions. Power
calculations based on these orthogonal components, and assuming that each estimated
^ is large, showed that a test based on low-order
expected cell frequency, nps ðuÞ,
marginals, X 2½2 , has higher power to detect lack of fit located in the second-order
associations when compared with a statistic that incorporates higher-order marginals
such as X 2½2:3 or the Pearson statistic. X 2½2 , however, would be very insensitive to a lack
of fit that is present only in the third-order marginals. Other limited-information statistics
discussed in the paper can be expected to suffer the same low power in this type of
situation. The result demonstrates that X 2½2 is focused on the second-order marginals.
The optimal form of X 2½t:u for a particular test will depend on the order of an alternative
hypothesis, which is determined by the relative magnitude of parameters under the
alternative model that contribute to lack of fit for the hypothesized model in marginals
of differing order.
In many applications of latent variable models in the social sciences, manifest
variables are designed to have high bivariate association, and so X 2½2 or X 2½1:2 may be the
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
358 Mark Reiser
best choice for a test statistic in this area of application. To guard against the possibility
of failing to detect a departure from the null hypothesis H0:p ¼ pðuÞ in higher-order
marginals, the residual X 2PF 2 X 2½2 could be examined, since X 2½2 is a component of X 2PF :
If the residual is large, relative to degrees of freedom, then inclusion of higher-order
marginals may be warranted. Use of residuals in a similar way was suggested by Durbin
and Knott (1972) in the context of the Cramér–von Mises statistic. It is also apparent
from the results presented here that a test based on an entire bank of marginals, such as
the complete set of second-order marginals, may not be optimal in terms of power
against a false null hypothesis. An alternative approach would be to estimate the optimal
value for the number of components to be included in the test statistic. Eubank (1997),
for example, presented a method for selecting the number of components that is similar
to using Mallows’ Cp to choose the best regression model. In an application of a model to
contingency tables, it may also be useful to examine each of the orthogonal components
on a single degree of freedom in order to detect possible sources of lack of fit. As
suggested by Lancaster (1969), a multiple comparison procedure should be used when
assessing a set of components.
Acknowledgements
The author would like to thank Martin Knott for suggesting the idea of an orthogonal
decomposition of the test statistics presented in this paper and for other helpful comments.
References
Agresti, A. (2002). Categorical data analysis. New York: Wiley.
Agresti, A., & Yang, M. C. (1987). An empirical investigation of some effects of sparseness in
contingency tables. Computational Statistics & Data Analysis, 5, 9–21.
Bartholomew, D. J. (1987). Latent variable models and factor analysis. New York: Oxford
University Press.
Bartholomew, D. J., & Leung, S. O. (2002). A goodness of fit test for sparse 2p contingency tables.
British Journal of Mathematical and Statistical Psychology, 55, 1–15.
Bartholomew, D. J., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude
measurement. Sociological Methods and Research, 27, 525–546.
Birch, M. W. (1964). A new proof of the Pearson–Fisher theorem. Annals of Mathematical
Statistics, 35, 818–824.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis.
Cambridge, MA: MIT Press.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items.
Psychometrika, 35, 179–197.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5–32.
Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests.
Biometrical Journal, 10, 417–451.
Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.
Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait
models. Psychometrika, 48, 129–141.
Durbin, J., & Knott, M. (1972). Components of Cramér-von Mises statistics I. Journal of the Royal
Statistical Society, B, 34, 290–307.
Eubank, R. (1997). Testing goodness of fit with multinomial data. Journal of the American
Statistical Association, 92, 1084–1093.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
Goodness-of-fit testing using components
359
Fergusson, D. M., Horwood, L. J., & Lynskey, M. T. (1995). The stability of disruptive childhood
behaviours. Journal of Abnormal Child Psychology, 23, 379–396.
Fisher, R. A. (1941). Statistical methods for research workers (8th ed.). Edinburgh: Oliver and
Boyd.
Haberman, S. J. (1973). The analysis of residuals in cross-classified tables. Biometrics, 29,
205–220.
Haberman, S. J. (1988). A warning on the use of chi-squared statistics with frequency
tables with small expected cell counts. Journal of the American Statistical Association, 83,
555–560.
Hall, P. (1985). Tailor-made tests of goodness of fit. Journal of the Royal Statistical Society, B, 47,
125–131.
Kendall, M. G. (1952). The advanced theory of statistics (5th ed., Vol. 1). London: Griffin.
Kendall, M. G., & Stuart, A. S. (1973). The advanced theory of statistics (3rd ed., Vol. 2). London:
Griffin.
Knott, M., & Tzamourani, P. (1997). Fitting a latent trait model for missing observations to racial
prejudice data. In J. Rost & R. Langeheine (Eds.), Applications of latent trait and latent class
models in the social sciences (pp. 224–252). Münster: Waxmann.
Koehler, K.J. (1986) Goodness-of-fit tests for log-linear models in sparse contingency tables.
Journal of the American Statistical Association 81, 483–493.
Koehler, K.J., and Larntz, K. (1980) An empirical investigation of goodness-of-fit statistics for
sparse multinomials. Journal of the American Statistical Association 75, 336–344.
Lancaster, H. O. (1969). The chi-squared distribution. New York: Wiley.
Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus (rev. ed.). New York: Wiley.
Maydeu-Olivares, A. (2001). Limited information estimation and testing of Thurstonian
models for paired comparison data under multiple judgement sampling. Psychometrika,
66, 209–228.
Maydeu-Olivares, A., & Joe, H. (2005). Limited and full information estimation and goodness-of-fit
testing in 2n contingency tables: A unified framework. Journal of the American Statistical
Association, 100, 1009–1020.
Mitra, S. K. (1958). On the limiting power function of the frequency chi-square test. Annals of
Statistics, 29, 1221–1233.
Moore, D. S. (1977). Generalized inverses, Wald’s method, and the construction of chi-squared
tests of fit. Journal of the American Statistical Association, 72, 131–137.
Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43,
551–560.
Rasch, G. (1980). Probabilisitic models for some intellegence and attainment tests. Chicago: The
University of Chicago Press.
Rayner, J. C. W., & Best, D. J. (1989). Smooth tests of goodness of fit. New York: Oxford.
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data.
New York: Springer-Verlag.
Reiser, M. (1996). Analysis of residuals for the multinomial item response model. Psychometrika,
61, 509–528.
Reiser, M., & Lin, G. (1999). A goodness-of-fit test for the latent class model when expected
frequencies are small. In M. Sobel & M. Becker (Eds.), Sociological methodology 1999
(pp. 81–111). Boston: Blackwell.
Reiser, M., & Schuessler, K. (1991). A hierarchy for some latent structure models. Sociological
Methods and Research, 19, 419–465.
Salomaa, H. (1990). Factor analysis of dichotomous data. Helsinki: Statistical Society.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis.
Psychometrika, 50, 83–90.
Schuessler, K. F. (1982). Measuring social life feelings. San Francisco: Jossey-Bass.
Tate, M. W., & Hyer, L. A. (1973). Inaccuracy of the chi-squared test of goodness of fit when expected
frequencies are small. Journal of the American Statistical Association, 68, 836–841.
Copyright © The British Psychological Society
Reproduction in any form (including the internet) is prohibited without prior permission from the Society
360 Mark Reiser
Tjur, T. (1982). A connection between Rasch’s item analysis model and a multiplicative Poisson
model. Scandinavian Journal of Statistics, 9, 23–30.
Tollenaar, N., & Mooijaart, A. (2003). Type I errors and power of the parametric bootstrap
goodness-of-fit test: Full and limited information. British Journal of Mathematical and
Statistical Psychology, 56, 271–288.
Received 11 March 2006; revised version received 4 March 2007
Appendix
The equivalence of X 2T 2g21 and X 2PF can be established as shown in Theorems 7.1.1 and
7.1.2 and Section 7.2 in Rayner and Best (1989). Assume that H is a T 2 g 2 1 by T
matrix with rank T 2 g 2 1. It is not necessary to assume that H represents
transformations to obtain marginal proportions, that is, the proof is valid for any
transformation matrix that has rank T 2 g 2 1. From Section 3,
^ 21 Hr:
X 2T 2g21 ¼ nr0 H0 V
e
Using matrix F as defined in Section 4,
^ 21 ðF^ 0 Þ21 F^ 0 Hr ¼ nr0 H0 Fð
^ e FÞ
^ Þ0 H
^ F^ 0 V
^ 21 F^ 0 Hr ¼ nr0 ðH
^ r:
X 2T 2g21 ¼ nr0 H0 F^ F^ 21 V
e
ð19Þ
Define
1
.
.
^ 21 pðuÞÞ
^ .. DðpðuÞÞ
^ 21 G
^ 0 ðA^ 0 AÞ
^ 2 Þ:
^ † Þ0 ¼ ððHÞ
^ 0 .. DðpðuÞÞ
ðH
^ 21 pðuÞÞ
^ ¼ 1 and GDðpð
^ 21 pðuÞÞ
^ ¼ 0,
^ 0 DðpðuÞÞ
^
uÞÞ
Since pðuÞÞ
†
0
†
21
†
0
†
^ H
^
^ Þ DðpðuÞÞ
^ ¼ I T and DðpðuÞÞ
^ ÞH
^ : Then,
ðH
¼ ðH
it
follows
that
^ 21 2 DðpðuÞÞ
^ 21 pðuÞÞpð
^
^ 0 DðpðuÞÞ
^ 21
^ Þ0 H
^ ¼ DðpðuÞÞ
ðH
uÞÞ
^ 21 G
^ 21 G
^ 21 :
^ 0 ðGDðpð
^ 0 ÞGDðpð
^
^
2 DðpðuÞÞ
uÞÞ
uÞÞ
Substituting into equation (19),
^ 21 r;
X 2T 2g21 ¼ nr0 DðpðuÞÞ
^ 0 DðpðuÞÞ
^ 21 r ¼ 0 and GDðpð
^ 21 r ¼ 0 because it is the score vector equation
^
as pðuÞ
uÞÞ
that the maximum-likelihood estimator satisfies.
Under a simple null hypothesis, a similar proof with g ¼ 0 can be used to show that
X 2T 21 ¼ X 2½1:q ¼ X 2P ; the Pearson statistic (see also Maydeu-Olivares and Joe 2005, for a
proof under a simple null hypothesis).