as a PDF

Journal of Educational Statistics
Spring 1988, Vol. 13, No. 1, pp. 19-43
An Assessment of the Dimensionality of Three
SAT-Verbal Test Editions
Linda L. Cook
Neil J. Dorans
Daniel R. Eignor
Educational Testing Service
Key words: factor analysis, binary data, item parcelling
A strong assumption made by most commonly used item response theory
(IRT) models is that the data are unidimensional, that is, statistical dependence among item scores can be explained by a single ability dimension.
First-order and second-order factor analyses were conducted on correlation
matrices among item parcels of SAT-Verbal items. The item parcels were
constructed to yield correlation matrices that were amenable to linear factor
analyses. The first-order analyses were employed to assess the effective
dimensionality of the item parcel data. Second-order analyses were employed to test meaningful hypotheses about the structure of the data. Parcels
were constructed for three SAT-Verbal editions. The dimensionality analyses revealed that one SAT-Verbal test edition was less parallel to the other
two editions than these other editions were to each other. Refinements in the
dimensionality methodology and a more systematic dimensionality assessment are logical extensions of the present research.
In recent years there has been considerable research and interest devoted
to the use of item response theory (IRT) in the solutions to a variety of
measurement problems (see Hambleton, 1983; Lord, 1980). Because of the
special properties of test data characterized by IRT models, users are often
able to solve problems not amenable to solution through the use of traditional psychometric methods. However, in order for IRT to be useful in the
solution of measurement problems, certain fairly strong assumptions about
This study was supported by Educational Testing Service through Program
Research Planning Council funding. The opinions expressed herein are those of the
authors, however, and should not be interpreted as policy statements by either ETS
or the College Board. Support for the payment of voluntary page charges was
provided by the College Board Programs Division of ETS.
The authors' names appear in alphabetical order.
19
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
the data must be met. One of the most important of these assumptions is
the assumption of unidimensionality. Most IRT models that are currently
used with binary scored item response data assume that the probability of
a correct response to an item can be modeled by a mathematical function
that assumes a single ability dimension common to all items. For reasons to
be developed later in this section of the paper, researchers working on IRT
applications that involve binary scored item response data typically have
assumed, without empirical verification, that the items which appear to test
a skill or content area are unidimensional (Divgi, 1981). This assumption
is almost surely inappropriate for many types of test data (Drasgow &
Parsons, 1983). This lack of empirical verification of the unidimensionality
assumption has generally been caused by the difficulties involved in the
assessment of the dimensionality of binary scored item response data.
A variety of methods have been advanced over the past half century for
assessing the unidimensionality assumption for binary scored item response
data. Hattie (1981, 1985) has provided a comprehensive review of these
methods. Two categories of procedures currently seem to be applied to the
problem; they may be described in very general terms as procedures that
make use of information on item parameter estimates or residuals from
fitting a particular IRT model and procedures that make use of factor
analytic procedures, or indices based on a factor analysis, applied before a
specific IRT model is fit. (See Rosenbaum, 1984, however, for an approach
that falls into neither category.) The IRT-only procedures focus on testing
whether the assumption of unidimensionality holds, whereas the factor
analytic procedures allow for multiple dimensions.
IRT-Only Approaches
If the one-parameter logistic model and conditional maximum likelihood estimation techniques are used, a number of statistical tests of
the unidimensionality assumption follow directly from the estimation of
item parameters over different groups of people or subsets of items
(see Gustafsson, 1980; van den Wollenberg, 1982a, 1982b). If the oneparameter or two-parameter normal ogive model are used with marginal
maximum likelihood estimation procedures (Bock & Lieberman, 1970), a
residuals-based test of the unidimensionality assumption exists. McDonald
(1982), while presenting IRT models that utilize marginal maximum likelihood estimation procedures as special cases of the random regressors
factor analytic model, has suggested that the set of residual item covariances after fitting a unidimensional model be studied for indications of
departures from unidimensionality. Hattie (1981, 1984), in a large scale
study of indices of unidimensionality, studied McDonald's suggested procedure along with a number of other proposed measures, and found that
McDonald's suggestion provided the best results.
Because the one-parameter or Rasch model is in many cases not applicable to the analysis of binary scored multiple choice item response data
20
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
(Divgi, 1986; Fischer, 1978), and because researchers often object to the
assumption of normally distributed abilities, usually utilized in the random
regressors factor analytic model (McDonald, 1982; but see Bock & Aitkin,
1981, for a recently developed approach that does not depend on the
assumption of normally distributed abilities), many researchers work instead with the three-parameter logistic model and unconditional maximum
likelihood estimation procedures, as used, for instance, in the computer
program LOGIST (Wingersky, 1983; Wingersky, Barton, & Lord, 1982).
For this model and estimation procedure, direct statistical and data-based
tests of the unidimensionality assumption, which follow from the parameter
estimation process, do not presently exist, although data-based tests could
easily be developed (McDonald, 1982). Be jar (1980) has developed a procedure, comparable to one of the procedures suggested for the Rasch
model by Gustafsson (1980), that is applicable with the more complex
three-parameter logistic model in the unconditional maximum likelihood
estimation context. The procedure requires both a priori knowledge about
the test items, so that a subset of the total set of items can be formed that
is clearly unidimensional, and the subsequent availability of item parameter
estimates for the total set of items and the subset deemed unidimensional.
Because this kind of information (in particular, item parameter estimates
for the subset in the Be jar procedure) is usually unavailable at the time
researchers wish to assess the unidimensionality assumption, many of them
have chosen instead, when working with binary scored item response data,
to forego tests based on parameter estimates or residuals and instead to use
factor analytic procedures with the individual item data to assess dimensionality, usually working with phi, or when possible, tetrachoric correlation coefficients. In addition, Hambleton and Rovinelli (1986) have provided some examples of situations in which Be jar's procedure was unable
to detect multidimensionality in the data.
Factor Model Approaches
A complete review of the literature documenting the theoretical and
practical problems involved in the linear factor analysis of phi and tetrachoric correlation coefficients is beyond the scope of this paper. What
follows is a brief review of the major issues. Carroll (1945, 1961, 1983) has
documented the problems inherent in the factor analysis of phi coefficients;
these correlations depend not only on the strength of relationship between
the variables being correlated, but upon the means of the variables as well.
Further, factor analysis of phi coefficients produced by the same underlying
structure, but dichotomized at different points, can conform to factor
models with different structures and possibly different numbers of factors
(Mislevy, 1986). The existence of additional artifactual or "difficulty" factors when phi coefficients are factor analyzed via a linear model has been
a well-discussed phenomenon in the literature. McDonald and Ahlawat
(1974) provided a review of previous work on the issue "difficulty" factors
21
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
and offered an alternative explanation for the existence of these factors.
According to them, artifactual factors are not the result of the characteristics of the particular items being analyzed, but rather, are the result of the
fact that a nonlinear model is needed to characterize the regression of item
score on ability instead of the linear model that is implied in linear factor
analysis.
Factor analysis of tetrachoric correlation coefficients theoretically should
circumvent the problem of "difficulty" factors. However, other problems
can occur when tetrachorics are factor analyzed. Carroll (1945,1961,1983)
has documented the problems involved in the factor analysis of tetrachoric
correlation coefficients based on binary scored multiple-choice items where
guessing is possible. In this context, failure to take guessing effects into
account will again produce artifactual factors and misleading information as
to the number of factors needed to account for the data. Carroll (1945) has
given formulas to correct for the effects of guessing on tetrachoric correlation coefficients and, hence, eliminate the existence of artifactual "guessing" factors. Mislevy (1986), in discussing these formulas further, points
out that it is often the case that after Carroll's (1945) correction is applied,
adjustments to elements of the matrix may still need to be made (see also
Hulin, Drasgow, & Parsons, 1983) and even when this is done, the sample
tetrachoric correlation matrix may not necessarily be positive definite (see
Lord & Novick, 1968, p. 349), thereby ruling out the use of many of the
more desirable factor analytic estimation procedures.
Given the practical problems involved in the linear factor analysis of
binary scored item response data using tetrachorics (i.e., the matrices are
often nonpositive definite) and the assumptions that must be met in order
for the procedure to be viable (no or correctable guessing and normally
distributed traits), three other approaches that provide viable options to the
problem of assessing dimensionality at the item level have been developed.
These procedures either avoid the use of tetrachorics with the linear factor
analytic model or avoid the use of the linear model altogether. A variation
of the third approach to be discussed was used in the research described in
this paper.
Nonlinear factor models. Hambleton and Rovinelli (1986) have provided
one possible approach to the problem, which is based on McDonald's
(1981) suggested procedure for studying dimensionality with the random
regressors factor analytic model. This involves looking at the residual covariances between items after fitting successively more complex nonlinear
factor models. A special nonlinear polynomial factor analytic model developed by McDonald (1982; Etezadi-Amoli & McDonald, 1983) was used by
Hambleton and Rovinelli in their study. No assumptions about the distribution of latent traits had to be made with this procedure. Phi coefficients
were used in the analyses.
IRT-based factor analyses. The second procedure or set of procedures
involves the use of recent advancements with the random regressors factor
22
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
analytic model that circumvent the problems involved with factor analyzing
sample tetrachoric matrices. Tucker (1983) refers to these procedures as
item characteristic function approaches. They provide a blending of factor
analytic and item response theory techniques. That is, factor analysis no
longer needs to be thought of as an auxiliary procedure to be applied as a
check on dimensionality before a unidimensional IRT model can be utilized. When using these factor analytic procedures, if a unidimensional
model is indicated, the item parameters specifying the relationship between
item score and the underlying trait will be estimated. These procedures
involve a generalized least squares approach attributable to Christoffersson
(1975) and marginal maximum likelihood full information factor analysis
based on the work of Bock and Aitkin (1981). Mislevy (1986) has provided
an excellent review of these approaches, along with the closely related
procedure attributable to Muthen (1978, 1984). These approaches are
based on the usual assumption that the underlying traits are multinormally
distributed, although Mislevy suggests that the Bock and Aitkin approach
for multiple traits, currently operationalized in the program TESTFACT
(Wilson, Wood, and Gibbons, 1984), can be generalized so that empirically
derived distributions, rather than the normal, can be used to characterize
the traits. This has been done already for the unidimensional case in the
program BILOG (Mislevy & Bock, 1983). Also, Tucker has been working
on an alternate procedure that circumvents multinormal distributional assumptions on the latent traits.
Linearization. The final procedure is based on the work of Cattell (1956,
1974) and involves the linear factor analysis of item parcel data, or minitests, made up of small collections of nonoverlapping items thought to
measure the same underlying dimension or dimensions. Again, the assumption of multinormality of the underlying traits is made. Data on individual
items are no longer used directly in deriving the correlation matrix. Some
practical justification for aggregating the data into mini-tests and using
linear factor analysis appears in the summary section of McDonald's 1981
article:
(1) In principle, a set of n tests or n binary items is unidimensional if and
only if the set fits a (generally nonlinear) common factor model with just
one common factor.
(2) In checking the unidimensionality of a set of tests, a simple, appropriate, ancillary assumption is that the regressions of the tests on the factors
are linear, (p. 113)
If item parcel data is to be used in a linear factor analytic study, of serious
concern is the method chosen for defining the subsets from the total set of
items and then placing items into parcels within a subset. Cattell and
Burdsal (1975) recommend doing two factor analyses, one on the items to
define the item dimensions for forming subsets within which the parcels will
be formed and then one on the parcels to assess dimensionality. Because
the first factor analysis suggested involves all the problems inherent in the
23
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
factor analysis of item data, unless one of the more recently developed
random regressors factor analytic approaches can reasonably be applied, it
would appear that a non-factor analytic procedure for the formation of item
subsets, such as using item types as defined by well-developed test specifications, would provide a suitable approach. Another concern when using
item parcel data in factor analytic studies is the possibility of propagation
of artifactual factors at the item parcel level (see Swinton & Powers, 1980).
The use of item parcel data instead of individual item data tends to "linearize" the basic nonlinear relationship between score and underlying trait
that exists at the item level, thereby providing some justification for the use
of linear factor analysis of a matrix of product moment correlations without
concern for artifactual factors due to nonlinearity (McDonald & Ahlawat,
1974). If, however, the parcels are of widely differing difficulty, artifactual
factors may still possibly result. Such factors will inhibit a reasonable assessment of the dimensionality of the data.
Purpose
The purpose of this study was to look at the dimensionality of the
Scholastic Aptitude Test Verbal (SAT-Verbal) section. An item parcelling
approach was used in the study, in conjunction with contemporary factor
analytic techniques. More specifically, three SAT-Verbal editions were
chosen for dimensionality assessment. It was hypothesized, because of
differing numbers of items and underlying content specifications, that one
of the editions chosen might have an underlying factor structure that differed from the other two. Linear factor analysis of item parcel data was
used for dimensionality assessment. For each test edition, a series of confirmatory factor analyses using the LISREL V computer program
(Joreskog & Sorbom, 1981) were performed to assess dimensionality. The
results of the factor analyses were then related to the manifest differences
between the editions under study.
Methodology
The verbal section of the Scholastic Aptitude Test (SAT) was selected for
this dimensionality study. This test is of a multiple choice variety and has
been described as measuring developed verbal reasoning abilities that are
related to successful performance in college. It is intended to supplement
the secondary school record and other information about the student in
assessing readiness for college-level work.
Test specifications for the SAT-Verbal section have not remained constant over the years. Test booklets containing SAT editions administered
prior to the fall of 1974 consisted of two 45-minute sections (one SATVerbal and one SAT-Mathematical) and three 30-minute sections (one
SAT-Verbal, one SAT-Mathematical, and one experimental containing an
anchor test or pretest). The two SAT-Verbal sections contained a total of
90 five-choice items composed of 53 reading comprehension items (18
24
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
sentence completions and 7 reading passages each of which is followed by
5 items based on the passage) and 37 vocabulary items (18 antonym items
and 19 analogy items). One of the SAT-Verbal editions used in this study
was developed to these pre-1974 specifications. Test booklets containing
SAT editions administered since the fall of 1974 (which includes the other
two SAT-Verbal editions used in this study), consist of six 30-minute sections: two SAT-Verbal sections, two SAT-Mathematical sections, one Test
of Standard Written English, and one experimental section. The two SATVerbal sections contain a total of 85 five-choice items composed of 40
reading comprehension items (15 sentence completions and five reading
passages each of which is followed by 5 items based on the passage) and 45
vocabulary items (25 antonym items and 20 analogy items). Regardless of
specifications, raw scores on SAT-Verbal are obtained scores that have
been corrected for guessing. Raw scores are computed by the formula
R - Wlk, where R is the number of correct responses, W is the number of
incorrect responses, and (k + 1) equals the number of answer choices per
item.
Choice of Test Editions for Analysis
Three SAT-Verbal test editions were chosen for the factor analyses performed for this study. An attempt was made to locate two test editions that
could be considered somewhat nonparallel and then to select a third edition
that could be considered reasonably parallel to one of the editions previously chosen. Edition V4 was chosen to be the nonparallel edition; it
contained five more items than the other two SAT-Verbal editions (90 in
total) and was built to different content specifications. The other two editions that were chosen have edition designations X2 and Y3. Both of those
editions contained the same number of items (85 in total), were built to the
same content specifications, and were fairly similar both in reliability and
overall difficulty level.
Formation of Item Parcels
The parcelling approach used in this paper attempts to circumvent the
problems associated with factoring item data by factoring item parcel
scores, that is, sums of scores on a small subset of items, which are more
amenable to analysis by a linear factor model than item data.
Parcel construction principles. As noted earlier, it is well documented
(e.g., Carroll, 1945, 1983) that a linear factor analysis of a matrix of phi
coefficients from binary item data with an underlying unidimensional structure will be viewed as multidimensional, with a second dimension clearly
related to item difficulty. As McDonald and Ahlawat (1974) argued, part
of the problem is that a linear regression model is inappropriate for the
item/factor regression, which has to be nonlinear given the bounded nature
of dichtomous data and the unbounded metric assumed for the underlying
factor.
For the parcel approach to avoid the problems of factoring phi coeffi25
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cooky DoranSy and Eignor
cients, the parcels must be constructed in a fashion that is sensitive to these
problems. The major reason for constructing parcel scores is to achieve a
matrix of correlations or covariances that is not affected by item difficulty
and the nonlinearity of the item/factor regression. Parcel construction
should attempt to "linearize" the data by attempting to remove the effects
of nonlinearity and differences in item difficulty.
To mitigate the effects of differences in item difficulty and nonlinearity,
parcel scores should have approximately equal means and variances. In the
terminology of classical test theory, the parcels should be constructed to be
parallel to each other. To achieve parallel parcels, it is essential to place
approximately equal numbers of easy, middle difficulty, and hard items
within each parcel such that each parallel parcel is composed of several
nonparallel items.
A critical question that needs to be addressed is how many items are
needed for a parcel. Experience (Drasgow & Dorans, 1982) indicates that
a minimum of at least three is needed and that six or seven are clearly
enough provided that the items within a parcel are adequately spaced to
achieve a situation in which the probabilities associated with the parcel
score distribution is approximately normal. A statistical justification for the
parallel parcelling approach might be drawn from the work of Drasgow and
Dorans (1982) where they introduce the notion of a categorization attenuation factor that reduces the correlation between two continuous variables
when one variable is polychotomized. Parallel parcelling can be viewed as
a heuristic approach to converting dichotomous data into polychotomous
data with an eye toward minimizing the size of the categorization attenuation factor. Ideally, an algorithm should be worked out for parallel parcelling. Such an algorithm would be designed to achieve a small constant
categorization attenuation factor across parcels. In this paper, an iterative
process that requires expert judgment and which focuses only on the first
two moments of the parcel score distribution was used. If the parcel approach is to be used extensively, an algorithm would appear essential.
Construction of specific item parcels. Items from each SAT-Verbal edition were separated into item subsets on a within-edition basis using the
four item types contained in the test: sentence completion items, antonym
items, analogy items, and items based on reading passages. Within each of
the four item subsets, items were placed into parcels of four to seven items
such that the mean difficulty level and the standard deviation of the difficulties of the parcels were approximately the same. The building of parcels
of comparable difficulty was accomplished by assigning items to parcels
based upon their equated delta difficulty indices. (See Hecht & Swineford,
1981, for an explanation of delta difficulty indices and the process of delta
equating.) Within each of the four subsets of items, the same number of
parcels were formed across each of the three editions. Figure 1 contains the
number of items within each of the four item subsets of SAT-Verbal for
each of the three editions and the number of parcels within each of the
subsets.
26
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
Scores for examinees on the item parcels were formed, and then correlations were computed between parcels both within and across subtests
for each edition. The correlations among the parcels were used as input to
the LISREL V program.
LISREL V: First-order and Second-order Models
The LISREL V computer program (Joreskog & Sorbom, 1981) fits and
tests models for linear structural relationships among quantitative variables. As mentioned earlier, the primary reason for developing item parcels
was to yield variance-covariance matrices that were amenable to a linear
factor analysis. Both first-order factor analysis and second-order factor
FIGURE 1. Factor loading matrices and parcel descriptions for SA T- Verbal editions
Parcels
p x o o o~]
X
X
0
0
0
0
0
0
0
0
0
0
0
0
0
[_0
A"
Parcels
1-3
i
0 0 0
0 0 0
X00
X00
X00
X00
X00
0X0
0X0
0X0
0X0
0 0 X
0 0 X
0 0 X
0 0 X
0 0 XJ
Items Type
Sentence
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Number of Items
Edition V4
Edition X2 and Y3
18
15
Completions
4-8
Antonyms
18
25
9-12
Analogies
19
20
Reading
Passage items
35
25
Totals
90
85
13-17
27
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cooky Dorans, and Eignor
analysis are special cases of the LISREL V model. First-order factor analyses were employed in this study to assess the "effective" dimensionality of
the item parcels, that is, the number of factors needed to adequately describe the covariation among item parcels. Second-order factor analyses
were employed to test meaningful hypotheses about the structure of the
data.
Common factor model. The traditional first-order common factor model
is
% = Ax + Du,
(1)
where y is an n-by-1 vector of observable scores on the n item parcels; x is
a A>by-1 vector of non-observable scores on the k common factors that
account for covariation among the n parcels; A is anrc-by-fcmatrix of
common factor loadings describing the regressions of the n parcel scores on
the k factor scores; u is an «-by-l vector of unobservable unique scores,
which could be further decomposed into measurement error and scores on
specific factors; and D is anrc-by-rcdiagonal matrix of uniqueness loadings.
Therc-by-rccovariance matrix among the item parcels can be expressed
as
Cyy = ACxxA' + D\
(2)
2
where C^ is the A:-by-A: matrix of factor covariances, and D is an rc-by-rc
diagonal matrix of unique variances.
One goal of a factor analysis is to identify the number of common factors
needed to fit the off-diagonal elements of Cyy. This is known as the numberof-factors problem. LISREL V was used to assess the number-of-factors
problem in the following fashion. For each test edition studied, the fit of a
single-common-factor model to the correlation matrix among item parcels
(correlation matrices were used to simplify proportion of variance interpretations and reduce the impact of variable length parcels on the multifactor
solutions), was examined. Next, the fit of a very general two-commonfactor model to the same data was examined. The two-common-factor
models were essentially unconstrained in that no restrictions were imposed
on the factor loading matrix A. Consequently, the two-factor solutions
were not readily interpretable in a substantive fashion. They did, however,
permit assessment of the number-of-factors question.
To achieve interpretable results, a second-order factor model was used in
a more traditional confirmatory application of the LISREL approach. A
second-order factor analysis can be thought of as a factor analysis of the
first-order factors. (See Schmid & Leiman, 1957, for a discussion of one
approach to hierarchical or second-order factor analysis.) It is a particularly
fruitful approach to employ when one suspects that correlations among the
first-order factors can be explained by a single second-order common factor. Such a model is particularly applicable to item data that one suspects
is essentially unidimensional. Drasgow and Parsons (1983) suggested a
28
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
second-order factor model that influenced the choice of the model and
approach used in this study.
Second-order model. The second-order factor model fitted to the first
order common factors, x, is
x = bz+Fv,
(3)
where z represents a score on the second-order general factor; b is the
A>by-1 vector of loadings of the k first-order factors on z; F is a k-by-k
diagonal matrix of loadings of the k first-order factors on their corresponding group factors; and v is a A>by-1 vector containing the k group factor
scores.
This second-order factor model decomposes each first-order factor into
a second-order common factor that influences all first-order factors, and a
second-order group factor which influences performance only on that firstorder factor. If the contribution of the second-order common factor to
every first-order factor is large, the correlations among the first-order factors will be close to unity. If the second-order group factor for a particular
first-order factor is relatively large, then the correlations of that first-order
factor with other first-order factors will be among the lowest in the firstorder factors correlation matrix.
As with the first-order factor analyses, the fit of the second-order factor
models to the data was assessed. More importantly, substantive interpretations were attached to the second-order solutions. The substantive
interpretations followed from the nature of the item parcels.
For the three SAT-Verbal test editions, 17 parcels were constructed:
three sentence completions parcels; five antonyms parcels; four analogies
parcels; and five parcels for items based on reading passages. The firstorder factor loading matrix is highly restricted with simple structure corresponding to item type. In other words, the three sentence completions
parcels load on a sentence completions factor only, the five antonyms
parcels load on the antonyms factor only, and so forth (see Figure 1 for a
more detailed summary of the parcels and simple structure). Thus, the
second-order factor model contains a second-order common verbal factor
and four independent second-order group factors corresponding to each of
the four verbal item types. To the extent that the first-order factor variance
explained by the second-order common factor is large, the data are
unidimensional. On the other hand, a sizeable second-order group factor
on a particular item type, say reading passage items, would indicate
that this item type is making the largest contribution to violations of
unidimensionality.
To summarize, both first-order factor analyses and second-order factor
analyses were employed. The first-order analyses focused on the number of
factors or "effective" dimensionality issue. The second-order analyses were
more confirmatory and focused on assessing hypothesized structures suggested by the item types and content areas measured by the tests. Fit of the
29
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cooky Dorans, and Eignor
model to the data was the dominant concern in the first-order analyses.
Decomposition of first-order factor variance into second-order common
and group specific components was the main concern of the second-order
analyses.
LISREL V's Indices of Fit
LISREL V provides several indices of fit that are described by Joreskog
and Sorbom (1981). When LISREL V provides maximum likelihood estimates of free parameters, it also provides the likelihood ratio \ 2 statistic
with associated degrees of freedom and probability level. Ideally, this index
should be helpful in assessing competing models for the data because,
under certain conditions, the difference in x2 values is itself chi square
distributed with degrees of freedom equal to the difference in degrees of
freedom associated with the two competing models. However, it is important to keep in mind that this difference in x2 values is asymptotically
distributed as chi square only if one model is a special case of the other
model and the larger model is true. This difference in x2 values indicates
whether the parameters that are estimated in the more general model add
anything to the fit of the model for the data. It should be noted that
Joreskog and Sorbom also cite several other reasons why the x2 indices
should be used with caution.
Another goodness of fit index provided by LISREL V is the root mean
square residual,
r
n
n
11/2
RMSR = 2 2 2 (ci}- c(j )2l(k + 1)*
L i"=i;=i
,
(4)
J
where k is the number of observed variables, and c,y and c,y are elements of
the observed and fitted covariance matrices. The RMSR is a useful descriptive index for comparing the fit of two different models for the data.
In addition to these indices of global fit, LISREL V provides individual
residuals in both raw and standardized forms. The standardized residuals
are taken from standard asymptotics based on normality; hence the standardized residual is assumed to be asymptotically a standard normal variable. Joreskog and Sorbom (1981) suggest that standardized residuals with
values greater than two in absolute value merit close examination. For an
effective summary of the fit of individual models, LISREL V presents
Q -plots of the normalized residuals against normal quantiles. The slope of
the plotted points are indicative of model fit and it is possible to evaluate
model fit by visual inspection of the Q -plots. One can imagine a straight
line passing through the plotted points and compare the slope of this line
with a 45° line represented on the plots by small dots. Slopes which are close
to one represent moderate fit and those smaller than one poor fit. Perfect
fit is represented by points falling in a straight line perpendicular to the
abscissa.
30
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
Results
The factor analytic results are presented in the following fashion. For
each SAT-Verbal test edition, the number-of-factors question is assessed by
examining the fit of first-order factor solutions. Comparability of the hypothesized second-order factor structures is then examined across the three
test editions.
Number of Factors
Figure 2 contains examples of Q -plots of normalized residuals used to
assist in assessing fit in this study; the plots are for SAT-Verbal Edition V4.
There are four panels in this figure. The two left panels summarize the fit
FIGURE 2. Normalized residuals plots and indices offit for SAT-Verbal edition V4
One Factor First-Order Solution
Two Factor First-Order Solution
One. Second-Order Common Factor and Four
Second-Order Group Factors Solution
Two Second-Order Common Factors and Four
Second-Order Group Factors Solution
31
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
of a one factor first-order solution and a two factor first-order solution
respectively, while the two right panels summarize the fit of two secondorder factor solutions: a solution with one second-order common factor and
four second-order group factors (one each for sentence completions, antonyms, analogies, and items based on reading passages), and a solution
with two independent second-order common factors and the same four
second-order group factors. The top left panel reveals that a single firstorder factor solution does not fit the V4 item parcel correlation matrix. The
residuals plot reveals a sizeable number of large positive residuals, which is
indicative of underfactoring. In the bottom left panel it can be seen that
adding a second first-order factor results in a very noticeable improvement
in fit. The indices of fit, x2 and RMSR, for SAT-Verbal Edition V4 are
shown in the left-hand column of Table 1, which contains summaries of the
fit for all three SAT-Verbal editions. For Edition V4, the RMSR is halved
from .026 to .013, and the x2 exhibits a sizeable drop from 605 (df = 119)
to 182 (df = 101), an unquestionably significant improvement in fit.
The information contained in the top right panel of Figure 2 reveals that
a second-order solution with a restrictive factor pattern, one second-order
common factor and four second-order group factors, fits the V4 item parcel
correlations very well. Adding a second second-order common factor, or-
TABLE 1
Summary of fit for three SAT-Verbal editions
Edition
Model
One factor
first-order
Two factor
first-order
Three factor
first-order
V4
X2
Y3
RMSRa = .026
RMSR = .027
RMSR = .026
( X 2 , # ) = (605,119) ( x 2 , # ) = (681,119) ( X 2 , # ) = (653,119)
RMSR = .013
RMSR = .017
RMSR = .017
(X\df) = (182,101) (X\df) = (310,101) (X\df) = (296,101)
Not done
RMSR = .010
Not done
(X\df) = (124,82)
One second-order
RMSR = .014
RMSR = .012
RMSR = .013
common factor ( x 2 , # ) = (176,115) ( X 2 , # ) = (145,115) ( X 2 , # ) = (169,115)
and four
second-order
group factors
Two second-order
RMSR = .013
RMSR = .012
RMSR = .013
common factor ( X 2 , # ) = (152,114) ( x 2 , # ) = (143,114) (x\df) = (163,114)
and four
second-order
group factors
PRoot mean square residual.
32
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
thogonal to the first (the bottom right panel in Figure 2), produces a slight
but statistically significant improvement in fit, as seen in Table 1, dropping
the x2 from 176 (df = 115) to 152 (df = 114).
The normalized residuals plots for SAT-Verbal Edition X2 were similar
to those for Edition V4, revealing that one factor was clearly inadequate
and that addition of the second first-order factor improved the fit noticeably. In fact, as seen in the middle column of Table 1, three first-order
factors were really needed to provide a tight fit to the data for Edition X2.
As seen in Table 1, taking a third first-order factor results in a x2 of 124
(df = 82) and RMSR of .010 for Edition X2.
The fit obtained for the first-order solutions and the fit obtained for the
second-order solutions for Edition X2 can be contrasted using data in the
middle column of Table 1. A restrictive confirmatory second-order solution
that is theory-based fits better than the less restrictive first-order factor
solutions. Indices in the middle column of Table 1 reveal that one secondorder common factor and four second-order group factors fits the X2 item
parcels correlation matrix very well and also indicate that adding a second
second-order common factor is unnecessary. Thus a model that requires
only one second-order common factor to account for correlations between
parcels composed of different item types fits the data very well. Recall that
for V4 the addition of a second second-order common factor improved the
fit slightly but significantly.
The right hand column of Table 1 summarizes the fit results for SATVerbal Edition Y3. As was the case for Edition X2, at least two first-order
factors are needed to fit the Y3 item parcels correlations. As with Edition
X2, the second-order solution with one second-order common factor and
four second-order group factors provides a very good fit to the data.
Adding a second second-order common factor improves the fit very little.
Second-order Structures
For all three SAT-Verbal editions, the hypothesized second-order factor
solutions fit the data well. Table 2 contains a numerical summary of the
single second-order common factor solutions. Here the relative contributions of the single second-order common factor and each the four secondorder group factors to the first-order parcel factors are tabled. In addition,
Table 2 contains the correlations among the four first-order factors. One
aspect of the data presented in Table 2 is immediately obvious. For every
SAT-Verbal edition, the second-order common factor is large relative to
the second-order group factors. This fact can be observed in the first-order
factor correlations, all of which are .80 or higher, and in the variance
contributions portion of the table. For example, for Edition V4, the secondorder common factor accounts for 98% of the sentence completions factor
variance, 85% of the antonyms factor variance, 93% of the analogies factor
variance, and 82% of the reading passage items factor variance.
Looking across test editions (down columns in the table), it can be seen
33
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
First-Order factors
Test edition
V4
X2
Y3
second-order
common factor
second-order
group factors
second-order
common factor
second-order
group factors
second-order
common factor
second-order
group factors
Sentence
completions
I
Antonyms
II
Analogies
III
Reading
passage items
IV
.98
.85
.93
.82
.02a
.15
.07
.18
.97.
.92
.82
.81
.03a
.08
.18
.19
.96
.88
.86
.84
.04a
.12
.14
.16
"Not significantly different from zero (p < .01).
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
First-Order factor correlations
I
I
II
III
IV
I
II
III
IV
I
II
III
IV
1.0
.92
.96
.90
1.0
.94
.89
.89
1.0
.92
.91
.90
II
III
IV
1.0
.89
.84
1.0
.88
1.0
1.0
.87
.86
1.0
.81
1.0
1.0
.87
.86
1.0
.85
1.0
Cook, Doransy and Eignor
TABLE 2
Relative contributions of one second-order common and four second-order group factors to variance of first-order parcel factors for
three SAT-Verbal editions
Dimensionality of SAT-Verbal
that the second-order common factor accounts for almost all of the sentence completions factor variance on all three test editions. In contrast, the
reading passage items factor has the largest second-order group factor on
all three editions. For Edition V4, the second-order common factor is more
closely related to the analogies factor than the antonyms factor; for Edition
X2, the opposite is true. For Edition Y3, the second-order common factor
is only slightly more related to the antonyms factor than it is to the analogies factor.
Table 1 included descriptions of the fit of a second-order solution that
allowed for a second second-order common factor. Table 3 summarizes
these solutions. It can be seen, from the information summarized in Table
3, that for Editions X2 and Y3, inclusion of a second second-order common
factor adds nothing to the solution. This fact can be observed in the minuscule contributions of this second second-order common factor (.00 or .01)
to first-order factor variance. Note also that for Editions X2 and Y3, the
correlations among first-order factors remained virtually unchanged when
the second second-order common factor was added (compare correlations
in Tables 2 and 3).
In contrast, addition of a second second-order common factor has an
impact on the solution for Edition V4. Note that the antonym second-order
group factor is reduced substantially, while the reading passage secondorder group factor is reduced somewhat. This second second-order common factor makes a non-trivial contribution to the variance of the antonym
and reading passage second-order group factors. As the note to Table 3
indicates, this second second-order common factor has positive weights for
the vocabulary item types, antonyms, and analogies, and negative loadings
for the reading item types, sentence completions, and reading passage
items. Consequently, inclusion of the second second-order common factor
increases the correlations between the vocabulary item type factors, and
decreases their correlations with the reading item type factors.
Dropping Reading Passage Items Parcels
The results contained in Tables 1-3 suggest two conclusions. First, SATVerbal is not strictly unidimensional and most of the lack of unidimensionality can be attributed to the reading passage items. Second, the
content structure for Edition V4 differs from that for Editions X2 and Y3.
Edition V4 needs a second second-order common factor to explain the
correlations among the item parcels, a factor that Editions X2 and Y3 do
not require.
To evaluate the supposition that the reading passage items are the major
reason for lack of unidimensionality, factor analyses were conducted on
reduced item parcels correlation matrices obtained by excluding the five
reading passage items parcels from the matrices. These analyses for the
reduced matrices parallel those conducted for the full item parcels correlation matrices.
35
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
First-Order factors
Test edition
V4
X2
Y3
second-order
common factor
second-order
common factor
second-order
group factors
second-order
common factor
second-order
common factor
second-order
group-factors
second-order
common factor
second-order
common factor
second-order
group factors
Sentence
completions
I
Aantonyms
II
Analogies
III
Reading
passage items
IV
.96
.91
.92
.84
.00
.06
.00
.06
.04
.03
.08
.10
.97
.92
.82
.81
.01
.00
.01
.01
.02
.08
.17
.18
.96
.89
.86
.84
.01
.01
.01
.01
.03
.10
.13
.15
1
2a
1
2a
1
2a
%
First-Order factor correlations
I
II
III
I
II
III
IV
1.0
.92
.94
.91
1.0
.92
.81
1.0
.87
1.0
I
II
III
IV
1.0
.94
.89
.89
1.0
.87
.86
1.0
.81
1.0
I
II
III
IV
1.0
.92
.91
.90
1.0
.87
.86
1.0
.85
1.0
IV
¥or all three test editions, first-order loadings on second-order common factor 2 were positive for analogies and antonyms and
negative for sentence completion and reading passage item parcels. With the exception of antonyms and reading passage items on
Edition V4, these loadings on the second second-order common factor were trivial.
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
TABLE 3
Relative contributions of two second-order common and four second-order group factors to variance of first-order parcel factors for
three SAT-Verbal editions
Dimensionality of SAT-Verbal
The data presented in Figure 3 and Table 4 for Edition V4 parallel that
presented in Figure 1 and Table 1. Dropping the reading passage items does
not result in a drop in the number of first-order factors needed to fit the
data. The single factor first-order solutions, however, are somewhat better
here than they were when the reading passage items parcels were included.
Hence, the reading passage items parcels, while a major contributor, are
not the sole reason for lack of unidimensionality. Table 5 provides more
evidence on this point. From the information presented in this table, it can
be seen that the analogies second-order group factors are sizeable for
Edition X2 and Y3. Recent research by Lawrence and Dorans (1987) found
that a speededness factor contributed to departure from unidimensionality
FIGURE 3. Normalized residuals plots and indices offit for SAT-Verbal edition V4
(excluding reading passage items parcels)
One Second-Order Common Factor
and Three Second-Order Group
Factors Solution
Two Factor First-Order Solution
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cook, Dorans, and Eignor
TABLE 4
Summary of fit for three SAT-Verbal editions (excluding reading passage items)
Edition
V4
Model
a
One Factor
First-Order
Two Factor
First-Order
One
Second-Order
Common
Factor
and Three
Second-Order
Group Factors
Y3
X2
RMSR = .019
(X2,df) = (149,54)
RMSR = .013
(X\df) = (81,41)
RMSR = .022
RMSR = .024
(X\df) = (306,54) (X2,<//) = (246,54)
RMSR = .012
RMSR = .014
(X\df) = (84,41) (X\df) = (110,41)
RMSR = .014
(x2, df) = (83,51)
RMSR = .011
(X2, df) = (71,51)
RMSR = .013
(X2, df) = (85,51)
"Root mean square residual.
for items at the end of the SAT-Mathematical sections. For Editions X2 and
Y3, half of the 20 analogy items in each edition appears at the end of one
of the two 30-minute verbal sections. Perhaps, speededness is a contributing factor to sizeable second-order analogy group factors obtained for
Editions X2 and Y3.
One also can see from Table 5 that the structure for Edition V4 still gives
evidence of being different from that of X2 and Y3. In fact, V4 appears to
be the most unidimensional of the three test editions when reading passages
are excluded from the analysis. This finding is consistent with the speededness hypothesis noted earlier since on Edition V2 the last items in each
verbal section were reading passage items. The structures for X2 and Y3,
on the other hand, appear quite parallel. Thus, removing the reading
passage items parcels results in data (the remaining item types) that are
more unidimensional and clarifies the structural differences between Edition V4 and Editions X2 and Y3, structural differences that may be related
to test speededness.
To summarize, the results of the factor analyses indicate that the SATVerbal edition can be considered to be slightly multidimensional, and to
exhibit some departures from edition-to-edition parallelism. Edition V4
appears to be more unidimensional than the other two editions when reading passage items are excluded, and, as was hypothesized, less parallel to
Editions X2 and Y3 than the latter two editions are to each other. Removing the item type for which the second-order group factor contributed
the most to parcel variance (reading passage items), although providing
data of a more unidimensional nature, did not result in what could be
considered a truly unidimensional set of items for any of the test editions.
38
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
TABLES
Relative contributions of one second-order common and three second-order group factors to variance of first-order parcel factors
for three SAT-Verbal editions (excluding reading passage items)
First-Order factors
Test edition
a
Antonyms
II
Analogies
III
.93
.90
.94
.07
.10
.06
.96
.92
.82
.04a
.08
.18
.93
.90
.87
.07
.10
.13
Not significantly different from zero (p < .01).
v©
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
First-Order factor correlations
I
II
III
I
II
III
1.0
.92
.93
1.0
.92
1.0
I
II
III
1.0
.94
.89
1.0
.87
1.0
I
II
III
1.0
.92
.90
1.0
.88
1.0
Dimensionality of SAT-Verbal
V4 second-order
common factor
second-order
group factors
X2 second-order
common factor
second-order
group factors
Y3 second-order
common factor
second-order
group factors
Sentence
completions
I
Cook, Dorans, and Eignor
Discussion
This research was conducted in an attempt to develop a better understanding of the dimensionality of the SAT-Verbal section. Previous examination of underlying factor structure has been hampered by the difficulties
associated with assessing dimensionality when using binary item data. In an
attempt to circumvent some of these difficulties, item parcels were constructed in this study. Construction of these parcels was guided by content
and item type considerations, and by a desire to produce correlations that
could be fit by linear factor models. The resultant correlation matrices were
subjected to a series of confirmatory factor analyses employing the
LISREL V model. The dimensionality analyses clearly verified that SATVerbal Edition V4 was less parallel to the other two editions, Editions X2
and Y3, than these other two editions were to each other. The interaction
of test speededness with format of the test may account for some of those
structural differences.
The methodology used in this study should be refined in regard to the
manner in which item parcels are formed. The parcels or item subsets used
in the study were formed using item types as defined by content specifications. Cattell and Burdsal (1975) suggest, however, that the results of a
factor analysis be used to define the item dimensions for forming subsets.
At the time that the data analysis activities were performed for this study,
the computer program TESTFACT (Wilson, Wood, & Gibbons, 1984),
which provides a marginal maximum likelihood full information factor
analysis of item level data, had not been written. A combination of methodologies, with full information factor analysis used to define the subsets
and then the methodology described in this report used to assess dimensionality, would be a natural extension of the current procedure. A combination of methodologies, rather than reliance on only full information
factor analysis, also makes sense given the costs involved in running TESTFACT and the additional fact that it is not a good procedure to use for long
tests with a fairly large number of hypothesized factors (Mislevy, 1986).
Given the strict adherence to item type composition observed for the
Scholastic Aptitude Test, the verbal and mathematical sections of this test
seem quite amenable to continued dimensionality analyses. These analyses
should uncover more general (and perhaps contrasting) trends in dimensionality and edition-to-edition parallelism. These results could then be
related to the quality of IRT true-score equating, currently in use with the
SAT, and some statements could be made concerning the robustness of
IRT equating to violations of the unidimensionality assumption. Eventually, this approach might yield diagnostics that could be used to arrive at
more informed psychometric decisions about test specifications, and about
the equating and scoring of the SAT.
40
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
References
Bejar, 1.1. (1980). A procedure for investigating the unidimensionality of achievement tests based on item parameter estimates. Journal of Educational Measurement, 17, 283-296.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of
item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179-197.
Carroll, J. B. (1945). The effect of difficulty and chance success on correlations
between items or between tests. Psychometrika, 10, 1-19.
Carroll, J. B. (1961). The nature of the data, or how to choose a correlation
coefficient. Psychometrika, 26, 347-372.
Carroll, J. B. (1983). The difficulty of a test and its factor composition revisited. In
S. Messick & H. Wainer (Eds.), Principals of modern psychological measurement: A festschrift for Frederic M. Lord. Hillsdale, NJ: Erlbaum.
Cattell, R. B. (1956). Validation and intensification of the Sixteen Personality
Factor Questionnaire. Journal of Clinical Psychology, 12, 205-214.
Cattell, R. B. (1974). Radial parcel factoring versus item factoring in defining
personality structure in questionnaires: Theory and experimental checks. Australian Journal of Psychology, 26, 103-119.
Cattell, R. B., & Burdsal, C. A. (1975). The radial parcel double factoring design:
A solution to the item-vs-parcel controversy. Multivariate Behavioral Research,
10, 165-179.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.
Divgi, D. R. (1981, April). Potential pitfalls in applications of item response theory.
Paper presented at the annual meeting of the National Council on Measurement
in Education, Los Angeles.
Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items?
Not if you look closely. Journal of Educational Measurement, 23, 283-298.
Drasgow, F., & Dorans, N. J. (1982). Robustness of estimates of the squared
multiple correlation and squared cross-validity coefficient to violations of multivariate normality. Applied Psychological Measurement, 6, 185-200.
Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response
theory models to multidimensional data. Applied Psychological Measurement, 7,
189-199.
Etezadi-Amoli, J., & McDonald, R. P. (1983). A second-generation nonlinear
factor analysis. Psychometrika, 48, 315-342.
Fischer, G. H. (1978). Probabilistic test models and their applications. German
Journal of Psychology, 2, 298-319.
Gustafsson, J-E. (1980). Testing and obtaining fit of data to the Rasch model.
British Journal of Mathematical and Statistical Psychology, 33, 205-233.
Hambleton, R. K. (1983). Applications of item response theory. Vancouver, BC:
Educational Research Institute of British Columbia.
Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set
of test items. Applied Psychological Measurement, 10, 287-302.
Hattie, J. A. (1981). Decision criteria for determining unidimensionality. Unpublished doctoral dissertation, University of Toronto.
41
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Cooky Dorans, and Eignor
Hattie, J. A. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49-78.
Hattie, J. A. (1985). Methodology review: Assessing unidimensionality of test and
items. Applied Psychological Measurement, 9, 139-164.
Hecht, L. W., & Swineford, F. (1981). Item analysis at Educational Testing Service.
Princeton, NJ: Educational Testing Service.
Hulin, C. L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.
Joreskog, K. G., & Sorbom, D. (1981). LISREL V-Analysis of linear structural
relationships by the method of maximum likelihood. Chicago, IL: International
Educational Services.
Lawrence, I. M., & Dorans, N.J. (1987, April). An assessment of the dimensionality
of SAT-Mathematical. Paper presented at the annual meeting of the National
Council on Measurement in Education, Washington, DC.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores.
Reading, MA: Addison Wesley.
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of
Mathematical and Statistical Psychology, 34, 100-117.
McDonald, R. P. (1982). Linear versus nonlinear models in item response theory.
Applied Psychological Measurement, 6, 379-396.
McDonald, R. P., & Ahlawat, K. S. (1974). Difficulty factors in binary data. British
Journal of Mathematical and Statistical Psychology, 27, 82-99.
Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical
variables. Journal of Educational Statistics, 11, 3-31.
Mislevy, R. J., & Bock, R. D. (1983). BILOG: Item analysis and test scoring for
binary logistic models. Chicago, IL: International Educational Services.
Muthen, B. (1978). Contributions to factor analysis of dichotomous variables.
Psychometrika, 43, 551-560.
Muthen, B. (1984). A general structural equation model with dichotomous, ordered
categorical, and continuous latent variable indicators. Psychometrika, 49,
115-132.
Rosenbaum, P. R. (1984). Testing the conditional independence and monotonicity
assumptions of item response theory. Psychometrika, 49, 425-435.
Schmid, J., & Leiman, J. (1957). The development of hierarchical factor solutions.
Psychometrika, 22, 53-61.
Swinton, S. S., & Powers, D. E. (1980). A factor analytic study of the restructured
GRE Aptitude Test (GRE Board Professional Report GREB No. 77-6P). Princeton, NJ: Educational Testing Service.
Tucker, L. R. (1983). Searching for structure in binary data. In S. Messick & H.
Wainer (Eds.), Principals of modern psychological measurement: A festschrift for
Frederic M. Lord. Hillsdale, NJ: Erlbaum.
Van den Wollenberg, A. L. (1982a). Two new test statistics for the Rasch model.
Psychometrika, 47, 123-140.
Van den Wollenberg, A. L. (1982b). A simple and effective method to test the
dimensionality axiom of the Rasch model. Applied Psychological Measurement,
6, 83-91.
42
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
Dimensionality of SAT-Verbal
Wilson, D., Wood, R. L., & Gibbons, R. (1984). TESTFACT: Test scoring and item
factor analysis. Chicago, IL: Scientific Software.
Wingersky, M. S. (1983). LOGIST: A program for computing maximum likelihood
procedures for logistic test models. In R. K. Hambleton (Ed.), Applications of
item response theory. Vancouver, B.C.: Educational Research Institute of British
Columbia.
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982). LOGIST V user's guide.
Princeton, NJ: Educational Testing Service.
Authors
LINDA L. COOK, Principal Measurement Specialist, Educational Testing Service,
Princeton, NJ 08541. Specializations: item response theory, educational measurement.
NEIL J. DORANS, Senior Measurement Statistician, Educational Testing Service,
Princeton, NJ 08541. Specializations: quantitative psychology, educational measurement.
DANIEL R. EIGNOR, Principal Measurement Specialist, Educational Testing
Service, Princeton, NJ 08541. Specializations: item response theory, educational
measurement.
43
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016