Journal of Educational Statistics Spring 1988, Vol. 13, No. 1, pp. 19-43 An Assessment of the Dimensionality of Three SAT-Verbal Test Editions Linda L. Cook Neil J. Dorans Daniel R. Eignor Educational Testing Service Key words: factor analysis, binary data, item parcelling A strong assumption made by most commonly used item response theory (IRT) models is that the data are unidimensional, that is, statistical dependence among item scores can be explained by a single ability dimension. First-order and second-order factor analyses were conducted on correlation matrices among item parcels of SAT-Verbal items. The item parcels were constructed to yield correlation matrices that were amenable to linear factor analyses. The first-order analyses were employed to assess the effective dimensionality of the item parcel data. Second-order analyses were employed to test meaningful hypotheses about the structure of the data. Parcels were constructed for three SAT-Verbal editions. The dimensionality analyses revealed that one SAT-Verbal test edition was less parallel to the other two editions than these other editions were to each other. Refinements in the dimensionality methodology and a more systematic dimensionality assessment are logical extensions of the present research. In recent years there has been considerable research and interest devoted to the use of item response theory (IRT) in the solutions to a variety of measurement problems (see Hambleton, 1983; Lord, 1980). Because of the special properties of test data characterized by IRT models, users are often able to solve problems not amenable to solution through the use of traditional psychometric methods. However, in order for IRT to be useful in the solution of measurement problems, certain fairly strong assumptions about This study was supported by Educational Testing Service through Program Research Planning Council funding. The opinions expressed herein are those of the authors, however, and should not be interpreted as policy statements by either ETS or the College Board. Support for the payment of voluntary page charges was provided by the College Board Programs Division of ETS. The authors' names appear in alphabetical order. 19 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor the data must be met. One of the most important of these assumptions is the assumption of unidimensionality. Most IRT models that are currently used with binary scored item response data assume that the probability of a correct response to an item can be modeled by a mathematical function that assumes a single ability dimension common to all items. For reasons to be developed later in this section of the paper, researchers working on IRT applications that involve binary scored item response data typically have assumed, without empirical verification, that the items which appear to test a skill or content area are unidimensional (Divgi, 1981). This assumption is almost surely inappropriate for many types of test data (Drasgow & Parsons, 1983). This lack of empirical verification of the unidimensionality assumption has generally been caused by the difficulties involved in the assessment of the dimensionality of binary scored item response data. A variety of methods have been advanced over the past half century for assessing the unidimensionality assumption for binary scored item response data. Hattie (1981, 1985) has provided a comprehensive review of these methods. Two categories of procedures currently seem to be applied to the problem; they may be described in very general terms as procedures that make use of information on item parameter estimates or residuals from fitting a particular IRT model and procedures that make use of factor analytic procedures, or indices based on a factor analysis, applied before a specific IRT model is fit. (See Rosenbaum, 1984, however, for an approach that falls into neither category.) The IRT-only procedures focus on testing whether the assumption of unidimensionality holds, whereas the factor analytic procedures allow for multiple dimensions. IRT-Only Approaches If the one-parameter logistic model and conditional maximum likelihood estimation techniques are used, a number of statistical tests of the unidimensionality assumption follow directly from the estimation of item parameters over different groups of people or subsets of items (see Gustafsson, 1980; van den Wollenberg, 1982a, 1982b). If the oneparameter or two-parameter normal ogive model are used with marginal maximum likelihood estimation procedures (Bock & Lieberman, 1970), a residuals-based test of the unidimensionality assumption exists. McDonald (1982), while presenting IRT models that utilize marginal maximum likelihood estimation procedures as special cases of the random regressors factor analytic model, has suggested that the set of residual item covariances after fitting a unidimensional model be studied for indications of departures from unidimensionality. Hattie (1981, 1984), in a large scale study of indices of unidimensionality, studied McDonald's suggested procedure along with a number of other proposed measures, and found that McDonald's suggestion provided the best results. Because the one-parameter or Rasch model is in many cases not applicable to the analysis of binary scored multiple choice item response data 20 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal (Divgi, 1986; Fischer, 1978), and because researchers often object to the assumption of normally distributed abilities, usually utilized in the random regressors factor analytic model (McDonald, 1982; but see Bock & Aitkin, 1981, for a recently developed approach that does not depend on the assumption of normally distributed abilities), many researchers work instead with the three-parameter logistic model and unconditional maximum likelihood estimation procedures, as used, for instance, in the computer program LOGIST (Wingersky, 1983; Wingersky, Barton, & Lord, 1982). For this model and estimation procedure, direct statistical and data-based tests of the unidimensionality assumption, which follow from the parameter estimation process, do not presently exist, although data-based tests could easily be developed (McDonald, 1982). Be jar (1980) has developed a procedure, comparable to one of the procedures suggested for the Rasch model by Gustafsson (1980), that is applicable with the more complex three-parameter logistic model in the unconditional maximum likelihood estimation context. The procedure requires both a priori knowledge about the test items, so that a subset of the total set of items can be formed that is clearly unidimensional, and the subsequent availability of item parameter estimates for the total set of items and the subset deemed unidimensional. Because this kind of information (in particular, item parameter estimates for the subset in the Be jar procedure) is usually unavailable at the time researchers wish to assess the unidimensionality assumption, many of them have chosen instead, when working with binary scored item response data, to forego tests based on parameter estimates or residuals and instead to use factor analytic procedures with the individual item data to assess dimensionality, usually working with phi, or when possible, tetrachoric correlation coefficients. In addition, Hambleton and Rovinelli (1986) have provided some examples of situations in which Be jar's procedure was unable to detect multidimensionality in the data. Factor Model Approaches A complete review of the literature documenting the theoretical and practical problems involved in the linear factor analysis of phi and tetrachoric correlation coefficients is beyond the scope of this paper. What follows is a brief review of the major issues. Carroll (1945, 1961, 1983) has documented the problems inherent in the factor analysis of phi coefficients; these correlations depend not only on the strength of relationship between the variables being correlated, but upon the means of the variables as well. Further, factor analysis of phi coefficients produced by the same underlying structure, but dichotomized at different points, can conform to factor models with different structures and possibly different numbers of factors (Mislevy, 1986). The existence of additional artifactual or "difficulty" factors when phi coefficients are factor analyzed via a linear model has been a well-discussed phenomenon in the literature. McDonald and Ahlawat (1974) provided a review of previous work on the issue "difficulty" factors 21 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor and offered an alternative explanation for the existence of these factors. According to them, artifactual factors are not the result of the characteristics of the particular items being analyzed, but rather, are the result of the fact that a nonlinear model is needed to characterize the regression of item score on ability instead of the linear model that is implied in linear factor analysis. Factor analysis of tetrachoric correlation coefficients theoretically should circumvent the problem of "difficulty" factors. However, other problems can occur when tetrachorics are factor analyzed. Carroll (1945,1961,1983) has documented the problems involved in the factor analysis of tetrachoric correlation coefficients based on binary scored multiple-choice items where guessing is possible. In this context, failure to take guessing effects into account will again produce artifactual factors and misleading information as to the number of factors needed to account for the data. Carroll (1945) has given formulas to correct for the effects of guessing on tetrachoric correlation coefficients and, hence, eliminate the existence of artifactual "guessing" factors. Mislevy (1986), in discussing these formulas further, points out that it is often the case that after Carroll's (1945) correction is applied, adjustments to elements of the matrix may still need to be made (see also Hulin, Drasgow, & Parsons, 1983) and even when this is done, the sample tetrachoric correlation matrix may not necessarily be positive definite (see Lord & Novick, 1968, p. 349), thereby ruling out the use of many of the more desirable factor analytic estimation procedures. Given the practical problems involved in the linear factor analysis of binary scored item response data using tetrachorics (i.e., the matrices are often nonpositive definite) and the assumptions that must be met in order for the procedure to be viable (no or correctable guessing and normally distributed traits), three other approaches that provide viable options to the problem of assessing dimensionality at the item level have been developed. These procedures either avoid the use of tetrachorics with the linear factor analytic model or avoid the use of the linear model altogether. A variation of the third approach to be discussed was used in the research described in this paper. Nonlinear factor models. Hambleton and Rovinelli (1986) have provided one possible approach to the problem, which is based on McDonald's (1981) suggested procedure for studying dimensionality with the random regressors factor analytic model. This involves looking at the residual covariances between items after fitting successively more complex nonlinear factor models. A special nonlinear polynomial factor analytic model developed by McDonald (1982; Etezadi-Amoli & McDonald, 1983) was used by Hambleton and Rovinelli in their study. No assumptions about the distribution of latent traits had to be made with this procedure. Phi coefficients were used in the analyses. IRT-based factor analyses. The second procedure or set of procedures involves the use of recent advancements with the random regressors factor 22 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal analytic model that circumvent the problems involved with factor analyzing sample tetrachoric matrices. Tucker (1983) refers to these procedures as item characteristic function approaches. They provide a blending of factor analytic and item response theory techniques. That is, factor analysis no longer needs to be thought of as an auxiliary procedure to be applied as a check on dimensionality before a unidimensional IRT model can be utilized. When using these factor analytic procedures, if a unidimensional model is indicated, the item parameters specifying the relationship between item score and the underlying trait will be estimated. These procedures involve a generalized least squares approach attributable to Christoffersson (1975) and marginal maximum likelihood full information factor analysis based on the work of Bock and Aitkin (1981). Mislevy (1986) has provided an excellent review of these approaches, along with the closely related procedure attributable to Muthen (1978, 1984). These approaches are based on the usual assumption that the underlying traits are multinormally distributed, although Mislevy suggests that the Bock and Aitkin approach for multiple traits, currently operationalized in the program TESTFACT (Wilson, Wood, and Gibbons, 1984), can be generalized so that empirically derived distributions, rather than the normal, can be used to characterize the traits. This has been done already for the unidimensional case in the program BILOG (Mislevy & Bock, 1983). Also, Tucker has been working on an alternate procedure that circumvents multinormal distributional assumptions on the latent traits. Linearization. The final procedure is based on the work of Cattell (1956, 1974) and involves the linear factor analysis of item parcel data, or minitests, made up of small collections of nonoverlapping items thought to measure the same underlying dimension or dimensions. Again, the assumption of multinormality of the underlying traits is made. Data on individual items are no longer used directly in deriving the correlation matrix. Some practical justification for aggregating the data into mini-tests and using linear factor analysis appears in the summary section of McDonald's 1981 article: (1) In principle, a set of n tests or n binary items is unidimensional if and only if the set fits a (generally nonlinear) common factor model with just one common factor. (2) In checking the unidimensionality of a set of tests, a simple, appropriate, ancillary assumption is that the regressions of the tests on the factors are linear, (p. 113) If item parcel data is to be used in a linear factor analytic study, of serious concern is the method chosen for defining the subsets from the total set of items and then placing items into parcels within a subset. Cattell and Burdsal (1975) recommend doing two factor analyses, one on the items to define the item dimensions for forming subsets within which the parcels will be formed and then one on the parcels to assess dimensionality. Because the first factor analysis suggested involves all the problems inherent in the 23 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor factor analysis of item data, unless one of the more recently developed random regressors factor analytic approaches can reasonably be applied, it would appear that a non-factor analytic procedure for the formation of item subsets, such as using item types as defined by well-developed test specifications, would provide a suitable approach. Another concern when using item parcel data in factor analytic studies is the possibility of propagation of artifactual factors at the item parcel level (see Swinton & Powers, 1980). The use of item parcel data instead of individual item data tends to "linearize" the basic nonlinear relationship between score and underlying trait that exists at the item level, thereby providing some justification for the use of linear factor analysis of a matrix of product moment correlations without concern for artifactual factors due to nonlinearity (McDonald & Ahlawat, 1974). If, however, the parcels are of widely differing difficulty, artifactual factors may still possibly result. Such factors will inhibit a reasonable assessment of the dimensionality of the data. Purpose The purpose of this study was to look at the dimensionality of the Scholastic Aptitude Test Verbal (SAT-Verbal) section. An item parcelling approach was used in the study, in conjunction with contemporary factor analytic techniques. More specifically, three SAT-Verbal editions were chosen for dimensionality assessment. It was hypothesized, because of differing numbers of items and underlying content specifications, that one of the editions chosen might have an underlying factor structure that differed from the other two. Linear factor analysis of item parcel data was used for dimensionality assessment. For each test edition, a series of confirmatory factor analyses using the LISREL V computer program (Joreskog & Sorbom, 1981) were performed to assess dimensionality. The results of the factor analyses were then related to the manifest differences between the editions under study. Methodology The verbal section of the Scholastic Aptitude Test (SAT) was selected for this dimensionality study. This test is of a multiple choice variety and has been described as measuring developed verbal reasoning abilities that are related to successful performance in college. It is intended to supplement the secondary school record and other information about the student in assessing readiness for college-level work. Test specifications for the SAT-Verbal section have not remained constant over the years. Test booklets containing SAT editions administered prior to the fall of 1974 consisted of two 45-minute sections (one SATVerbal and one SAT-Mathematical) and three 30-minute sections (one SAT-Verbal, one SAT-Mathematical, and one experimental containing an anchor test or pretest). The two SAT-Verbal sections contained a total of 90 five-choice items composed of 53 reading comprehension items (18 24 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal sentence completions and 7 reading passages each of which is followed by 5 items based on the passage) and 37 vocabulary items (18 antonym items and 19 analogy items). One of the SAT-Verbal editions used in this study was developed to these pre-1974 specifications. Test booklets containing SAT editions administered since the fall of 1974 (which includes the other two SAT-Verbal editions used in this study), consist of six 30-minute sections: two SAT-Verbal sections, two SAT-Mathematical sections, one Test of Standard Written English, and one experimental section. The two SATVerbal sections contain a total of 85 five-choice items composed of 40 reading comprehension items (15 sentence completions and five reading passages each of which is followed by 5 items based on the passage) and 45 vocabulary items (25 antonym items and 20 analogy items). Regardless of specifications, raw scores on SAT-Verbal are obtained scores that have been corrected for guessing. Raw scores are computed by the formula R - Wlk, where R is the number of correct responses, W is the number of incorrect responses, and (k + 1) equals the number of answer choices per item. Choice of Test Editions for Analysis Three SAT-Verbal test editions were chosen for the factor analyses performed for this study. An attempt was made to locate two test editions that could be considered somewhat nonparallel and then to select a third edition that could be considered reasonably parallel to one of the editions previously chosen. Edition V4 was chosen to be the nonparallel edition; it contained five more items than the other two SAT-Verbal editions (90 in total) and was built to different content specifications. The other two editions that were chosen have edition designations X2 and Y3. Both of those editions contained the same number of items (85 in total), were built to the same content specifications, and were fairly similar both in reliability and overall difficulty level. Formation of Item Parcels The parcelling approach used in this paper attempts to circumvent the problems associated with factoring item data by factoring item parcel scores, that is, sums of scores on a small subset of items, which are more amenable to analysis by a linear factor model than item data. Parcel construction principles. As noted earlier, it is well documented (e.g., Carroll, 1945, 1983) that a linear factor analysis of a matrix of phi coefficients from binary item data with an underlying unidimensional structure will be viewed as multidimensional, with a second dimension clearly related to item difficulty. As McDonald and Ahlawat (1974) argued, part of the problem is that a linear regression model is inappropriate for the item/factor regression, which has to be nonlinear given the bounded nature of dichtomous data and the unbounded metric assumed for the underlying factor. For the parcel approach to avoid the problems of factoring phi coeffi25 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cooky DoranSy and Eignor cients, the parcels must be constructed in a fashion that is sensitive to these problems. The major reason for constructing parcel scores is to achieve a matrix of correlations or covariances that is not affected by item difficulty and the nonlinearity of the item/factor regression. Parcel construction should attempt to "linearize" the data by attempting to remove the effects of nonlinearity and differences in item difficulty. To mitigate the effects of differences in item difficulty and nonlinearity, parcel scores should have approximately equal means and variances. In the terminology of classical test theory, the parcels should be constructed to be parallel to each other. To achieve parallel parcels, it is essential to place approximately equal numbers of easy, middle difficulty, and hard items within each parcel such that each parallel parcel is composed of several nonparallel items. A critical question that needs to be addressed is how many items are needed for a parcel. Experience (Drasgow & Dorans, 1982) indicates that a minimum of at least three is needed and that six or seven are clearly enough provided that the items within a parcel are adequately spaced to achieve a situation in which the probabilities associated with the parcel score distribution is approximately normal. A statistical justification for the parallel parcelling approach might be drawn from the work of Drasgow and Dorans (1982) where they introduce the notion of a categorization attenuation factor that reduces the correlation between two continuous variables when one variable is polychotomized. Parallel parcelling can be viewed as a heuristic approach to converting dichotomous data into polychotomous data with an eye toward minimizing the size of the categorization attenuation factor. Ideally, an algorithm should be worked out for parallel parcelling. Such an algorithm would be designed to achieve a small constant categorization attenuation factor across parcels. In this paper, an iterative process that requires expert judgment and which focuses only on the first two moments of the parcel score distribution was used. If the parcel approach is to be used extensively, an algorithm would appear essential. Construction of specific item parcels. Items from each SAT-Verbal edition were separated into item subsets on a within-edition basis using the four item types contained in the test: sentence completion items, antonym items, analogy items, and items based on reading passages. Within each of the four item subsets, items were placed into parcels of four to seven items such that the mean difficulty level and the standard deviation of the difficulties of the parcels were approximately the same. The building of parcels of comparable difficulty was accomplished by assigning items to parcels based upon their equated delta difficulty indices. (See Hecht & Swineford, 1981, for an explanation of delta difficulty indices and the process of delta equating.) Within each of the four subsets of items, the same number of parcels were formed across each of the three editions. Figure 1 contains the number of items within each of the four item subsets of SAT-Verbal for each of the three editions and the number of parcels within each of the subsets. 26 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal Scores for examinees on the item parcels were formed, and then correlations were computed between parcels both within and across subtests for each edition. The correlations among the parcels were used as input to the LISREL V program. LISREL V: First-order and Second-order Models The LISREL V computer program (Joreskog & Sorbom, 1981) fits and tests models for linear structural relationships among quantitative variables. As mentioned earlier, the primary reason for developing item parcels was to yield variance-covariance matrices that were amenable to a linear factor analysis. Both first-order factor analysis and second-order factor FIGURE 1. Factor loading matrices and parcel descriptions for SA T- Verbal editions Parcels p x o o o~] X X 0 0 0 0 0 0 0 0 0 0 0 0 0 [_0 A" Parcels 1-3 i 0 0 0 0 0 0 X00 X00 X00 X00 X00 0X0 0X0 0X0 0X0 0 0 X 0 0 X 0 0 X 0 0 X 0 0 XJ Items Type Sentence 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of Items Edition V4 Edition X2 and Y3 18 15 Completions 4-8 Antonyms 18 25 9-12 Analogies 19 20 Reading Passage items 35 25 Totals 90 85 13-17 27 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cooky Dorans, and Eignor analysis are special cases of the LISREL V model. First-order factor analyses were employed in this study to assess the "effective" dimensionality of the item parcels, that is, the number of factors needed to adequately describe the covariation among item parcels. Second-order factor analyses were employed to test meaningful hypotheses about the structure of the data. Common factor model. The traditional first-order common factor model is % = Ax + Du, (1) where y is an n-by-1 vector of observable scores on the n item parcels; x is a A>by-1 vector of non-observable scores on the k common factors that account for covariation among the n parcels; A is anrc-by-fcmatrix of common factor loadings describing the regressions of the n parcel scores on the k factor scores; u is an «-by-l vector of unobservable unique scores, which could be further decomposed into measurement error and scores on specific factors; and D is anrc-by-rcdiagonal matrix of uniqueness loadings. Therc-by-rccovariance matrix among the item parcels can be expressed as Cyy = ACxxA' + D\ (2) 2 where C^ is the A:-by-A: matrix of factor covariances, and D is an rc-by-rc diagonal matrix of unique variances. One goal of a factor analysis is to identify the number of common factors needed to fit the off-diagonal elements of Cyy. This is known as the numberof-factors problem. LISREL V was used to assess the number-of-factors problem in the following fashion. For each test edition studied, the fit of a single-common-factor model to the correlation matrix among item parcels (correlation matrices were used to simplify proportion of variance interpretations and reduce the impact of variable length parcels on the multifactor solutions), was examined. Next, the fit of a very general two-commonfactor model to the same data was examined. The two-common-factor models were essentially unconstrained in that no restrictions were imposed on the factor loading matrix A. Consequently, the two-factor solutions were not readily interpretable in a substantive fashion. They did, however, permit assessment of the number-of-factors question. To achieve interpretable results, a second-order factor model was used in a more traditional confirmatory application of the LISREL approach. A second-order factor analysis can be thought of as a factor analysis of the first-order factors. (See Schmid & Leiman, 1957, for a discussion of one approach to hierarchical or second-order factor analysis.) It is a particularly fruitful approach to employ when one suspects that correlations among the first-order factors can be explained by a single second-order common factor. Such a model is particularly applicable to item data that one suspects is essentially unidimensional. Drasgow and Parsons (1983) suggested a 28 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal second-order factor model that influenced the choice of the model and approach used in this study. Second-order model. The second-order factor model fitted to the first order common factors, x, is x = bz+Fv, (3) where z represents a score on the second-order general factor; b is the A>by-1 vector of loadings of the k first-order factors on z; F is a k-by-k diagonal matrix of loadings of the k first-order factors on their corresponding group factors; and v is a A>by-1 vector containing the k group factor scores. This second-order factor model decomposes each first-order factor into a second-order common factor that influences all first-order factors, and a second-order group factor which influences performance only on that firstorder factor. If the contribution of the second-order common factor to every first-order factor is large, the correlations among the first-order factors will be close to unity. If the second-order group factor for a particular first-order factor is relatively large, then the correlations of that first-order factor with other first-order factors will be among the lowest in the firstorder factors correlation matrix. As with the first-order factor analyses, the fit of the second-order factor models to the data was assessed. More importantly, substantive interpretations were attached to the second-order solutions. The substantive interpretations followed from the nature of the item parcels. For the three SAT-Verbal test editions, 17 parcels were constructed: three sentence completions parcels; five antonyms parcels; four analogies parcels; and five parcels for items based on reading passages. The firstorder factor loading matrix is highly restricted with simple structure corresponding to item type. In other words, the three sentence completions parcels load on a sentence completions factor only, the five antonyms parcels load on the antonyms factor only, and so forth (see Figure 1 for a more detailed summary of the parcels and simple structure). Thus, the second-order factor model contains a second-order common verbal factor and four independent second-order group factors corresponding to each of the four verbal item types. To the extent that the first-order factor variance explained by the second-order common factor is large, the data are unidimensional. On the other hand, a sizeable second-order group factor on a particular item type, say reading passage items, would indicate that this item type is making the largest contribution to violations of unidimensionality. To summarize, both first-order factor analyses and second-order factor analyses were employed. The first-order analyses focused on the number of factors or "effective" dimensionality issue. The second-order analyses were more confirmatory and focused on assessing hypothesized structures suggested by the item types and content areas measured by the tests. Fit of the 29 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cooky Dorans, and Eignor model to the data was the dominant concern in the first-order analyses. Decomposition of first-order factor variance into second-order common and group specific components was the main concern of the second-order analyses. LISREL V's Indices of Fit LISREL V provides several indices of fit that are described by Joreskog and Sorbom (1981). When LISREL V provides maximum likelihood estimates of free parameters, it also provides the likelihood ratio \ 2 statistic with associated degrees of freedom and probability level. Ideally, this index should be helpful in assessing competing models for the data because, under certain conditions, the difference in x2 values is itself chi square distributed with degrees of freedom equal to the difference in degrees of freedom associated with the two competing models. However, it is important to keep in mind that this difference in x2 values is asymptotically distributed as chi square only if one model is a special case of the other model and the larger model is true. This difference in x2 values indicates whether the parameters that are estimated in the more general model add anything to the fit of the model for the data. It should be noted that Joreskog and Sorbom also cite several other reasons why the x2 indices should be used with caution. Another goodness of fit index provided by LISREL V is the root mean square residual, r n n 11/2 RMSR = 2 2 2 (ci}- c(j )2l(k + 1)* L i"=i;=i , (4) J where k is the number of observed variables, and c,y and c,y are elements of the observed and fitted covariance matrices. The RMSR is a useful descriptive index for comparing the fit of two different models for the data. In addition to these indices of global fit, LISREL V provides individual residuals in both raw and standardized forms. The standardized residuals are taken from standard asymptotics based on normality; hence the standardized residual is assumed to be asymptotically a standard normal variable. Joreskog and Sorbom (1981) suggest that standardized residuals with values greater than two in absolute value merit close examination. For an effective summary of the fit of individual models, LISREL V presents Q -plots of the normalized residuals against normal quantiles. The slope of the plotted points are indicative of model fit and it is possible to evaluate model fit by visual inspection of the Q -plots. One can imagine a straight line passing through the plotted points and compare the slope of this line with a 45° line represented on the plots by small dots. Slopes which are close to one represent moderate fit and those smaller than one poor fit. Perfect fit is represented by points falling in a straight line perpendicular to the abscissa. 30 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal Results The factor analytic results are presented in the following fashion. For each SAT-Verbal test edition, the number-of-factors question is assessed by examining the fit of first-order factor solutions. Comparability of the hypothesized second-order factor structures is then examined across the three test editions. Number of Factors Figure 2 contains examples of Q -plots of normalized residuals used to assist in assessing fit in this study; the plots are for SAT-Verbal Edition V4. There are four panels in this figure. The two left panels summarize the fit FIGURE 2. Normalized residuals plots and indices offit for SAT-Verbal edition V4 One Factor First-Order Solution Two Factor First-Order Solution One. Second-Order Common Factor and Four Second-Order Group Factors Solution Two Second-Order Common Factors and Four Second-Order Group Factors Solution 31 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor of a one factor first-order solution and a two factor first-order solution respectively, while the two right panels summarize the fit of two secondorder factor solutions: a solution with one second-order common factor and four second-order group factors (one each for sentence completions, antonyms, analogies, and items based on reading passages), and a solution with two independent second-order common factors and the same four second-order group factors. The top left panel reveals that a single firstorder factor solution does not fit the V4 item parcel correlation matrix. The residuals plot reveals a sizeable number of large positive residuals, which is indicative of underfactoring. In the bottom left panel it can be seen that adding a second first-order factor results in a very noticeable improvement in fit. The indices of fit, x2 and RMSR, for SAT-Verbal Edition V4 are shown in the left-hand column of Table 1, which contains summaries of the fit for all three SAT-Verbal editions. For Edition V4, the RMSR is halved from .026 to .013, and the x2 exhibits a sizeable drop from 605 (df = 119) to 182 (df = 101), an unquestionably significant improvement in fit. The information contained in the top right panel of Figure 2 reveals that a second-order solution with a restrictive factor pattern, one second-order common factor and four second-order group factors, fits the V4 item parcel correlations very well. Adding a second second-order common factor, or- TABLE 1 Summary of fit for three SAT-Verbal editions Edition Model One factor first-order Two factor first-order Three factor first-order V4 X2 Y3 RMSRa = .026 RMSR = .027 RMSR = .026 ( X 2 , # ) = (605,119) ( x 2 , # ) = (681,119) ( X 2 , # ) = (653,119) RMSR = .013 RMSR = .017 RMSR = .017 (X\df) = (182,101) (X\df) = (310,101) (X\df) = (296,101) Not done RMSR = .010 Not done (X\df) = (124,82) One second-order RMSR = .014 RMSR = .012 RMSR = .013 common factor ( x 2 , # ) = (176,115) ( X 2 , # ) = (145,115) ( X 2 , # ) = (169,115) and four second-order group factors Two second-order RMSR = .013 RMSR = .012 RMSR = .013 common factor ( X 2 , # ) = (152,114) ( x 2 , # ) = (143,114) (x\df) = (163,114) and four second-order group factors PRoot mean square residual. 32 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal thogonal to the first (the bottom right panel in Figure 2), produces a slight but statistically significant improvement in fit, as seen in Table 1, dropping the x2 from 176 (df = 115) to 152 (df = 114). The normalized residuals plots for SAT-Verbal Edition X2 were similar to those for Edition V4, revealing that one factor was clearly inadequate and that addition of the second first-order factor improved the fit noticeably. In fact, as seen in the middle column of Table 1, three first-order factors were really needed to provide a tight fit to the data for Edition X2. As seen in Table 1, taking a third first-order factor results in a x2 of 124 (df = 82) and RMSR of .010 for Edition X2. The fit obtained for the first-order solutions and the fit obtained for the second-order solutions for Edition X2 can be contrasted using data in the middle column of Table 1. A restrictive confirmatory second-order solution that is theory-based fits better than the less restrictive first-order factor solutions. Indices in the middle column of Table 1 reveal that one secondorder common factor and four second-order group factors fits the X2 item parcels correlation matrix very well and also indicate that adding a second second-order common factor is unnecessary. Thus a model that requires only one second-order common factor to account for correlations between parcels composed of different item types fits the data very well. Recall that for V4 the addition of a second second-order common factor improved the fit slightly but significantly. The right hand column of Table 1 summarizes the fit results for SATVerbal Edition Y3. As was the case for Edition X2, at least two first-order factors are needed to fit the Y3 item parcels correlations. As with Edition X2, the second-order solution with one second-order common factor and four second-order group factors provides a very good fit to the data. Adding a second second-order common factor improves the fit very little. Second-order Structures For all three SAT-Verbal editions, the hypothesized second-order factor solutions fit the data well. Table 2 contains a numerical summary of the single second-order common factor solutions. Here the relative contributions of the single second-order common factor and each the four secondorder group factors to the first-order parcel factors are tabled. In addition, Table 2 contains the correlations among the four first-order factors. One aspect of the data presented in Table 2 is immediately obvious. For every SAT-Verbal edition, the second-order common factor is large relative to the second-order group factors. This fact can be observed in the first-order factor correlations, all of which are .80 or higher, and in the variance contributions portion of the table. For example, for Edition V4, the secondorder common factor accounts for 98% of the sentence completions factor variance, 85% of the antonyms factor variance, 93% of the analogies factor variance, and 82% of the reading passage items factor variance. Looking across test editions (down columns in the table), it can be seen 33 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 First-Order factors Test edition V4 X2 Y3 second-order common factor second-order group factors second-order common factor second-order group factors second-order common factor second-order group factors Sentence completions I Antonyms II Analogies III Reading passage items IV .98 .85 .93 .82 .02a .15 .07 .18 .97. .92 .82 .81 .03a .08 .18 .19 .96 .88 .86 .84 .04a .12 .14 .16 "Not significantly different from zero (p < .01). Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 First-Order factor correlations I I II III IV I II III IV I II III IV 1.0 .92 .96 .90 1.0 .94 .89 .89 1.0 .92 .91 .90 II III IV 1.0 .89 .84 1.0 .88 1.0 1.0 .87 .86 1.0 .81 1.0 1.0 .87 .86 1.0 .85 1.0 Cook, Doransy and Eignor TABLE 2 Relative contributions of one second-order common and four second-order group factors to variance of first-order parcel factors for three SAT-Verbal editions Dimensionality of SAT-Verbal that the second-order common factor accounts for almost all of the sentence completions factor variance on all three test editions. In contrast, the reading passage items factor has the largest second-order group factor on all three editions. For Edition V4, the second-order common factor is more closely related to the analogies factor than the antonyms factor; for Edition X2, the opposite is true. For Edition Y3, the second-order common factor is only slightly more related to the antonyms factor than it is to the analogies factor. Table 1 included descriptions of the fit of a second-order solution that allowed for a second second-order common factor. Table 3 summarizes these solutions. It can be seen, from the information summarized in Table 3, that for Editions X2 and Y3, inclusion of a second second-order common factor adds nothing to the solution. This fact can be observed in the minuscule contributions of this second second-order common factor (.00 or .01) to first-order factor variance. Note also that for Editions X2 and Y3, the correlations among first-order factors remained virtually unchanged when the second second-order common factor was added (compare correlations in Tables 2 and 3). In contrast, addition of a second second-order common factor has an impact on the solution for Edition V4. Note that the antonym second-order group factor is reduced substantially, while the reading passage secondorder group factor is reduced somewhat. This second second-order common factor makes a non-trivial contribution to the variance of the antonym and reading passage second-order group factors. As the note to Table 3 indicates, this second second-order common factor has positive weights for the vocabulary item types, antonyms, and analogies, and negative loadings for the reading item types, sentence completions, and reading passage items. Consequently, inclusion of the second second-order common factor increases the correlations between the vocabulary item type factors, and decreases their correlations with the reading item type factors. Dropping Reading Passage Items Parcels The results contained in Tables 1-3 suggest two conclusions. First, SATVerbal is not strictly unidimensional and most of the lack of unidimensionality can be attributed to the reading passage items. Second, the content structure for Edition V4 differs from that for Editions X2 and Y3. Edition V4 needs a second second-order common factor to explain the correlations among the item parcels, a factor that Editions X2 and Y3 do not require. To evaluate the supposition that the reading passage items are the major reason for lack of unidimensionality, factor analyses were conducted on reduced item parcels correlation matrices obtained by excluding the five reading passage items parcels from the matrices. These analyses for the reduced matrices parallel those conducted for the full item parcels correlation matrices. 35 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 First-Order factors Test edition V4 X2 Y3 second-order common factor second-order common factor second-order group factors second-order common factor second-order common factor second-order group-factors second-order common factor second-order common factor second-order group factors Sentence completions I Aantonyms II Analogies III Reading passage items IV .96 .91 .92 .84 .00 .06 .00 .06 .04 .03 .08 .10 .97 .92 .82 .81 .01 .00 .01 .01 .02 .08 .17 .18 .96 .89 .86 .84 .01 .01 .01 .01 .03 .10 .13 .15 1 2a 1 2a 1 2a % First-Order factor correlations I II III I II III IV 1.0 .92 .94 .91 1.0 .92 .81 1.0 .87 1.0 I II III IV 1.0 .94 .89 .89 1.0 .87 .86 1.0 .81 1.0 I II III IV 1.0 .92 .91 .90 1.0 .87 .86 1.0 .85 1.0 IV ¥or all three test editions, first-order loadings on second-order common factor 2 were positive for analogies and antonyms and negative for sentence completion and reading passage item parcels. With the exception of antonyms and reading passage items on Edition V4, these loadings on the second second-order common factor were trivial. Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor TABLE 3 Relative contributions of two second-order common and four second-order group factors to variance of first-order parcel factors for three SAT-Verbal editions Dimensionality of SAT-Verbal The data presented in Figure 3 and Table 4 for Edition V4 parallel that presented in Figure 1 and Table 1. Dropping the reading passage items does not result in a drop in the number of first-order factors needed to fit the data. The single factor first-order solutions, however, are somewhat better here than they were when the reading passage items parcels were included. Hence, the reading passage items parcels, while a major contributor, are not the sole reason for lack of unidimensionality. Table 5 provides more evidence on this point. From the information presented in this table, it can be seen that the analogies second-order group factors are sizeable for Edition X2 and Y3. Recent research by Lawrence and Dorans (1987) found that a speededness factor contributed to departure from unidimensionality FIGURE 3. Normalized residuals plots and indices offit for SAT-Verbal edition V4 (excluding reading passage items parcels) One Second-Order Common Factor and Three Second-Order Group Factors Solution Two Factor First-Order Solution Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cook, Dorans, and Eignor TABLE 4 Summary of fit for three SAT-Verbal editions (excluding reading passage items) Edition V4 Model a One Factor First-Order Two Factor First-Order One Second-Order Common Factor and Three Second-Order Group Factors Y3 X2 RMSR = .019 (X2,df) = (149,54) RMSR = .013 (X\df) = (81,41) RMSR = .022 RMSR = .024 (X\df) = (306,54) (X2,<//) = (246,54) RMSR = .012 RMSR = .014 (X\df) = (84,41) (X\df) = (110,41) RMSR = .014 (x2, df) = (83,51) RMSR = .011 (X2, df) = (71,51) RMSR = .013 (X2, df) = (85,51) "Root mean square residual. for items at the end of the SAT-Mathematical sections. For Editions X2 and Y3, half of the 20 analogy items in each edition appears at the end of one of the two 30-minute verbal sections. Perhaps, speededness is a contributing factor to sizeable second-order analogy group factors obtained for Editions X2 and Y3. One also can see from Table 5 that the structure for Edition V4 still gives evidence of being different from that of X2 and Y3. In fact, V4 appears to be the most unidimensional of the three test editions when reading passages are excluded from the analysis. This finding is consistent with the speededness hypothesis noted earlier since on Edition V2 the last items in each verbal section were reading passage items. The structures for X2 and Y3, on the other hand, appear quite parallel. Thus, removing the reading passage items parcels results in data (the remaining item types) that are more unidimensional and clarifies the structural differences between Edition V4 and Editions X2 and Y3, structural differences that may be related to test speededness. To summarize, the results of the factor analyses indicate that the SATVerbal edition can be considered to be slightly multidimensional, and to exhibit some departures from edition-to-edition parallelism. Edition V4 appears to be more unidimensional than the other two editions when reading passage items are excluded, and, as was hypothesized, less parallel to Editions X2 and Y3 than the latter two editions are to each other. Removing the item type for which the second-order group factor contributed the most to parcel variance (reading passage items), although providing data of a more unidimensional nature, did not result in what could be considered a truly unidimensional set of items for any of the test editions. 38 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 TABLES Relative contributions of one second-order common and three second-order group factors to variance of first-order parcel factors for three SAT-Verbal editions (excluding reading passage items) First-Order factors Test edition a Antonyms II Analogies III .93 .90 .94 .07 .10 .06 .96 .92 .82 .04a .08 .18 .93 .90 .87 .07 .10 .13 Not significantly different from zero (p < .01). v© Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 First-Order factor correlations I II III I II III 1.0 .92 .93 1.0 .92 1.0 I II III 1.0 .94 .89 1.0 .87 1.0 I II III 1.0 .92 .90 1.0 .88 1.0 Dimensionality of SAT-Verbal V4 second-order common factor second-order group factors X2 second-order common factor second-order group factors Y3 second-order common factor second-order group factors Sentence completions I Cook, Dorans, and Eignor Discussion This research was conducted in an attempt to develop a better understanding of the dimensionality of the SAT-Verbal section. Previous examination of underlying factor structure has been hampered by the difficulties associated with assessing dimensionality when using binary item data. In an attempt to circumvent some of these difficulties, item parcels were constructed in this study. Construction of these parcels was guided by content and item type considerations, and by a desire to produce correlations that could be fit by linear factor models. The resultant correlation matrices were subjected to a series of confirmatory factor analyses employing the LISREL V model. The dimensionality analyses clearly verified that SATVerbal Edition V4 was less parallel to the other two editions, Editions X2 and Y3, than these other two editions were to each other. The interaction of test speededness with format of the test may account for some of those structural differences. The methodology used in this study should be refined in regard to the manner in which item parcels are formed. The parcels or item subsets used in the study were formed using item types as defined by content specifications. Cattell and Burdsal (1975) suggest, however, that the results of a factor analysis be used to define the item dimensions for forming subsets. At the time that the data analysis activities were performed for this study, the computer program TESTFACT (Wilson, Wood, & Gibbons, 1984), which provides a marginal maximum likelihood full information factor analysis of item level data, had not been written. A combination of methodologies, with full information factor analysis used to define the subsets and then the methodology described in this report used to assess dimensionality, would be a natural extension of the current procedure. A combination of methodologies, rather than reliance on only full information factor analysis, also makes sense given the costs involved in running TESTFACT and the additional fact that it is not a good procedure to use for long tests with a fairly large number of hypothesized factors (Mislevy, 1986). Given the strict adherence to item type composition observed for the Scholastic Aptitude Test, the verbal and mathematical sections of this test seem quite amenable to continued dimensionality analyses. These analyses should uncover more general (and perhaps contrasting) trends in dimensionality and edition-to-edition parallelism. These results could then be related to the quality of IRT true-score equating, currently in use with the SAT, and some statements could be made concerning the robustness of IRT equating to violations of the unidimensionality assumption. Eventually, this approach might yield diagnostics that could be used to arrive at more informed psychometric decisions about test specifications, and about the equating and scoring of the SAT. 40 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal References Bejar, 1.1. (1980). A procedure for investigating the unidimensionality of achievement tests based on item parameter estimates. Journal of Educational Measurement, 17, 283-296. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459. Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179-197. Carroll, J. B. (1945). The effect of difficulty and chance success on correlations between items or between tests. Psychometrika, 10, 1-19. Carroll, J. B. (1961). The nature of the data, or how to choose a correlation coefficient. Psychometrika, 26, 347-372. Carroll, J. B. (1983). The difficulty of a test and its factor composition revisited. In S. Messick & H. Wainer (Eds.), Principals of modern psychological measurement: A festschrift for Frederic M. Lord. Hillsdale, NJ: Erlbaum. Cattell, R. B. (1956). Validation and intensification of the Sixteen Personality Factor Questionnaire. Journal of Clinical Psychology, 12, 205-214. Cattell, R. B. (1974). Radial parcel factoring versus item factoring in defining personality structure in questionnaires: Theory and experimental checks. Australian Journal of Psychology, 26, 103-119. Cattell, R. B., & Burdsal, C. A. (1975). The radial parcel double factoring design: A solution to the item-vs-parcel controversy. Multivariate Behavioral Research, 10, 165-179. Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32. Divgi, D. R. (1981, April). Potential pitfalls in applications of item response theory. Paper presented at the annual meeting of the National Council on Measurement in Education, Los Angeles. Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23, 283-298. Drasgow, F., & Dorans, N. J. (1982). Robustness of estimates of the squared multiple correlation and squared cross-validity coefficient to violations of multivariate normality. Applied Psychological Measurement, 6, 185-200. Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory models to multidimensional data. Applied Psychological Measurement, 7, 189-199. Etezadi-Amoli, J., & McDonald, R. P. (1983). A second-generation nonlinear factor analysis. Psychometrika, 48, 315-342. Fischer, G. H. (1978). Probabilistic test models and their applications. German Journal of Psychology, 2, 298-319. Gustafsson, J-E. (1980). Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33, 205-233. Hambleton, R. K. (1983). Applications of item response theory. Vancouver, BC: Educational Research Institute of British Columbia. Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items. Applied Psychological Measurement, 10, 287-302. Hattie, J. A. (1981). Decision criteria for determining unidimensionality. Unpublished doctoral dissertation, University of Toronto. 41 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Cooky Dorans, and Eignor Hattie, J. A. (1984). An empirical study of various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49-78. Hattie, J. A. (1985). Methodology review: Assessing unidimensionality of test and items. Applied Psychological Measurement, 9, 139-164. Hecht, L. W., & Swineford, F. (1981). Item analysis at Educational Testing Service. Princeton, NJ: Educational Testing Service. Hulin, C. L., Drasgow, F., & Parsons, C.K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin. Joreskog, K. G., & Sorbom, D. (1981). LISREL V-Analysis of linear structural relationships by the method of maximum likelihood. Chicago, IL: International Educational Services. Lawrence, I. M., & Dorans, N.J. (1987, April). An assessment of the dimensionality of SAT-Mathematical. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, DC. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley. McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34, 100-117. McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6, 379-396. McDonald, R. P., & Ahlawat, K. S. (1974). Difficulty factors in binary data. British Journal of Mathematical and Statistical Psychology, 27, 82-99. Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11, 3-31. Mislevy, R. J., & Bock, R. D. (1983). BILOG: Item analysis and test scoring for binary logistic models. Chicago, IL: International Educational Services. Muthen, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560. Muthen, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132. Rosenbaum, P. R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425-435. Schmid, J., & Leiman, J. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53-61. Swinton, S. S., & Powers, D. E. (1980). A factor analytic study of the restructured GRE Aptitude Test (GRE Board Professional Report GREB No. 77-6P). Princeton, NJ: Educational Testing Service. Tucker, L. R. (1983). Searching for structure in binary data. In S. Messick & H. Wainer (Eds.), Principals of modern psychological measurement: A festschrift for Frederic M. Lord. Hillsdale, NJ: Erlbaum. Van den Wollenberg, A. L. (1982a). Two new test statistics for the Rasch model. Psychometrika, 47, 123-140. Van den Wollenberg, A. L. (1982b). A simple and effective method to test the dimensionality axiom of the Rasch model. Applied Psychological Measurement, 6, 83-91. 42 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016 Dimensionality of SAT-Verbal Wilson, D., Wood, R. L., & Gibbons, R. (1984). TESTFACT: Test scoring and item factor analysis. Chicago, IL: Scientific Software. Wingersky, M. S. (1983). LOGIST: A program for computing maximum likelihood procedures for logistic test models. In R. K. Hambleton (Ed.), Applications of item response theory. Vancouver, B.C.: Educational Research Institute of British Columbia. Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982). LOGIST V user's guide. Princeton, NJ: Educational Testing Service. Authors LINDA L. COOK, Principal Measurement Specialist, Educational Testing Service, Princeton, NJ 08541. Specializations: item response theory, educational measurement. NEIL J. DORANS, Senior Measurement Statistician, Educational Testing Service, Princeton, NJ 08541. Specializations: quantitative psychology, educational measurement. DANIEL R. EIGNOR, Principal Measurement Specialist, Educational Testing Service, Princeton, NJ 08541. Specializations: item response theory, educational measurement. 43 Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 12, 2016
© Copyright 2026 Paperzz