PSYCHOMETRIKA — VOL . 78, NO . 4, O CTOBER 2013 DOI : 10.1007/ S 11336-013-9333-5 830–855 EXPLANATORY MULTIDIMENSIONAL MULTILEVEL RANDOM ITEM RESPONSE MODEL: AN APPLICATION TO SIMULTANEOUS INVESTIGATION OF WORD AND PERSON CONTRIBUTIONS TO MULTIDIMENSIONAL LEXICAL REPRESENTATIONS S UN -J OO C HO , J ENNIFER K. G ILBERT , AND A MANDA P. G OODWIN PEABODY COLLEGE OF VANDERBILT UNIVERSITY This paper presents an explanatory multidimensional multilevel random item response model and its application to reading data with multilevel item structure. The model includes multilevel random item parameters that allow consideration of variability in item parameters at both item and item group levels. Item-level random item parameters were included to model unexplained variance remaining when item related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem. Using the model, this study examined the dimensionality of a person’s word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups. Key words: lexical representation, multidimensional item response model, multilevel item structure, random item parameters. 1. Introduction A multidimensional item response model has been used to investigate the cognitive nature of tasks (e.g., Janssen & De Boeck, 1999). An aim of the current paper is to extend the multidimensional item response model to a model that takes into account multidimensional ability due to item groups and a multilevel item structure where items are at the lower-level and item groups are at the higher-level. These extensions are illustrated by answering important research questions in reading theory and education. The multidimensional item response model incorporates dimensions with regard to both persons and items as a measurement model, written as logit P (yj i = 1|θ j , β i ) = θ j − β i , (1) where j is a person index (j = 1, . . . , J ), i is an item index (i = 1, . . . , I ), d is a dimension index (d = 1, . . . , D), θ j = [θj 1 , . . . , θj d , . . . , θj D ] is a multidimensional ability parameter, and β i = [βi1 , . . . , βid , . . . , βiD ] is a multidimensional difficulty parameter for a between-item design where an item is allowed to load on only one dimension. Individual ability differences among persons (θ j in Equation (1)) and difficulty differences among items (β i in Equation (1)) can be explained using covariates such as person characteristics and item features, respectively, and/or the interaction between them. Current methods might recommend a two-stage procedure. Parameter estimates of the measurement model can be obtained and then a regression model can be used to explain estimates using covariates. However, the use of the two-stage procedure may provide distorted estimates of covariate effects because measurement errors of the estimates are not incorporated. Within an explanatory item response theory (IRT) framework (De Boeck & Wilson, 2004), the parameters of the measurement model Requests for reprints should be sent to Sun-Joo Cho, Vanderbilt University, Peabody #H213A, 230 Appleton Place, Nashville, TN 37203, USA. E-mail: [email protected] © 2013 The Psychometric Society 830 SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 831 and the effect of covariates on the parameters in a structural model can be investigated simultaneously. With the simultaneous approach, measurement error of the estimated parameters is taken into account when the effects of covariates on parameters are estimated (e.g., Adams, Wilson, & Wu, 1997, for person parameters; Janssen, Schepers, & Peres, 2004, for item parameters). Item features or covariates have been used to explain the item difficulty parameters without allowing random residuals such as in the linear logistic test model (Fischer, 1973). However, it is preferable to allow the random residuals because item covariates may not explain all variations in parameters. Estimated residual variances provide information about how much remains unexplained, and hence how much room there might be for improving explanation by including more covariates (De Boeck, 2008). Furthermore, it has been shown that omitting the random residuals can also lead to underestimated standard errors for the covariate effects on the IRT model parameters (Janssen et al., 2004; Mislevy, 1988). Also, in certain tests, items are nested in item groups, such as item families used in an item generation context (Bejar, 1993). For example, to measure word reading ability, a family of words such as happiness, happily, unhappy, and unhappily can be created as an item group made of items which are from the same item stem, happy. Local dependency in item responses among items in the item group is of concern in the multilevel item structure where items are at the lower-level and item groups are at the higher-level. Ignoring such local dependency can lead to less accurate item parameter estimates (Cho, De Boeck, Embretson, & Rabe-Hesketh, 2013; Glas & van der Linden, 2003; Johnson & Sinharay, 2005; Sinharay, Johnson, & Williamson, 2003). Dependency in item responses due to within item-group similarities can be taken into account by having the random item parameters at the item-group level (Cho et al., 2013). Multidimensional extensions of item response models with random item parameters at the item-group level (called multidimensional multilevel random item response models hereafter) have not been described and applied in the literature thus far. The purpose of this paper is (1) to present extensions of explanatory multidimensional item response models to models with multilevel random item parameters, called explanatory multidimensional multilevel random item response models (EMMRIRM), and (2) to use the EMMRIRM to examine the interrelationships between the different aspects of a person’s word knowledge both at the person side and at the item side and to explain these interrelationships using personby-item, person, and item covariates. We use the term multilevel random item parameters to indicate that there are random item parameters at both item and item-group levels. In the EMMRIRM, a random item parameter at the item level is included to model unexplained variance remaining when item related covariates are used to explain variation in item difficulties. A random item parameter at the item group level is included to model dependency among items having the same item stem. In the following, the measurement issues related to assessing lexical representations and research questions to be answered using EMMRIRM are described. 2. Measurement of Multidimensional Lexical Representations 2.1. Multidimensional Lexical Representations Reading researchers are interested in determining how to support students in comprehending text. One factor that contributes to reading comprehension involves how much a reader knows about the words within the text. This word knowledge, termed a lexical representation, ideally comprises a deep and consistent representation of a word and its orthographic, phonological, grammatical, and semantic properties, which can then be used with greater ease to support text comprehension (Perfetti & Hart, 2001, 2002; Perfetti, 2007). This definition suggests that lexical 832 PSYCHOMETRIKA representations are multidimensional in nature because a word has multiple sources of information that can be tapped by asking individuals to spell, decode, and define words (Perfetti & Hart, 2001, 2002; Perfetti, 2007). For example, when attempting to comprehend a text by retrieving the lexical representations of words (e.g., stratagem), a reader may use multiple sources of information to gain the identity of the word, including its decoding, spelling, and meaning. For morphologically complex words like extremity, which are made by combining a root-word (i.e., extreme) and an affix (i.e., the ity), research also suggests students use root word knowledge (i.e., knowledge of extreme) to figure out the larger derived word (i.e., extremity). Studies suggest, for example, that students use root-word knowledge to support reading (Carlisle & Stone, 2005), spelling (Deacon & Bryant, 2005, 2006), and determining the meaning (Anglin, 1993; Tyler & Nagy, 1989, 1990) of the related derived word. In the present study, we want to measure individual differences in lexical representations of words via performance on related but separate dimensions of knowledge (Perfetti & Hart, 2002). We first formally investigate the dimensionality of lexical representations of words that individuals have, and then we test how various aspects of root-word knowledge support lexical representations for different words and persons. 2.2. Item, Item-Group, and Person Specific Lexical Representations A person may have knowledge of some lexical representation informational sources, but not others, suggesting that lexical representations are test-item specific. That is, decoding stratagem, spelling stratagem, and knowing the meaning of stratagem each represent different item knowledge related to the word stratagem. Lexical representations are also person and item group (or word) specific, which means that different persons are likely to vary in the lexical representations of different item groups encountered. This is because certain person characteristics (i.e., vocabulary knowledge, reading comprehension skills, and morphological awareness) and certain item-group features (i.e., frequency, transparency, consistency, and number of morphemes) make it easier to build a deep and consistent lexical representation about an item group. Prior researchers in reading education have used overall performance on literacy measures to group persons because of the difficulty of examining person, item, and item group features related to lexical representations simultaneously. For example, when examining differences in how persons perform on aspects of lexical representations, common practice would suggest researchers either aggregate scores on a test of decoding, spelling, and meaning and use structural equation modeling to run regression models with each aspect of the person or item lexical representation as an outcome being predicted by person or item characteristics (e.g., Vellutino, Tunmer, Jaccard, & Chen, 2007). Because total scores are used, researchers cannot separate how person characteristics such as reading comprehension or vocabulary knowledge contribute to performance on particular items nor how item group features such as frequency or transparency contribute to such performance. In fact, as conveyed by studies discussed in Perfetti (2007), researchers examining the latter questions of lexical representation have resorted to separating person characteristics from item characteristics through creating groups of similar reader profiles (such as above average or below average performance on various literacy measures) or similar item demands (such as high frequency versus low frequency words or single versus multimorphemic words) and looking at differences between persons and items, respectively. 2.3. Multilevel Item Structure in the Measurement of Lexical Representations In our empirical study, the three subtasks, decoding, spelling, and meaning, have their own item, but they share the same item stem. For example, the same word stratagem can be used to measure three aspects of the lexical representation including decoding, spelling, and meaning information, which are represented by the following tasks: “Please read the following word aloud: SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 833 stratagem,” “Please spell the word stratagem on your paper,” and “Do you know the meaning of the word stratagem,’ respectively. These different types of information about the same word (i.e., subtasks) are considered as the lower-level words (i.e., items), nested within the higher-level stimulus words (i.e., item groups). 2.4. Empirical Research Questions The main empirical research question is how aspects of word knowledge contribute to lexical representations for different adolescent readers and for different words. Within this larger research question, our study is guided by three research questions, specifically: 1. How do the various sources of lexical information (i.e., decoding, spelling, and meaning) relate to one another? 2. What aspects of knowledge (decoding, spelling, and knowing the meaning and morphological neighbors) of a root word are related to decoding, spelling, and knowing the meaning of a related derived word? In general, children learn short and simple words before they learn long and complex words. Models of word knowledge suggest that children may use their knowledge of a smaller root word (covet) to help them learn larger related words (covetousness). Perfetti’s (2007) lexical quality hypothesis states that lexical representations of words comprise interrelated dimensions of word knowledge such that knowledge of a word’s spelling, for example, is likely to associate with knowledge of a word’s pronunciation. Because derived words contain root words, it is likely that aspects of knowledge about root words will associate with aspects of knowledge about derived words. For instance, knowledge of how to spell a root word will likely associate not only with the spelling of the related derived word, but also the pronunciation of the related derived word. 3. Do word (or item group) level characteristics (i.e., frequency, consistency, transparency, and number of morphemes) and person level characteristics (i.e., vocabulary knowledge, reading comprehension, and morphological awareness) affect lexical representations? These three questions will be answered from results of the EMMRIRM. Such models allow us to better understand both the dimensionality of lexical representations and the sources of knowledge that contribute to lexical representations while taking into account differences between items, item groups, and persons. The knowledge produced from this study will provide a deeper understanding of how root-word knowledge and derived-word features bolster lexical representations, which will provide insights into possible instructional paths that may support students in building higher quality of lexical representations and, therefore, support reading comprehension as well. Next, EMMRIRM is detailed to incorporate the measurement issues related to lexical representations and to answer the research questions described earlier. Model parameter estimation issues are discussed subsequently. 3. Explanatory Multidimensional Multilevel Random Item Response Model The EMMRIRM is described for a between-item design. 3.1. Measurement Model: Multidimensional Multilevel Random Item Response Model There are both crossed and nested classifications in the item response data where items are nested within item groups. Every item is offered to all persons and every person responds to all 834 PSYCHOMETRIKA F IGURE 1. Diagram for multidimensional multilevel random item response model. items. Thus, the item and person classifications are found at the same level and they are crossed. In addition to the crossed design, there is a multilevel design in which items are nested within item groups on the item side. To frame this data structure within the multilevel literature, item responses at Level 1 are cross-classified with persons and items at Level 2. Items are nested within item groups at Level 3 on the item side as shown in Figure 1. With the crossed design at Level 2, a person random effect and an item random effect are crossed random effects when they are used to model individual differences over persons and to model differences in item difficulty over items, respectively. With the nested design at Level 3 in the item side, an item-group random effect and an item random effect are nested when the item-group random effect is taken into account to model differences in item-group difficulties over item groups. The person random effect and the item-group random effect are crossed at a different level. The measurement model, multidimensional multilevel random item response model (MMRIRM), can be described as follows: (2) (2) (2) (2) (2) logit P yj ig = 1|θ j , β i , βg(3) = λ + θ j − β i − βg(3) , where • • • • • j is an index for a person (j = 1, . . . , J ), i is an index for an item (i = 1, . . . , I ), g is an index for an item group (g = 1, . . . , G), d is an index for a dimension (i.e., a task) (d = 1, . . . , D), the superscript (2) refers to Level 2, 835 SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN • the superscript (3) refers to Level 3, (2) • θ j = [θj 1 , . . . , θj d , . . . , θj D ] is a multidimensional ability parameter and refers to the person dimensions, (2) • β i = [βi1 , . . . , βid , . . . , βiD ] is a multidimensional item difficulty parameter and refers to the item dimensions, • λ = [λ1 , . . . , λd , . . . , λD ] is an intercept, a logit for the probability of a correct response of an ‘average’ person on an ‘average’ item, and (3) • βg is an item-group difficulty parameter. (2) (2) In the MMRIRM, multidimensionality exists for both persons (θ j ) and items (β i ) (2) to model different tasks at Level 2. The multidimensional ability parameter, θ j , is assumed to follow a multivariate normal (MN) distribution: θ j = [θj 1 , . . . , θj d , . . . , θj D ] ∼ MN(μ(d×1) , Σ(d×d) ). Item-group-specific item difficulty, βid , in the multidimensional item difficulty parameter is assumed to follow a normal (N ) distribution: βid ∼ N (μβd , σβd ). Item-group (3) (3) difficulty parameter, βg , is also assumed to follow a normal (N ) distribution: βg ∼ N (μβ , σβ ). To identify the model, the means of all random effects are set to 0. After population parameters of random effects for abilities and difficulties are estimated, individual person scores and item difficulties at both item and item-group levels can be obtained using empirical Bayes prediction. Figure 1 depicts the MMRIRM, Equation (2). In the figure, the squares and ellipses represent manifest and latent variables, respectively. Item responses for each dimension are described as follows: yg1 = [y11g1 , . . . , yj ig1 , . . . , yJ I1 g1 ] for Dimension 1, ygd = [y11gd , . . . , yj igd , . . . , yJ Id gd ] , for Dimension d and ygD = [y11gD , . . . , yj igD , . . . , yJ ID gD ] for Dimension D, where Id is the number of items nested within a dimension. At Level 2, dimension-specific item difficulties, βid , are considered as random over items within dimension and dimension-specific abilities, (3) θj d , are also considered as random over persons. At Level 3, item-group difficulty, βg , is considered as random over item groups. Superscripts in parameters indicate the level in the data structure. Without Level 3, the data structure of MMRIRM is similar to that of a double-structure structural equation model (González, De Boeck, & Tuerlinckx, 2008). Both person and item modes of the data array are modeled simultaneously in the MMRIRM while both person and situation modes of the data array are modeled simultaneously in the double-structure structural equation model. Latent intraclass correlations1 for both items and item groups (ρ(IG)) and for items (ρ(I )) can be calculated to investigate dependency in latent item responses among items and item groups. A between-item design where each item is loaded only on one dimension (which is true of our empirical data) was assumed in the calculation of the latent intraclass correlations. A latent response formulation for the MMRIRM is introduced to define the latent intraclass correlations. Let there be a latent response yj∗ig so that the observed response is yj ig = 1 if yj∗ig > 0 and yj ig = 0 otherwise. Assuming that (2) yj∗ig = λ + θ j − β i − βg(3) + j ig (2) (2) and that the error j ig follows a logistic distribution (mean = 0, variance = model in Equation (2) for the observed responses, y. (3) π2 3 ), produces the 1 We used the term “latent” intraclass correlations to distinguish it from the “manifest” correlation, which refers to the conventional product-moment correlation between observed binary responses, y, (Rodríguez & Elo, 2003). 836 PSYCHOMETRIKA With a between-item design, variances (Var) of a multidimensional difficulty parameter (2) (β i ) conditional on the person abilities can be represented as the sum of the dimension-specific variances of item difficulties across item dimensions as follows: D D (2) (2) (2) (2) (2) (2) Var β i |θ j , θ j = Var βdi θ j , θ j = σβd . (4) d d=1 The definition of ρ(IG) as the correlation (Corr) among latent responses (yj∗ig ) for the same item i and the same item group g, conditional on the person abilities, can be defined as follows: (2) (2) ρ(IG) = Corr yj∗ig , yj∗ ig |θ j , θ j = D = D d Cov(yj∗ig , yj∗ ig |θ j , θ j ) (2) (2) (2) (2) Var(yj∗ig |θ j , θ j ) · Var(yj∗ ig |θ j , θ j ) (2) σβd + σβ 2 D + σβ + π3 · d σβd + σβ + d σβd π2 3 (2) (5) (6) , with a constant covariance (Cov) for any latent responses from different persons in the same item (2) (2) and item groups, Cov(yj∗ig , yj∗ ig |θ j , θ j ) = D d σβd + σβ . The definition of ρ(I ) as the correlation among latent responses for the same item group d, but different items i and i , conditional on the person abilities, can be defined as follows: (2) (2) ρ(I ) = Corr yj∗ig , yj∗ i g |θ j , θ j = = D d σβd Cov(yj∗ig , yj∗ i g |θ j , θ j ) (2) (2) (2) (2) Var(yj∗ig |θ j , θ j ) · Var(yj∗ i g |θ j , θ j ) (2) σβ 2 D + σβ + π3 · d σβd + σβ + π2 3 (2) (7) (8) , with a constant covariance for any latent responses from different items and different persons in (2) (2) the same item group, Cov(yj∗ig , yj∗ i g |θ j , θ j ) = σβ . The value of ρ(IG) is expected to be higher than that of ρ(I ) because latent item responses for the same item group are more correlated if items are in the same item group than if items are in the different item groups. 3.2. Measurement Model and Structural Model: Explanatory Multidimensional Multilevel Random Item Response Model In item response data, three kinds of covariates may exist, (1) person-by-item covariates (Wj i ), (2) person covariates (Zj ), and (3) item covariates (Xi ). In the structural model, these covariates can be used to explain the parameters in the measurement model. Specifically, individual differences in abilities, θ (2) j , can be explained using person-by-item covariates (Wj i ) and (2) (2) person covariates (Zj ), allowing residuals, j . Item difficulties, β i , can be taken into account (2) using person-by-item covariates (Wj i ) and item covariates (Xi ), allowing residuals, i . The EMMRIRM is written as follows: (2) (2) logit P yj ig = 1| j , i , βg(3) =λ+ K k=1 where δ k · Wj i.k + R r=1 ϑ r · Zj.r + H h=1 (2) (2) γ h · Xi.h + j − i − βg(3) , (9) SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 837 • • • • • • • k is an index for a covariate of person-by-item interaction (k = 1, . . . , K), r is an index for a covariate of person (r = 1, . . . , R), h is an index for a covariate of item (h = 1, . . . , H ), Wj i.k is a person-by-item covariate k, Zj.r is a person covariate r, Xi.h is an item covariate h, δ k = [δk1 , . . . , δkd , . . . , δkD ] is the dimension-specific effects of the person-by-item covariates, • ϑ r = [ϑr1 , . . . , ϑrd , . . . , ϑrD ] is the dimension-specific effects of the person covariates, • γ h = [γh1 , . . . , γhd , . . . , γhD ] is the dimension-specific effects of the item covariates, • (2) j = [j 1 , . . . , j d , . . . , j D ] is a multidimensional residual ability parameter, and • i = [i1 , . . . , id , . . . , iD ] is a multidimensional residual item difficulty parameter. (2) (2) (2) In the EMMRIRM, multidimensional residuals exist for both persons ( j ) and items ( i ) to take into account unexplained variances in each dimension-specific person and item dimen(2) sions, respectively, at Level 2. j is assumed to follow a multivariate normal (MN) distribution: j = [ j 1 , . . . , j d , . . . , j D ] ∼ MN(μ(d×1) , Σ(d×d) ). The item group specific item residual, (2) (2) id , is assumed to follow a normal (N ) distribution: id ∼ N (μd , σd ). To identify the model, the means of all random effects are set to 0. Figure 2 depicts the EMMRIRM, Equation (9). In the figure, the arrows from covariates to latent variables denote regression coefficients. At Level 2, a dimension-specific residual item (2) difficulty, i = [i1 , . . . , id , . . . , iD ] , is considered as random over items within a dimension (2) and a dimension-specific residual ability, j = [j 1 , . . . , j d , . . . , j D ] , is also considered as random over persons. 3.3. Parameter Estimation MMRIRM and EMMRIRM have both crossed and nested random effects for binary responses. Maximum likelihood estimation of models for binary responses is challenging because it requires estimation with numerical or Monte Carlo integration due to the marginal likelihood not having a closed form. The computational burden is low in cases where random effects and their integrals are nested (e.g., Rabe-Hesketh, Skrondal, & Pickles, 2005), but this is not the case for crossed random effects, which require evaluation of high-dimensional integrals. Three solutions exist for crossed random effect models with binary responses: (1) approximation of the integral with numerical integration techniques such as (adaptive) Gauss–Hermite quadrature and Monte Carlo integration, (2) approximation for the integrand, and (3) simulation-based methods such as Markov chain Monte Carlo (MCMC). An overview of these three estimation methods has been provided by Cho and Rabe-Hesketh (2011) and Cho, Partchev, and De Boeck (2012). Explanatory item response models are often presented as generalized linear and nonlinear mixed-effects effect (De Boeck & Wilson, 2004). The MMRIRM and EMMRIRM are also special cases of generalized linear mixed-effects models. All models in this study are fit with the lmer function in the R lme4 library (Bates, Maechler, & Bolker, 2011), which has been used for linear and generalized linear mixed-effects models. It is a flexible software that can be used to estimate parameters of various Rasch-family item response models including those with crossed random effects (see De Boeck, Bakker, Zwitser, Nivard, Hofman, Tuerlinckx, & Partchev, 2011). The lmer function is based on the Laplace approximation as an approximation for the integrand and is, therefore, computationally efficient for high-dimensional item response models compared to an approximation for the integral and simulation-based methods such as MCMC (Cho et al., 2012). The conditional modes of the random effects will be extracted using the R extractor function ranef. 838 PSYCHOMETRIKA F IGURE 2. Diagram for explanatory multidimensional multilevel random item response model. 4. Method 4.1. Data Description 4.1.1. Samples In this study, 172 participants (98 seventh graders, 74 eighth graders, 83 males) from a suburban middle school had complete data. The school had 11 % minority students and 5 % of students receiving support throughout the free and reduced lunch program (Tennessee Department of Education Report Card, 2011). These data were also analyzed and discussed substantively in a paper by Goodwin, Gilbert, and Cho (2013). 4.1.2. Measures In this study, three components constitute the dimensions of lexical representations: decoding, spelling, and meaning. The first source of information related to lexical representations is the ability to decode the word from text. For example, when a reader sees the word stratagem within a text, does it bring to mind a phonological representation such that the reader can pronounce stratagem? The second source of information related to lexical representations is spelling. Spelling involves translating a phonological representation of a word (i.e., the pronunciation of stratagem) into an orthographic representation of the word (i.e., the correct spelling of the word stratagem). The third source of information is self-perception of knowing the meaning of the word, which implies knowing the word’s definition, context, and connotation within one’s lexicon that can be used to understand what meaning is being communicated by that word (i.e., what meaning is being communicated by stratagem). The three subtasks were formulated to isolate the three components. Persons were asked to do the three different tasks with 39 stimulus words (i.e., item groups). The stimulus words SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 839 were morphologically complex words such that each word contained a root word and at least one affix. Words that are derived from root words are dubbed “derived words.” After piloting derived words from Carlisle and Katz (2006), we adapted the word list because of ceiling effects. In our adaptation, we added words from middle school texts that were less frequent and, therefore, less likely to experience ceiling effects. The list of these stimulus words is shown in the Appendix. The 39 words were presented in 3 different tasks, which equals 117 unique items (39 stimulus words × 3 tasks). Decoding of the Derived Word (DD) Participants were shown each of the 39 derived words separately. They were asked to read each word aloud into a tape recorder. Correct (score of 1)/incorrect (score of 0) scores were obtained by a research staff member listening to the audio files. Chronbach’s alpha for these 39 items was 0.85. Spelling of the Derived Word (SD) A research staff member read each of the 39 words aloud to a group of participants. For words that had a homophone (e.g., precedent), a sentence was used to clarify the meaning of the target word. Participants wrote their spellings on a provided answer sheet and each was scored as correct (score of 1) or incorrect (score of 0). Cronbach’s alpha for these 39 items was 0.87. Derived words were spelled before they were read to minimize the possibility that students were primed with visual forms of the words prior to reading them. Measures were administered within a short amount of time in order to assess student’s knowledge about a word at a single point in time. Meaning of the Derived Word (MD) (i.e., Self Perception) A research staff member read each of the 39 words aloud to a group of participants. After each word was read, participants were asked to rate their knowledge of the word by marking a box associated with no, some, or full knowledge. Dichotomous scoring was applied such that no knowledge was given a value of 0 and some or full knowledge was given a value of 1. Cronbach’s alpha for these 39 items was 0.87. Due to limits in testing time, self-perception of knowledge was used as a proxy for knowledge of the word’s meaning. Administration of spelling and decoding items was separated by at least one week, but items were administered within a short period of time in order to gain a snapshot of word knowledge at a specific point in time (Tyler & Nagy, 1989). 4.1.3. Covariates Descriptive statistics of person-by-word covariates, person covariates, and word (i.e., item group) covariates are shown in Table 1. Person-by-item covariates vary across both persons and items, person covariates vary across persons but not across items, and item covariates vary across items, but not across persons. 4.1.4. Person-by-Word Covariates The 39 root words that corresponded to the 39 derived words were presented in 3 different tasks. These measures were presented after derived-word tasks. We acknowledge that it would be a concern that derived-word tasks potentially prime root-word tasks. However, the developmental and instructional progression of word reading is that simple words (e.g., root words) are taught and learned before complex words (e.g., derived words) and, therefore, priming was minimized by presenting derived-word tasks prior to rootword tasks rather than the reverse. Decoding of the Root Word (DR) Procedures for decoding of the root are the same as procedures for decoding of the derived. Cronbach’s alpha for 34 items (5 items had no variability) was 0.76. Decoding Derived [DD] Spelling Derived [SD] Meaning Derived [MD] Decoding Root [DR] Spelling Root [SR] Meaning Root [MER] Morphology Root [MOR] Morphological Awareness [MA] Reading Comprehension [RC] Vocabulary [V] Consistency [CON] Frequency Derived [FD] Frequency Root [FR] Num. of Morphemes [NM] Orth Phon Opaqueness [OPO] Phon Opaqueness [PO] Semantic Opaqueness [SEM] Person-by-word Person Word Covariates Variable Measures Orth: Orthographic; Phon: Phonological. TABLE 1. 0.50 37.80 45.51 2.79 0.26 0.46 0.31 58.39 563.06 561.51 0.91 0.81 0.89 0.65 0.75 0.62 0.74 Mean 0.15 8.83 10.35 0.73 0.44 0.51 0.47 9.60 36.10 33.79 0.29 0.40 0.31 0.48 0.43 0.48 0.44 SD Descriptive statistics of measures and covariates. 0.27 22.10 20.80 2 0 0 0 19 468 458 0 0 0 0 0 0 0 Min 0.80 53.00 65.70 5 1 1 1 69 660 668 1 1 1 1 1 1 1 Max Proportion Log Log Raw score Dichotomous Dichotomous Dichotomous Raw score Extended scale score Extended scale score Dichotomous Dichotomous Dichotomous Dichotomous Dichotomous Dichotomous Dichotomous Scale 840 PSYCHOMETRIKA SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 841 Spelling of the Root Word (SR) Procedures for spelling of the root are the same as procedures for spelling of the derived. Root words were spelled after the derived words to minimize priming. Cronbach’s alpha for 38 items (1 item had no variability) was 0.85. Meaning of the Root Word (MER) (i.e., Self Perception) Procedures for meaning of the root are the same as procedures for meaning of the derived. Again, this task was completed after the derived-word task to minimize priming. Cronbach’s alpha for 34 items (5 items had no variability) was 0.80. Morphology of the Root Word (MOR) A research staff member read each of the 39 words aloud to a group of participants and asked participants to write down as many morphologically related words as possible. Students were provided with the example of forget and told morphologically related words would be words that shared the root and its meaning such as forgetful, forgetting, forgettable, etc., but not words that just had similar meaning and, therefore, not words or phrases such as not remember, overlook, or put out of your mind. Responses with no related words were scored as 0. Responses with one or more related words were scored as 1. Accuracy of spelling was not factored in and, therefore. any related words that were phonologically possible were scored as correct. Cronbach’s alpha for these 39 items was 0.97. 4.1.5. Person Covariates Morphological Awareness (MA) Morphological awareness scores comprised 70 items from three tasks adapted from the literature: Test of Knowledge of Derivational Relationships (Mahony, 1994), Derivational Suffix Test (Singson, Mahony, & Mann, 2000), and Syncat-real Test, also known as the Real Word Derivational Suffix Task (Mahony, 1994). A research staff member read aloud directions and examples, and participants were asked to complete the written tasks silently at their own pace. The staff member read items aloud to participants if participants requested such support in order minimize confounds with decoding, though the actual number of requests was minimal. Items were scored as correct (1) or incorrect (0). Cronbach’s alpha was 0.92. The first task included 25 of the 42 original items from the Test of Knowledge of Derivational Relationships from Mahony (1994). Participants were asked whether pairs of words were morphologically related. The second task included 20 items from the pseudoword written section of the Derivational Suffix Test from Singson et al. (2000). Each item was a nonsense root with real suffixes. Participants were asked to choose the most appropriate pseudoword to complete the given sentence. The third task included 25 items from Mahony’s (1994) Syncat-real test. Participants were asked to choose the best word to complete the sentence, with answer choices containing four derivations of the same stem. Reading Comprehension (RC) Reading comprehension was assessed with the Gates– MacGinitie Standardized Test of Reading Comprehension, Form S of Levels 7–9 (MacGinitie, MacGinitie, Maria, & Dreyer, 2000). This test consisted of 11 passages with a total of 48 multiple choice comprehension questions. Extended scale scores were used in this analysis. Extended scale scores provide a continuous scale that can be interpreted across test forms and age groups. Test developers report that the median performance for both the Gates Comprehension and Vocabulary tests is 550 for seventh-grade students and 575 for eighth-grade students; thus, our sample was average-performing for their grade range, although variability was evident from each standard deviation. Vocabulary Knowledge (V) Vocabulary knowledge was assessed with the Gates–MacGinitie Standardized Test of Reading Vocabulary in a multiple-choice test (MacGinitie et al., 2000). Participants silently read an underlined word within a phrase and then choose the word or phrase that means most nearly the same. Extended scale scores were used in this analysis. 842 PSYCHOMETRIKA 4.1.6. Word Covariates Consistency (CON) Consistency is a measure of how well the pronunciation of a word matches the pronunciation of other similarly spelled words in the English language. Consistency of derived words was established by first parsing each word into individual rime units (the vowel and following consonants, if any, in a syllable). For each rime unit in the word, we calculated the percentage of rimes in the CELEX database (Baayen, Piepenbrock, & van Rijn, 1993) that had the same pronunciation as the derived word as opposed to a different pronunciation. Then we averaged the consistency ratings across all rimes in the word. For each phonogram (i.e., -er in the first syllable of the target word thermosphere), a proportion was created to represent the rate of other words in which the phonogram is pronounced /3/ like in thermosphere versus in all words in which it occurs but is pronounced as anything besides /3/, like in the word sergeant. Frequency of Derived Word (FD) The Educator’s Word Frequency Guide (Zeno, Ivens, Millard, & Duvvuri, 1995) was used to code the frequency of the derived word; this database consists of a corpus of 60,527 sample texts. Frequency was reported in standard frequency index (SFI), which is the logarithmic transformation of U (the frequency of the type per million tokens weighted by how widely dispersed [D] a word is across subject areas). Frequency of Root Word (FR) as FD. Root-word frequency was established in the same manner Number of Morphemes (NM) The number of morphemes were coded for each derived word based on the English Lexicon Project (Balota, Yap, Cortese, Hutchison, Kessler, Loftis, Neely, Nelson, Simpson, & Treiman, 2007, http://elexicon.wustl.edu/) to represent the number of meaning units contained in each item. For example, telegraph was coded as two morphemes (tele + graph), whereas unquestionably was coded as five morphemes (un + quest + tion + able + ly). Orthographic/Phonological Opaqueness (OPO) Two dummy variables were created to compare transparent words to words that are orthographically and phonologically opaque (OPO) or simply phonologically opaque (PO). Transparent derived words have no orthographic or phonological change from their root word (e.g., graph: telegraph); they made up the reference group and were given values of 0 in OPO and PO. Derived words with both a phonological and orthographic change from their root word were given a value of 1 in OPO and 0 in PO. Following the suggestion of McCutchen, Logan, and Biangardi-Orpe (2009), we considered items with deletion of the final e (e.g., convene: convention) as orthographically stable because of the high degree of orthographic overlap in the root words and the regularity of this rule in English. Derived words with only a phonological change from their root word were given a value of 0 in OPO (e.g., verify: veritable) and 1 in PO (e.g., reside: residence). Semantic Opaqueness (SEM) A dummy variable was created to classify derived words as semantically opaque or transparent. Following Nagy and Anderson (1984), each word was coded on a 6-point scale and words with scores of 0–2 were classified as semantically transparent (0) and words with scores of 3–5 were classified as semantically opaque (1). Derived words in the latter category had meanings that could not be inferred from the meaning of their root words with minimal or reasonable help. A more detailed data description can be found in Goodwin, Gilbert, Cho, and Kearns (2012). SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 843 TABLE 2. Item structure for three confirmatory models and model selection. Models 1D Task Decoding Spelling Meaning Model selection No. of parameters Log-likelihood AIC BIC 1 1 1 2D 1 1 0 6 −7876.4 15765 15812 3D 0 0 1 1 0 0 9 −7692.2 15402 15473 0 1 0 0 0 1 13 −7654.8 15336 15438 4.2. Confirmatory Measurement Models and Model Selection Three hypotheses will be considered to make use of confirmatory models. Perfetti’s (2007) lexical quality hypothesis suggests multiple person dimensions of word knowledge comprise lexical representations. In our study, lexical representations were represented by three person dimensions of word knowledge, including decoding, spelling, and meaning knowledge. First, we explored a model where these dimensions of knowledge represented a single person dimension (i.e., lexical representation). We hypothesized this single dimension because research suggests interdependence amongst decoding, spelling, and meaning word-knowledge (Harm & Seidenberg, 2004; Perfetti, 2007; Vellutino et al., 2007). Next, we explored a two-dimensional model where lexical representations were represented by (decoding + spelling) and meaning. We hypothesized this model due to research supporting the self-teaching hypothesis, which shows evidence for orthographic (i.e., spelling) representations boosting phonological (i.e., decoding) representations (Bowey & Muller, 2005; Cunningham, Perry, Stanovich, & Share, 2002; Nation, Angell, & Castles, 2007). This research suggests that decoding and spelling may be more highly related to each other than to meaning, hence the need for the two-dimensional model that merges decoding with spelling. Lastly, we hypothesized that lexical representations consist of three separate yet correlated person dimensions of knowledge because we hypothesized participants were likely to have different person dimensions of knowledge for different words. For example, a participant might know the meaning of a word, but not the spelling or decoding, etc. These person dimensions of knowledge are different, so we explored a three-dimensional model that acknowledged these distinctions. The data structure of one-, two-, and three-person-dimensional models (labeled Model 1D, Model 2D, and Model 3D, respectively) are shown in Table 2. The items of a specific task are coded as “1” if they are restricted to measure a person dimension and are coded as “0” otherwise in Table 2. The model selection among three confirmatory models was done based on MMRIRM (unconditional model without any covariates) using information criteria such as Akaike’s (1974) information criterion (AIC) and Schwarz’s (1978) Bayesian information criterion (BIC). After the selection of the “best” MMRIRM, explanatory models, EMMRIEMs, are considered by adding covariates to the MMRIRM. Table 2 also presents the model fit results. In the three confirmatory models, the dimensionspecific average parameters across persons and items and correlation between person dimensions were estimated.2 Model 3D shows the best model fit based on both AIC and BIC as indices of 2 The number of parameters in Model 1D are 6, including 3 variances of item dimensions, 1 variance of item-group dimension, 1 average logit value across persons and items, and 1 variance of person dimension; The number of parameters 844 PSYCHOMETRIKA overall model fit. Using standardized residuals based on results of Model 3D, item fit and person fit were investigated regarding how well Model 3D explains the responses to a particular item and person, respectively. There are 2 items out of 117 items, which have standardized residuals with values less than −2 and greater than 2 across persons as an item fit and there are 11 persons out of 172 persons which have standardized residuals with values less than −2 and greater than 2 across items as a person fit. These item and person fit results indicate that Model 3D explains the responses relatively well. Accordingly, Model 3D is chosen as a measurement model (a unconditional model); as such, covariates are added to it in structural models. 4.3. Analysis Continuous person covariates were grand-mean centered with respect to persons and continuous item covariates were grand-mean centered with respect to items before entry into structural models. Dichotomous covariates were coded as dummy variables. First, the null model without any covariates (which will be abbreviated “Model 3D-0” throughout the paper) was fitted for research question 1. Next, 12 person-by-item covariates (3 dimensions × 4 person-by-word covariates) were added to Model 3D-0 (which will be abbreviated “Model 3D-1” throughout the paper) to investigate the patterns of significance without person and item covariates. Finally, 9 person covariates (3 dimensions × 3 person covariates) and 21 item covariates (3 dimensions × 7 word covariates) were added to Model 3D-1 (which will be abbreviated “Model 3D-2” throughout the paper) for research questions 2 and 3. Codes are available from the first author upon request. 5. Results Results of fixed effects in three Model-3Ds were reported in Table 3 and those of population parameters of random effects in the three models were reported in Table 4. In Table 3, labels of covariates were created as follows: DD · ∗ is covariate for decoding dimension, SD · ∗ is covariate for spelling dimension, and MD · ∗ is covariate for meaning dimension, where ∗ is the covariate name described in Table 1. 5.1. Consequences of Ignoring Multilevel Item Structure Based on results of Model 3D-0, latent intraclass correlations, ρ(IG) and ρ(I ), were cal2.93 = 0.321. culated: ρ(IG) = (1.63+0.95+0.33)+2.93π 2 = 0.640 and ρ(I ) = π2 (1.63+0.95+0.33)+2.93+ 3 (1.63+0.95+0.33)+2.93+ 3 These results indicate that latent item responses for the same item group are more correlated when items are in the same item group than when items are in the different item groups (ρ(IG) > ρ(I )). Furthermore, 32.1 % of total item difficulty variances are explained by item groups, implying that there is unignorable dependency in latent item responses among items nested within item groups. Model 3D-0 fits better to the data than Model 3D-0 without σβd based on the information criteria, AIC and BIC. Variances of random item difficulties (σβd ) and predicted item difficulties using empirical Bayes prediction between Model 3D-0 without βd and Model 3D-0 were compared to investigate the consequences of ignoring multilevel item structure empirically. Variances of random in Model 2D are 9, including 3 variances of item dimensions, 1 variance of item-group dimension, 2 average logit values across persons and items, 2 variances of person dimensions, and 1 correlation between person dimensions; The number of parameters in Model 3D are 13, including 3 variances of item dimensions, 1 variance of item-group dimension, 3 average logit values across persons and items, 3 variances of 3 person dimensions, and 3 correlations between person dimensions. 845 SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN TABLE 3. Estimates and standard errors (SE) of fixed effects within the explanatory multidimensional multilevel random item response model. Models Model 3D-0 Model 3D-1 Model 3D-2 Est. SE z Est. SE z Est. SE z 1.827 0.855 2.069 0.307 0.332 0.364 5.962 2.575 5.685 0.667 −0.301 0.889 0.334 0.362 0.393 1.983 −0.831 2.260 1.317 0.942 1.725 0.532 0.602 0.578 2.475 1.564 2.984 0.667 0.009 0.461 0.188 0.002 0.472 0.643 0.287 0.143 0.031 0.784 0.505 0.130 0.106 0.133 0.093 0.140 0.103 0.138 0.088 0.140 0.115 0.143 0.100 5.125 0.084 3.476 2.026 0.013 4.595 4.651 3.265 1.024 0.266 5.482 5.048 0.656 −0.035 0.445 0.183 0.001 0.454 0.634 0.268 0.101 0.052 0.777 0.460 0.129 0.106 0.131 0.093 0.141 0.103 0.138 0.088 0.141 0.115 0.144 0.101 5.077 −0.333 3.387 1.974 0.007 4.409 4.593 3.047 0.712 0.449 5.390 4.570 Persons ϑ11 [DD·MA] ϑ21 [DD·RC] ϑ31 [DD·V] ϑ12 [SD·MA] ϑ22 [SD·RC] ϑ32 [SD·V] ϑ13 [MD·MA] ϑ23 [MD·RC] ϑ33 [MD·V] 0.058 0.005 0.013 0.040 0.007 0.006 −0.015 0.010 0.014 0.010 0.003 0.003 0.011 0.003 0.003 0.015 0.004 0.004 6.080 1.616 0.995 3.711 2.187 1.871 −1.012 2.370 3.274 Item γ11 [DD·CON] γ21 [DD·FD] γ31 [DD·FR] γ41 [DD·NM] γ51 [DD·OPO] γ61 [DD·PO] γ71 [DD·SEM] γ12 [SD·CON] γ22 [SD·FD] γ32 [SD·FR] γ42 [SD·NM] γ52 [SD·OPO] γ62 [SD·PO] γ72 [SD·SEM] γ13 [MD·CON] 0.027 0.081 0.046 −0.144 −0.727 −1.254 0.511 −0.728 0.089 0.051 0.513 −1.877 −1.433 −0.232 0.040 1.539 0.028 0.023 0.311 0.623 0.584 0.504 1.755 0.032 0.026 0.356 0.712 0.666 0.575 1.644 0.017 2.892 2.013 −0.462 −1.166 −2.149 1.013 −0.415 2.798 1.924 1.438 −2.636 −2.152 −0.403 0.024 Intercept λ1 [DD] λ1 [SD] λ1 [MD] Persons-by-items δ11 [DD·DR] δ21 [DD·SR] δ31 [DD·MER] δ41 [DD·MOR] δ12 [SD·DR] δ22 [SD·SR] δ32 [SD·MER] δ42 [SD·MOR] δ13 [MD·DR] δ23 [MD·SR] δ33 [MD·MER] δ43 [MD·MOR] 846 PSYCHOMETRIKA TABLE 3. (Continued.) Models Model 3D-0 Est. SE Model 3D-1 z Est. γ23 [MD·FD] γ33 [MD·FR] γ43 [MD·NM] γ53 [MD·OPO] γ63 [MD·PO] γ73 [MD·SEM] SE Model 3D-2 z Est. SE z 0.184 0.061 0.547 −1.486 −1.151 0.476 0.030 0.025 0.333 0.664 0.628 0.540 6.129 2.501 1.646 −2.237 −1.832 0.882 (3) item difficulties are larger when the group level random item difficulty (βd ) is ignored. Vari(3) (3) ance estimates without βd are 5.161, 3.871, and 3.383 while those with βd are 1.627, 0.953, and 0.340 for decoding, spelling, and meaning dimensions, respectively. Pearson correlations between predicted item difficulties from the two different models are 0.508, 0.571, and 0.687 for decoding, spelling, and meaning dimensions, respectively, which indicates that item difficulties can be quite different when the dependency among items is ignored. 5.2. Consequences of Ignoring Random Residuals There were large unexplained variances in item difficulties and abilities even after variabilities were explained by person-by-item, person, and item covariates. When person and item random residuals are ignored in Model 3D-2, results of Model 3D-2 without residuals (results not reported in a table) differed in two ways from those of Model 3D-2, which are consistent with previous findings (Janssen et al., 2004). First, the effects of person and item covariates were smaller in Model 3D-2 without residuals. Second, the standard errors of these estimates are smaller in Model 3D-2 without residuals than in Model 3D-2. Consequently, significance testing results of the effects of covariates were different between Model 3D-2 without residuals and Model 3D-2. More significant covariate effects were found in Model 3D-2 without residuals than in Model 3D-2. 5.3. Answers to Research Questions Research Question 1. The fixed effects from Model 3D-0 in Table 3 indicated for the ‘average’ 1 person and item, the probability of knowing the meaning of a derived word ( 1+exp(−2.069) = 0.89) 1 = 0.86) was somewhat higher than the probability of decoding the derived word ( 1+exp(−1.827) 1 = 0.70). The which was higher than the probability of spelling the derived word ( 1+exp(−0.855) population parameter estimates of random effects reported in Table 4 suggest that variability was evident for the lexical representation dimensions across persons, items, and item groups. The correlations for the person dimensions were also listed in Table 4. Among persons, as in Model 3D-0, decoding and spelling were correlated at 0.843, decoding and meaning were correlated at 0.494, and spelling and meaning were correlated at 0.590. For research question 1, these correlations indicated that decoding, spelling, and meaning of derived words are separate, yet correlated person dimensions of lexical representations. Comparing variances and correlations of person dimensions in Model 3D-1 and Model 3D-2 (the explanatory models) with those of Model 3D-0 (the measurement model) revealed that individual differences in the three person dimensions were partly explained by dimension-specific person covariates and dimension-specific person-by-item covariates. Residual variances in person, item, and item group dimensions were still present in Model 3D-1 and Model 3D-2 as unexplained variances. TABLE 4. 2.933 Item groups σβ [Variance] 1.329 1.548 2.016 1.627 0.953 0.340 Decoding Spelling Meaning Var. Items σβ1 [Variance] σβ2 [Variance] σβ3 [Variance] Persons Σθ [Var.-Corr.] 0.843 0.494 Decoding Corr. Model 3D-0 0.590 Spelling Decoding Spelling Meaning 2.527 1.638 0.949 0.330 1.103 1.253 1.733 Var. 0.812 0.404 Decoding Corr. Model 3D-1 0.510 Spelling Decoding Spelling Meaning 1.432 0.427 0.772 0.220 0.511 0.715 1.333 Var. 0.673 0.237 Decoding Corr. Model 3D-2 Estimates and standard errors (SE) of population parameters of random effects in explanatory multidimensional multilevel random item response model. 0.358 Spelling SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 847 848 PSYCHOMETRIKA TABLE 5. Item difficulties from Model 3D-0. Derived words Item level Decoding benefactor thermosphere stratagem financially biometric phonetic aquascape telegraph distinguish irrelevant circumscribe significance dormant expeditious veritable residence economical usurpatory irascible tranquility perception migratory dictator convention disingenuous discretionary sagacity diagnostician disenchantment reinstate extremity restriction covetousness precedent hypersensitivity unquestionably malignant composite nationalistic Spelling Item-group level Meaning β Rank β Rank β Rank β Rank −0.932 −0.754 −0.516 −0.430 −0.239 −0.235 −0.211 −0.208 −0.206 −0.178 −0.177 −0.160 −0.146 −0.142 −0.116 −0.102 −0.027 −0.018 0.002 0.034 0.037 0.040 0.056 0.066 0.095 0.102 0.114 0.202 0.244 0.248 0.256 0.283 0.315 0.320 0.328 0.419 0.467 0.723 0.760 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1.196 0.385 0.855 1.319 −0.946 0.711 −0.835 0.188 0.673 1.298 −0.305 1.339 0.408 0.549 0.454 0.986 −0.234 0.045 1.284 −0.291 −0.505 −0.968 −0.155 −0.561 0.212 −0.045 −0.182 0.249 −1.414 −0.734 −0.442 −0.800 0.032 0.069 −0.392 −0.768 −0.587 −1.005 −2.089 5 13 7 2 35 8 34 16 9 3 25 1 12 10 11 6 23 18 4 24 28 36 21 29 15 20 22 14 38 31 27 33 19 17 26 32 30 37 39 1.563 1.886 1.701 −0.659 1.855 0.815 2.262 −0.475 −1.091 −1.491 1.418 −1.800 −0.412 0.130 0.307 −1.840 −0.569 1.241 −0.093 −0.203 0.561 1.043 −1.406 0.021 0.480 0.210 1.141 −0.448 0.969 −0.025 −0.127 −1.153 0.189 −0.685 −0.941 −1.084 0.147 −1.786 −0.238 5 2 4 29 3 11 1 27 33 36 6 38 25 18 14 39 28 7 21 23 12 9 35 19 13 15 8 26 10 20 22 34 16 30 31 32 17 37 24 −1.554 −1.932 1.244 −0.841 −1.630 1.628 −0.312 −2.072 −1.677 −0.228 0.088 −0.505 −0.747 0.695 0.945 −1.166 −1.975 2.221 3.803 −0.970 −0.227 −0.749 −2.532 −1.121 2.337 1.119 2.482 1.702 −0.497 −0.157 0.625 −2.101 3.156 1.743 −0.070 −0.703 2.492 −0.065 −0.293 32 35 10 28 33 9 22 37 34 20 15 24 26 13 12 31 36 6 1 29 19 27 39 30 5 11 4 8 23 18 14 38 2 7 17 25 3 16 21 For descriptive purposes, predicted values of item difficulties were obtained from Model 3D-0 and are reported in Table 5. Table 5 shows that words were variable in their difficulty for each of the three item dimensions, controlling for the other dimensions. Item-group level difficulty can be interpreted as the word difficulty after accounting for the dimensionspecific item difficulties, i.e., decoding, spelling, and meaning difficulties, on the item side. Of the decoding items, controlling for spelling and meaning, the easiest were benefactor and thermosphere whereas the most difficult were nationalistic and composite. The easiest spelling items, SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 849 controlling for decoding and meaning, were nationalistic and disenchantment and the hardest were significance and irrelevant. Also, the easiest items to define, controlling for decoding and spelling, were residence and significance and the hardest were aquascape and thermosphere. Finally, after taking into account the difficulties of decoding, spelling, and meaning, the easiest words (item-groups) were dictator and restriction whereas the most difficult were irascible and covetousness. The word nationalistic is the epitome of item variability: it is the most difficult word to read (rank = 1), the easiest item to spell (rank = 39), and moderately difficult to define (rank = 24). After taking into account the difficulty of decoding, spelling, and defining nationalistic, the word nationalistic was a word of average difficulty (rank = 21). Substantively, the average ranking of nationalistic is intuitive when considering that it has approximately average scores on the item-group features (e.g., consistency, frequency, number of morphemes, etc.), which we expect to explain item-group dependencies. Research Question 2. Because we were interested in the effects of various aspects of root-word knowledge on dimensions of lexical representations, we added dimension-specific root variables to the model. Model 3D-1 was without the effects of person and item covariates whereas Model 3D-2 included all effects. Comparing the root effects across the models in Table 3 revealed that although the estimates were somewhat different between the two models, the same pattern of significance emerged. Coefficient interpretation is based on Model 3D-2 (see third column of Table 3 for results). Controlling for overall knowledge of lexical representations and the other root effects, root decoding was only significantly associated with derived decoding ( δ11 = 0.656, z = 5.077, p < 0.001) not derived spelling ( δ12 = 0.001, z = 0.007, p = 0.995) or meaning ( δ13 = 0.101, z = 0.712, p < 0.001). Similarly, controlling for all the other variables, root spelling was only significantly associated with its corollary, derived spelling ( δ22 = 0.454, z = 4.409, p < 0.001), not decoding ( δ21 = −0.035, z = −0.333, p = 0.739) or meaning ( δ23 = 0.052, z = 0.449, p = 0.653). On the other hand, root meaning and morphology were significantly associated with all three lexical representation dimensions: derived decoding ( δ31 = 0.445, z = 3.387, p < 0.001 and δ41 = 0.183, z = 1.974, p = 0.048, respectively), derived spelling ( δ32 = 0.634, z = 4.593, p < 0.001 and δ42 = 0.268, z = 3.047, p = 0.002, respectively), and derived meaning ( δ33 = 0.777, z = 5.390, p < 0.001 and δ43 = 0.460, z = 4.570, p < 0.001, respectively). Research Question 3. To address our research question regarding effects of person and item group characteristics on lexical representations, we added dimension-specific person and item group effects to the model and fit Model 3D-2. See the third column of Table 3 for results. Controlling for all other variables in the model, the person characteristic of morphological awareness was significantly associated with derived decoding ( ϑ11 = 0.058, z = 6.080, p < 0.001) and spelling ( ϑ12 = 0.040, z = 3.711, p < 0.001). Reading comprehension was significantly related to derived spelling ( ϑ22 = 0.007, z = 2.187, p = 0.029) and meaning ( ϑ23 = 0.010, z = 2.370, p = 0.018), controlling for all other effects, but, vocabulary knowledge was only related to derived meaning ( ϑ33 = 0.014, z = 3.274, p = 0.001). Controlling for the other variables in the model, three item characteristics were significantly related to derived decoding. Frequency of the derived ( γ21 = 0.081, z = 2.892, p = 0.004) and root word ( γ31 = 0.046, z = 2.013, p = 0.044), plus phonological opaqueness ( γ61 = −1.254, z = −2.149, p = 0.032). Frequency of the word and phonological opaqueness were also significantly related to derived spelling ( γ22 = 0.089, z = 2.798, p = 0.005, and γ62 = −1.433, z = −2.152, p = 0.031, respectively), in addition to orthographic-phonological opaqueness ( γ52 = −1.877, z = −2.636, p = 0.008). Frequency of the derived word ( γ23 = 0.184, z = 6.129, p < 0.001), frequency of the root word ( γ33 = 0.061, z = 2.501, p = 0.012), and orthographic-phonological opaqueness ( γ53 = −1.486, z = −2.237, p = 0.025) were significantly related to derived meaning, controlling for all other variables in the model. 850 PSYCHOMETRIKA 6. Discussion In this study, multidimensional item response models were extended to a model with multilevel random item parameters to investigate how aspects of morphological knowledge contribute to lexical representations for different persons, items, and item groups, taking a different approach from most researchers in reading education. 6.1. Summary of Empirical Study First, results suggest that lexical representations are best represented by three separate but correlated dimensions of decoding, spelling, and meaning rather than two (decoding + spelling and meaning) or one dimension (decoding + spelling + meaning). Second, root-word (e.g., isolate) knowledge differentially contributes to the decoding, spelling, and meaning represented within the lexical representation of a related derived word (e.g., isolation). Controlling for other covariates in the model, knowledge of root-word decoding, meaning, and morphological relatives were significant covariates of derived-word decoding. Knowledge of root-word spelling, meaning, and morphological relatives were significantly related to derived-word spelling. Knowledge of root-word meaning and morphological relatives were significantly related to derived-word meaning. Our third finding is that variability in lexical representations that remains after controlling for root-word knowledge can partly be explained by person and item characteristics. Person characteristics of reading comprehension, morphological awareness, and vocabulary knowledge and item group features of frequency of the derived and root word, orthographic-phonological opaqueness, and phonological opaqueness were significantly related to knowledge of lexical representations, although different patterns of significant relations emerged for the separate dimensions. A more in-depth discussion of the substantive results can be found in Goodwin et al. (2012). 6.2. Methodological Discussions Main extensions of the multidimensional item response models presented in this paper were to allow random difficulty parameters at both item and item group level. The purpose of having random item parameters at each level was different. Item-level random item parameters were used to model unexplained variance left over when item related covariates were used to explain variations in item difficulties. Group-level random item parameters were employed to model dependency in item responses among items having the same item stem. Consequences of ignoring the random residuals and the multilevel item structure were shown. Several methodological justifications for using the EMMRIRM that we presented in this study are described below. Measurement Model To check if there were large variations in item discriminations among items in our empirical data, a 2-parameter item response model with random item difficulty and discrimination parameters was fit using a hierarchical Bayesian approach. Normal distributions on item parameters were used as priors. Half-t priors on standard deviation were used as hyperpriors, recommended for small sample sizes (Gelman & Hill, 2007, p. 428). Mean and variance of item discriminations were 1.353 and 0.009, respectively, while those of item difficulties were −1.305 and 3.773, respectively. The small variance of item discriminations indicated that the quality of items to discriminate people was not substantially different among items and there was small variability in item discriminations to be explained by person-by-item and item covariates. Thus, a model without discriminations was chosen as a measurement model in explanatory item response models. SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 851 Omission of Covariates The simultaneous estimation of effect of covariates and population parameters of random parameters was done in explanatory item response models. Even though the measurement error of the estimated parameters can be considered in estimating the effect of covariates, parameters estimates can be affected by the inclusion of covariates (Hickendorff, Heiser, van Putten, & Verhelst, 2009; Verhelst & Verstralen, 2002). In our illustration, there were person covariates and person-by-item covariates to explain a multidimensional person parameter and dimension-specific item covariates and person-by-item covariates to explain a multidimensional item parameter. Because we allow the multidimensional residuals for both persons and items in the model, the omission of additional covariates does not lead to the problem of predicting individual person and item parameters, respectively. However, the significance testing results for the effects of covariates can be different, depending on sets of covariates that are considered. Possible alternative models with different sets of covariates were fitted. However, interpretations of the effects of covariates were similar among the alternative models we considered. Downward Bias in Using Laplace Approximation Laplace approximation implemented in the R lmer function was used to estimate model parameters. Laplace approximation can perform poorly for a dichotomous response with small cluster sizes (e.g., the number of items), with a downward bias in the estimated variance components (Cho & Rabe-Hesketh, 2011; Goldstein & Rasbash, 1996; Raudenbush, Yang, & Yosef, 2000; Rodríguez & Goldman, 1995). The downward bias in variance components can lead to overestimated correlations (Cho et al., 2012). However, our empirical data set has a large number of items and item groups (117 items and 39 item groups) so downward bias is not of much concern, given results of previous simulation studies (Cho & Rabe-Hesketh, 2011; Cho et al., 2012). Distributional Assumption on Item Difficulty Parameters In the R lmer function, all random effects for persons and items are assumed to be multivariate normal or normal distribution. It is usual to assume person parameters to be from a normal or multivariate distribution as this often occurs in marginal maximum likelihood estimation for item response models. However, it may not be usual to use random item parameters in most IRT applications. A normal distribution has been used for item difficulties in random item response model applications in various estimation methods (Geerlings, Glas, & van der Linden, 2011; Glas & van der Linden, 2003; Sinharay et al., 2003). It was shown that item difficulties can be predicted using empirical Bayes prediction precisely with the normality assumption even when the “true” population distribution departs from a normal distribution (Hofman & De Boeck, 2010). In addition, it was shown that item difficulties with a normal approximation are similar to those with a discrete approximation as a nonparametric approach (Cho & Rabe-Hesketh, 2011). Multidimensional Test Design We described the EMMRIRMs for our specific test design, a between-item design where an item is allowed to load on only one dimension. A within-item design such that an item is allowed to load on more than one dimension is not applicable to our empirical data set. For example, one item such as “Please read the word nationalistic” would not load onto spelling nationalistic or saying whether you know what nationalistic means. When a test design is a within-item design, the model description, the model identification constraints, parameter interpretation, and calculation of intraclass correlations for items and/or item groups will be different from the descriptions presented in this paper. The Number of Parameters in Random Item Modeling A multidimensional item response model requires large enough sample sizes to estimate model parameters as a complex model. However, with a random item approach such as in the EMMRIRM, the number of item parameters to be estimated is largely reduced compared to the numbers in a fixed item approach. For 852 PSYCHOMETRIKA example, in the case of the measurement model of our application (Model 3D), the number of item parameters is 4 including 3 variances of item dimensions and 1 variance of item-group dimension in the random item approach, while it is 156 including 117 (39 × 3) item difficulties and 39 item-group difficulties in the fixed item approach. In the case of the most complex model, Model 3D-2, the total number of parameters is 55 including 3 variances of 3 item dimensions, 1 variance of item-group dimension, 3 average logit values across persons and items (intercepts), 3 variances of 3 person dimensions, 3 correlations between person dimensions, and 42 effects of covariates. Given 172 participants and structures of various covariates in the application, we did not run into any estimation problems such as convergence problem to estimate the 55 parameters of Model 3D-2. However, future research is required to investigate minimum sample sizes to estimate model parameters via extensive simulation studies. Model Extensions The EMMRIRMs can be extended to 2-parameter or 3-parameter item response models. Unfortunately, the R lmer function we used in this study cannot be used for nonRasch models even though there is a procedure to implement 2-parameter random item response models using the R lmer function (Jeon & Rabe-Hesketh, 2012). It will be more challenging to estimate model parameters when all kinds of item parameters (i.e., difficulty, discrimination, and guessing) are random. In this case, the high-dimensional integration should involve maximum likelihood estimation. On the other hand, it may be easy to use a hierarchical Bayesian method with MCMC for complex random item response models (e.g., Geerlings et al., 2011). However, use of priors and hyperpriors on random item discriminations are not established yet in the psychometric literature (see Cho & Rabe-Hesketh, 2012 for survey of use of priors and hyperpriors on a random item discrimination). Model Selection AIC and BIC were used to compare alternative models in the empirical study. The appropriate use of information criteria in latent variable models is an ongoing area of research (e.g., Vaida & Blanchard, 2005). Further study is required to investigate the performance of AIC and BIC when comparing models which differ with respect to the use of covariates and a different number of dimensionality for random item response models having crossed random effects. 6.3. Empirical Study Implications for Practice The findings of this study have educational implications. First, results suggest that students can improve the quality of their lexical representations through developing multiple dimensions of word knowledge. For example, learning how to decode, spell, and determine the meaning of a word will likely support lexical representations. Additionally, developing root-word knowledge would be expected to further support lexical representations, perhaps through morphological instruction that teaches students about root words. Based on results with item features, person characteristics, and their interactions, students will also need additional support with words that are opaque and less frequent; and readers that struggle with vocabulary, comprehension, and morphological awareness will need additional support in developing representations of higher quality. Overall, educators can help students improve their lexical representations by providing multiple exposures to words that develop multiple dimensions of word and root-word knowledge while also developing more general skills like better vocabulary knowledge, morphological awareness, and reading comprehension. Because this study deepens understanding of lexical representations, educators and researchers can use these findings to design more effective interventions to support the development of high quality lexical representations, which may, in turn, improve text comprehension. SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 853 Appendix: List of Stimulus Words Num. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 List of root words List of derived words benefit sphere strategy finance meter phone aqua graph distinct relative scribe sign dorm expedite verify reside economy usurp irate tranquil perceive migrate dictate convene genuine discrete sage diagnose enchant state extreme strict covet precede sense quest malign compose nation benefactor thermosphere stratagem financially biometric phonetic aquascape telegraph distinguish irrelevant circumscribe significance dormant expeditious veritable residence economical usurpatory irascible tranquility perception migratory dictator convention disingenuous discretionary sagacity diagnostician disenchantment reinstate extremity restriction covetousness precedent hypersensitivity unquestionably malignant composite nationalistic References Adams, R.J., Wilson, M., & Wu, M. (1997). Multilevel item response models: an approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. Anglin, J.M. (1993). Vocabulary development: a morphological analysis. Monographs of the Society for Research in Child Development, 58, 1–166. Serial #238. Balota, D.A., Yap, M.J., Cortese, M.J., Hutchison, K.A., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson, G.B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39, 445–459. Bates, D., Maechler, M., & Bolker, B. (2011). lme4: linear mixed-effects models using S4 classes. R package version 0.999375-39. http://CRAN.Rproject.org/package=lme4. 854 PSYCHOMETRIKA Baayen, R.H., Piepenbrock, R., & Van Rijn, H. (1993). The CELEX lexical data base on CD-ROM. Philadelphia: Linguistic Data Consortium. Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359). Hillsdale: Erlbaum. Bowey, J.A., & Muller, D. (2005). Phonological recoding and rapid orthographic learning in third graders’ silent reading: a critical test of the self-teaching hypothesis. Journal of Experimental Child Psychology, 92, 203–219. Carlisle, J.F., & Stone, C.A. (2005). Exploring the role of morphemes in word reading. Reading Research Quarterly, 40, 428–449. Carlisle, J.F., & Katz, L.A. (2006). Effects of word and morpheme familiarity on reading of derived words. Reading & Writing, 19, 669–693. Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25. Cho, S.-J., Partchev, I., & De Boeck, P. (2012). Parameter estimation of multiple item profiles models. British Journal of Mathematical & Statistical Psychology, 65, 438–466. Cho, S.-J., & Rabe-Hesketh, S. (2012). Random item discrimination marginal maximum likelihood estimation of item response models. Paper presented at the National Council on Measurement in Education, Vancouver, British Columbia. Cho, S.-J., De Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2013). Additive item structure models with random residuals: item modeling for explanation and item generation. Unpublished manuscript. Cunningham, A.E., Perry, K.E., Stanovich, K.E., & Share, D.L. (2002). Orthographic learning during reading: examining the role of self-teaching. Journal of Experimental Child Psychology, 82, 185–199. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559. De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39, 1–28. De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. New York: Springer. Deacon, S.H., & Bryant, P. (2005). What children do and do not know about the spelling of inflections and derivations. Developmental Science, 8, 583–594. Deacon, S.H., & Bryant, P. (2006). Getting to the root: young writers’ sensitivity to the role of root morphemes in the spelling of inflected and derived words. Journal of Child Language, 33, 401–417. Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359– 374. Geerlings, H., Glas, C.A.W., & van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261. Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society. Series A, 159, 505–513. González, F., De Boeck, P., & Tuerlinckx, F. (2008). A double-structure structural equation model for three-mode data. Psychological Methods, 13, 337–353. Goodwin, A.P., Gilbert, J.K., & Cho, S.-J. (2013). Morphological contributions to adolescent word reading: an item response approach. Reading Research Quarterly, 48, 39–60. Goodwin, A.P., Gilbert, J.K., Cho, S.-J., & Kearns, D. (2012). Probing lexical representations: simultaneous modeling of word and reader contributions to multidimensional lexical representations. Unpublished manuscript. Harm, M.W., & Seidenberg, M.S. (2004). Computing the meanings of words in reading: cooperative division of labor between visual and phonological processes. Psychological Review, 111, 662–720. Hickendorff, M., Heiser, W.J., van Putten, C.M., & Verhelst, N.D. (2009). Solution strategies and achievement in Dutch complex arithmetic: latent variable modeling of change. Psychometrika, 74, 331–350. Hofman, A., & De Boeck, P. (2010). The utility of models with multidimensional random person and random item (stimuli) variables: a recovery study with lmer. Unpublished manuscript, University of Amsterdam. Janssen, R., & De Boeck, P. (1999). Confirmatory analyses of componential test structure using multidimensional item response theory. Multivariate Behavioral Research, 34, 245–268. Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 189–212). New York: Springer. Jeon, M., & Rabe-Hesketh, S. (2012). Profile-likelihood approach for estimating generalized linear mixed models with factor structures. Journal of Educational and Behavioral Statistics, 37, 518–542. Johnson, M.S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400. MacGinitie, W.H., MacGinitie, R.K., Maria, K., & Dreyer, L.G. (2000). Gates-MacGinitie reading tests. Itasca: Riverside Publishing. Mahony, D.L. (1994). Using sensitivity to word structure to explain variance in high school and college level reading ability. Reading & Writing: An Interdisciplinary Journal, 6, 19–44. McCutchen, D., Logan, B., & Biangardi-Orpe, U. (2009). Making meaning: children’s sensitivity to morphological information during word reading. Reading Research Quarterly, 44, 360–376. SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN 855 Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281–296. Nagy, W.E., & Anderson, R.C. (1984). How many words are there in printed school English? Reading Research Quarterly, 19, 304–440. Nation, K., Angell, P., & Castles, A. (2007). Orthographic learning via self-teaching in children learning to read English: effects of exposure, durability, and context. Journal of Experimental Child Psychology, 96, 71–84. Perfetti, C.A. (2007). Reading ability: lexical quality to comprehension. Scientific Studies of Reading, 11, 357–383. Perfetti, C.A., & Hart, L. (2001). The lexical basis of comprehension skill. In D.S. Gorfein (Ed.), On the consequences of meaning selection: perspectives on resolving lexical ambiguity, Washington: American Psychological Association. Perfetti, C.A., & Hart, L. (2002). The lexical quality hypothesis. In L. Verhoeven, C. Elbro, & P. Reitsma (Eds.), Precursors of Functional Literacy. Amsterdam/Philadelphia: Benjamins. Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323. Raudenbush, S.W., Yang, M., & Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9, 141–157. Rodríguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society. Series A, 158, 73–89. Rodríguez, G., & Elo, I. (2003). Intra-class correlation in random-effects models for binary data. Stata Journal, 3, 32–46. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464. Singson, M., Mahony, D., & Mann, V. (2000). The relation between reading ability and morphological skills: evidence from derivational suffixes. Reading & Writing: An Interdisciplinary Journal, 12, 219–252. Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313. Tennessee Department of Education Report Card (2011). Tennessee Department of Education Report Card. Retrieved from http://edu.reportcard.state.tn.us/pls/apex/f?p=200:1:4463180647001600on8-28-11. Tyler, A., & Nagy, W. (1989). The acquisition of English derivational morphology. Journal of Memory and Language, 28, 649–667. Tyler, A., & Nagy, W.E. (1990). Use of derivational morphology during reading. Cognition, 36, 17–34. Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351–370. Vellutino, F.R., Tunmer, W.E., Jaccard, J.J., & Chen, R. (2007). Components of reading ability: multivariate evidence for a convergent skills model. Scientific Studies of Reading, 11, 3–32. Verhelst, N.D., & Verstralen, H.H.F.M. (2002). Structural analysis of a univariate latent variable (SAUL) (computer program and manual). Arnhem: CITO. Zeno, S.M., Ivens, S.H., Millard, R.T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster: Touchstone Applied Science Associates. Manuscript Received: 21 APR 2012 Final Version Received: 28 AUG 2012 Published Online Date: 15 MAR 2013
© Copyright 2025 Paperzz