explanatory multidimensional multilevel random item response model

PSYCHOMETRIKA — VOL . 78, NO . 4,
O CTOBER 2013
DOI : 10.1007/ S 11336-013-9333-5
830–855
EXPLANATORY MULTIDIMENSIONAL MULTILEVEL RANDOM ITEM RESPONSE
MODEL: AN APPLICATION TO SIMULTANEOUS INVESTIGATION OF WORD AND
PERSON CONTRIBUTIONS TO MULTIDIMENSIONAL LEXICAL REPRESENTATIONS
S UN -J OO C HO , J ENNIFER K. G ILBERT , AND A MANDA P. G OODWIN
PEABODY COLLEGE OF VANDERBILT UNIVERSITY
This paper presents an explanatory multidimensional multilevel random item response model and
its application to reading data with multilevel item structure. The model includes multilevel random item
parameters that allow consideration of variability in item parameters at both item and item group levels.
Item-level random item parameters were included to model unexplained variance remaining when item
related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem.
Using the model, this study examined the dimensionality of a person’s word knowledge, termed lexical
representation, and how aspects of morphological knowledge contributed to lexical representations for
different persons, items, and item groups.
Key words: lexical representation, multidimensional item response model, multilevel item structure, random item parameters.
1. Introduction
A multidimensional item response model has been used to investigate the cognitive nature
of tasks (e.g., Janssen & De Boeck, 1999). An aim of the current paper is to extend the multidimensional item response model to a model that takes into account multidimensional ability
due to item groups and a multilevel item structure where items are at the lower-level and item
groups are at the higher-level. These extensions are illustrated by answering important research
questions in reading theory and education.
The multidimensional item response model incorporates dimensions with regard to both
persons and items as a measurement model, written as
logit P (yj i = 1|θ j , β i ) = θ j − β i ,
(1)
where j is a person index (j = 1, . . . , J ), i is an item index (i = 1, . . . , I ), d is a dimension index
(d = 1, . . . , D), θ j = [θj 1 , . . . , θj d , . . . , θj D ] is a multidimensional ability parameter, and β i =
[βi1 , . . . , βid , . . . , βiD ] is a multidimensional difficulty parameter for a between-item design
where an item is allowed to load on only one dimension.
Individual ability differences among persons (θ j in Equation (1)) and difficulty differences
among items (β i in Equation (1)) can be explained using covariates such as person characteristics and item features, respectively, and/or the interaction between them. Current methods might
recommend a two-stage procedure. Parameter estimates of the measurement model can be obtained and then a regression model can be used to explain estimates using covariates. However,
the use of the two-stage procedure may provide distorted estimates of covariate effects because
measurement errors of the estimates are not incorporated. Within an explanatory item response
theory (IRT) framework (De Boeck & Wilson, 2004), the parameters of the measurement model
Requests for reprints should be sent to Sun-Joo Cho, Vanderbilt University, Peabody #H213A, 230 Appleton Place,
Nashville, TN 37203, USA. E-mail: [email protected]
© 2013 The Psychometric Society
830
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
831
and the effect of covariates on the parameters in a structural model can be investigated simultaneously. With the simultaneous approach, measurement error of the estimated parameters is taken
into account when the effects of covariates on parameters are estimated (e.g., Adams, Wilson, &
Wu, 1997, for person parameters; Janssen, Schepers, & Peres, 2004, for item parameters).
Item features or covariates have been used to explain the item difficulty parameters without
allowing random residuals such as in the linear logistic test model (Fischer, 1973). However, it is
preferable to allow the random residuals because item covariates may not explain all variations
in parameters. Estimated residual variances provide information about how much remains unexplained, and hence how much room there might be for improving explanation by including more
covariates (De Boeck, 2008). Furthermore, it has been shown that omitting the random residuals can also lead to underestimated standard errors for the covariate effects on the IRT model
parameters (Janssen et al., 2004; Mislevy, 1988).
Also, in certain tests, items are nested in item groups, such as item families used in an
item generation context (Bejar, 1993). For example, to measure word reading ability, a family
of words such as happiness, happily, unhappy, and unhappily can be created as an item group
made of items which are from the same item stem, happy. Local dependency in item responses
among items in the item group is of concern in the multilevel item structure where items are at the
lower-level and item groups are at the higher-level. Ignoring such local dependency can lead to
less accurate item parameter estimates (Cho, De Boeck, Embretson, & Rabe-Hesketh, 2013; Glas
& van der Linden, 2003; Johnson & Sinharay, 2005; Sinharay, Johnson, & Williamson, 2003).
Dependency in item responses due to within item-group similarities can be taken into account by
having the random item parameters at the item-group level (Cho et al., 2013). Multidimensional
extensions of item response models with random item parameters at the item-group level (called
multidimensional multilevel random item response models hereafter) have not been described
and applied in the literature thus far.
The purpose of this paper is (1) to present extensions of explanatory multidimensional item
response models to models with multilevel random item parameters, called explanatory multidimensional multilevel random item response models (EMMRIRM), and (2) to use the EMMRIRM
to examine the interrelationships between the different aspects of a person’s word knowledge
both at the person side and at the item side and to explain these interrelationships using personby-item, person, and item covariates. We use the term multilevel random item parameters to
indicate that there are random item parameters at both item and item-group levels. In the EMMRIRM, a random item parameter at the item level is included to model unexplained variance
remaining when item related covariates are used to explain variation in item difficulties. A random item parameter at the item group level is included to model dependency among items having
the same item stem.
In the following, the measurement issues related to assessing lexical representations and
research questions to be answered using EMMRIRM are described.
2. Measurement of Multidimensional Lexical Representations
2.1. Multidimensional Lexical Representations
Reading researchers are interested in determining how to support students in comprehending
text. One factor that contributes to reading comprehension involves how much a reader knows
about the words within the text. This word knowledge, termed a lexical representation, ideally
comprises a deep and consistent representation of a word and its orthographic, phonological,
grammatical, and semantic properties, which can then be used with greater ease to support text
comprehension (Perfetti & Hart, 2001, 2002; Perfetti, 2007). This definition suggests that lexical
832
PSYCHOMETRIKA
representations are multidimensional in nature because a word has multiple sources of information that can be tapped by asking individuals to spell, decode, and define words (Perfetti & Hart,
2001, 2002; Perfetti, 2007). For example, when attempting to comprehend a text by retrieving
the lexical representations of words (e.g., stratagem), a reader may use multiple sources of information to gain the identity of the word, including its decoding, spelling, and meaning. For
morphologically complex words like extremity, which are made by combining a root-word (i.e.,
extreme) and an affix (i.e., the ity), research also suggests students use root word knowledge (i.e.,
knowledge of extreme) to figure out the larger derived word (i.e., extremity). Studies suggest, for
example, that students use root-word knowledge to support reading (Carlisle & Stone, 2005),
spelling (Deacon & Bryant, 2005, 2006), and determining the meaning (Anglin, 1993; Tyler &
Nagy, 1989, 1990) of the related derived word.
In the present study, we want to measure individual differences in lexical representations
of words via performance on related but separate dimensions of knowledge (Perfetti & Hart,
2002). We first formally investigate the dimensionality of lexical representations of words that
individuals have, and then we test how various aspects of root-word knowledge support lexical
representations for different words and persons.
2.2. Item, Item-Group, and Person Specific Lexical Representations
A person may have knowledge of some lexical representation informational sources, but not
others, suggesting that lexical representations are test-item specific. That is, decoding stratagem,
spelling stratagem, and knowing the meaning of stratagem each represent different item knowledge related to the word stratagem. Lexical representations are also person and item group (or
word) specific, which means that different persons are likely to vary in the lexical representations of different item groups encountered. This is because certain person characteristics (i.e.,
vocabulary knowledge, reading comprehension skills, and morphological awareness) and certain
item-group features (i.e., frequency, transparency, consistency, and number of morphemes) make
it easier to build a deep and consistent lexical representation about an item group.
Prior researchers in reading education have used overall performance on literacy measures to
group persons because of the difficulty of examining person, item, and item group features related
to lexical representations simultaneously. For example, when examining differences in how persons perform on aspects of lexical representations, common practice would suggest researchers
either aggregate scores on a test of decoding, spelling, and meaning and use structural equation
modeling to run regression models with each aspect of the person or item lexical representation
as an outcome being predicted by person or item characteristics (e.g., Vellutino, Tunmer, Jaccard,
& Chen, 2007). Because total scores are used, researchers cannot separate how person characteristics such as reading comprehension or vocabulary knowledge contribute to performance on
particular items nor how item group features such as frequency or transparency contribute to such
performance. In fact, as conveyed by studies discussed in Perfetti (2007), researchers examining
the latter questions of lexical representation have resorted to separating person characteristics
from item characteristics through creating groups of similar reader profiles (such as above average or below average performance on various literacy measures) or similar item demands (such as
high frequency versus low frequency words or single versus multimorphemic words) and looking
at differences between persons and items, respectively.
2.3. Multilevel Item Structure in the Measurement of Lexical Representations
In our empirical study, the three subtasks, decoding, spelling, and meaning, have their own
item, but they share the same item stem. For example, the same word stratagem can be used to
measure three aspects of the lexical representation including decoding, spelling, and meaning information, which are represented by the following tasks: “Please read the following word aloud:
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
833
stratagem,” “Please spell the word stratagem on your paper,” and “Do you know the meaning
of the word stratagem,’ respectively. These different types of information about the same word
(i.e., subtasks) are considered as the lower-level words (i.e., items), nested within the higher-level
stimulus words (i.e., item groups).
2.4. Empirical Research Questions
The main empirical research question is how aspects of word knowledge contribute to lexical representations for different adolescent readers and for different words. Within this larger
research question, our study is guided by three research questions, specifically:
1. How do the various sources of lexical information (i.e., decoding, spelling, and meaning)
relate to one another?
2. What aspects of knowledge (decoding, spelling, and knowing the meaning and morphological neighbors) of a root word are related to decoding, spelling, and knowing the meaning of a related derived word? In general, children learn short and simple words before
they learn long and complex words. Models of word knowledge suggest that children
may use their knowledge of a smaller root word (covet) to help them learn larger related
words (covetousness).
Perfetti’s (2007) lexical quality hypothesis states that lexical representations of
words comprise interrelated dimensions of word knowledge such that knowledge of a
word’s spelling, for example, is likely to associate with knowledge of a word’s pronunciation. Because derived words contain root words, it is likely that aspects of knowledge
about root words will associate with aspects of knowledge about derived words. For instance, knowledge of how to spell a root word will likely associate not only with the
spelling of the related derived word, but also the pronunciation of the related derived
word.
3. Do word (or item group) level characteristics (i.e., frequency, consistency, transparency,
and number of morphemes) and person level characteristics (i.e., vocabulary knowledge,
reading comprehension, and morphological awareness) affect lexical representations?
These three questions will be answered from results of the EMMRIRM. Such models allow us to better understand both the dimensionality of lexical representations and the sources
of knowledge that contribute to lexical representations while taking into account differences between items, item groups, and persons. The knowledge produced from this study will provide
a deeper understanding of how root-word knowledge and derived-word features bolster lexical
representations, which will provide insights into possible instructional paths that may support
students in building higher quality of lexical representations and, therefore, support reading comprehension as well.
Next, EMMRIRM is detailed to incorporate the measurement issues related to lexical representations and to answer the research questions described earlier. Model parameter estimation
issues are discussed subsequently.
3. Explanatory Multidimensional Multilevel Random Item Response Model
The EMMRIRM is described for a between-item design.
3.1. Measurement Model: Multidimensional Multilevel Random Item Response Model
There are both crossed and nested classifications in the item response data where items are
nested within item groups. Every item is offered to all persons and every person responds to all
834
PSYCHOMETRIKA
F IGURE 1.
Diagram for multidimensional multilevel random item response model.
items. Thus, the item and person classifications are found at the same level and they are crossed.
In addition to the crossed design, there is a multilevel design in which items are nested within
item groups on the item side. To frame this data structure within the multilevel literature, item
responses at Level 1 are cross-classified with persons and items at Level 2. Items are nested
within item groups at Level 3 on the item side as shown in Figure 1.
With the crossed design at Level 2, a person random effect and an item random effect are
crossed random effects when they are used to model individual differences over persons and to
model differences in item difficulty over items, respectively. With the nested design at Level 3
in the item side, an item-group random effect and an item random effect are nested when the
item-group random effect is taken into account to model differences in item-group difficulties
over item groups. The person random effect and the item-group random effect are crossed at a
different level.
The measurement model, multidimensional multilevel random item response model
(MMRIRM), can be described as follows:
(2)
(2)
(2)
(2)
(2)
logit P yj ig = 1|θ j , β i , βg(3) = λ + θ j − β i − βg(3) ,
where
•
•
•
•
•
j is an index for a person (j = 1, . . . , J ),
i is an index for an item (i = 1, . . . , I ),
g is an index for an item group (g = 1, . . . , G),
d is an index for a dimension (i.e., a task) (d = 1, . . . , D),
the superscript (2) refers to Level 2,
835
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
• the superscript (3) refers to Level 3,
(2)
• θ j = [θj 1 , . . . , θj d , . . . , θj D ] is a multidimensional ability parameter and refers to the
person dimensions,
(2)
• β i = [βi1 , . . . , βid , . . . , βiD ] is a multidimensional item difficulty parameter and refers
to the item dimensions,
• λ = [λ1 , . . . , λd , . . . , λD ] is an intercept, a logit for the probability of a correct response
of an ‘average’ person on an ‘average’ item, and
(3)
• βg is an item-group difficulty parameter.
(2)
(2)
In the MMRIRM, multidimensionality exists for both persons (θ j ) and items (β i )
(2)
to model different tasks at Level 2. The multidimensional ability parameter, θ j , is assumed to follow a multivariate normal (MN) distribution: θ j = [θj 1 , . . . , θj d , . . . , θj D ] ∼
MN(μ(d×1) , Σ(d×d) ). Item-group-specific item difficulty, βid , in the multidimensional item difficulty parameter is assumed to follow a normal (N ) distribution: βid ∼ N (μβd , σβd ). Item-group
(3)
(3)
difficulty parameter, βg , is also assumed to follow a normal (N ) distribution: βg ∼ N (μβ , σβ ).
To identify the model, the means of all random effects are set to 0. After population parameters
of random effects for abilities and difficulties are estimated, individual person scores and item
difficulties at both item and item-group levels can be obtained using empirical Bayes prediction.
Figure 1 depicts the MMRIRM, Equation (2). In the figure, the squares and ellipses represent manifest and latent variables, respectively. Item responses for each dimension are described
as follows: yg1 = [y11g1 , . . . , yj ig1 , . . . , yJ I1 g1 ] for Dimension 1, ygd = [y11gd , . . . , yj igd , . . . ,
yJ Id gd ] , for Dimension d and ygD = [y11gD , . . . , yj igD , . . . , yJ ID gD ] for Dimension D, where
Id is the number of items nested within a dimension. At Level 2, dimension-specific item difficulties, βid , are considered as random over items within dimension and dimension-specific abilities,
(3)
θj d , are also considered as random over persons. At Level 3, item-group difficulty, βg , is considered as random over item groups. Superscripts in parameters indicate the level in the data
structure.
Without Level 3, the data structure of MMRIRM is similar to that of a double-structure
structural equation model (González, De Boeck, & Tuerlinckx, 2008). Both person and item
modes of the data array are modeled simultaneously in the MMRIRM while both person and
situation modes of the data array are modeled simultaneously in the double-structure structural
equation model.
Latent intraclass correlations1 for both items and item groups (ρ(IG)) and for items (ρ(I ))
can be calculated to investigate dependency in latent item responses among items and item
groups. A between-item design where each item is loaded only on one dimension (which is
true of our empirical data) was assumed in the calculation of the latent intraclass correlations.
A latent response formulation for the MMRIRM is introduced to define the latent intraclass correlations. Let there be a latent response yj∗ig so that the observed response is yj ig = 1 if yj∗ig > 0
and yj ig = 0 otherwise. Assuming that
(2)
yj∗ig = λ + θ j − β i − βg(3) + j ig
(2)
(2)
and that the error j ig follows a logistic distribution (mean = 0, variance =
model in Equation (2) for the observed responses, y.
(3)
π2
3 ),
produces the
1 We used the term “latent” intraclass correlations to distinguish it from the “manifest” correlation, which refers to
the conventional product-moment correlation between observed binary responses, y, (Rodríguez & Elo, 2003).
836
PSYCHOMETRIKA
With a between-item design, variances (Var) of a multidimensional difficulty parameter
(2)
(β i ) conditional on the person abilities can be represented as the sum of the dimension-specific
variances of item difficulties across item dimensions as follows:
D
D
(2) (2) (2)
(2) (2) (2) Var β i |θ j , θ j = Var
βdi θ j , θ j =
σβd .
(4)
d
d=1
The definition of ρ(IG) as the correlation (Corr) among latent responses (yj∗ig ) for the same
item i and the same item group g, conditional on the person abilities, can be defined as follows:
(2) (2) ρ(IG) = Corr yj∗ig , yj∗ ig |θ j , θ j = D
= D
d
Cov(yj∗ig , yj∗ ig |θ j , θ j )
(2) (2)
(2) (2)
Var(yj∗ig |θ j , θ j ) · Var(yj∗ ig |θ j , θ j )
(2)
σβd + σβ
2
D
+ σβ + π3 ·
d σβd + σβ +
d
σβd
π2
3
(2)
(5)
(6)
,
with a constant covariance (Cov) for any latent responses from different persons in the same item
(2) (2)
and item groups, Cov(yj∗ig , yj∗ ig |θ j , θ j ) = D
d σβd + σβ .
The definition of ρ(I ) as the correlation among latent responses for the same item group d,
but different items i and i , conditional on the person abilities, can be defined as follows:
(2) (2) ρ(I ) = Corr yj∗ig , yj∗ i g |θ j , θ j = = D
d
σβd
Cov(yj∗ig , yj∗ i g |θ j , θ j )
(2) (2)
(2) (2)
Var(yj∗ig |θ j , θ j ) · Var(yj∗ i g |θ j , θ j )
(2)
σβ
2
D
+ σβ + π3 ·
d σβd + σβ +
π2
3
(2)
(7)
(8)
,
with a constant covariance for any latent responses from different items and different persons in
(2) (2)
the same item group, Cov(yj∗ig , yj∗ i g |θ j , θ j ) = σβ .
The value of ρ(IG) is expected to be higher than that of ρ(I ) because latent item responses
for the same item group are more correlated if items are in the same item group than if items are
in the different item groups.
3.2. Measurement Model and Structural Model: Explanatory Multidimensional Multilevel
Random Item Response Model
In item response data, three kinds of covariates may exist, (1) person-by-item covariates
(Wj i ), (2) person covariates (Zj ), and (3) item covariates (Xi ). In the structural model, these
covariates can be used to explain the parameters in the measurement model. Specifically, individual differences in abilities, θ (2)
j , can be explained using person-by-item covariates (Wj i ) and
(2)
(2)
person covariates (Zj ), allowing residuals, j . Item difficulties, β i , can be taken into account
(2)
using person-by-item covariates (Wj i ) and item covariates (Xi ), allowing residuals, i . The
EMMRIRM is written as follows:
(2) (2)
logit P yj ig = 1| j , i , βg(3)
=λ+
K
k=1
where
δ k · Wj i.k +
R
r=1
ϑ r · Zj.r +
H
h=1
(2)
(2)
γ h · Xi.h + j − i − βg(3) ,
(9)
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
837
•
•
•
•
•
•
•
k is an index for a covariate of person-by-item interaction (k = 1, . . . , K),
r is an index for a covariate of person (r = 1, . . . , R),
h is an index for a covariate of item (h = 1, . . . , H ),
Wj i.k is a person-by-item covariate k,
Zj.r is a person covariate r,
Xi.h is an item covariate h,
δ k = [δk1 , . . . , δkd , . . . , δkD ] is the dimension-specific effects of the person-by-item covariates,
• ϑ r = [ϑr1 , . . . , ϑrd , . . . , ϑrD ] is the dimension-specific effects of the person covariates,
• γ h = [γh1 , . . . , γhd , . . . , γhD ] is the dimension-specific effects of the item covariates,
• (2)
j = [j 1 , . . . , j d , . . . , j D ] is a multidimensional residual ability parameter, and
• i = [i1 , . . . , id , . . . , iD ] is a multidimensional residual item difficulty parameter.
(2)
(2)
(2)
In the EMMRIRM, multidimensional residuals exist for both persons ( j ) and items ( i )
to take into account unexplained variances in each dimension-specific person and item dimen(2)
sions, respectively, at Level 2. j is assumed to follow a multivariate normal (MN) distribution:
j = [ j 1 , . . . , j d , . . . , j D ] ∼ MN(μ(d×1) , Σ(d×d) ). The item group specific item residual,
(2)
(2)
id , is assumed to follow a normal (N ) distribution: id ∼ N (μd , σd ). To identify the model,
the means of all random effects are set to 0.
Figure 2 depicts the EMMRIRM, Equation (9). In the figure, the arrows from covariates
to latent variables denote regression coefficients. At Level 2, a dimension-specific residual item
(2)
difficulty, i = [i1 , . . . , id , . . . , iD ] , is considered as random over items within a dimension
(2)
and a dimension-specific residual ability, j = [j 1 , . . . , j d , . . . , j D ] , is also considered as
random over persons.
3.3. Parameter Estimation
MMRIRM and EMMRIRM have both crossed and nested random effects for binary responses. Maximum likelihood estimation of models for binary responses is challenging because
it requires estimation with numerical or Monte Carlo integration due to the marginal likelihood
not having a closed form. The computational burden is low in cases where random effects and
their integrals are nested (e.g., Rabe-Hesketh, Skrondal, & Pickles, 2005), but this is not the case
for crossed random effects, which require evaluation of high-dimensional integrals. Three solutions exist for crossed random effect models with binary responses: (1) approximation of the
integral with numerical integration techniques such as (adaptive) Gauss–Hermite quadrature and
Monte Carlo integration, (2) approximation for the integrand, and (3) simulation-based methods
such as Markov chain Monte Carlo (MCMC). An overview of these three estimation methods
has been provided by Cho and Rabe-Hesketh (2011) and Cho, Partchev, and De Boeck (2012).
Explanatory item response models are often presented as generalized linear and nonlinear
mixed-effects effect (De Boeck & Wilson, 2004). The MMRIRM and EMMRIRM are also special cases of generalized linear mixed-effects models. All models in this study are fit with the
lmer function in the R lme4 library (Bates, Maechler, & Bolker, 2011), which has been used for
linear and generalized linear mixed-effects models. It is a flexible software that can be used to
estimate parameters of various Rasch-family item response models including those with crossed
random effects (see De Boeck, Bakker, Zwitser, Nivard, Hofman, Tuerlinckx, & Partchev, 2011).
The lmer function is based on the Laplace approximation as an approximation for the integrand
and is, therefore, computationally efficient for high-dimensional item response models compared
to an approximation for the integral and simulation-based methods such as MCMC (Cho et al.,
2012). The conditional modes of the random effects will be extracted using the R extractor function ranef.
838
PSYCHOMETRIKA
F IGURE 2.
Diagram for explanatory multidimensional multilevel random item response model.
4. Method
4.1. Data Description
4.1.1. Samples In this study, 172 participants (98 seventh graders, 74 eighth graders, 83
males) from a suburban middle school had complete data. The school had 11 % minority students
and 5 % of students receiving support throughout the free and reduced lunch program (Tennessee
Department of Education Report Card, 2011). These data were also analyzed and discussed substantively in a paper by Goodwin, Gilbert, and Cho (2013).
4.1.2. Measures In this study, three components constitute the dimensions of lexical representations: decoding, spelling, and meaning. The first source of information related to lexical
representations is the ability to decode the word from text. For example, when a reader sees the
word stratagem within a text, does it bring to mind a phonological representation such that the
reader can pronounce stratagem? The second source of information related to lexical representations is spelling. Spelling involves translating a phonological representation of a word (i.e.,
the pronunciation of stratagem) into an orthographic representation of the word (i.e., the correct
spelling of the word stratagem). The third source of information is self-perception of knowing
the meaning of the word, which implies knowing the word’s definition, context, and connotation
within one’s lexicon that can be used to understand what meaning is being communicated by that
word (i.e., what meaning is being communicated by stratagem).
The three subtasks were formulated to isolate the three components. Persons were asked
to do the three different tasks with 39 stimulus words (i.e., item groups). The stimulus words
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
839
were morphologically complex words such that each word contained a root word and at least one
affix. Words that are derived from root words are dubbed “derived words.” After piloting derived
words from Carlisle and Katz (2006), we adapted the word list because of ceiling effects. In our
adaptation, we added words from middle school texts that were less frequent and, therefore, less
likely to experience ceiling effects. The list of these stimulus words is shown in the Appendix.
The 39 words were presented in 3 different tasks, which equals 117 unique items (39 stimulus
words × 3 tasks).
Decoding of the Derived Word (DD) Participants were shown each of the 39 derived words
separately. They were asked to read each word aloud into a tape recorder. Correct (score of
1)/incorrect (score of 0) scores were obtained by a research staff member listening to the audio
files. Chronbach’s alpha for these 39 items was 0.85.
Spelling of the Derived Word (SD) A research staff member read each of the 39 words
aloud to a group of participants. For words that had a homophone (e.g., precedent), a sentence
was used to clarify the meaning of the target word. Participants wrote their spellings on a provided answer sheet and each was scored as correct (score of 1) or incorrect (score of 0). Cronbach’s alpha for these 39 items was 0.87. Derived words were spelled before they were read to
minimize the possibility that students were primed with visual forms of the words prior to reading them. Measures were administered within a short amount of time in order to assess student’s
knowledge about a word at a single point in time.
Meaning of the Derived Word (MD) (i.e., Self Perception) A research staff member read
each of the 39 words aloud to a group of participants. After each word was read, participants
were asked to rate their knowledge of the word by marking a box associated with no, some, or
full knowledge. Dichotomous scoring was applied such that no knowledge was given a value
of 0 and some or full knowledge was given a value of 1. Cronbach’s alpha for these 39 items
was 0.87. Due to limits in testing time, self-perception of knowledge was used as a proxy for
knowledge of the word’s meaning.
Administration of spelling and decoding items was separated by at least one week, but items
were administered within a short period of time in order to gain a snapshot of word knowledge
at a specific point in time (Tyler & Nagy, 1989).
4.1.3. Covariates Descriptive statistics of person-by-word covariates, person covariates,
and word (i.e., item group) covariates are shown in Table 1. Person-by-item covariates vary
across both persons and items, person covariates vary across persons but not across items, and
item covariates vary across items, but not across persons.
4.1.4. Person-by-Word Covariates The 39 root words that corresponded to the 39 derived
words were presented in 3 different tasks. These measures were presented after derived-word
tasks. We acknowledge that it would be a concern that derived-word tasks potentially prime
root-word tasks. However, the developmental and instructional progression of word reading is
that simple words (e.g., root words) are taught and learned before complex words (e.g., derived
words) and, therefore, priming was minimized by presenting derived-word tasks prior to rootword tasks rather than the reverse.
Decoding of the Root Word (DR) Procedures for decoding of the root are the same as
procedures for decoding of the derived. Cronbach’s alpha for 34 items (5 items had no variability)
was 0.76.
Decoding Derived [DD]
Spelling Derived [SD]
Meaning Derived [MD]
Decoding Root [DR]
Spelling Root [SR]
Meaning Root [MER]
Morphology Root [MOR]
Morphological Awareness [MA]
Reading Comprehension [RC]
Vocabulary [V]
Consistency [CON]
Frequency Derived [FD]
Frequency Root [FR]
Num. of Morphemes [NM]
Orth Phon Opaqueness [OPO]
Phon Opaqueness [PO]
Semantic Opaqueness [SEM]
Person-by-word
Person
Word
Covariates
Variable
Measures
Orth: Orthographic; Phon: Phonological.
TABLE 1.
0.50
37.80
45.51
2.79
0.26
0.46
0.31
58.39
563.06
561.51
0.91
0.81
0.89
0.65
0.75
0.62
0.74
Mean
0.15
8.83
10.35
0.73
0.44
0.51
0.47
9.60
36.10
33.79
0.29
0.40
0.31
0.48
0.43
0.48
0.44
SD
Descriptive statistics of measures and covariates.
0.27
22.10
20.80
2
0
0
0
19
468
458
0
0
0
0
0
0
0
Min
0.80
53.00
65.70
5
1
1
1
69
660
668
1
1
1
1
1
1
1
Max
Proportion
Log
Log
Raw score
Dichotomous
Dichotomous
Dichotomous
Raw score
Extended scale score
Extended scale score
Dichotomous
Dichotomous
Dichotomous
Dichotomous
Dichotomous
Dichotomous
Dichotomous
Scale
840
PSYCHOMETRIKA
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
841
Spelling of the Root Word (SR) Procedures for spelling of the root are the same as procedures for spelling of the derived. Root words were spelled after the derived words to minimize
priming. Cronbach’s alpha for 38 items (1 item had no variability) was 0.85.
Meaning of the Root Word (MER) (i.e., Self Perception) Procedures for meaning of the
root are the same as procedures for meaning of the derived. Again, this task was completed
after the derived-word task to minimize priming. Cronbach’s alpha for 34 items (5 items had no
variability) was 0.80.
Morphology of the Root Word (MOR) A research staff member read each of the 39 words
aloud to a group of participants and asked participants to write down as many morphologically
related words as possible. Students were provided with the example of forget and told morphologically related words would be words that shared the root and its meaning such as forgetful,
forgetting, forgettable, etc., but not words that just had similar meaning and, therefore, not words
or phrases such as not remember, overlook, or put out of your mind. Responses with no related
words were scored as 0. Responses with one or more related words were scored as 1. Accuracy of
spelling was not factored in and, therefore. any related words that were phonologically possible
were scored as correct. Cronbach’s alpha for these 39 items was 0.97.
4.1.5. Person Covariates
Morphological Awareness (MA) Morphological awareness scores comprised 70 items
from three tasks adapted from the literature: Test of Knowledge of Derivational Relationships
(Mahony, 1994), Derivational Suffix Test (Singson, Mahony, & Mann, 2000), and Syncat-real
Test, also known as the Real Word Derivational Suffix Task (Mahony, 1994). A research staff
member read aloud directions and examples, and participants were asked to complete the written
tasks silently at their own pace. The staff member read items aloud to participants if participants
requested such support in order minimize confounds with decoding, though the actual number
of requests was minimal. Items were scored as correct (1) or incorrect (0). Cronbach’s alpha was
0.92. The first task included 25 of the 42 original items from the Test of Knowledge of Derivational Relationships from Mahony (1994). Participants were asked whether pairs of words were
morphologically related. The second task included 20 items from the pseudoword written section
of the Derivational Suffix Test from Singson et al. (2000). Each item was a nonsense root with
real suffixes. Participants were asked to choose the most appropriate pseudoword to complete
the given sentence. The third task included 25 items from Mahony’s (1994) Syncat-real test.
Participants were asked to choose the best word to complete the sentence, with answer choices
containing four derivations of the same stem.
Reading Comprehension (RC) Reading comprehension was assessed with the Gates–
MacGinitie Standardized Test of Reading Comprehension, Form S of Levels 7–9 (MacGinitie, MacGinitie, Maria, & Dreyer, 2000). This test consisted of 11 passages with a total of 48
multiple choice comprehension questions. Extended scale scores were used in this analysis. Extended scale scores provide a continuous scale that can be interpreted across test forms and age
groups. Test developers report that the median performance for both the Gates Comprehension
and Vocabulary tests is 550 for seventh-grade students and 575 for eighth-grade students; thus,
our sample was average-performing for their grade range, although variability was evident from
each standard deviation.
Vocabulary Knowledge (V) Vocabulary knowledge was assessed with the Gates–MacGinitie
Standardized Test of Reading Vocabulary in a multiple-choice test (MacGinitie et al., 2000). Participants silently read an underlined word within a phrase and then choose the word or phrase
that means most nearly the same. Extended scale scores were used in this analysis.
842
PSYCHOMETRIKA
4.1.6. Word Covariates
Consistency (CON) Consistency is a measure of how well the pronunciation of a word
matches the pronunciation of other similarly spelled words in the English language. Consistency
of derived words was established by first parsing each word into individual rime units (the vowel
and following consonants, if any, in a syllable). For each rime unit in the word, we calculated
the percentage of rimes in the CELEX database (Baayen, Piepenbrock, & van Rijn, 1993) that
had the same pronunciation as the derived word as opposed to a different pronunciation. Then
we averaged the consistency ratings across all rimes in the word. For each phonogram (i.e., -er
in the first syllable of the target word thermosphere), a proportion was created to represent the
rate of other words in which the phonogram is pronounced /3/ like in thermosphere versus in all
words in which it occurs but is pronounced as anything besides /3/, like in the word sergeant.
Frequency of Derived Word (FD) The Educator’s Word Frequency Guide (Zeno, Ivens,
Millard, & Duvvuri, 1995) was used to code the frequency of the derived word; this database
consists of a corpus of 60,527 sample texts. Frequency was reported in standard frequency index
(SFI), which is the logarithmic transformation of U (the frequency of the type per million tokens
weighted by how widely dispersed [D] a word is across subject areas).
Frequency of Root Word (FR)
as FD.
Root-word frequency was established in the same manner
Number of Morphemes (NM) The number of morphemes were coded for each derived
word based on the English Lexicon Project (Balota, Yap, Cortese, Hutchison, Kessler, Loftis,
Neely, Nelson, Simpson, & Treiman, 2007, http://elexicon.wustl.edu/) to represent the number
of meaning units contained in each item. For example, telegraph was coded as two morphemes
(tele + graph), whereas unquestionably was coded as five morphemes (un + quest + tion +
able + ly).
Orthographic/Phonological Opaqueness (OPO) Two dummy variables were created to
compare transparent words to words that are orthographically and phonologically opaque (OPO)
or simply phonologically opaque (PO). Transparent derived words have no orthographic or
phonological change from their root word (e.g., graph: telegraph); they made up the reference
group and were given values of 0 in OPO and PO. Derived words with both a phonological and
orthographic change from their root word were given a value of 1 in OPO and 0 in PO. Following the suggestion of McCutchen, Logan, and Biangardi-Orpe (2009), we considered items
with deletion of the final e (e.g., convene: convention) as orthographically stable because of the
high degree of orthographic overlap in the root words and the regularity of this rule in English.
Derived words with only a phonological change from their root word were given a value of 0 in
OPO (e.g., verify: veritable) and 1 in PO (e.g., reside: residence).
Semantic Opaqueness (SEM) A dummy variable was created to classify derived words as
semantically opaque or transparent. Following Nagy and Anderson (1984), each word was coded
on a 6-point scale and words with scores of 0–2 were classified as semantically transparent (0)
and words with scores of 3–5 were classified as semantically opaque (1). Derived words in the
latter category had meanings that could not be inferred from the meaning of their root words with
minimal or reasonable help.
A more detailed data description can be found in Goodwin, Gilbert, Cho, and Kearns (2012).
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
843
TABLE 2.
Item structure for three confirmatory models and model selection.
Models
1D
Task
Decoding
Spelling
Meaning
Model selection
No. of parameters
Log-likelihood
AIC
BIC
1
1
1
2D
1
1
0
6
−7876.4
15765
15812
3D
0
0
1
1
0
0
9
−7692.2
15402
15473
0
1
0
0
0
1
13
−7654.8
15336
15438
4.2. Confirmatory Measurement Models and Model Selection
Three hypotheses will be considered to make use of confirmatory models. Perfetti’s (2007)
lexical quality hypothesis suggests multiple person dimensions of word knowledge comprise
lexical representations. In our study, lexical representations were represented by three person dimensions of word knowledge, including decoding, spelling, and meaning knowledge. First, we
explored a model where these dimensions of knowledge represented a single person dimension
(i.e., lexical representation). We hypothesized this single dimension because research suggests interdependence amongst decoding, spelling, and meaning word-knowledge (Harm & Seidenberg,
2004; Perfetti, 2007; Vellutino et al., 2007). Next, we explored a two-dimensional model where
lexical representations were represented by (decoding + spelling) and meaning. We hypothesized this model due to research supporting the self-teaching hypothesis, which shows evidence
for orthographic (i.e., spelling) representations boosting phonological (i.e., decoding) representations (Bowey & Muller, 2005; Cunningham, Perry, Stanovich, & Share, 2002; Nation, Angell, &
Castles, 2007). This research suggests that decoding and spelling may be more highly related to
each other than to meaning, hence the need for the two-dimensional model that merges decoding
with spelling. Lastly, we hypothesized that lexical representations consist of three separate yet
correlated person dimensions of knowledge because we hypothesized participants were likely to
have different person dimensions of knowledge for different words. For example, a participant
might know the meaning of a word, but not the spelling or decoding, etc. These person dimensions of knowledge are different, so we explored a three-dimensional model that acknowledged
these distinctions.
The data structure of one-, two-, and three-person-dimensional models (labeled Model 1D,
Model 2D, and Model 3D, respectively) are shown in Table 2. The items of a specific task are
coded as “1” if they are restricted to measure a person dimension and are coded as “0” otherwise
in Table 2. The model selection among three confirmatory models was done based on MMRIRM
(unconditional model without any covariates) using information criteria such as Akaike’s (1974)
information criterion (AIC) and Schwarz’s (1978) Bayesian information criterion (BIC). After the selection of the “best” MMRIRM, explanatory models, EMMRIEMs, are considered by
adding covariates to the MMRIRM.
Table 2 also presents the model fit results. In the three confirmatory models, the dimensionspecific average parameters across persons and items and correlation between person dimensions
were estimated.2 Model 3D shows the best model fit based on both AIC and BIC as indices of
2 The number of parameters in Model 1D are 6, including 3 variances of item dimensions, 1 variance of item-group
dimension, 1 average logit value across persons and items, and 1 variance of person dimension; The number of parameters
844
PSYCHOMETRIKA
overall model fit. Using standardized residuals based on results of Model 3D, item fit and person
fit were investigated regarding how well Model 3D explains the responses to a particular item
and person, respectively. There are 2 items out of 117 items, which have standardized residuals
with values less than −2 and greater than 2 across persons as an item fit and there are 11 persons out of 172 persons which have standardized residuals with values less than −2 and greater
than 2 across items as a person fit. These item and person fit results indicate that Model 3D explains the responses relatively well. Accordingly, Model 3D is chosen as a measurement model
(a unconditional model); as such, covariates are added to it in structural models.
4.3. Analysis
Continuous person covariates were grand-mean centered with respect to persons and continuous item covariates were grand-mean centered with respect to items before entry into structural
models. Dichotomous covariates were coded as dummy variables. First, the null model without
any covariates (which will be abbreviated “Model 3D-0” throughout the paper) was fitted for
research question 1. Next, 12 person-by-item covariates (3 dimensions × 4 person-by-word covariates) were added to Model 3D-0 (which will be abbreviated “Model 3D-1” throughout the
paper) to investigate the patterns of significance without person and item covariates. Finally, 9
person covariates (3 dimensions × 3 person covariates) and 21 item covariates (3 dimensions × 7
word covariates) were added to Model 3D-1 (which will be abbreviated “Model 3D-2” throughout the paper) for research questions 2 and 3. Codes are available from the first author upon
request.
5. Results
Results of fixed effects in three Model-3Ds were reported in Table 3 and those of population
parameters of random effects in the three models were reported in Table 4. In Table 3, labels of
covariates were created as follows: DD · ∗ is covariate for decoding dimension, SD · ∗ is covariate
for spelling dimension, and MD · ∗ is covariate for meaning dimension, where ∗ is the covariate
name described in Table 1.
5.1. Consequences of Ignoring Multilevel Item Structure
Based on results of Model 3D-0, latent intraclass correlations, ρ(IG) and ρ(I ), were cal2.93
= 0.321.
culated: ρ(IG) = (1.63+0.95+0.33)+2.93π 2 = 0.640 and ρ(I ) =
π2
(1.63+0.95+0.33)+2.93+
3
(1.63+0.95+0.33)+2.93+
3
These results indicate that latent item responses for the same item group are more correlated when
items are in the same item group than when items are in the different item groups (ρ(IG) > ρ(I )).
Furthermore, 32.1 % of total item difficulty variances are explained by item groups, implying that
there is unignorable dependency in latent item responses among items nested within item groups.
Model 3D-0 fits better to the data than Model 3D-0 without σβd based on the information criteria,
AIC and BIC.
Variances of random item difficulties (σβd ) and predicted item difficulties using empirical
Bayes prediction between Model 3D-0 without βd and Model 3D-0 were compared to investigate the consequences of ignoring multilevel item structure empirically. Variances of random
in Model 2D are 9, including 3 variances of item dimensions, 1 variance of item-group dimension, 2 average logit values
across persons and items, 2 variances of person dimensions, and 1 correlation between person dimensions; The number
of parameters in Model 3D are 13, including 3 variances of item dimensions, 1 variance of item-group dimension, 3
average logit values across persons and items, 3 variances of 3 person dimensions, and 3 correlations between person
dimensions.
845
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
TABLE 3.
Estimates and standard errors (SE) of fixed effects within the explanatory multidimensional multilevel random item
response model.
Models
Model 3D-0
Model 3D-1
Model 3D-2
Est.
SE
z
Est.
SE
z
Est.
SE
z
1.827
0.855
2.069
0.307
0.332
0.364
5.962
2.575
5.685
0.667
−0.301
0.889
0.334
0.362
0.393
1.983
−0.831
2.260
1.317
0.942
1.725
0.532
0.602
0.578
2.475
1.564
2.984
0.667
0.009
0.461
0.188
0.002
0.472
0.643
0.287
0.143
0.031
0.784
0.505
0.130
0.106
0.133
0.093
0.140
0.103
0.138
0.088
0.140
0.115
0.143
0.100
5.125
0.084
3.476
2.026
0.013
4.595
4.651
3.265
1.024
0.266
5.482
5.048
0.656
−0.035
0.445
0.183
0.001
0.454
0.634
0.268
0.101
0.052
0.777
0.460
0.129
0.106
0.131
0.093
0.141
0.103
0.138
0.088
0.141
0.115
0.144
0.101
5.077
−0.333
3.387
1.974
0.007
4.409
4.593
3.047
0.712
0.449
5.390
4.570
Persons
ϑ11 [DD·MA]
ϑ21 [DD·RC]
ϑ31 [DD·V]
ϑ12 [SD·MA]
ϑ22 [SD·RC]
ϑ32 [SD·V]
ϑ13 [MD·MA]
ϑ23 [MD·RC]
ϑ33 [MD·V]
0.058
0.005
0.013
0.040
0.007
0.006
−0.015
0.010
0.014
0.010
0.003
0.003
0.011
0.003
0.003
0.015
0.004
0.004
6.080
1.616
0.995
3.711
2.187
1.871
−1.012
2.370
3.274
Item
γ11 [DD·CON]
γ21 [DD·FD]
γ31 [DD·FR]
γ41 [DD·NM]
γ51 [DD·OPO]
γ61 [DD·PO]
γ71 [DD·SEM]
γ12 [SD·CON]
γ22 [SD·FD]
γ32 [SD·FR]
γ42 [SD·NM]
γ52 [SD·OPO]
γ62 [SD·PO]
γ72 [SD·SEM]
γ13 [MD·CON]
0.027
0.081
0.046
−0.144
−0.727
−1.254
0.511
−0.728
0.089
0.051
0.513
−1.877
−1.433
−0.232
0.040
1.539
0.028
0.023
0.311
0.623
0.584
0.504
1.755
0.032
0.026
0.356
0.712
0.666
0.575
1.644
0.017
2.892
2.013
−0.462
−1.166
−2.149
1.013
−0.415
2.798
1.924
1.438
−2.636
−2.152
−0.403
0.024
Intercept
λ1 [DD]
λ1 [SD]
λ1 [MD]
Persons-by-items
δ11 [DD·DR]
δ21 [DD·SR]
δ31 [DD·MER]
δ41 [DD·MOR]
δ12 [SD·DR]
δ22 [SD·SR]
δ32 [SD·MER]
δ42 [SD·MOR]
δ13 [MD·DR]
δ23 [MD·SR]
δ33 [MD·MER]
δ43 [MD·MOR]
846
PSYCHOMETRIKA
TABLE 3.
(Continued.)
Models
Model 3D-0
Est.
SE
Model 3D-1
z
Est.
γ23 [MD·FD]
γ33 [MD·FR]
γ43 [MD·NM]
γ53 [MD·OPO]
γ63 [MD·PO]
γ73 [MD·SEM]
SE
Model 3D-2
z
Est.
SE
z
0.184
0.061
0.547
−1.486
−1.151
0.476
0.030
0.025
0.333
0.664
0.628
0.540
6.129
2.501
1.646
−2.237
−1.832
0.882
(3)
item difficulties are larger when the group level random item difficulty (βd ) is ignored. Vari(3)
(3)
ance estimates without βd are 5.161, 3.871, and 3.383 while those with βd are 1.627, 0.953,
and 0.340 for decoding, spelling, and meaning dimensions, respectively. Pearson correlations
between predicted item difficulties from the two different models are 0.508, 0.571, and 0.687 for
decoding, spelling, and meaning dimensions, respectively, which indicates that item difficulties
can be quite different when the dependency among items is ignored.
5.2. Consequences of Ignoring Random Residuals
There were large unexplained variances in item difficulties and abilities even after variabilities were explained by person-by-item, person, and item covariates. When person and item
random residuals are ignored in Model 3D-2, results of Model 3D-2 without residuals (results
not reported in a table) differed in two ways from those of Model 3D-2, which are consistent
with previous findings (Janssen et al., 2004). First, the effects of person and item covariates
were smaller in Model 3D-2 without residuals. Second, the standard errors of these estimates
are smaller in Model 3D-2 without residuals than in Model 3D-2. Consequently, significance
testing results of the effects of covariates were different between Model 3D-2 without residuals
and Model 3D-2. More significant covariate effects were found in Model 3D-2 without residuals
than in Model 3D-2.
5.3. Answers to Research Questions
Research Question 1. The fixed effects from Model 3D-0 in Table 3 indicated for the ‘average’
1
person and item, the probability of knowing the meaning of a derived word ( 1+exp(−2.069)
= 0.89)
1
= 0.86)
was somewhat higher than the probability of decoding the derived word ( 1+exp(−1.827)
1
= 0.70). The
which was higher than the probability of spelling the derived word ( 1+exp(−0.855)
population parameter estimates of random effects reported in Table 4 suggest that variability
was evident for the lexical representation dimensions across persons, items, and item groups.
The correlations for the person dimensions were also listed in Table 4. Among persons, as in
Model 3D-0, decoding and spelling were correlated at 0.843, decoding and meaning were correlated at 0.494, and spelling and meaning were correlated at 0.590. For research question 1, these
correlations indicated that decoding, spelling, and meaning of derived words are separate, yet
correlated person dimensions of lexical representations. Comparing variances and correlations
of person dimensions in Model 3D-1 and Model 3D-2 (the explanatory models) with those of
Model 3D-0 (the measurement model) revealed that individual differences in the three person dimensions were partly explained by dimension-specific person covariates and dimension-specific
person-by-item covariates. Residual variances in person, item, and item group dimensions were
still present in Model 3D-1 and Model 3D-2 as unexplained variances.
TABLE 4.
2.933
Item groups
σβ [Variance]
1.329
1.548
2.016
1.627
0.953
0.340
Decoding
Spelling
Meaning
Var.
Items
σβ1 [Variance]
σβ2 [Variance]
σβ3 [Variance]
Persons
Σθ [Var.-Corr.]
0.843
0.494
Decoding
Corr.
Model 3D-0
0.590
Spelling
Decoding
Spelling
Meaning
2.527
1.638
0.949
0.330
1.103
1.253
1.733
Var.
0.812
0.404
Decoding
Corr.
Model 3D-1
0.510
Spelling
Decoding
Spelling
Meaning
1.432
0.427
0.772
0.220
0.511
0.715
1.333
Var.
0.673
0.237
Decoding
Corr.
Model 3D-2
Estimates and standard errors (SE) of population parameters of random effects in explanatory multidimensional multilevel random item response model.
0.358
Spelling
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
847
848
PSYCHOMETRIKA
TABLE 5.
Item difficulties from Model 3D-0.
Derived words
Item level
Decoding
benefactor
thermosphere
stratagem
financially
biometric
phonetic
aquascape
telegraph
distinguish
irrelevant
circumscribe
significance
dormant
expeditious
veritable
residence
economical
usurpatory
irascible
tranquility
perception
migratory
dictator
convention
disingenuous
discretionary
sagacity
diagnostician
disenchantment
reinstate
extremity
restriction
covetousness
precedent
hypersensitivity
unquestionably
malignant
composite
nationalistic
Spelling
Item-group level
Meaning
β
Rank
β
Rank
β
Rank
β
Rank
−0.932
−0.754
−0.516
−0.430
−0.239
−0.235
−0.211
−0.208
−0.206
−0.178
−0.177
−0.160
−0.146
−0.142
−0.116
−0.102
−0.027
−0.018
0.002
0.034
0.037
0.040
0.056
0.066
0.095
0.102
0.114
0.202
0.244
0.248
0.256
0.283
0.315
0.320
0.328
0.419
0.467
0.723
0.760
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1.196
0.385
0.855
1.319
−0.946
0.711
−0.835
0.188
0.673
1.298
−0.305
1.339
0.408
0.549
0.454
0.986
−0.234
0.045
1.284
−0.291
−0.505
−0.968
−0.155
−0.561
0.212
−0.045
−0.182
0.249
−1.414
−0.734
−0.442
−0.800
0.032
0.069
−0.392
−0.768
−0.587
−1.005
−2.089
5
13
7
2
35
8
34
16
9
3
25
1
12
10
11
6
23
18
4
24
28
36
21
29
15
20
22
14
38
31
27
33
19
17
26
32
30
37
39
1.563
1.886
1.701
−0.659
1.855
0.815
2.262
−0.475
−1.091
−1.491
1.418
−1.800
−0.412
0.130
0.307
−1.840
−0.569
1.241
−0.093
−0.203
0.561
1.043
−1.406
0.021
0.480
0.210
1.141
−0.448
0.969
−0.025
−0.127
−1.153
0.189
−0.685
−0.941
−1.084
0.147
−1.786
−0.238
5
2
4
29
3
11
1
27
33
36
6
38
25
18
14
39
28
7
21
23
12
9
35
19
13
15
8
26
10
20
22
34
16
30
31
32
17
37
24
−1.554
−1.932
1.244
−0.841
−1.630
1.628
−0.312
−2.072
−1.677
−0.228
0.088
−0.505
−0.747
0.695
0.945
−1.166
−1.975
2.221
3.803
−0.970
−0.227
−0.749
−2.532
−1.121
2.337
1.119
2.482
1.702
−0.497
−0.157
0.625
−2.101
3.156
1.743
−0.070
−0.703
2.492
−0.065
−0.293
32
35
10
28
33
9
22
37
34
20
15
24
26
13
12
31
36
6
1
29
19
27
39
30
5
11
4
8
23
18
14
38
2
7
17
25
3
16
21
For descriptive purposes, predicted values of item difficulties were obtained from
Model 3D-0 and are reported in Table 5. Table 5 shows that words were variable in their difficulty for each of the three item dimensions, controlling for the other dimensions. Item-group
level difficulty can be interpreted as the word difficulty after accounting for the dimensionspecific item difficulties, i.e., decoding, spelling, and meaning difficulties, on the item side. Of
the decoding items, controlling for spelling and meaning, the easiest were benefactor and thermosphere whereas the most difficult were nationalistic and composite. The easiest spelling items,
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
849
controlling for decoding and meaning, were nationalistic and disenchantment and the hardest
were significance and irrelevant. Also, the easiest items to define, controlling for decoding and
spelling, were residence and significance and the hardest were aquascape and thermosphere.
Finally, after taking into account the difficulties of decoding, spelling, and meaning, the easiest
words (item-groups) were dictator and restriction whereas the most difficult were irascible and
covetousness. The word nationalistic is the epitome of item variability: it is the most difficult
word to read (rank = 1), the easiest item to spell (rank = 39), and moderately difficult to define
(rank = 24). After taking into account the difficulty of decoding, spelling, and defining nationalistic, the word nationalistic was a word of average difficulty (rank = 21). Substantively, the
average ranking of nationalistic is intuitive when considering that it has approximately average
scores on the item-group features (e.g., consistency, frequency, number of morphemes, etc.),
which we expect to explain item-group dependencies.
Research Question 2. Because we were interested in the effects of various aspects of root-word
knowledge on dimensions of lexical representations, we added dimension-specific root variables to the model. Model 3D-1 was without the effects of person and item covariates whereas
Model 3D-2 included all effects. Comparing the root effects across the models in Table 3 revealed that although the estimates were somewhat different between the two models, the same
pattern of significance emerged. Coefficient interpretation is based on Model 3D-2 (see third
column of Table 3 for results). Controlling for overall knowledge of lexical representations and
the other root effects, root decoding was only significantly associated with derived decoding
(
δ11 = 0.656, z = 5.077, p < 0.001) not derived spelling (
δ12 = 0.001, z = 0.007, p = 0.995) or
meaning (
δ13 = 0.101, z = 0.712, p < 0.001). Similarly, controlling for all the other variables,
root spelling was only significantly associated with its corollary, derived spelling (
δ22 = 0.454,
z = 4.409, p < 0.001), not decoding (
δ21 = −0.035, z = −0.333, p = 0.739) or meaning (
δ23 =
0.052, z = 0.449, p = 0.653). On the other hand, root meaning and morphology were significantly associated with all three lexical representation dimensions: derived decoding (
δ31 = 0.445,
z = 3.387, p < 0.001 and δ41 = 0.183, z = 1.974, p = 0.048, respectively), derived spelling
(
δ32 = 0.634, z = 4.593, p < 0.001 and δ42 = 0.268, z = 3.047, p = 0.002, respectively), and
derived meaning (
δ33 = 0.777, z = 5.390, p < 0.001 and δ43 = 0.460, z = 4.570, p < 0.001,
respectively).
Research Question 3. To address our research question regarding effects of person and item
group characteristics on lexical representations, we added dimension-specific person and item
group effects to the model and fit Model 3D-2. See the third column of Table 3 for results. Controlling for all other variables in the model, the person characteristic of morphological awareness
was significantly associated with derived decoding (
ϑ11 = 0.058, z = 6.080, p < 0.001) and
spelling (
ϑ12 = 0.040, z = 3.711, p < 0.001). Reading comprehension was significantly related
to derived spelling (
ϑ22 = 0.007, z = 2.187, p = 0.029) and meaning (
ϑ23 = 0.010, z = 2.370,
p = 0.018), controlling for all other effects, but, vocabulary knowledge was only related to derived meaning (
ϑ33 = 0.014, z = 3.274, p = 0.001).
Controlling for the other variables in the model, three item characteristics were significantly
related to derived decoding. Frequency of the derived (
γ21 = 0.081, z = 2.892, p = 0.004) and
root word (
γ31 = 0.046, z = 2.013, p = 0.044), plus phonological opaqueness (
γ61 = −1.254,
z = −2.149, p = 0.032). Frequency of the word and phonological opaqueness were also significantly related to derived spelling (
γ22 = 0.089, z = 2.798, p = 0.005, and γ62 = −1.433,
z = −2.152, p = 0.031, respectively), in addition to orthographic-phonological opaqueness
(
γ52 = −1.877, z = −2.636, p = 0.008). Frequency of the derived word (
γ23 = 0.184, z =
6.129, p < 0.001), frequency of the root word (
γ33 = 0.061, z = 2.501, p = 0.012), and
orthographic-phonological opaqueness (
γ53 = −1.486, z = −2.237, p = 0.025) were significantly related to derived meaning, controlling for all other variables in the model.
850
PSYCHOMETRIKA
6. Discussion
In this study, multidimensional item response models were extended to a model with multilevel random item parameters to investigate how aspects of morphological knowledge contribute
to lexical representations for different persons, items, and item groups, taking a different approach from most researchers in reading education.
6.1. Summary of Empirical Study
First, results suggest that lexical representations are best represented by three separate but
correlated dimensions of decoding, spelling, and meaning rather than two (decoding + spelling
and meaning) or one dimension (decoding + spelling + meaning). Second, root-word (e.g., isolate) knowledge differentially contributes to the decoding, spelling, and meaning represented
within the lexical representation of a related derived word (e.g., isolation). Controlling for other
covariates in the model, knowledge of root-word decoding, meaning, and morphological relatives
were significant covariates of derived-word decoding. Knowledge of root-word spelling, meaning, and morphological relatives were significantly related to derived-word spelling. Knowledge
of root-word meaning and morphological relatives were significantly related to derived-word
meaning. Our third finding is that variability in lexical representations that remains after controlling for root-word knowledge can partly be explained by person and item characteristics. Person
characteristics of reading comprehension, morphological awareness, and vocabulary knowledge
and item group features of frequency of the derived and root word, orthographic-phonological
opaqueness, and phonological opaqueness were significantly related to knowledge of lexical representations, although different patterns of significant relations emerged for the separate dimensions. A more in-depth discussion of the substantive results can be found in Goodwin et al.
(2012).
6.2. Methodological Discussions
Main extensions of the multidimensional item response models presented in this paper were
to allow random difficulty parameters at both item and item group level. The purpose of having
random item parameters at each level was different. Item-level random item parameters were
used to model unexplained variance left over when item related covariates were used to explain
variations in item difficulties. Group-level random item parameters were employed to model
dependency in item responses among items having the same item stem. Consequences of ignoring the random residuals and the multilevel item structure were shown. Several methodological
justifications for using the EMMRIRM that we presented in this study are described below.
Measurement Model To check if there were large variations in item discriminations among
items in our empirical data, a 2-parameter item response model with random item difficulty and
discrimination parameters was fit using a hierarchical Bayesian approach. Normal distributions
on item parameters were used as priors. Half-t priors on standard deviation were used as hyperpriors, recommended for small sample sizes (Gelman & Hill, 2007, p. 428). Mean and variance
of item discriminations were 1.353 and 0.009, respectively, while those of item difficulties were
−1.305 and 3.773, respectively. The small variance of item discriminations indicated that the
quality of items to discriminate people was not substantially different among items and there was
small variability in item discriminations to be explained by person-by-item and item covariates.
Thus, a model without discriminations was chosen as a measurement model in explanatory item
response models.
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
851
Omission of Covariates The simultaneous estimation of effect of covariates and population parameters of random parameters was done in explanatory item response models. Even
though the measurement error of the estimated parameters can be considered in estimating the
effect of covariates, parameters estimates can be affected by the inclusion of covariates (Hickendorff, Heiser, van Putten, & Verhelst, 2009; Verhelst & Verstralen, 2002). In our illustration,
there were person covariates and person-by-item covariates to explain a multidimensional person parameter and dimension-specific item covariates and person-by-item covariates to explain
a multidimensional item parameter. Because we allow the multidimensional residuals for both
persons and items in the model, the omission of additional covariates does not lead to the problem of predicting individual person and item parameters, respectively. However, the significance
testing results for the effects of covariates can be different, depending on sets of covariates that
are considered. Possible alternative models with different sets of covariates were fitted. However, interpretations of the effects of covariates were similar among the alternative models we
considered.
Downward Bias in Using Laplace Approximation Laplace approximation implemented in
the R lmer function was used to estimate model parameters. Laplace approximation can perform poorly for a dichotomous response with small cluster sizes (e.g., the number of items),
with a downward bias in the estimated variance components (Cho & Rabe-Hesketh, 2011; Goldstein & Rasbash, 1996; Raudenbush, Yang, & Yosef, 2000; Rodríguez & Goldman, 1995). The
downward bias in variance components can lead to overestimated correlations (Cho et al., 2012).
However, our empirical data set has a large number of items and item groups (117 items and
39 item groups) so downward bias is not of much concern, given results of previous simulation
studies (Cho & Rabe-Hesketh, 2011; Cho et al., 2012).
Distributional Assumption on Item Difficulty Parameters In the R lmer function, all random effects for persons and items are assumed to be multivariate normal or normal distribution.
It is usual to assume person parameters to be from a normal or multivariate distribution as this
often occurs in marginal maximum likelihood estimation for item response models. However, it
may not be usual to use random item parameters in most IRT applications. A normal distribution has been used for item difficulties in random item response model applications in various
estimation methods (Geerlings, Glas, & van der Linden, 2011; Glas & van der Linden, 2003;
Sinharay et al., 2003). It was shown that item difficulties can be predicted using empirical Bayes
prediction precisely with the normality assumption even when the “true” population distribution
departs from a normal distribution (Hofman & De Boeck, 2010). In addition, it was shown that
item difficulties with a normal approximation are similar to those with a discrete approximation
as a nonparametric approach (Cho & Rabe-Hesketh, 2011).
Multidimensional Test Design We described the EMMRIRMs for our specific test design,
a between-item design where an item is allowed to load on only one dimension. A within-item
design such that an item is allowed to load on more than one dimension is not applicable to our
empirical data set. For example, one item such as “Please read the word nationalistic” would not
load onto spelling nationalistic or saying whether you know what nationalistic means. When a
test design is a within-item design, the model description, the model identification constraints,
parameter interpretation, and calculation of intraclass correlations for items and/or item groups
will be different from the descriptions presented in this paper.
The Number of Parameters in Random Item Modeling A multidimensional item response
model requires large enough sample sizes to estimate model parameters as a complex model.
However, with a random item approach such as in the EMMRIRM, the number of item parameters to be estimated is largely reduced compared to the numbers in a fixed item approach. For
852
PSYCHOMETRIKA
example, in the case of the measurement model of our application (Model 3D), the number of
item parameters is 4 including 3 variances of item dimensions and 1 variance of item-group dimension in the random item approach, while it is 156 including 117 (39 × 3) item difficulties
and 39 item-group difficulties in the fixed item approach. In the case of the most complex model,
Model 3D-2, the total number of parameters is 55 including 3 variances of 3 item dimensions,
1 variance of item-group dimension, 3 average logit values across persons and items (intercepts),
3 variances of 3 person dimensions, 3 correlations between person dimensions, and 42 effects of
covariates. Given 172 participants and structures of various covariates in the application, we did
not run into any estimation problems such as convergence problem to estimate the 55 parameters
of Model 3D-2. However, future research is required to investigate minimum sample sizes to
estimate model parameters via extensive simulation studies.
Model Extensions The EMMRIRMs can be extended to 2-parameter or 3-parameter item
response models. Unfortunately, the R lmer function we used in this study cannot be used for nonRasch models even though there is a procedure to implement 2-parameter random item response
models using the R lmer function (Jeon & Rabe-Hesketh, 2012). It will be more challenging to
estimate model parameters when all kinds of item parameters (i.e., difficulty, discrimination, and
guessing) are random. In this case, the high-dimensional integration should involve maximum
likelihood estimation. On the other hand, it may be easy to use a hierarchical Bayesian method
with MCMC for complex random item response models (e.g., Geerlings et al., 2011). However,
use of priors and hyperpriors on random item discriminations are not established yet in the psychometric literature (see Cho & Rabe-Hesketh, 2012 for survey of use of priors and hyperpriors
on a random item discrimination).
Model Selection AIC and BIC were used to compare alternative models in the empirical
study. The appropriate use of information criteria in latent variable models is an ongoing area of
research (e.g., Vaida & Blanchard, 2005). Further study is required to investigate the performance
of AIC and BIC when comparing models which differ with respect to the use of covariates and
a different number of dimensionality for random item response models having crossed random
effects.
6.3. Empirical Study Implications for Practice
The findings of this study have educational implications. First, results suggest that students
can improve the quality of their lexical representations through developing multiple dimensions
of word knowledge. For example, learning how to decode, spell, and determine the meaning of a
word will likely support lexical representations. Additionally, developing root-word knowledge
would be expected to further support lexical representations, perhaps through morphological instruction that teaches students about root words. Based on results with item features, person
characteristics, and their interactions, students will also need additional support with words that
are opaque and less frequent; and readers that struggle with vocabulary, comprehension, and
morphological awareness will need additional support in developing representations of higher
quality. Overall, educators can help students improve their lexical representations by providing
multiple exposures to words that develop multiple dimensions of word and root-word knowledge while also developing more general skills like better vocabulary knowledge, morphological
awareness, and reading comprehension. Because this study deepens understanding of lexical
representations, educators and researchers can use these findings to design more effective interventions to support the development of high quality lexical representations, which may, in turn,
improve text comprehension.
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
853
Appendix: List of Stimulus Words
Num.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
List of root words
List of derived words
benefit
sphere
strategy
finance
meter
phone
aqua
graph
distinct
relative
scribe
sign
dorm
expedite
verify
reside
economy
usurp
irate
tranquil
perceive
migrate
dictate
convene
genuine
discrete
sage
diagnose
enchant
state
extreme
strict
covet
precede
sense
quest
malign
compose
nation
benefactor
thermosphere
stratagem
financially
biometric
phonetic
aquascape
telegraph
distinguish
irrelevant
circumscribe
significance
dormant
expeditious
veritable
residence
economical
usurpatory
irascible
tranquility
perception
migratory
dictator
convention
disingenuous
discretionary
sagacity
diagnostician
disenchantment
reinstate
extremity
restriction
covetousness
precedent
hypersensitivity
unquestionably
malignant
composite
nationalistic
References
Adams, R.J., Wilson, M., & Wu, M. (1997). Multilevel item response models: an approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19,
716–723.
Anglin, J.M. (1993). Vocabulary development: a morphological analysis. Monographs of the Society for Research in
Child Development, 58, 1–166. Serial #238.
Balota, D.A., Yap, M.J., Cortese, M.J., Hutchison, K.A., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson,
G.B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39, 445–459.
Bates, D., Maechler, M., & Bolker, B. (2011). lme4: linear mixed-effects models using S4 classes. R package version
0.999375-39. http://CRAN.Rproject.org/package=lme4.
854
PSYCHOMETRIKA
Baayen, R.H., Piepenbrock, R., & Van Rijn, H. (1993). The CELEX lexical data base on CD-ROM. Philadelphia: Linguistic Data Consortium.
Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy,
& I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359). Hillsdale: Erlbaum.
Bowey, J.A., & Muller, D. (2005). Phonological recoding and rapid orthographic learning in third graders’ silent reading:
a critical test of the self-teaching hypothesis. Journal of Experimental Child Psychology, 92, 203–219.
Carlisle, J.F., & Stone, C.A. (2005). Exploring the role of morphemes in word reading. Reading Research Quarterly, 40,
428–449.
Carlisle, J.F., & Katz, L.A. (2006). Effects of word and morpheme familiarity on reading of derived words. Reading &
Writing, 19, 669–693.
Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random
effects. Computational Statistics & Data Analysis, 55, 12–25.
Cho, S.-J., Partchev, I., & De Boeck, P. (2012). Parameter estimation of multiple item profiles models. British Journal of
Mathematical & Statistical Psychology, 65, 438–466.
Cho, S.-J., & Rabe-Hesketh, S. (2012). Random item discrimination marginal maximum likelihood estimation of item response models. Paper presented at the National Council on Measurement in Education, Vancouver, British Columbia.
Cho, S.-J., De Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2013). Additive item structure models with random residuals: item modeling for explanation and item generation. Unpublished manuscript.
Cunningham, A.E., Perry, K.E., Stanovich, K.E., & Share, D.L. (2002). Orthographic learning during reading: examining
the role of self-teaching. Journal of Experimental Child Psychology, 82, 185–199.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of
item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39, 1–28.
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach.
New York: Springer.
Deacon, S.H., & Bryant, P. (2005). What children do and do not know about the spelling of inflections and derivations.
Developmental Science, 8, 583–594.
Deacon, S.H., & Bryant, P. (2006). Getting to the root: young writers’ sensitivity to the role of root morphemes in the
spelling of inflected and derived words. Journal of Child Language, 33, 401–417.
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–
374.
Geerlings, H., Glas, C.A.W., & van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76,
337–359.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge
University Press.
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological
Measurement, 27, 247–261.
Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of
the Royal Statistical Society. Series A, 159, 505–513.
González, F., De Boeck, P., & Tuerlinckx, F. (2008). A double-structure structural equation model for three-mode data.
Psychological Methods, 13, 337–353.
Goodwin, A.P., Gilbert, J.K., & Cho, S.-J. (2013). Morphological contributions to adolescent word reading: an item
response approach. Reading Research Quarterly, 48, 39–60.
Goodwin, A.P., Gilbert, J.K., Cho, S.-J., & Kearns, D. (2012). Probing lexical representations: simultaneous modeling
of word and reader contributions to multidimensional lexical representations. Unpublished manuscript.
Harm, M.W., & Seidenberg, M.S. (2004). Computing the meanings of words in reading: cooperative division of labor
between visual and phonological processes. Psychological Review, 111, 662–720.
Hickendorff, M., Heiser, W.J., van Putten, C.M., & Verhelst, N.D. (2009). Solution strategies and achievement in Dutch
complex arithmetic: latent variable modeling of change. Psychometrika, 74, 331–350.
Hofman, A., & De Boeck, P. (2010). The utility of models with multidimensional random person and random item
(stimuli) variables: a recovery study with lmer. Unpublished manuscript, University of Amsterdam.
Janssen, R., & De Boeck, P. (1999). Confirmatory analyses of componential test structure using multidimensional item
response theory. Multivariate Behavioral Research, 34, 245–268.
Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson
(Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 189–212). New York:
Springer.
Jeon, M., & Rabe-Hesketh, S. (2012). Profile-likelihood approach for estimating generalized linear mixed models with
factor structures. Journal of Educational and Behavioral Statistics, 37, 518–542.
Johnson, M.S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling.
Applied Psychological Measurement, 29, 369–400.
MacGinitie, W.H., MacGinitie, R.K., Maria, K., & Dreyer, L.G. (2000). Gates-MacGinitie reading tests. Itasca: Riverside
Publishing.
Mahony, D.L. (1994). Using sensitivity to word structure to explain variance in high school and college level reading
ability. Reading & Writing: An Interdisciplinary Journal, 6, 19–44.
McCutchen, D., Logan, B., & Biangardi-Orpe, U. (2009). Making meaning: children’s sensitivity to morphological information during word reading. Reading Research Quarterly, 44, 360–376.
SUN-JOO CHO, JENNIFER K. GILBERT, AND AMANDA P. GOODWIN
855
Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters.
Applied Psychological Measurement, 12, 281–296.
Nagy, W.E., & Anderson, R.C. (1984). How many words are there in printed school English? Reading Research Quarterly, 19, 304–440.
Nation, K., Angell, P., & Castles, A. (2007). Orthographic learning via self-teaching in children learning to read English:
effects of exposure, durability, and context. Journal of Experimental Child Psychology, 96, 71–84.
Perfetti, C.A. (2007). Reading ability: lexical quality to comprehension. Scientific Studies of Reading, 11, 357–383.
Perfetti, C.A., & Hart, L. (2001). The lexical basis of comprehension skill. In D.S. Gorfein (Ed.), On the consequences of
meaning selection: perspectives on resolving lexical ambiguity, Washington: American Psychological Association.
Perfetti, C.A., & Hart, L. (2002). The lexical quality hypothesis. In L. Verhoeven, C. Elbro, & P. Reitsma (Eds.), Precursors of Functional Literacy. Amsterdam/Philadelphia: Benjamins.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent
variable models with nested random effects. Journal of Econometrics, 128, 301–323.
Raudenbush, S.W., Yang, M., & Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random
effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9,
141–157.
Rodríguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society. Series A, 158, 73–89.
Rodríguez, G., & Elo, I. (2003). Intra-class correlation in random-effects models for binary data. Stata Journal, 3, 32–46.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Singson, M., Mahony, D., & Mann, V. (2000). The relation between reading ability and morphological skills: evidence
from derivational suffixes. Reading & Writing: An Interdisciplinary Journal, 12, 219–252.
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using
family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.
Tennessee Department of Education Report Card (2011). Tennessee Department of Education Report Card. Retrieved
from http://edu.reportcard.state.tn.us/pls/apex/f?p=200:1:4463180647001600on8-28-11.
Tyler, A., & Nagy, W. (1989). The acquisition of English derivational morphology. Journal of Memory and Language,
28, 649–667.
Tyler, A., & Nagy, W.E. (1990). Use of derivational morphology during reading. Cognition, 36, 17–34.
Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351–370.
Vellutino, F.R., Tunmer, W.E., Jaccard, J.J., & Chen, R. (2007). Components of reading ability: multivariate evidence for
a convergent skills model. Scientific Studies of Reading, 11, 3–32.
Verhelst, N.D., & Verstralen, H.H.F.M. (2002). Structural analysis of a univariate latent variable (SAUL) (computer
program and manual). Arnhem: CITO.
Zeno, S.M., Ivens, S.H., Millard, R.T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster: Touchstone
Applied Science Associates.
Manuscript Received: 21 APR 2012
Final Version Received: 28 AUG 2012
Published Online Date: 15 MAR 2013