Measurement and Research Department Reports
A Logistic Latent Class Model for
Multiple Choice Items
H.H.F.M. Verstralen
97-1
Measurement and Research Department Reports
A Logistic Latent Class Model for Multiple Choice Items
H.H.F.M. Verstralen
Cito
Arnhem, 1997
97-1
This manuscript has been submitted for publication. No part of this manuscript
may be copied or reproduced without permission.
Abstract
A logistic latent class model for the analysis of options of a class of multiple
choice items is presented. For each item a set of latent classes with a chain
structure is assumed. The probability of latent class membership is modeled by a
logistic function. The conditional probability of the observed response, the
selection of an option, given the latent class membership is assumed to be
constant. The model can be viewed as a generalization of Nedelsky’s (1954)
method to determine a pass/fail score. Apart from giving a more detailed model
on the process of solving a multiple choice item an increase in the precision of
latent variable estimates in comparison with binary scoring is achieved. The model
is shown to possess some favorable psychometric properties.
Key words: item response models, latent class models, maximum marginal
likelihood, ability estimation, multiple choice data.
1
2
Introduction
It has long been realized that binary scoring of multiple choice questions neglects
valuable information on the latent variable that is contained in the particular
choice of an option. Using a logistic function to model the trace lines of options
Bock (1972), for instance, found that using the information in the selected options
has the same effect on increased precision of estimated ability as doubling the test
length. The increase in precision, however, is largely restricted to abilities with
lower test scores. Thissen (1976) found similar results in an application of Bock’s
model to the Raven test.
Samejima (1979) noticed an awkward property of Bock’s (1972) model. The
model implies that each item has an option that is selected with probability
approaching unity as ability decreases to minus infinity. The examples provided
by Bock show, however, that he was aware of this and assumed that the
’awkward’ option was explicitly mentioned among the options as "no response".
This property is also clearly visible in many examples of fitted trace lines in
Thissen (1976). Samejima solved the problem for multiple choice items without
an explicit "no response" option, by the introduction of two latent classes per
item, analogously to Birnbaum’s 3PL. Respondents in the first latent class respond
according to Bock’s model, and respondents in the second latent class, called
"Don’t Know" select randomly from the complete set of options. Clearly, the
observed "No response" by Bock is identical with the nonobserved "Don’t know"
by Samejima. The probability of latent class membership is given by a logistic
function of the latent variable. Thissen and Steinberg (1984, 1997) enlarged
Samejima’s model by estimating the conditional probabilities for the options given
the "Don’t Know" latent class. This adds Σi J i parameters to the model, where
Ji
1 denotes the number of options of item i. An item with five options has 14
parameters in this model, although the data analyst may impose restrictions.
The SERE (Solution-Error Response-Error) model of Kelderman (1988)
assumes the same two latent classes, but is formulated as an incomplete latent
class analysis model (Haberman, 1979). The SERE model was generalized to the
GSERE model by Westers and Kelderman (1993), and Westers (1993). Whereas
the SERE model described a unidimensional latent trait and two latent classes,
the GSERE models generalize to a multidimensional latent trait and polytomously
scored latent classes.
3
This ’trend’ towards more latent classes to account for option selection is
broken by Ramsay (1991) and Abrahamowicz and Ramsay (1992), where, like in
Bock’s (1972) model trace lines of options are modeled directly, without the
mediation of latent classes. Because they use versatile M-splines, instead of a
logistic model, their approach does not suffer from the peculiarity inherent in the
model of Bock.
In this report one type of IRT models for multiple choice questions called
Nedelsky models after Nedelsky (1954), will be treated. In these models latent
classes are associated with an item, and are characterized by the subset of options
from which the subject selects his response. The correct option is assumed to be
a member of all allowed option subsets. We will distinguish two types of latent
class structures. In the first latent class structure, each option has a positive
probability, dependent on the ability of the respondent, to be a member of the
latent class from which the response is selected. Because all subsets of incorrect
options are allowed in this model it is called the Full Nedelsky model, or for short
the Full model. In the second model the difficulty order among the options
restricts the set of latent classes to a chain that follows the difficulty order of the
options. Only subsets of options are allowed with the following property. If a
certain option is not a member of the selection set, all options that are more easily
exposed as false aren’t a member either. This latter model is called the Chain
Nedelsky model, or for short the Chain model. As already mentioned it is
assumed that the correct option is contained in all latent classes. This report is
concerned with the Chain model.
Nedelsky models, like the GSERE model, follow the trend towards more latent
classes per item to account for option selection. However, whereas (G)SERE
moves toward Latent Class Analysis, these models remain within the tradition of
item response theory. In particular, the Chain model treated here follows the
OPLM and GPCM traditions (Verhelst, a.o., 1994, Verhelst & Glas, 1995, Muraki,
1992). In the (G)SERE model every possible response vector defines a latent
state, which causes the number of latent states to grow exponentially with the
number of items. Consequently, with the usual number of items the amount of
latent states creates a problem. In the Chain Nedelsky model it is assumed that
the probability of membership of a latent class is modeled by a logistic function
of a latent variable. We will assume a small number of latent classes per item, so
that the number of latent classes only grows linearly, and not very fast, with the
4
number of items. Like in GSERE models it is also assumed that membership of
a latent class is associated with a latent item score.
The chain structure of the allowed subsets of options in the Chain model, can
have a foundation in developmental psychology. For instance, the order of the
options may implement a sequence of Piagetian mental operations that are
invariably acquired one after the other. Another source for an order among the
options is the introduction of degrees of partial correctness.
It is an important property of Nedelsky models that the probability of selecting
the correct option strictly increases with θ , this property does not hold for the
other models discussed so far.
The Chain Nedelsky Model
In the Chain model, the space of latent classes related to an item is reduced to a
small number of classical subsets of options, with a chain structure. This means
that the first latent class selects randomly from all options, the second latent class
from all but the easiest false option, etc., and the last latent class selects only from
the correct option, and therefore always answers correctly. A multiple choice item
that behaves according to this model is called a ’Chain Nedelsky item’. Nedelsky
(1954) conceived of a method to determine pass/fail score for a multiple choice
test:
1 Imagine a person with a borderline ability, whose passing or failing the test is
considered immaterial.
2 Determine for every item the options that such a person must be able to
recognize as false.
3 Assume that the chosen option is randomly drawn from the remaining options.
4 By adding the probabilities of a correct response one obtains an estimate of
the expected score of a borderline subject. This score is taken as the pass/fail
score.
The Nedelsky procedure explicates a hypothesis about the solution of multiple
choice questions, and it is interesting in its own right to build a model that enables
to investigate and test this hypothesis. Therefore, Nedelsky’s method is
generalized in two directions: 1. the selection of the subset of incorrect options is
probabilistic, and 2. can be modeled for all ability levels.
5
In the sequel it is always assumed that the options are indexed in their order
of easiness, called their natural order. For instance, if an item has four options
{0123} in their natural order then option 0 is most easily exposed as false, and
option 3 is correct.
The set of subsets of options of a Chain item not recognized as false is denoted
by A N . Denote an element αr ∈ A N as an indicator vector on the ordered set of
alternatives of an item. Taking as an example a Chain item with four options, and
omitting comma’s between the elements of a vector α , the set A N can be written
as {(1111), (0111), (0011), (0001)}. Clearly option 0 is recognized as false most
easily, and option 3, the correct option, is the most difficult (impossible) to
recognize as such. However, when a person tries this item his latent class α ∈ A N
is not observed. If he recognizes the set of options {01} as false, he is presumed
to be a member of the latent class α2 = (0011), also denoted as latent class 2, and
earns a latent item score 2. Being a member of α2 he selects randomly from the
option set {23}, and consequently has a probability 0.5 to produce the observed
correct response by selecting option 3.
In the above example the number of options equals the number of latent
classes. However, it may be the case that the options {01} are (almost) invariably
together recognized as false. When this happens only three latent classes remain
{(1111), (0011), (0001)}. Consequently, the number of latent classes is equal to or
less than the number of options. Moreover, in such a case the latent score is no
longer equal to the number of options recognized as false. The two simultaneously
recognized options are counted as one, so that the index of the latent class equals
the latent score in all cases. In our example the latent scores of the latent classes
{(1111),(0011),(0001)} are respectively 0, 1, and 2.
If a person has a certain ability with respect to a set of Chain items, then not
only can this ability be estimated more accurately than with models for binary
scored MC items but it also allows a more informative interpretation. A certain
ability implies for each item not only the expected number of options that are
recognized as false, but also the most probable subset of options recognized as
false.
Because the assumptions of the Chain Nedelsky model may be unduly strict for
some items, a relaxation in the same vein as Thissen and Steinberg (1984) was
investigated. By dropping the assumption that the subject selects the alternative
of his choice from the remaining alternatives on a purely random basis,
6
alternatives perceived as correct may still differ in attractiveness. This relaxation
is, however, introduced at the cost of doubling the number of parameters.
Moreover, it appeared that this relaxed model runs into serious estimation
problems, because of indeterminacy.
Although the relaxed model will be dropped in the sequel as a viable model it
still is instructive to start the formal treatment with the relaxed Chain model. The
Chain model proper is then easily indicated. Consider a set of multiple choice
items
i (i
1 ,..., I ) with options j ( j
0 ,..., J i ) , in their natural order, with J i the correct
option. Let θ be a real latent ability parameter. An underscored symbol denotes
a vector. Denote the ordered set of latent classes associated with item i with
<α i 0 ,..., α i R > , also denoted with their ordered set of indices <0 ,..., R i > , with
i
R i ≤ J i . Denote the random variable that takes values in the set of latent classes
of i with A i . Let η i r be a real parameter associated with latent class r of item
i, and a i a discrimination parameter. The probability of being a member of
r ∈ { 0 ,..., R i } is modeled by
ψi r ; θ
ψi r θ
ψ Ai
αi r θ
exp a i r θ
Ri 1
exp a i t θ
ηi r
.
(1)
ηi t
t 0
To identify the model η i 0 is set equal to 0. Thus, a parameter vector η is
identified by (η i 1 ,..., η i R ) . Occasionally the reparameterization β i r
i
ηir
ηir
1
is used. Furthermore, a i is a discrimination parameter, which may be estimated,
as in the 2PL and 3PL models (Birnbaum, 1968) and GPCM (Muraki, 1992), or
considered known as part of the model hypothesis, as in OPLM, or indeed in the
Rasch model, where all a i are presumed to be equal. Considering a i a constant
will be referred to as the OPLM approach, estimating a i as the GPCM approach.
Attach to alternative j of item i the parameter νi j > 0 . Identify the symbol ν i j
with the vector of length J i
1 with all elements equal to zero except element j,
which is equal to ν i j , and denote with ν i the vector of length J i
1 with element
j equal to ν i j . Denote the random variable that takes values in the set of options
of item i with X i . Then the conditional probability of selecting option j of item i (X i
given that the respondent is a member of latent class r is modeled by:
7
j)
Pi r ( j)
Pi r j
P Xi
νij , αir
αir
j Ai
,
νi , αir
(2)
where (x , y ) denotes the inner product of the vectors x and y . In the (unrelaxed)
Chain model ν i is assumed a unit vector 1. For a four option Chain item with all
four latent classes Formula (2) generates conditional probabilities of the options
given the latent class as shown in Table 1. For instance, with ν
1 if a person
scores in the "don’t know anything class" (1111) his probability to select any
option equals 1/4. The ν parameter vector is identified up to a multiplicative
constant. Therefore, we arbitrarily set νi J
i
1.0 . Formula (2) follows the model
on choice behavior of Luce (1959).
Table 1
The conditional probabilities for a four option Chain item
Latent
class
Option
0
1
2
3
¼
¼
¼
r
α
0
1111
¼
1
0111
0
2
0011
0
0
½
½
3
0001
0
0
0
1
It is obvious from Formula (1) that a person obtains a latent score r on item i
when he is a member of latent class αi r . Combining Formulas (2) and (1) gives
the probability function to select option j of item i
fi ( j ; θ )
P Xi
j θ
Ri
Pi t j ψi t (θ) .
t 0
8
(3)
Figure 1
Latent characteristic curves of a four option item with three latent
item parameters β = (-0.5, 0.0, 0.5)
Figure 2
Option characteristic curves of a four option Chain item with three latent
item parameters β = (-0.5, 0.0, 0.5)
Figure 1 shows an example of the latent class probability functions of a Chain
item with a
1 , and four latent classes with parameter vector β = (-0.5, 0.0, 0.5).
The option probability functions are displayed in Figure 2.
9
Some Characteristic Properties
Love (1997) gives attention to two important properties of models for multiple
choice questions. The first property states that the probability of selecting the
correct option rises with θ . A model with this property is called monotone. The
second property states that the likelihood ratio of the correct option and a false
option is nondecreasing in θ for all false options. An item with this property is
said to possess rising selection ratios. It is shown that the Chain model is
monotone, and also that Chain items possess a stronger property than rising
selection ratios. Given the natural order of the options of a Chain item it is shown
that the likelihood ratio of two options j and k is rising if the order of j is larger
than the order of k. One may call this property rising natural ratios.
A third property states that the conditional probabilities to select an option given
θ follow the natural order of the options. Therefore, the same holds for the
marginal probabilities as well. The section is closed with a discussion on the
problems with the relaxed model.
1 The Chain Nedelsky model is monotone
The probability to score in the correct option strictly increases with θ in the
Chain model. This also holds for the unrelaxed Chain model with variable ν . The
derivative of the probability to select the correct option J i is given by
s
PJ s ∂θ ψs (θ)
i
a i
s PJ s ψs (θ)
i
s
s (θ) f i J i ; θ
a i Cov s, PJ
is
θ > 0,
with s (θ ) , the mean latent score at θ . The last inequality holds because it follows
from Formula (2) that PJ
is
increases with s, and at each value of θ the variance
of s is positive. If latent classes are allowed to have different discriminations ai s ,
a sufficient condition for the property to hold is that the class weight ai s s
increases with s.
2 A Chain Nedelsky item has rising natural ratios
Consider a Chain item i with options 0 ,..., J i indexed in their natural order, and
assume that the number of different latent classes also equals the number of
options. Index the latent class with the index of the easiest option that it contains.
For instance, latent class 0
{0 ,..., J i} , latent class 1
10
{1 ,..., J i} , and latent class
Ji
{J i} . Denote the probability to select option j with f j
f i ( j ; θ) (Formula
(3)). Derivation with respect to θ is indicated by a prime. Item i has rising natural
ratios iff
fj
fj
′
′
≥ 0 ⇔ f j fj
f j fj
1
1
≥ 0.
1
The last expression equals
′
fj
′
1
Pj j ψj fj
fj
1
′
Pj j ψj fj
1
1
(4)
′
Pj j ψj fj 1
′
ψj fj 1
,
which is larger than 0 iff
a ψj j
s fj
1
ψj
j 1
Pj
′
1, r
r 0
j 1
j fj
r Pj
1
1 , r ψr
ψr ≥ 0 ⇔
≥ 0.
r 0
The left part of the last expression is larger than fj
1
≥ 0.
If a latent class is omitted and R i < J i , there are options that form a tie in the
natural order. The likelihood ratio of tied options is identically equal to 1.
As is clearly visible from option 2 in Figure 2, rising natural ratios does not imply
that the characteristic curve of a false option is decreasing in θ .
From this result, and the second part of Formula (4) directly follows the
interesting property
ψ (θ)
j
f (θ)
k
′
≥ 0 , with k < j ,
which means that the ratio of the probability of a latent class and the probability
of any option that it does not contain is rising in θ .
11
3 The conditional option probabilities of the options given θ follow the natural
order of the options, and, therefore also their p-values
Denote with r j the latent class where options 0 .. j 1 are recognized as false but
not options j .. J i . It follows from Formula (3) that in the Chain model
f( j ; θ)
f( j
1 ;θ)
1
ψ (θ) > 0 .
J i 1 j i rj
(5)
Consequently, the order of the conditional option probabilities of a Chain item
given
θ is invariant for θ . This means that the latent classes of a Chain item can be
inferred from the p-values of the options. The observed p-values can, of course,
deviate from the expected order as a result of random sampling. However, with
enough data, a change of order of options will only occur when the expected pvalues of options are relatively close. As a result the latent class where both
options have a different status will be almost empty, and can be omitted.
An example of observed counts from simulated data of a Chain item is given in
the left subtable of Table 3. The expected counts in the right subtable of Table
3 are based on estimation of the relaxed model with ν estimated as well.
Consequently the expected counts per row do not necessarily increase with the
natural order of the options, as is evident from the first row. Table 2 is also
exemplary to clarify the estimation problems with the relaxed Chain model. The
results are obtained from a simulated data set of 18 Chain items with all η parameters equal to 0, and, of course, all ν -parameters equal to 1.
Table 2
An example to show that the relaxed Chain Nedelsky model is not identified.
The table shows estimated relaxed Chain model parameters of a Chain item in a
simulated data set. The presented parameters deviate grossly from the original
parameters. Nevertheless the APD statistic shows an excellent model fit
(see also Table 3)
Label
APD
(df)
P(X2 >)
17
Item_17
8.347
(12)
0.75851
Lat
Alpha
1.
Itnr
Eta
SE-Eta
Nu
SE-Nu
0111
6.671
28.049
85.388
0.360
2.
0011
-2.329
0.616
178.415
6.917
3.
0001
-2.073
6.914
1.656
0.816
12
Table 3
Observed and expected option counts from a generated data set in
homogeneous ability groups of the Chain item in Table 2. The expected
counts are based on estimating the relaxed model. The options are naturally
ordered.
The contributions of the options to the PD-statistic are listed below the expected
counts
17
Grp
Item_17
Observed Counts
Expected Counts
Tot
1.
4
12
7
8
4.8
10.0
7.6
8.5
31
2.
5
7
7
15
3.4
7.2
10.2
13.2
34
3.
2
3
10
14
1.6
3.4
9.4
14.5
29
4.
0
1
19
30
0.9
2.0
15.1
31.9
50
5.
0
0
8
19
0.1
0.3
6.2
20.4
27
6.
0
0
2
27
0.0
0.1
4.4
24.5
29
Tot
11
23
53
113
11.0
23.0
53.0
113.0
200
PD
8.347
(12)
0.75851
2.2
1.4
4.0
0.8
Reestimation of the parameters in the relaxed model results in a parameter drift
that continues from iteration to iteration in the same pace. The parameter values
given in Table 2 were obtained after 100 iterations, and grossly deviate from their
original values. However, comparison of the observed and expected counts
(Table 3) reveals an excellent fit, as is also indicated by the Power-Divergence
statistic (see below) and its P-value. Clearly the relaxed Chain Nedelsky model is
under or indetermined. The conclusion must be that one cannot have the Thissen
and Steinberg (1984) advantage of differential option attractiveness in more than
one latent class. In the Thissen and Steinberg approach the differential
attractiveness is exclusive for the "don’t know" latent class. Because of the
apparent uniqueness problem with the relaxed model the sequel is restricted to
the Chain Nedelsky model proper.
Estimating the parameters in the Chain model can proceed along well known
pathways if the latent classes of all items are given. Therefore, in the next sections
it is assumed that for all items the subsets of options that identify the latent
classes are known. They will be inferred from the p-values of the options and
13
from the expected counts per latent class as is discussed in the section on initial
estimates.
MML Estimation of the Chain Nedelsky Model by an EM
algorithm
With a complete design the data
x v (v
1 ,..., V ) , where vector x v
X
consist of
V
response vectors
(xv 1 ,..., xv I) contains the responses of subject
xv i the index of the option chosen by v from the J i
v on I items, with
1
options of item i. Assume that the options are indexed according to their natural
order, the easiest false option is indexed 0 until the correct option with index J i .
Further, assume that for each respondent v a one dimensional latent variable θv
is randomly drawn from a density function g(.) . After drawing θv , for each item
a latent class r i is drawn from the multinomial distribution given by ψ i (θv )
(Formula (1)). The option xv i for item i is drawn from the multinomial given by Pi r
(Formula (2)).
Model estimation will be accomplished by an EM-procedure (Dempster, a.o.,
1977, McLachlan & Krishnan, 1997). By considering not only the person
parameter θv , but also the vector of latent classes r v as missing data a relatively
simple EM-procedure is obtained. Denote the joint distribution of the missing
data θ and r with g (θ , r) . Then the log marginal likelihood of X is given by
M
M λ;X
M
v
.
M
v
v
v
λ ; xv
v
ln ⌠
⌡
Pi r xv i g θ , r dθ
r
i
Assuming conditionally independent latent item responses
g θ,r
gθ
ψi r i ; θ ,
i
where Pi r (xv i ) is given by Formula (2), ψi (r ; θ ) by Formula (1), and λ the
vector of model parameters. Consistent estimation of the parameter vector λ is
achieved by maximizing M (Kiefer & Wolfowitz, 1956). This can be done with
the EM procedure, or the Newton-Raphson (NR) procedure. Although the NR
procedure converges much faster than EM algorithms, the latter is more robust
14
for less accurate starting values of λ , and is much cheaper per iteration.
Therefore, the EM procedure is preferred.
Bock and Aitkin (1981) show how the EM algorithm can be applied with a
continuous nonobserved variable by assuming g (.) to be a normal density, and to
perform numerical integration with the Gauss-Hermite procedure. The GaussHermite procedure (Press, a.o. (1992)) approximates the integral of the product
of a function f (x ) with a normal density g (x ) by
⌠ f (x ) g(x) dx ≈
⌡
Q
w q f (x q ) ,
q 1
with w q the appropriate weight at x q .
Distinguish the item parameters and the parameters for the marginal ability
distribution in the parameter vector λ as resp. λ1 and λ2 . The first step in the
EM algorithm is to specify the log likelihood of the complete data.
c
λ ; θ, R, X
cv
v
λ ; θv , r v , x v
(6)
ln P xv i rv i
v
i
ln ψ rv i ; λ1 θv
ln g θv ; λ2 .
Because θv and r v are not observed, they are considered random variables with
a conditional posterior distribution h (r , θ x v ) , see Formula (18) in Appendix B.
It is easily shown that for any model it holds that the first order derivatives of the
sum over subjects v of the posterior expected
cv ,
where the expectation is taken
over the posterior distribution, here h (r , θ x v ) , equals the first order derivatives
of the Marginal loglikelihood. Therefore both are maximized by the same
parameter values, and
M
C
RΘ c
λ ; R, θ X ,
with C a constant. In Appendix B the posterior expected
RΘ c
λ ; R, θ X
v
⌠
⌡
c(.)
is derived as:
ln ψi (r θ ) h r θ , xv i
i,r
ln g (θ ) h θ x v d θ
15
(7)
D,
with D a constant, and where
Pi r xv i ψi r (θ )
h r θ , xv i
f i xv i ; θ
,
and assuming locally independent overt item responses as well
f i xv i ; θ g (θ )
h θ xv
i
⌠
⌡
f i xv i ; θ g (θ ) d θ
.
i
Observe that P (xv i r i ) is independent of variable parameters in the Chain
model, and, therefore, may be considered absorbed in D.
An EM algorithm maximizes the expected loglikelihood given by Formula (7)
in an iterative process with two stages (Tanner, 1994; McLachlan & Krishnan,
1997). In each iteration the Q-function
Q λ;λ
v
⌠
ln ψi (r ; λ1 θ ) h(r i ; λ
⌡
i,r
ln g (λ2 ; θ) h (θ ; λ
θ , x vi)
x v) dθ .
is maximized with respect to the unstarred λ -parameters. The starred λ parameters are inherited from the previous iteration. More elaboration is to be
found in Appendix D.
Estimation of Latent Person Parameters
Three methods for the estimation of person parameters will be discussed:
Expected posterior, Maximum Likelihood and Modal posterior.
The Expected posterior (EAP) is the most straightforward. Given a response
pattern x it takes as the estimate θ̂ of θ the expectation of θ over the posterior
distribution h (θ x ) as given by Formula (17) in Appendix B
EAP x ; λ
⌠θ h θ ;λ x d θ .
⌡
16
Its uncertainty of estimation is expressed by the standard deviation of h (θ x v )
Var EAP x ; λ
1
2
⌠θ h θ x d θ
⌡
EAP x ; λ
2
1
2 2
.
Given that a marginal model is used the modal posterior (MAP) estimator is a
reasonable alternative as a maximum likelihood estimator (ML). The derivation
of formulas for their estimation is given in Appendix E.
Testing the Model
For the present model, tests can be constructed that have power against several
types of model violations.
1 If the discrimination parameter is fixed (the OPLM approach is taken) one
needs a test against misspecification of its value.
2 The hypothesis that the marginal distributions are adequately described by a
normal distribution needs to be tested.
3 A general test against model fit per item, and for all items together is also
needed.
Although it will not be pursued any further it is worth mentioning that one may
question the hypothesis that the discrimination for all latent classes within an item
is identical, and the current hypothesis on the subsets of options that define the
latent classes per item. Perhaps model fit can be improved with another set of
latent classes.
The first two tests can take the form of a Lagrange Multiplier test (LM-test). An
introduction to the LM-test within a larger context can be found in Buse (1982).
The idea for the LM-test originates with Rao (1948), there called the ’score test’,
and with Aitchinson and Silvey (1958). An application within the context of IRT
models can be found in Glas and Verhelst (1995). To compute the LM statistic the
status of the constants of concern is changed into a variable parameter of the
likelihood function, which is evaluated at the maximum likelihood estimates of the
original parameters and the new parameters at the values of their original
constants. The LM-test statistic can now be expressed as:
L M ξ1 ,.., ξU
(1) T
(2)
1
(1)
,
17
where the superscripted number between parentheses indicates the order of
differentiation, and the superscripted T indicates transposition. The new
ξ are indexed by u (u
parameters
1 ,..., U ) . LM (ξ) is chi-square distributed
with U degrees of freedom. Because the likelihood is evaluated at its maximum
for the original parameters, the elements of the first derivative that correspond to
the original parameters are all zero. This simplifies the computation as follows.
Denote the complete vector of original and new parameters by (λ , ξ ) , and select
with F (λ) the vector of elements of the first derivatives of M with respect to the
elements of λ . Likewise, for instance, I (λ , ξ) indicates the part of the observed
) with the rows for λ and the columns for ξ . Then
(2)
information matrix I (
LM ξ
F ξ
W
1
F ξ ,
(8)
with
W
I ξ, ξ
I ξ, λ I λ, λ
Note that I (λ , λ)
1
1
I λ, ξ .
is already computed for the evaluation of the standard errors
of the parameter estimates, and that the computation is especially simple if ξ
contains only one element. Consequently, the LM-test L M a against the chosen
discrimination index of an item i is now readily constructed.
For an LM-test against the hypothesized normal marginal distribution conceive
of the Gauss-Hermite weight coefficients as the current parameters. According to
Formula (8) the first and second derivatives of the log marginal likelihood with
respect to the weights w q of the Gauss-Hermite points for q
needed. Let w Q
C
∑
Q 1
1 ,..., Q
1 are
w q . The second derivatives can be obtained by the
method of Louis (Appendix A). Mixed second derivatives of the complete
loglikelihood with respect to a marginal parameter and an item parameter are
always zero, not the posterior covariances of the first derivatives, however. For a
response pattern x we have
∂q ln f (x)
(for q
∂
ln f (x)
∂w q
1 ,..., Q
f x θq
f x θQ
f (x)
1 ), and
18
,
f x θq
∂q q ln f (x)
f x θQ
f x θq
fx
f x θQ
2
.
If one is primarily interested in deviations of the skewness and kurtosis from those
of the normal distribution, the lambda-distribution (Ramberg a.o., 1979) can be
used as a more general alternative. The lambda-distribution family is given as an
inverse function
θ
R P (θ )
λ1
P
λ3
1
P
λ4
/ λ2
0 < P <1 .
The density with respect to θ is given by
p (θ)
d P (θ)
dθ
d R (P )
dP
1
λ 2 λ 3 P (θ)
λ3 1
λ4 1
P (θ)
λ4 1
1
.
The lambda-distribution that closely approximates the standard normal
distribution has parameter values λ1
0 , λ2
0.1975 , λ3
0.1349 , and λ4
3.0 .
The approximation to the N (µ , σ ) distribution is obtained by a simple
2
transformation of the first two parameters. λ1 (µ , σ)
λ1 σ
µ , λ2 (µ , σ)
λ2 / σ .
Considering λ3 and λ4 as variable parameters, and calculating Formula (8), yields
an LM-test against the normal prior.
Read and Cressie, 1988 present a family of general tests against fit, which they
call Power-Divergence (PD) statistics. To apply a PD-statistic, the respondents
have to be partitioned into groups of homogeneous ability, for instance according
to an estimate of θ , e.g., the EAP. Let the set of respondents be partitioned into
G groups of similar size, such that each group contains respondents of similar
ability. Then a PD-statistic for item i is given by:
P Di
2
κ (κ 1)
Ji
G
j 0 g 1
They recommend κ
O
Og i j g i j
Eg i j
κ
1 .
. Og i j is the number of observed respondents in group g
that selected option j of item i, and Eg i j its expected value.
19
Eg i j
P i j s ψi s θg ,
Ng
s
with N g the number of respondents in ability group g, and θg an ability-estimate
for group g. An overall test for model fit is obtained by ∑ i P D i .
P D i is asymptotically chi-square distributed with G J i
G Ji
(R i 1) (GPCM) degrees of freedom. Note that J i
R i (OPLM) or
1 denotes the number
of options, and R i the number of latent class parameters. In practical applications Eg i j
tends to be small (often smaller than 0.25) for easily exposed false options in
groups of high ability. Moreover, it may be instructive to have a better
interpretable contribution to the PD-statistic per option. Therefore, we will use
the modified version by Anscombe (Read & Cressie, 1988, pg 96), abbreviated to
APD:
Ji
G
APD i
j 0 g 1
Og i j
¼
Eg i j
2
1
12
4
E
9 gij
.
They claim that each term of this sum behaves very much like a χ2 with one
degree of freedom, also with low values for Eg i j . Therefore, for each option one
obtains a χ2 -statistic with (almost) G degrees of freedom (neglecting the
parameters estimated with the same data).
In the application of the APD-statistic to the Chain model, the degrees of
freedom can only be approximately indicated. The APD-statistics are derived for
multinomial models of contingency tables. The contingency tables used here are
a very crude representation of the data with which the model parameters are
estimated.
Initial Estimates
The EM estimation procedure iteratively calculates new estimates of all the
parameters, given estimates from the previous iteration. This leaves us with the
problem of providing initial estimates. In general it is advisable that the initial
estimates do not deviate too much from the final estimates. However, the Chain
model poses an extra problem. Not only the parameter vector λ is unknown,
20
among which the η -parameters of the latent classes, but also the latent classes
themselves are not given a priori. They must be inferred from the (weak) order
of the options. Therefore, the question of which latent classes are needed to
account for the observed proportions per option, and the initial estimation of η
and σ2 will be addressed.
First the natural order of the false options must be assessed. This is fairly
simple in the Chain model. Below Formula (5) it is explained that for any θ the
likelihood of the options follows their natural order. Therefore, the natural order
is inferred from the observed marginal probabilities of the options of an item. The
smaller the probability of an option the easier it is exposed as false.
Initially all latent class parameters are set to zero and, with the OPLM
approach the standard deviation is equated to the reciprocal of the geometric
mean of the discrimination indices of the items. Each iteration loops over all items
k and groups the respondents into homogeneous ability groups independently of
their response to item
indexed by g (g
k, based on the current parameters. The groups are
1 ,..., G) . The ability of the members of group g is denoted by θg .
The algorithm that classifies a subject independently of his response to item k is
explained in Appendix C. The standardized abilities θg / σ̂ are considered known
constants, with σ̂ the current estimate of σ . The probability for a respondent
from group g to be in latent class r of item i is expressed by
exp a i r θg
ψg i r
Ri 1
ηi r
exp a i s θg
,
ηi s
s 0
with ηi 0
0 . The expected probability for a person from group g to select option
j from item i is then given by:
πg i j
Ri
Pi s j ψg i s .
s 0
The loglikelihood of the observed counts Ng i j of respondents from group g =
(1,...,G) to select option j of item i is given by:
λ α
Ng i j ln πg i j .
gij
21
This loglikelihood is to be maximized for λ with the Newton Raphson method for
each item separately, and for σ̂ . Formulas for the maximization of (λ α) are
given in Appendix F.
With the new parameters new partitions (for each item independently of its
responses) of respondents into homogeneous ability groups can be calculated, and,
given this partition new parameters estimated. This procedure can be repeated
until some convergence criterion is met.
Because latent classes with only a few expected members causes large
estimation errors and possibly even non-positive observed information matrices
it is checked whether the set of latent classes of an item can be reduced. The
following pragmatic rule is applied. If the expected number of observations in
latent class r of item i falls below 1% of the number of observations on item i,
latent class r is omitted, because the number of expected members of r is deemed
negligible. If for some item a latent class is omitted the initial estimation
procedure is restarted. The above procedure could, in principle, also be executed
with the EM-algorithm as outlined in the previous sections. However, an
EM-estimation process is rather time consuming. Therefore, before to proceed
with the EM-algorithm, an initial estimation procedure should preferably be
inserted.
The Gain of Information on θ with the Chain Nedelsky Model
To evaluate the gain in estimation accuracy for the latent person parameter θ the
information on θ using option scoring with the Chain model must be compared
with the information using binary scoring. The asymptotic standard error of
estimation for θ is given by the square root of the inverse of the information
function. The test information function is simply the sum of the item information
functions over the items in the test. Let
Pj
Pj r ψr (θ) , and r
P j (θ)
ψr (θ) r ,
r (θ)
r
r
then he expected information function with option scoring is given by
J
P j (θ)2
j 0
P j (θ)
I o (θ)
J
j 0
a ∑ r Pj r ψr r
Pj
22
r
2
,
where P j (θ) represents the probability of observing selection of option j at ability
θ , and P j (θ) its derivative with respect to θ . The Fisher information with binary
scoring is given by
I b (θ)
P J (θ)2
P J (θ) ′2
1
P J (θ)
1
PJ 2
PJ
PJ 1
PJ
a ∑ rPJ r ψr r
PJ
r
2
2
.
PJ
Note that the Chain model is not compared with, for instance, the Rasch model,
but with the Chain model with only two observations (right, wrong). Both models
show the same characteristic curve for the correct option.
The expected information that would be obtained about θ if the latent class r
would be observed is given by
ψr 2
I c (θ)
r
ψr
ψr r
a2
r 2.
r
By way of illustration Figure 3a gives the three information functions for the item
in Figures 1 and 2, and Figure 3b the relative efficiency of option scoring relative
to binary scoring and of class scoring relative to option scoring.
23
Figure 3a
Fisher information functions of the item in Figures 1 and 2 for option
scoring in the Chain Nedelsky model, binary scoring, and latent class scoring
Figure 3b
Relative efficiency of option scoring relative to binary scoring, and class
scoring relative to option scoring of the item in Figures 1 and 2
Figure 3b suggests that for large θ the relative efficiency of class/option scoring
approaches 2. Although devoid of any practical significance, it is interesting to
understand this property. Consider an arbitrary item that fits the Full model. Let
k be the most difficult incorrect option, and let αJ
24
{J } be the latent class with
only the correct option and αk
{k , J } , the one but highest latent class that also
contains option k, with respectively ψJ (θ) , and ψk (θ) their probability functions.
For large θ , J r tends to zero, while ψJ does not exceed 1. Therefore, for large θ
the contribution of αJ to the information vanishes. Excluding αJ , it is easily
verified that, although vanishing itself, ψk increasingly dominates the probabilities
for all other subsets r for large θ , because
∂
∂θ
ψ
k > 0
ψ
r
for all r ≠k , J , while
lim r k
θ→∞
1.
Therefore,
lim I o (θ)
θ→∞
P ψ k r
a 2 k k k
pk
2
PJ k ψk k r
pJ
2
a
2
Pk k ψk k r
pk
2
ψ
a 2 ,k
2
where the second term of the second expression becomes insignificant because p k
tends to zero, while p J tends to 1. Consequently,
I c (θ )
ψk
θ → ∞ I o (θ )
ψk
lim
2.
2
From Figure 3b it can also be inferred that especially for the lower abilities an
interesting gain in precision is to be expected when option scoring is applied
compared to simple binary scoring. However, the information gain seems less than
shown by Bock (1972). More than twice the amount of information is obtained
only for standardized θ < 0.80 , that is for just more than 20% of the population.
I think that the main reason for this discrepancy is the implicit property of the
model of Bock that the "Don’t know" latent class is observed, whereas in the
Chain model it is not (except when the easiest option is chosen, but this happens
only infrequently). A much larger gain in accuracy of estimation could be
achieved when the latent response (the subset of options recognized as false)
25
could be observed. It clearly pays to devote attention to models that use subsets
of options as data.
An Application
To show how the Chain Nedelsky model can be applied to real data responses to
34 reading comprehension items from 995 pupils from group 7 (the class before
the last year in elementary school) were analyzed with the Chain model. The kind
of test (CLIB test, see an explanation in Appendix G) is described in Staphorsius,
1994. Of the 995 records 844 were complete and used to estimate the model
parameters. The OPLM and the GPCM approaches were both taken. First the
OPLM approach is discussed. As a first step in the analysis the parameters were
estimated with all discrimination indices set equal to 1. The analysis showed a bad
fit to the model, largely due to improper discrimination indices. To obtain integer
estimates for the discrimination indices, the data set was converted into binary
scores and processed with OPCAT (Verstralen, 1996a and b).
Table 4
General results of successive steps in the analysis of
the Reading Comprehension data
LogML
Tot APD
(DF)
p(χ2>APD)
=1
-31679.7
1645.7
(961)
0.00000
from OPCAT
-31615.6
1531.6
(961)
0.00000
adapted
-31506.0
1306.2
(961)
0.00000
-22463.5
698.7
(651)
0.09542
all items
-31491.8
1290.4
(928)
0.00000
23 items P(APD) >
.01
-22233.8
709.6
(629)
0.01390
23 items OPLM
-22460.6
694.4
(628)
0.03359
Analysis
OPLM
ai
ai
ai
23 items P(APD) >
.01
GPCM
26
The analysis still showed model deviations due to ill chosen discrimination indices.
Therefore, several discrimination indices were changed according to their LMstatistic and the size of the first NR-step. As a last step the eleven items with a
p-value of their APD-statistic less than 0.01 were omitted from the analysis. For
the remaining items 920 records were complete. The GPCM analysis is relatively
simple compared with the OPLM approach. First all items were estimated.
Table 5
Overview of item-statistics of the last OPLM analysis
of the Reading Comprehension data
a
Delta_a
LMa2
p(χ2>)
APD
(DF)
p(χ2>)
1.
1
0.03
1.676
0.1922
19.53
(28)
0.88.13
2.
3
-0.05
0.102
0.7449
29.77
(28)
0.3741
4.
3
0.16
1.100
0.2946
32.22
(28)
0.2655
5.
3
0.21
1.099
0.2948
26.05
(29)
0.6229
6.
4
0.00
0.000
0.9436
28.96
(28)
0.4146
8.
2
0.00
0.000
0.9446
33.07
(28)
0.2328
9.
3
0.06
0.090
0.7581
25.88
(28)
0.5799
10.
2
0.00
0.003
0.9122
22.23
(28)
0.7711
11.
4
-0.02
0.006
0.8973
36.09
(28)
0.1399
12.
5
0.06
0.021
0.8559
33.62
(29)
0.2533
13.
6
0.35
0.176
0.6777
41.17
(28)
0.0517
14.
5
-0.02
0.001
0.9229
26.77
(30)
0.6358
15.
3
-0.03
0.032
0.8362
31.45
(28)
0.2970
17.
3
0.04
0.021
0.8569
23.78
(28)
0.6937
19.
5
-0.06
0.016
0.8672
35.69
(29)
0.1826
20.
2
0.10
0.427
0.5212
41.71
(29)
0.0596
21.
3
0.01
0.002
0.9183
28.98
(30)
0.5188
24.
2
0.05
0.095
0.7529
38.25
(28)
0.0935
26.
5
-0.37
0.642
0.4285
21.09
(28)
0.8218
29.
3
0.06
0.041
0.8206
36.25
(28)
0.1361
30.
4
0.08
0.093
0.7550
31.18
(28)
0.3089
31.
3
-0.07
0.129
0.7185
23.42
(28)
0.7119
32.
3
-0.12
0.322
0.5778
31.50
(28)
0.2953
Item
2
See the discussion at the end of Appendix D
27
Table 6
Begin of Table of parameter estimates from the last analyses with the OPLM and
GPCM approaches of the Reading Comprehension data. Alpha: Indicates vector
of the subset of options from which a subject in the, latent class pictes his
choice corresponding. E#Obs: Expected number of observations in the latent class
OPLM Approach
ItNr
Lat
Label
Alpha
1
APD
Eta
3
(df)
p(χ2>APD)
E#Obs
SE-Eta
1
19.527
1.
11011
1.275
0.806
54.7
2.
01011
0.963
0.504
71.7
3.
01001
0.919
0.402
74.5
00001
-0.971
0.168
3
29.772
4.
2
Item_1
a
Item_2
(28)
0.88130
507.6
(28)
0.37409
1.
11011
-0.255
0.184
63.9
2.
10011
-0.035
0.326
20.6
3.
00011
-0.595
0.130
86.8
00001
-1.285
0.107
3
32.217
4.
4
Item_4
689.5
(28)
0.26549
1.
11101
0.339
0.346
31.8
2.
11001
-0.145
0.113
96.7
3.
10001
0.276
0.260
23.9
4.
00001
-0.774
0.075
615.8
a
SE-a3
APD
Eta
SE-Eta3
E#Obs
Sigma 0.198 (SEE 0.0065).
GPCM approach
ItNr
Lat
Label
Alpha
0.27
0.039
19.253
1.
1
11011
4.019
2.846
59.3
2.
01011
2.742
1.852
77.9
3.
01001
2.861
1.620
74.5
00001
-4.004
0.727
508.3
0.52
0.076
28.646
-1.157
1.104
62.4
4.
2
Item_1
Item_2
1.
11011
2.
10011
0.491
2.426
17.9
3.
00011
-2.979
0.746
90.1
00001
-6.858
0.726
692.2
0.46
0.057
23.744
4.
4
Item_4
1.
11101
3.204
3.160
25.2
2.
11001
-0.166
0.825
93.6
3.
10001
3.147
2.543
18.8
4.
00001
-4.353
0.510
626.6
3
(df)
p(χ2>APD)
(27)
0.86087
(27)
0.37818
(27)
0.64486
See the discussion at the end of Appendix D
28
Figure 4a
Fisher information curves from the last analysis of the Reading Comprehension
data
Figure 4b
Relative efficiency of option scoring relative to binary scoring, and class
scoring relative to option scoring of the remaining items in the last
29
analysis of the Reading Comprehension data
The second step is similar to the last step in the OPLM analysis were the 11 items
with a p-value of their APD-statistic smaller than 0.01 were omitted. These were
not the same 11 items as were omitted in the OPLM analysis. General results of
the analyses are given in Table 4. The results of both approaches are quite similar.
The last line of Table 4 gives results of the GPCM analysis of the 23 items that
remained in the last OPLM analysis. It appears that continuous discrimination
parameters yield a small increase of the loglikelihood. The degrees of freedom
and the p-value of the APD test give a more truthful picture than those produced
by the OPLM-approach, however.
Table 5 gives a summary of the item statistics, and Table 6 gives some item
parameters and their standard errors.
Figure 4a shows the Fisher information of binary scoring, option scoring and
class scoring. Figure 4b shows the relative efficiencies of option scoring to binary
scoring and of class scoring to option scoring. The information gain is less than
obtained in the theoretical example displayed in Figures 3a and 3b. Option scoring
gives more than twice as much information as binary scoring only for standardized
θ < 3.7 , in the realm of rare lower abilities ( p ≈ 0.0001). More than 1.2 times
improvement is obtained from standardized θ < 1.52 (
0.064), whereas in the
example it is obtained from θ < 0.68 ( p ≈ 0.75). The relatively low information
gain must largely be attributed to the easiness of the test with low frequencies of
incorrect options.
Table 7
An example of misfit
ItNr.
3
Label
a
Delta_a
LMa
p(χ2>)
Item_3
2
0.22
4.048
0.04174
Expected
Grp Observed
Tot.
1.
6
21
19
13
36
4.3
12.6
12.6
23.3
42.0
95
2.
2
10
20
14
54
2.9
10.6
10.6
23.4
52.5
100
3.
3
5
14
22
64
2.3
9.4
9.4
23.9
63.1
108
4.
1
9
11
39
114
2.6
12.4
12.4
35.8
110.8
174
5.
0
3
2
29
65
1.1
5.7
5.7
18.8
67.7
99
6.
1
2
2
15
77
0.7
4.4
4.4
16.7
70.7
97
7.
0
6
4
18
75
0.5
3.5
3.5
15.6
79.8
103
8.
0
1
1
14
81
0.2
2.1
2.1
11.7
80.9
97
30
Tot
APD
13
(29)
57
73
164
566
0.00686
14.7
60.7
60.7
169.3
3.8
12.7
16.8
15.8
567.6
873
2.0 51.131
Inspection of Table 6 shows that the standard errors of estimation of the item
parameters are relatively large (see also the discussion at the end of Appendix D).
Surely an important cause is that the test is relatively easy, so that the number of
responses in false options is relatively low. Nevertheless, it seems reasonable to
conclude that an estimation of acceptable precision demands more observations.
The deviances from the model highlight interesting aspects of the psychometric
properties of individual items and options. Table 7, and Figures 5 and 6 show why
the first item that was omitted from the last analysis does not fit the model. The
lower line after APD in the option columns of the ’Expected’ matrix shows the
extent to which the option deviates from the model. Directly behind APD
between () one finds the degrees of freedom of the value of APD in the last
column labeled ’Tot’. The first matrix displays for each option the observed
number of responses in eight homogeneous ability groups. The second matrix
gives the expected number of observations given the model.
Figure 5
An example of a badly fitting item. Option characteristic curves of item 3
(Reading Comprehension data) and observed proportions. Options 1 and 2 share
31
the same expected curve. ’Obsj’ refers to observed probabilities of option j in
eight homogeneous ability groups. The solid curves represent the expected option
probabilities, the lowest curve for option 0, etc.
As Figure 5 shows, the misfit is not caused by the correct option. The observed
proportions for the correct option closely follow the curve of expected
proportions.
However, Figure 6 shows that Options 1 and 2 of item 3 display more steepness
than the overall picture in item 3. Both are more attractive for lower ability
groups and (somewhat) less for higher ability groups than given by the model.
Option 3 of item 3
shows rather erratic behavior. It is almost equally attractive for lower and higher
abilities, but much more attractive for intermediate abilities. To compare this item
with a well fitting item, Figures 7 and 8 show the response characteristic curves
and observed proportions of item 17.
Figure 6
Only incorrect options of item 3 (Reading Comprehension data).
Options 1 and 2 share the same expected curve. The observations of
option 3 show a clear bump in the middle region.
32
Figure 7
An example of a well fitting item. Option characteristic curves of item 17
(Reading Comprehension data) and observed proportions
33
Figure 8
Only incorrect options of item 17 (Reading Comprehension data)
Conclusions and Discussion
The Chain Nedelsky model implements a rather severe theory on the solution of
multiple choice questions. Consequently many items will deviate from the model
in a statistically significant sense, especially with larger data sets of several
thousands of respondents. However, not only the items that conform to the model
are interesting, it is also interesting how items deviate from the model. Options
that show a deviance are for some reason, perhaps to be highlighted by the item
constructor, more or less attractive for certain ability groups. The item shown in
Figure 6 is an interesting example in this respect (a translation is given in
Appendix G). Options 0, 1 and 2 are all three to be characterized as afterthoughts one may have in reading such a text. Option 3 (not everyone is willing
to do this (visit the doctor)) really refers to the first part of the text, but does not
provide a smooth connection between the two text halves. It would have been a
correct choice if it had to be inserted a few lines higher. This is the most attractive
distractor and has more attraction for respondents with average ability than
indicated by the model. One may tentatively conclude that a more complex
process is at work here than simple elimination of alternatives.
As experience with OPLM shows the estimation of ability is, in general, only
marginally affected by the presence of deviating items, unless they are not
specially selected. Therefore, it is no good practice to omit items from a test
battery, just because they significantly deviate from the model. This means that
most items can be retained when abilities are estimated with the Chain model.
Ramsay (1997, pg. 383) expresses this view by saying: "A simple wrong model can
be more useful for many statistical purposes than a complex correct model."
The application of the Chain model may add to the accuracy of ability
estimation as compared with binary scoring. However, the gain in accuracy
depends on the difficulty level of the items. With easy items there will be few
observations of false options. In that case false options can hardly add to the
accuracy of ability estimation. The added benefit will be more conspicuous with
more difficult items and many selections of false options. However, Figures 3, and
34
4 clearly show that a large gain in efficiency is to be expected from direct
observation of the latent classes themselves instead of indirectly via the options.
35
36
References
Abrahamowicz, M., & Ramsay, J.O. (1992). Multicategorical spline model for item
response theory. Psychometrika, 57, 5-27.
Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters
subject to restraints. Annals of Mathematical Statistics, 29, 813-828.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an
examinee’s ability. In: F.M. Lord, & M.R. Novick (Eds.). Statistical theories of
mental test scores (pp. 395-479). Reading, MA: Addison-Wesley.
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate
analysis: theory and practice. Cambridge, Mass: MIT Press.
Bock, R.D. (1972). Estimating item parameters and latent ability when responses are
scored in one or more categories. Psychometrika, 37, 29-51.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item
parameters: an application of an EM-algorithm. Psychometrika, 46, 443-459.
Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: an
expository note. The American Statistician, 36, 153-157.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood estimation
from incomplete data via the EM Algorithm (with discussion). Journal of the
Royal Statistical Society, Series B, 39, 1-38.
Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In: G.H. Fischer,
& I.W. Molenaar (Eds.). Rasch models: Their foundations, recent developments,
and applications. New York: Springer.
Haberman, S.J. (1979). Analysis of qualitative data. Vol. 2. New developments. New
York: Academic Press.
Kelderman, H. (1988). An IRT model for item responses that are subject to omission
and/or intrusion errors. (Research Report 88-16) Enschede: University of Twente,
Faculty of Educational Science and Technology.
Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator
in the presence of infinitely many incidental parameters. Annals of Mathematical
Statistics, 27, 887-903.
Louis, Th. A. (1982). Finding the information matrix when using the EM algorithm.
J.R. Statist. Soc. B, 44, 2, 226-233.
Love, Th.E. (1997). Distractor selection ratios. Psychometrika,62,51-62.
Luce, R.D. (1959). Individual choice behavior. New York: Wiley.
37
McLachlan, G.J., & Krishnan, Th. (1997). The EM algorithm and extensions. New
York: Wiley.
Muraki, E. (1992). A generalized partial credit model: application of an EM
algorithm. Applied Psychological Measurement, 16, 159-176.
Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and
Psychological Measurement, 14, 3-19.
Press, W.H., Flannery, B.P., Teukolsky, S.A., & Vetterling, W.T. (1992). Numerical
recipies in Pascal. Cambridge: Cambridge University Press.
Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item
characteristic curve estimation. Psychometrika, 56, 611-630.
Rao, C.R. (1948). Large sample tests of statistical hypotheses concerning several
parameters with applications to problems of estimation. Proceedings of the
Cambridge Philosophical Society, 44, 50-57.
Read, T.R.C., & Cressie, N.A.C. (1988). Goodness-of-fit statistics for discrete
multivariate data. New York: Springer.
Samejima, (1979). A new family of models for the multiple choice item. (Research
Report No. 79-4), Knoxville: University of Tenessee, Department of Psychology.
Staphorsius, G. (1994). Leesbaarheid en leesvaardigheid (Readability and reading
ability). Dissertation, University of Twente. Arnhem: Cito.
Tanner, M.A. (1994). Tools for statistical inference. New York: Springer.
Thissen, D. (1976). Information in wrong responses to the Raven progressive
matrices. Journal of Educational Measurement, 13, 201-214.
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items.
Psychometrika, 49, 501-519.
Thissen, D., & Steinberg, L. (1997). A response model for multiple-choice items. In:
W.J. van der Linden, & R.K. Hambleton (Eds.). Handbook of modern item
response theory. New York: Springer.
Verhelst, N.D. (1993). On the standard errors of parameter estimators in the Rasch
model. (Measurement and Research Department Reports 93-1). Arnhem: Cito.
Verhelst, N.D., & Glas, C.A.W. (1993). A dynamic generalization of the Rasch
model. Psychometrika, 49, 501-519.
Verhelst, N.D., Glas, C.A.W., & Verstralen, H.H.F.M. (1995). OPLM: Oneparameter logistic model. Computer program and manual. Arnhem: Cito.
38
Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In: G.H.
Fischer, & I.W. Molenaar (Eds.). Rasch models: Their foundations, recent
developments, and applications. New York: Springer.
Verstralen, H. (1996a). Estimating integer category weights in two IRT models for
polytomous items. (Measurement and Research Department Reports 96-1).
Arnhem: Cito.
Verstralen, H. (1996b). Evaluating ability in a korfball game. (Measurement and
Research Department Reports 96-2). Arnhem: Cito.
Westers, P. (1993). The solution-error response-error model: a method for the
examination of test item bias. Thesis, Enschede: University of Twente, Department
of Education.
Westers, P., & Kelderman, H. (1993). Generalizations of the solution-error responseerror model. Research Report 93-1. Enschede: University of Twente, Department
of Education.
39
40
Appendix A
Formula 3.2’ in Louis (1982) is not really sensible for a section on independent
cases, because it contains a summation over all pairs of cases. McLachlan and
Krishnan (1997, pg 113, Formula 4.9), following Louis, use a notation, which is
also unclear from a computational perspective. This appendix is aimed to clarify
the issue. The same result is proven in Verhelst and Glas (1993), however the
proof below is simpler and more perspicuous.
Denote the observed data with X and the missing data with θ , possibly vector
valued. The (incomplete) loglikelihood (λ ; X ) can, by virtue of experimental
independence of the rows (the cases) v (v
V
(λ ; X)
v
v
1 ,..., V ) of X, be written as:
λ ; xv .
Therefore, any partial derivative of any order of (.) can be written as the sum
over v. In particular element (i , j ) of the matrix H of second derivatives of (.)
can be written as:
Hi j
∂2
V
Ii j ( λ ; X )
v
λ xv
V
∂λi ∂λj
v
v
Ivi j λ ; x v .
According to Louis (1982, § 3.1 Formula 3.2):
Iv i j λ ; x v
2
∂
λ ; θ, xv
∂λi ∂λj
cv
∂
Cov
cv
λ ; θ, x v
∂λi
,
∂
cv
λ ; θ, xv
,
∂λj
(9)
where expectation and covariance are taken with respect to the posterior
distribution of the missing data given the observed data. The loglikelihoods in the
right part of (9) are the complete loglikelihoods of observing X v and θv . The
missing data θ is not indexed in (9) to emphasize that it is the random variable
that makes the loglikelihood the random variable of which expectation and
covariance are, as with (21), to be taken over the posterior distribution h (θ | x v ) .
The covariance represents the expected additional information on λ obtained by
41
observing θ in addition to already having observed x v . Summation over v of (9)
results in in Formula (21).
From the mentioned Formula in McLachlan and Krishnan one could easily
conclude that the posterior expected first order partial derivative of the
loglikelihood given the observations of a subject v does vanish, and, therefore, the
pairwise product can be omitted in the covariance per subject. However, although
the sum over subjects of a first order derivative vanishes at the maximum
likelihood estimate of the parameters, the sum over subjects of a product of two
first order derivatives does not vanish in general.
42
Appendix B
Derivation of the expected loglikelihood over the posterior distribution of t and
θ given the response vector x .
First some notation
i ∈ {1 ,..., I}
item index
r i ∈ {0 ,..., R i}
r
latent class index item i
r1 ,..., r I
r(k)
vector of latent class indices
r1 ,..., rk
1 , rk 1 ,..., r I
r
without component k
xv
vector of responses of person v.
The complete data of subject v are r v , θv and x v , and the complete loglikelihood
is given by:
v
v
λ ; r v , θv , x v
v
ln P x v r v
ln ψ r v θv
ln g θv .
Because the observations are independent across subjects, we first concentrate on
the likelihood of the record of one subject and omit the subscript v.
The EM algorithm maximizes the expected loglikelihood over the posterior
distribution of the missing observations (r , θ ) given the observations x:
h
λ ; r, θ x
⌠
⌡
⌠
⌡
ln P x r
ln ψ r θ
ln g (θ) h r, θ x d θ
r
ln P x r
ln ψ r θ h r θ, x
r
ln g (θ) h θ x d θ ,
(10)
because
h r,θ x
h r θ, x h θ x .
(11)
Expression (10) does not show a procedure for computation that can be expected
to halt within reasonable time for the numbers of items usually encountered in
practice. To derive an equivalent expression for Formula (10) that does show an
43
efficient procedure for computation, first the conditional loglikelihood given θ
(the central part of (10))
h θ
λ ; r,x θ
ln ψ r θ
ln P x r
h r θ,x
(12)
r
is investigated. Assuming conditional independence between item responses (overt
and latent) the posterior distribution h (.) in Formula (12) can be written as:
Px r ψr θ
h r θ,x
hx θ
P xi ri ψ ri θ
(13)
h xi θ
i
h ri θ , xi .
i
The first transition in (13) holds because x i only depends on r i , and r i only on
θ . The first part of the expected conditional loglikelihood in Formula (12) can be
written as:
ln P x r h r θ , x
r
h r i θ,x i
ln P x i r i
r
i
i
i
r
(14)
h ri θ , xi
ln P x i r i
i
h ri θ , xi
ln P x i r i
i≠k r
i
h ri θ , xi ,
ln P x k r k
r
i
where k is an arbitrary element from {1,...,I}. The second part of the last line in
(14) can be written as:
h ri θ , xi
ln P x k r k
(k)
r
rk
h ri θ , xi
r(k) i ≠ k
i
ln P x k r k h r k θ , x k
rk
44
(15)
ln P x k r k h r k θ , x k .
rk
Because the sum of products of h(.) expresses the (posterior) probability that an
arbitrary partial vector r (k) from its complete domain occurs, this product equals
one. Because (15) holds for each k ∈ {1 ,..., I} Formula (14) is simplified to:
I
i
ln P x i r i h r i θ , x i .
(16)
ri
The transition from (14) to (16) makes computation feasible. Whereas (14)
Ri
contains
1 summands, (16) only contains ∑ i R i
1 summands. For
i
instance, for a test with 40 items with four options each, the number of summands
reduces from 4040 ≈ 1.2 × 1024 to 4 × 40
1.6 × 102 , a reduction of order 1022 .
With the second part of the expected loglikelihood in Formula (12) (the part with
ln ψ ) the same exercise can be executed. Therefore, Formula (12) can be written
as:
h θ
λ ; r,x θ
I
ln ψ r i θ
ln P x i r i
i
h ri θ , xi .
ri
Combination according to Formula (11) of
h x θ g (θ)
hθ x
f (x)
,
(17)
with the first line of Formula (13) gives
h r,θ x
P x r ψ r θ g (θ)
f (x)
.
(18)
The expected loglikelihood over the posterior distribution of the missing
observations r and θ given the vector of option choices x (Formula (10)) can
now be written as
RΘ
λ ; r,θ x
⌠ ∑ ∑ ln P x r
i
i
⌡ i ri
ln ψ r i θ
ln g (θ) h (θ x) d θ .
45
h ri θ , xi
(19)
The expected loglikelihood for all observations on all subjects is simply the sum
over subjects of Formula (19).
46
47
Appendix C
Classification into homogeneous ability groups
independent on the response to item k
Let g (θ )
N (0 , 1) be the prior ability density. Define H
1 edges ξh (h
selecting H
1 ,..., H
2 ability groups by
1) on the latent continuum. In the algorithm
to obtain initial estimates the C-scale boundaries for the standard normal are used
ξh
2.25 (0.50) 2.25 . For the calculation of the APD statistic homogeneous
groups are formed with about equal numbers of observations. Persons are
classified for each item i separately into one of these ability groups on the basis
of their standardized EAP from their response pattern omitting their response x i
on item i. In the initial estimation routine EAPs outside the range [-2.25,2.25] are
neglected, and the midpoint θh
(ξh
ξh 1) / 2 of the interval [ξh , ξh
1]
is taken
as the common ability estimate for the members of class h.
For each item i and each ability group h one obtains the number Nh i j of
respondents in group h that responded in category j of item i as follows. The a
posteriori density of θ given a response vector x is given by
I
g θ x
⌠
⌡
f x i θ g (θ)
i 1
I
.
f x i θ g (θ) d θ
i 1
The integral in the denominator is approximated with the Gauss-Hermite sum
with Q points
∞
⌠
⌡
∞ i
f x i θ g (θ) d θ
Q
f x i θq ,
wq
q 1
(20)
i
where w q are the Gauss-Hermite weights associated with the Q Gauss-Hermite
values θq .
First calculate the Q summands
yq
f x i , θq
wq
i
48
of (20), and denote the response vector x omitting the response from item i with
x (i) . Then the a posteriori distribution of θ at θq given the response pattern x (i)
is given by
g θq x(i) ∝ zi q
y q / f x i θq , and the EAP i given x (i) is found with
θq zi q
EAP x
(i)
q
zi q
.
q
Suppose xv i
j , and ξh ≤EAPv i <ξh
1
then v contributes 1 to Nh i j . It is
conceivable that for i ≠i EAPv i is contained in another interval h ≠ h than
EAPv i . In that case v contributes 1 to Nh
49
ij
, and not to Nh i j .
Appendix D
Formulas for the execution of the EM-algorithm,
to calculate the standard errors of estimation,
and the LM-test for a i
In the first stage of each iteration, the E-step, h (θ ; λ
x v ) is evaluated for each
response vector at the Gauss-Hermite quadrature points given the starred
parameter estimates. Moreover, the following statistics are accumulated:
Let the J i dimensional vector ρ i have elements
⌠h r ; λ
⌡ i
ρi r
v
θ , xv i h θ ; λ
xv dθ,
and the Q dimensional vector τ
τq
h θq ; λ
xv .
v
If not OPLM but the GPCM approach is used and a is estimated, also
θq h r ; λ
ir
θq h θq ; λ
v,q
xv ,
is accumulated, and the following expressions are needed
bi
ln a i ,
sti r q
sti r θq
spi q
sp i θq
r θq
ηir ,
ψi r sti r q ,
r
shv i q
shv i θq
sti r q h r ; λ
θq , xv i .
r
2
Moreover, a normal prior distribution p b (µb , σb) on b is assumed:
2
ln p b
b µb
2
σb
D,
50
with D a constant, µb , the mean of b in the previous iteration, and σ2 provided
by the user.
In the maximization step, the M-step, the first derivatives are evaluated
∂Q λ,λ
ir
≈ a i ρi r
∂ ηi r
ib
∂Q λ,λ
≈ a i
∂ bi
τq ψi r (θq) ,
q
r
η i r ρi r
ir
r
q
spi q τq
b µb
2
σb
,
and with the Newton-Raphson procedure a value for λ 1 is found for which the
first derivatives vanish.
The second derivatives are
∂2 Q λ , λ
∂ ηi r ∂ ηi s
∂Q λ,λ
2
≈ ai
∂ b i ∂ ηi r
∂2 Q λ , λ
∂ bi ∂ bi
≈
ψi s θq ;η i
ψi r θq sti r q
sp q τq
q
2
≈ ai
ψi r θq ; η i
δr s τq ,
ib
,
q
2
ai
q
2
spi q τq
2
ψi r θq sti r q
r
1
2
σb
ib
.
Mixed second derivatives of parameters from different items are all zero.
In the OPLM approach, or with more than one population, the parameters
(µ , σ2 ) are to be estimated as well. They can be estimated by the EM-method for
regular exponential families, by equating the expected sufficient statistics with
their posterior expectations (Tanner, 1994). The sufficient statistics m and s 2 for
respectively µ and σ2 are
m
1
V
θv and s 2
v
1
V
2
θv
v
m 2.
The expectations of m and s 2 as a function of the model parameters are
m ; λ2
⌠θ g θ ; λ d θ
2
⌡
s 2 ; λ2
⌠θ2 g θ ; λ d θ
2
⌡
µ
µ2
σ2 .
51
The posterior expectations of m and s 2 are
m;λ
X
s2;λ
X
1
V
⌠θ h θ ; λ
⌡
v
1
V
v
⌠θ2 h θ ; λ
⌡
xv dθ
xv dθ
µ
2
.
In the M-step the new estimate of σ 2 is just equal to
(s 2 ; λ
X) . To identify
the model µ can be set at zero, in the GPCM approach also σ has to be fixed,
2
e.g., to 1.
To calculate the standard errors of estimation the matrix of second derivatives
of M (.) is needed. In the present case section 3.2 (the independence case,
Formula 3.2’) from Louis (1982) is applicable (see also Appendix A). Indicating
derivatives by superscripted parameters the information matrix, and matrix H of
second derivatives of M are given by
M
λi λj
Ii j
Hi j
∂2 Q λ , λ
∂ λi ∂ λj
v
where
c v (λ)
∂ (λ) ∂ (λ)
cv
Cov c v
,
x v ,
∂ λj
∂ λi
(21)
denotes the complete loglikelihood given in Formula (6) for
respondent v. The expectation and covariance are to be taken with respect to the
posterior density
h (r , θ ; λ | x v ) given by Formula (13, Appendix B). All
derivatives are given above, just the posterior covariances of the first derivatives
are needed.
The posterior covariance of the first derivatives with respect to parameters
associated with different items, or with an item and the prior distribution of θ , is
simply derived from the first derivatives. Therefore, given that
ηr
cv x
v
a i ⌠ ψr (θ)
⌡
h r θ , xv i
h θ xv dθ,
and
bi
cv x
v
a i ⌠ shv i (θ)
⌡
sp i (θ) h θ x v d θ ,
52
the posterior covariance of the first derivatives with respect to for instance b i and
b i ηi′r
η
⌠
aby:
i r is simplyx given
Cov
sp i (θ) ψi r (θ) h r θ , xv i h θ x v d θ
i a i ⌡ shi v (θ)
v
cv cv
(
bi
cv
x v)
(
ηi′r
cv
x v) .
However, when two parameters are associated with the same item, also the
posterior distribution of r is involved in the covariance.
ηr ηs
2
Cov c v , c v x v a i
⌠ ψ ψ
h r xv i ψs h s xv i ψr
⌡ r s
ηr
cv x
v
Cov
b
b
cv, cv
xv
2
ai ⌠
⌡
b ηr
Cov c v , c v x v a i
⌠ ψ s h (θ) s p (θ)
i
⌡ r vi
δr s h r xv i
ηs
cv x .
v
s p i (θ) h r xv i h θ x v d θ
2
s ti r (θ)
r
h r xv i s ti r (θ)
b
cv
xv
g θ xv dθ
ηr
cv
s p i (θ)
b
cv
2
x v,
θ xv dθ
x v.
To obtain the standard errors of estimation for µ and σ , and the covariances with
item parameters, the following formulas are used.
Let
i
⌠ θ µ h θ x d θ
v
⌡ σ
i
mv
be the i-th posterior moment of the standardized theta given response vector x v
of subject v. Then
µ
cv
xv
1 ⌠θ µ
h θ xv dθ
σ⌡ σ
1 1
m ,
σ v
53
σ
cv
xv
µµ
cv
xv
µσ
cv
xv
σσ
cv
xv
1 ⌠ θ µ 2
h θ xv dθ
σ ⌡ σ
1
σ
2
2
σ2
1
σ2
Cov
µ
µ
cv, cv
xv
Cov
µ
σ
cv, cv
xv
Cov
σ σ
cv, cv
xv
1
σ
2
1
mv
σ
1,
σ
cv
xv ,
,
1
mv ,
2
3 mv
1 ,
1
σ
2
mv
2
1
σ2
1
σ2
3
mv
4
mv
µ
cv
xv
1
2
,
µ
cv
mv
2
2 mv
1
xv
σ
cv
2
xv .
Note that after the inversion of the information matrix, the variance of a i is
2
obtained from the variance of b i by multiplication with a i (see the Delta-method
in Bishop, a.o., 1975).
For the OPLM approach we have, under the assumption that the first order
partial derivatives with respect to η are zero, that the LM-statistic for the
discrimination index a i is given by
L Ma
aT
W
1 a
, W
I(a,a)
I(a,λ) I(λ,λ)
1
I(λ,a) .
Some simplifications are introduced into the calculations of LM-statistics and
the Standard Errors of Estimation as reported in Tables 5 and Table 6 to keep the
advantages of the EM-algorithm where the calculations are largely restricted to
individual items. The partial second derivatives of the complete loglikelihood with
respect to parameters from different items are zero, and, generally, the posterior
covariances of first derivatives with respect to parameters associated with different
items are small relative to the information of within item parameter pairs. At least
with the normalization chosen here (µ
0) . With respect to SEE’s this
normalization is comparable to setting the mean difficulty parameter in the Rasch
54
model to zero. Under this last normalization Verhelst (1993) shows that the
’diagonal’ SEE, as calculated here, is an overestimation of the true SEE by a
factor of approximately k / (k
1) , with k the number of items in a complete
design. With incomplete designs the matter is more complicated. Population
parameters are treated likewise. Consequently, the influence of the covariances
between item parameters of different items and between item and population
parameters on the values of LM-statistics and estimation errors is neglected.
Taking them into account, however, slows down the computations quadratically
with the number of items (per booklet). Moreover, already with a moderate
amount of items (more than 45 items with 5 options), the order of the information
matrix becomes too large to reliably be inverted. Therefore, this part of the
information matrix is neglected and the calculations are restricted to each item
separately.
55
Appendix E
Formulas to obtain MAP and ML estimates of θ by Newton’s
method
For reasons of exposition first the maximum likelihood (ML) method of
estimation is treated. The ML estimator is the value of θ that maximizes the
probability f (x | θ ) of response pattern x , disregarding the prior marginal density
g (θ) . Newton’s iterative method is used to calculate the estimates. Let:
Ri 1
ri
r i (θ)
r ψi r (θ) , (the expected latent score on item i with ability θ )
r 1
Ri
a i
r
dri
ri
dθ
Ri 1
wi
w i (θ)
1
r 2 ψi r
1
r Pi r x ψi r (for Pi r ( j) see Formula (2), and
i
r 1
Ri 1
vi
v i (θ)
r 2 Pi r x ψi r , then
i
r 1
d fi xi
fi
dθ
ai wi
r i fi
I
d ln f x θ
lf
2
r i ,
dθ
i
fi
fi
, and
(22)
d 2 ln f x θ
lf
d θ2
k
i
2
ai vi
wi r i
ai fi r i
fi r i
f
i
f
i
2
.
Given an estimate θ (i) of θ a new ML-estimate is given by
θ(i
1)
θ (i)
lf
.
lf
56
(23)
The EAP estimate can be used as an initial estimate θ (0) .
The modal posterior (MAP) estimator is obtained in the same way as the ML
estimator except for a slight change in the Formulas (22) and (23), because the
probability f (x | θ ) of the response pattern is multiplied by the density g (θ ) to
obtain the posterior likelihood. From the first derivative (Formula (22))
(θ
µ) / σ 2 is subtracted, and from the second derivative (Formula (23)) 1 / σ 2 is
subtracted.
57
Appendix F
Initial Parameter Estimates
Denote the response vector without the response to item i with x (i) . Let the
respondents be grouped into homogeneous ability groups on the basis of x (i) . Let
the probability of v to obtain latent score r, given that v is a member of group g
be given by
exp a i r θg
ψg i r
ηi r
exp a i s θg
1
.
ηi s
s
The probability for a member of group g to select option j of item i is given by
Ri
πg i j
Pi s j ψg i s .
s 0
Then the loglikelihood of the observed counts Ng i j of respondents from group g
who select option j of item i is given by
η α
N g i j πg i j .
gij
Denote the derivatives with respect to η i r , and lnσ respectively by ∂i r and ∂σ .
The derivative of ψ with respect to η i r is
∂i r ψg i s
a i ψg i r ψg i s
δr s
so that
∂i r πg i j
Ri
a i ψg i r
s
1
Pi s j ψg i s
Pi r j
0
a i ψg i r πg i j
Pi r j ,
and the second order partial derivatives
∂i r s πg i j
a i ψg i r πg i j
Pi r j
2
δr s πg i j
a i ψg i s ψg i r
ψg i r πg i j
Pi r j
58
ψg i r πg i j
Pi s j
.
To obtain the derivatives of
Pi j s ψg i s s , p s 2 g
psg
s
vsg
∂σ πg i j
2
∂σ πg i j
with respect to lnσ let
Pi j s ψg i s s 2 , s g
s
s2g
2
s g , and tg i j
psg
ψg i s s , s 2 g
s
s
s g πg i j , then
a i θg tg i j , and
a i θg 2 p s 2 g
sg psg
s2
g
59
2
s g πg i j
ψg i s s 2 ,
s g tg i j .
Appendix G
An example of a CLIB item, that explains itself,
and a translation of item 3 of the Reading comprehension test
(a)
There are many different types of stories. There are funny, beautiful, sweet and
also boring stories. There are scary or creepy ones. Long and short ones. In a little
while you will be asked to read a number of stories or texts. They are neither very
long nor very short. Some of them are quite enjoyable, but there are others you
will probably find boring.
You
1
The texts in this book look different than the stories you usually find in books.
Some parts of the story have been left out. There is a line with a number above
it at the place where something is missing. You have to ’figure out’ what the
missing part is. Next to the texts are numbers which match those above the empty
spaces. Under these numbers are groups of words. One of the choices fits in the
empty space. You can only choose one answer. If you understood the text, you
will know the correct answer. You will have to understand not only the text that
comes before the empty space but also the text that follows it. If you only read
what comes before the line, you might choose the wrong answer. Even after you
have read the entire text, you must choose your answer carefully. Don’t pick an
answer too quickly. Think about it. There may be two or even three answers that
appear to be correct. Remember, only one is really correct. Choose the answer
you think best fits the text.
1
A can learn something from them.
B can also write them yourself.
C shouldn’t complain.
D should sit still and concentrate on your work.
E must read them completely and read them well.
60
Item 3
.....(some preceding text)
Leprosy doesn’t cure just by itself. If not combated it worsens. It starts with a rash
Later on the whole body is covered with ulcers. People with leprosy can be cured.
However, as soon as they discover the rash they have to visit a doctor. Often they
are afraid to do so. Relatives and neighbors might find out that they have
contracted leprosy. They fear that they would therefore shun them .... That’s why
there are still people with leprosy. But
1
Where do you find leprosy? Almost exclusively in poor countries. In some of
those countries there are too few doctors. Consequently people cannot always be
helped in time. Yet there is another reason. Many people in poor countries are
convinced that leprosy cannot be cured. That a doctor is of no use to them. It is
not just out of fear that they do not visit a doctor if they have contracted leprosy.
Also because they think it senseless.
1
A that is not the only problem.
B clean water is also in short supply.
C there also are lonesome people.
D not everyone is prepared to do this.
E schools are expensive.
61
© Copyright 2026 Paperzz