Constrained and Unconstrained Multivariate Normal Finite Mixture

Multivariate Behavioral Research, 39(1), 69-98
Copyright © 2004, Lawrence Erlbaum Associates, Inc.
Constrained and Unconstrained Multivariate Normal Finite
Mixture Modeling of Piagetian Data
Conor V. Dolan, Brenda R. J. Jansen, and Han L. J. van der Maas
University of Amsterdam
We present the results of multivariate normal mixture modeling of Piagetian data. The
sample consists of 101 children, who carried out a (pseudo-)conservation computer task on
four occasions. We fitted both cross-sectional mixture models, and longitudinal models based
on a Markovian transition model. Piagetian theory of cognitive development provides a
strong basis for the number and interpretation of the components in the mixtures. Most
studies of Piagetian development have been based on mixture modeling of discrete responses.
The present results show that normal mixture modeling is a useful approach, when responses
are continuous and approximately normal within the components. Multivariate normal
mixture modeling has the advantage that the covariance structure within the components may
be modeled. Generally the results are consistent with the presence of distinct modes of
responding. This provides support for the hypothesis of stage-wise development.
Introduction
Multivariate normal finite mixtures have been the subject of considerable
recent interest (Arminger & Stein, 1997; Dolan & van der Maas, 1998;
Jedidi, Jagpal, & DeSarbo, 1997; Muthén, 2002; Muthén & Shedden, 1999;
Yung, 1997). One reason for this is the possibility that mixtures offer to
account for population heterogeneity in statistical modeling of data. In this
context, population heterogeneity means that cases in the sample are
representative of two or more distinct sub-populations, but the sub-population
membership of the individual cases is unknown. The failure to take into
account such heterogeneity may result in the adoption of a model, which does
not actually hold in any sub-population. In addition, heterogeneity violates the
assumption that the data are independently and identically distributed. As a
consequence, various results associated with maximum likelihood theory
(standard errors, null-distribution of test statistics) may be unreliable.
While finite mixture modeling offers interesting possibilities, its successful
application depends in part on a strong theoretical underpinning, and (or) the
good separation (e.g., large mean differences) of the sub-populations. One
Correspondence concerning this article should be directed to Conor Dolan, Department
of Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam, The
Netherlands. Email: [email protected]
MULTIVARIATE BEHAVIORAL RESEARCH
69
C. Dolan, B. Jansen, and H. van der Maas
area of research, which is characterized by a strong theory, is Piagetian
cognitive development. Piagetian theory provides both a good indication of
expected number of sub-populations, and of the type of task or test, which
discriminates well between the sub-populations. To date, most applications
of mixture modeling in the area of Piagetian development have involved
discrete responses (e.g., latent class model). The aim of this article is to
present the results of multivariate normal mixture modeling of Piagetian data.
The results demonstrate the feasibility of multivariate normal mixture
modeling and the usefulness, in this type of modeling, of strong theory.
Below we first discuss recent applications of mixture modeling in studies
of Piagetian development. Subsequently, we present the Piagetian task and
the sample of the present study. The dataset consists of responses to four
computerized (pseudo-) conservation items, which were completed at four
occasions. As explained below conservation is an important ability in
Piagetian developmental theory. The sample size is relatively small and data
include many missing observations. Next we present the models and the
methods we used to fit the normal mixtures. The quasi-Newton and the EM
algorithms are used to estimate ML parameters. We present results of
cross-sectional and longitudinal analyses. The former are based on
unconstrained and constrained mixture models, the latter are based on highly
constrained mixture models.
Mixture Modeling of Piagetian Data
The results presented below pertain to the transition of the preoperational (2-7y) to the concrete operational stage (7-12y) in Piaget’s theory
of cognitive development (Piaget & Inhelder, 1969). This transition is
associated with striking changes in cognitive abilities. In contrast to concrete
operational children, pre-operational children typically fail tasks that require
proportional and analogical reasoning. Their thinking is rigid and lacking in
logic, and their focus is limited to a single aspect of a problem at a time. The
unidimensionality of their thinking is evident in tasks that require the ability to
conserve. This is the ability to understand that certain physical properties of
objects (weight, volume, quantity) are invariant, despite changes in their
appearance.
Children’s responses to Piagetian tasks are well suited to finite mixture
modeling for a number of reasons. First, the stages of development are
mutually exclusive, and second, the transitions are abrupt. The stages may
thus be viewed as levels of a latent nominal variable (Rindskopf, 1987).
Thomas, who pioneered the application of mixture modeling in the study of
Piagetian development, has argued convincingly that the concept of
70
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
developmental stage may be operationalized as a distinct component
distribution in a finite mixture (Thomas, 1994; Thomas & Lohaus, 1993;
Thomas, Lohaus, & Kessler, 1999; Thomas & Turner, 1991). Finite mixture
modeling provides both a framework for assessing development, objective
criteria to determine a child’s present stage, given his or her responses, and a
means to define qualitative differences in performance, and to relate these to
the use of different rules (e.g., see Jansen & van der Maas, 1997). In
addition, van der Maas and Molenaar (1992) suggested the application of
finite mixture modeling to investigate multimodality, which is an important
criterion in their mathematical catastrophe model for developmental
transitions in Piagetian cognitive development.
Finite mixture modeling has been used to study behavior in several
domains of Piagetian cognitive development. Most of these applications have
involved models for discrete responses, for example, mixtures of
multinomials, or latent class models (McCutcheon, 1987; Wolfe, 1970), and
mixtures of binomials (e.g., Everitt & Hand, 1981; McLachlan & Peel, 2000).
Thomas and Turner (1991) and Thomas, Lohaus, and Kessler (1999)
investigated performance on the water-level task using mixtures of binomials.
In the water level task subjects are asked to draw the water line in a vessel
(half filled with liquid) that is depicted at various angles relative to the
horizon. This spatial task is used to assess development in children's
understanding of the horizontal-vertical coordinate system. Pre-operational
children typically do not understand that the water line remains horizontal,
and usually draw a line parallel to the bottom of the vessel. Thomas and
Hettmansperger (2001) presented a comprehensive transition model, based
on a discrete time stationary Markov process, to investigate development on
this task. This model accommodates both qualitatively different modes of
responding at each occasion, as well as switches between the modes from
one occasion to the next. Binomial mixtures were also used by Hosenfeld,
van der Maas, and van den Boom (1997) in a study of the development of
analogical reasoning.
In addition to binomial mixtures, latent class analysis has been applied
extensively to study responses on the balance scale task, which is a test of
proportional reasoning. In the balance scale task subjects predict the side to
which the balance scale will tip, given varying number and position of
weights, relative to the fulcrum. Siegler (1981) suggested a rule assessment
methodology to determine the exact rule, which a given child uses to solve
balance scale problems. Originally he distinguished 4 rules, and designed 6
balance scale problems to determine the rule applied by a given subject.
Jansen and van der Maas (1997, 2001, 2002) and Boom, Hoijtink, and
Kunnen (2001) used latent class analysis to carry out rule assessment in the
MULTIVARIATE BEHAVIORAL RESEARCH
71
C. Dolan, B. Jansen, and H. van der Maas
balance scale task. Jansen and van der Maas (2002) also applied latent class
analysis in a detailed study of the transition between rules.
The applications mentioned above are based on discrete models. In the
case of the balance scale and analogical reasoning tasks this is a natural
choice. In the case of the water level task, where the response variable is
continuously distributed, Thomas discretizes responses in order to proceed
with discrete mixture modeling. Concerning normal finite mixtures, Thomas
etal. (1999) state that “(...) normal mixture procedures are not robust:
Parameter estimates can be exceedingly poor and misleading, particularly in
those cases where the component distributions are asymmetrical in shape
(...)” (p. 1034).
We are not convinced that normal mixtures are quite as unsuited as
Thomas suggests. Little is known about the feasibility of normal mixture
modeling of Piagetian data, as there have been few applications. One
advantage of normal mixture models is that they allow one to estimate (and
model) the covariance among responses within each component. In latent
class analysis, the responses conditional on class membership are
independent. Multivariate normal mixture models may also be applied to
longitudinal (panel) data. Discrete time transition models (Kemeny & Snell,
1976; Thomas & Hettmansperger, 2001; van de Pol & Langeheine, 1990) for
normally distributed responses give rise to highly constrained normal mixture
models.
The aim of the article is thus to present the results of multivariate normal
finite mixture analyses of data relating to Piagetian cognitive development.
Below we first explain the experimental task that we used to assess
conservation. Subsequently we describe the sample. We explain how we
obtained maximum likelihood parameters and present the models that we
fitted. We present the cross-sectional and longitudinal results, and we
conclude this article with a discussion.
The Pseudo-Conservation Anticipation Task
The data relate to the conservation of an amount of liquid, when it is
poured from one vessel to another, differently shaped, vessel. Typically,
children in the concrete operational stage (henceforth: conservers)
understand that the level of the water will change with the shape of the
vessels, but that the amount of water is invariant. Children in the preoperational stage (non-conservers) often think that level of the water is
indicative of the amount: the higher the level, the greater the amount of
water. To assess this ability, a sample of children completed a computer test
at 11 occasions. During the test, the children were asked to anticipate the
72
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
water level in the event that the water, presented in one glass, were poured
into a second, differently shaped, glass. They indicated the predicted level on
the computer screen by pressing pre-selected keys on the keyboard. These
keys controlled the water level, which was depicted as a horizontal line in an
empty glass, next to the glass containing the water before it is poured. The
children’s responses were recorded automatically. The test comprised four
items, which are depicted in Figure 1.
During the instruction children were shown three-dimensional glasses
with rectangular forms and a constant third dimension (i.e., depth). This
dimension was omitted in the two-dimensional computer display (Figure 1).
This instruction was given to minimize the possible confusion concerning the
correct rule (i.e., height*width, rather than the rule based on round glasses,
½**radius2). An anonymous reviewer pointed out that we cannot be sure
that the children interpret this task as a liquid conservation task rather than
an area conservation task in view of the nature of the stimuli (depth
information is lacking in the depiction of the vessels in Figure 1). This is an
important issue, which was addressed in an additional small study. The
Figure 1
Task Stimuli in Computer Test of Conservation
Children are required to predict the water level when the water in the left vessel is poured
into the right vessel.
MULTIVARIATE BEHAVIORAL RESEARCH
73
C. Dolan, B. Jansen, and H. van der Maas
results indicated that responses on the present task correlated highly (about
.79) with responses to a similar task, in which the depiction of the vessel did
include depth information. However, this result and the explicit instruction
notwithstanding, we agree that we cannot be completely certain that the
children interpreted the task in the intended manner. To avoid confusion on
this point we refer to the task as a pseudo-conservation anticipation task,
as suggested by the reviewer.
Ideally, given a large sample in the appropriate age range, one would expect
children to be conservers or non-conservers, and their responses to the items to
be bimodal at each measurement occasion. Consider the first item (top left in
Figure 1). We measured responses in cm from the correct level, which we
assigned the value of 0 cm. Given this scale, we expect non-conservers to have
a mean of about .75, and conservers to have a mean of about zero. Similar
predictions can be made in the case of the other items: expected means in the
pre-operational sample are expected to equal B2, 2, -.5, and 0, 0, 0, in the
concrete operational sample. In addition to the predictions concerning the means,
we expect non-conservers to have smaller standard deviations than the
conservers. We base this on the supposition that these children carry out an
alignment, as they believe that the water level will not change (Dolan & van der
Maas, 1998, see also Thomas, Lohaus, & Kessler, 1999). Conservers know that
the level of the liquid will change. To generate a response, they have to estimate
the level. The task is thus more difficult as it appeals the spatial ability to
estimate the new water level. We predict a larger standard deviation in the
conservers, because here individual differences in this spatial ability are expected
to come into play. The same reasoning may be applied to the covariance
matrices within each component. Assuming again that the non-conservers carry
out a simple alignment to generate a response, we do not expect the covariances
between the responses to the four items to be large. We assume that the
individual differences in the accuracy of the alignment are dominated by true
error. In the conservers, individual differences in the accuracy of the response
are likely to be due in part to systematic individual differences in spatial ability. In
other words we expect variable and correlated responses in the conservers and
uncorrelated and much less variable responses in the non-conservers.
These expectations apply both to the covariance structure of the four
items at each occasion and to the longitudinal covariance structure of each
item over the occasions. In the case of the repeated measures, we fit a
highly constrained two-stage Markovian transition model (Kemeny & Snell,
1976). In this model children may remain in their current stage, or may
switch stages between the occasions (Thomas & Hettmansperger, 2002).
We outline these expectations in terms of the means and covariances below.
74
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
Sample
The sample consists of 101 children from 4 groups of a Montessori
school in Amsterdam, the Netherlands. The children carried out the
conservation test on a computer, which was placed in the classroom. The
children completed the test 11 times (for details, see van der Maas, 1993).
At the first session only, the children completed the test under supervision.
Here we limited our analyses to occasions 1,4,7, and 10 (called occasions
1,2,3, and 4, below). Summary statistics relating to the ages of the children
are shown in Table 1. The average time between testing was 2 months.
Figure 2 depicts the 16 (4 items × 4 occasions) histograms of the data,
which are clearly bimodal. A good deal of the data are missing as can be
seen in Table 2. The dataset is available on request.
Method and Models
In view of the modest sample size and the missing data, we decided to
carry out two sets of analyses: four cross-sectional analyses of the four
items at each occasion, and four longitudinal analyses of each item over
the four occasions. We fitted several multivariate normal mixture models
subject to various constraints. The M-component multivariate normal
mixture density is defined as follows:
M
(1)
f  x j ; S (u ) , m (u ) , p  = ∑ pi g i ( x j ; ui ).
i =1
The P × M matrix () contains M P × P covariance matrices, [(1),
(2), ..., (M)], and the P × M dimensional vector () contains M × P
dimensional mean vectors [(1)t, (2)t, ..., (M)t]t. The M dimensional
Table 1
Summary Statistics Relating to Age in Months of Subjects
Mean
Median
Std.Dev.
Minimum
Maximum
93.5
92.9
10.8
73
126
MULTIVARIATE BEHAVIORAL RESEARCH
95.3
94.7
10.8
75
128
97.2
96.5
10.8
77
130
99.2
98.7
10.8
79
131
75
C. Dolan, B. Jansen, and H. van der Maas
Figure 2
Histograms of Observed Data
The dotted lines indicate the positions of the expected means in the conservers and nonconservers. The histograms on the first row are those of the test scores obtained at occasion
1. The histograms from left to right correspond to the stimuli shown in Figure 1.
vector p contains the M mixing proportions, (p1, p2, ..., pM), which may be
viewed as probabilities: pi = 1 and 0 < pi < 1. The vector = (1t, 2t, ...,
Mt)t contains unknown parameters, which are used to model the means and
covariance matrices. The ith component distribution, gi(xj; i), is
(2)
76
gi[xj;(i),(i)] =
(2)-P/2 |(i)|-1/2 exp{–(1/2)[yj – (i)]t (i)-1[yj – (i)]},
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
Table 2
Number and Distribution of Missing Observations
Number of Missing Given Each Item
Item
Occasion
1
2
3
4
1
2
3
4
6
30
41
17
8
27
37
19
7
23
30
15
6
20
35
16
Number of Missing Items at Each Occasion
Occasion
0
1
Item
2
3
4
1
2
3
4
89
59
52
69
7
20
13
17
0
4
5
4
0
0
4
2
5
18
27
9
Number of Missing Occasions Given Each Item
Item
0
1
Occasion
2
3
4
1
2
3
4
44
40
48
46
41
39
36
35
11
14
13
18
3
8
3
2
0
0
1
0
Note. Top: number of missing observations by occasion and item. Middle: number of
missing items at each occasion (e.g., at occasion 3, 2 items were missing in 5 subjects).
Bottom: number of missing repeated measures by items (e.g., the data of item 4 were missing
at 3 occasions in 2 subjects).
MULTIVARIATE BEHAVIORAL RESEARCH
77
C. Dolan, B. Jansen, and H. van der Maas
the P-variate normal density with (P × P) covariance matrix (i) and Pdimensional mean vector (i). The vector i contains the parameters used
to model the covariance matrix and the mean vector within the ith
component. In the case of unconstrained mean vector and covariance
matrix, the elements of i include the non-redundant elements of the mean
vector and covariance matrix. For instance, in the case of P = 4, we have:
(i) = i11 2
i21 2
i31 2
i41 2
i22
i32 2
i42 2
(i) = (i1 i2 i3 i4)t,
2
i33 2
i43 2
i44 2
so that i = (i112i212 ... i432 i442 i1 ... i4)t. In the case of the latent
profile model (Lazarsfeld & Henry, 1968; Wolfe, 1970), the off-diagonals are
zero, so that i = (i112 i222 i332 i442 i1... i4)t.
We estimated parameters by maximum likelihood estimation (Azzelini,
1996; McLachlan & Peel, 2000, Chapter 2) using two FORTRAN 77
programs, provisionally called MIXEM and MIXLIS1. These programs
maximize the loglikelihood of the multivariate normal finite mixture:
N
(3)
{
}
LogL (u, p ) = ∑ log f  x j ; ∑ (u ) , m (u ) , p  .
j =1
We used MIXEM in the cross-sectional analyses. This program fits
unconstrained mixtures and latent profile models with fixed or estimated
means. It maximizes the loglikelihood by means of the EM algorithm
(McLachlan & Peel, 2000, p. 82). We used the EM algorithm of Ghabramani
and Jordan (1994), which can accommodate missing data, assuming that they
are missing at random (Little & Rubin, 1989; Rovine, 1994). As local maxima
may be a problem in fitting unconstrained multivariate normal mixtures, we
used a large set (say, 5000) of random starting values. We provided the upper
and lower bounds of the parameters and generated random starting values
drawn from the uniform distribution, defined by the bounds (for a similar
method, see McLachlan & Peel, 2000, p. 55). In working through the random
1
McLachlan and Peel (2000) provide an overview of mixture software, including their own freely
available EMMIX. We choose to use our own program MIXEM as it was especially written to
handle missing data and generate starting values as described above. The FORTRAN 77 program
MIXEM is available upon request. Schmittmann, Dolan, and Neale (2003) are currently investigating the possibility of fitting the present transition models in the freely available Mx program
(Neale, Boker, Xie, & Maes, 1999).
78
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
starting vectors, we retained each (admissible) solution, which resulted in a
larger value of the maximum likelihood function. Once the program
terminated, we checked whether the solution was acceptable and interpretable.
As demonstrated by McLachlan and Peel (2000, p. 100), spurious ML solutions
may be due to extremely small variances. In MIXEM this was avoided
discarding solutions with extremely small within component variances.
The longitudinal analyses were carried out using the program MIXLIS.
This program allows us to specify a structural equation model within each
component. Parameter estimates may be fixed, free, or subject to equality
constraints. The loglikelihood is maximized by means of the quasi-Newton
routine NPSOL (Gill, Murray, Saunders, & Wright, 1986) using exact
derivatives (e.g., Dolan & van der Maas, 1998; Jedidi, Jagpal, & DeSarbo,
1997; Yung, 1997). We consider following simple model:
(i) = iiit and (i) = i,
where the subscript i denotes component and superscript t denotes
transposition. The vector i contains the unknown parameters in i, i, and
i. For instance, in the case of the latent profile model, we specify i = I,
diag(i) = [i11, i22, i33, i44], and i = [i1,i2,i3,i4]t. Like MIXEM,
MIXLIS can handle missing data, on the assumption that the data are missing
at random (Little & Rubin, 1989; Rovine, 1994). The algorithm used in
MIXLIS to handle missing data is described in Finkbeiner (1979). Unlike
MIXEM, MIXLIS requires user supplied starting values and parameter
bounds, rather than just bounds. Because the longitudinal models are highly
constrained, local maxima pose less of a problem.
Below we require the posterior probability, denoted ij, that a given subject
belongs to a given component given his or her data. By Bayes’ theorem
(McLachlan & Peel, 2000, p. 20), this probability is calculated as follows
(4)
ij = pigi[xj; (i),(i)] / f[xj; (), (), p].
Cross-Sectional Models
In the cross-sectional analyses, we fitted a series of exploratory two and
three component mixtures to the responses to the four items, observed at
each occasion. Two components are expected on the basis of theory (viz.
conservers and non-conservers). A third component may be required to
account for irregular responding (Thomas, Lohaus, & Kessler, 1999). In all,
we fitted 4 models to the data observed at each occasion, that is, a total of 16
analyses. We fitted the following models.
MULTIVARIATE BEHAVIORAL RESEARCH
79
C. Dolan, B. Jansen, and H. van der Maas
Unconstrained Model
This a mixture with two and three components with unconstrained
covariance matrices and mean vectors. In the case of the three-component
model, we have:
1 = [1112 1212 ... 1432 1442 11 12 13 14],
2 = [2112 2212 ... 2432 2442 21 22 23 24],
3 = [3112 3212 ... 3432 3442 31 32 33 34],
where i112 i212 ... i432 i442 and i1 i2 i3 i4 are the elements of the
covariance matrix and mean vector, respectively, in component i. Including
the mixing proportions the number of parameters is 29 in the case of the twocomponent model (10 × 2 elements in the covariance matrices [cv], 4 × 2
means [me], and 1 mixing proportion [mp]), and 44 in the case of the threecomponent model (10 × 3 cv + 4 × 3 me + 2 mp).
Fixed Means Model
This a mixture with two and three components with fixed means in the
two components representing non-conservers and the conservers. We did
not constrain the means in the additional component. The covariances within
each component were estimated. In the case of the three-component model
we have:
nc = (nc112 nc212 ... nc432 nc442 .75 –2 2 –.5),
c = (c112 c212 ... c432 c442 0. 0. 0. 0.),
3 = (3112 3212 ... 3432 3442 31 32 33 34),
where nc is the parameter vector in the non-conservers, and c is the
parameter vector in the conservers. The number of parameters is 21 in the
two-component model (10 × 2 cv + 1 mp), and 36 in the three-component
model (10 × 3 cv + 4 me + 2 mp).
Latent Profile Model
We fitted latent profile models with two and three components. In the
case of the three-component model, we have:
80
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
1 = (
2 = (
3 = (
2
111
2
211
2
311
122 133 1442 11 12 13 14),
2222 2332 2442 21 22 23 24),
3222 3332 3442 31 32 33 34).
2
2
In the latent profile model the covariances within each component are fixed
to zero. The number of parameters is 17 in the two-component model (2 × 4
cv + 2 × 4 me + 1 mp), and 26 in the three-component model (3 × 4 cv + 3 ×
4 me + 2 mp).
Latent Profile + Fixed Means Model
We fitted latent profile models with fixed means in the two components
representing non-conservers and conservers. In the case of the threecomponent model:
nc = (nc112 nc222 nc332 nc442 .75 –2 2 –.5),
c = (c112 c222 c332 c442 0. 0. 0. 0.), and
3 = (3112 3222 3332 3442 31 32 33 34).
In the three-component model, we estimated the means in the additional
component, because we lack a hypothesis concerning these means. The
number of parameters is 9 in the two-component model (2 × 4 cv + 1 mp),
and 18 in the three-component model (3 × 4 cv + 4 me + 2 mp).
Longitudinal Models
In the longitudinal analyses, we fitted a series of highly constrained
mixtures to the responses to each item observed at the four occasions. The
mixture model represents a two state transition model (e.g., Kemeny & Snell,
1976). We assumed that a child is either a conserver or non-conserver at
each occasion, and that any transition may take place between the occasions.
Possible transitions are thus from (a) non-conserver to conserver; (b) nonconserver to non-conserver; (c) conserver to conserver; and (d) from
conserver to non-conserver. This model gives rise to a 16 (24) component
mixture model, as shown in the first column of Table 3.
In principle, it is possible to estimate the 15 mixing proportions (Table 3
column 2). This is however problematic because the number of mixing
proportions increases exponentially with the number of occasions. It is
therefore preferable to derive mixing proportions from a transition model
(Kemeny & Snell, 1976; Thomas & Hettmansperger, 2001). Let p0 (1 – p0)
denote the probability that a child is a conserver (non-conserver) at the first
MULTIVARIATE BEHAVIORAL RESEARCH
81
C. Dolan, B. Jansen, and H. van der Maas
Table 3
16 Mixing Proportions, Given Initial Probabilities and Transition Probabilities
component
(occ. 1 to 4)
proportion
1 nc, nc, nc, nc
2 nc, nc, nc, c
3 nc, nc, c, nc
4 nc, nc, c, c
5 nc, c, nc, nc
6 nc, c, nc, c
7 nc, c, c, nc
8 nc, c, c, c
9 c, nc, nc, nc
10 c, nc, nc, c
11 c, nc, c, nc
12 c, nc, c, c
13 c, c, nc, nc
14 c, c, nc, c
15 c, c, c, nc
16 c, c, c, c
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
p14
p15
p16
stationary transition
model
p0*q*q*q
p0*q*q*(1 – q)
p0*q*(1 – q)*(1 – p)
p0*q*(1 – q)*p
p0*(1 – q)*(1 – p)*q
p0*(1 – q)*(1 – p)*(1 – q)
p0*(1 – q)*p*(1 – q)
p0*(1 – q)*p*p
(1 – p0)*(1 – p)*q*q
(1 – p0)*(1 – p)*q*(1 – q)
(1 – p0)*(1 – p)*(1 – q)*(1 – p)
(1 – p0)*(1 – p)*(1 – q)*p
(1 – p0)*p*(1 – p)*q
(1 – p0)*p*(1 – p)*(1 – q)
(1 – p0)*p*p*(1 – p)
(1 – p0)*p*p*p
transition model
c absorbing (p = 1)
p0*q*q*q
p0*q*q*(1 – q)
p0*q*(1 – q)
p0*(1 – q)
(1 – p0)
Note. Components are derived from a two state transition model. nc stands for nonconserver and c stands for conserver. p0 is the probability that a given child is a nonconserver at occasion 1; p is the probability of a transition from c to c, and q is the
probability of a transition from nc to nc. Each component represents a distinct trajectory
spanning occasion 1 to 4.
measurement occasion. Let p (1 – p) denote the conditional probability that
a child is a conserver (non-conserver) at occasion t – 1, given that it was a
conserver at occasion t. Let q (1 – q) denote the conditional probability that
a child is a non-conserver (conserver) at occasion t – 1, given that it was a
non-conserver at occasion t. This basic transition model gives rise to the
mixing proportions shown in Table 3, column 3. We assume that the
transition probabilities (p and q) do not change over time. Note that in this
transition model, the concrete operational stage is not absorbing in the sense
that children may switch from concrete operational to pre-operational
responding (Thomas & Hettmansperger, 2001; Thomas, Lohaus, & Kessler,
82
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
1999). Assuming that conserving is an absorbing state (i.e., the transition
from conserver to non-conserver is not possible, p = 1), the number of
components is reduced to 5, as shown in column 4 of Table 3. We call these
models the constrained transition model (3 parameters p0, p, q) and the
absorbing constrained transition model (2 parameters: p0 and q).
As we analyzed responses to the same item observed at four occasions,
we constrained means, standard deviations, and correlations accordingly.
We employed the model (i) = iiit and (i) = i. We estimated
standard deviations, instead of variances, that is, diagonal elements of i
were included as free parameters. We estimated correlations, instead of
covariances, in i, that is, diag(i) = diag(I). We first present the 4 models
we considered for the means and covariance structure. Below we discuss
possible models for the transition probabilities. We fitted the following
models.
1. Latent profile model with equality constraints on means and standard
deviations. The latent profile model implies i = I. In terms of mean and
diagonal covariance matrices, the model is:
diag(1) = (nc nc nc nc)
diag(2) = (nc nc nc c)
diag(3) = (nc nc c nc)
....
diag(14) = (c c nc c)
diag(15) = (c c c nc)
diag(16) = (c c c c)
1 = (nc nc nc nc)
2 = (nc nc nc c)
3 = (nc nc c nc)
14 = (c c nc c)
15 = (c c c nc)
16 = (c c c c)
Given the equality constraints, the estimated parameters accounting for
means and covariance matrices are = (c nc c nc). In combination with
the parameters p0, p, and q, this models has 7 parameters.
2. Latent profile model with fixed means and constrained standard
deviations. This is the same as model 1, except for the means, which are
fixed to their expected values. For instance, in item 1 (Figure 1, top left):
1 = (.75 .75 .75 .75)
2 = (.75 .75 .75 0)
3 = (.75 .75 0 .75)
....
14 = (0 0 .75 0)
15 = (0 0 0 .75)
16 = (0 0 0 0)
MULTIVARIATE BEHAVIORAL RESEARCH
83
C. Dolan, B. Jansen, and H. van der Maas
In this model, the estimated parameters accounting for covariance matrices
are, = (c nc). In combination with the parameters p0, p, and q, this
models has 5 parameters.
3. Covariance model: mixture model with equality constraints on
means, standard deviations, and correlations. The means and standard
deviations are estimated subject to the same equality constraints as in model
1. In addition, we estimated three correlations to take into account the
possible dependency between the responses over time. We estimated a
correlation between the responses of the non-conservers (nc), a correlation
between the responses of the conservers (c), and a correlation between the
responses of children who switch between occasions (t; subscript t stands
for transition). The correlation does not depend on the exact switch (nonconserver to conserver, or vice versa). The matrices i are as follows:
1 =
2 =
3 =
1
nc
nc
nc
1
nc
nc
1
nc
1
1
nc
nc
t
1
nc
t
1
t
1
1
nc
t
nc
1
t
nc
1
t
1
1
c
t
c
1
t
c
1
c
1
1
c
c
t
1
c
t
1
t
1
...
14 =
15 =
84
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
16 =
1
c
c
c
1
c
c
1
c
1
Given the equality constraints, the estimated parameters accounting for the
means and covariance matrices are = (c nc c nc c t nc). In
combination with the parameters p0, p, and q, this model has 10 parameters.
4. Covariance model with fixed means: mixture model with fixed
means and equality constraints on standard deviations, and
correlations. This is the same as model 3 except that the means are fixed
to their expected values. In this model the estimated parameters accounting
for the covariance matrices are = (c nc c t nc). In combination with
the parameters p0, p, and q, this model has 8 parameters.
We first fitted the constrained transition model (Table 3, column 3) to
determine how well the four models fitted the data. Subsequently, we fitted
the model of choice with mixing proportions constrained according to the
constrained absorbing transition model (Table 3, column 4).
Model Comparisons
In so far as they concern the number of components, model comparisons
are complicated by the fact that the number of components in a mixture
cannot be established by means of a likelihood ratio test (Everitt & Hand,
1981; McLachlan & Peel, 2000). Often information criteria, such as
Akaike’s Information Criterion (AIC; Akaike, 1974) and Bayesian
Information Criterion (BIC; Schwarz, 1978), are used to determine the
number of components. Although these criteria are based on the same
(untenable) regularity conditions as the likelihood ratio test, their utility in this
context is well established (McLachlan & Peel, 2000).
Compared to AIC, BIC tends to favor more parsimonious models. In
view of the modest sample size, we report BIC. BIC is calculated as
BIC = –2*LogL(, p) + log(N)*NPAR, where N is the sample size and
NPAR is the number of free parameters. In addition to BIC, we report the
ICLBIC (ICL stands for Integrated Classification Likelihood; see McLachlan
& Peel, 2000). This is calculated as ICLBIC = –2*LogL(, p) + 2*EN() +
log(N)*NPAR, where EN() = – ijlog(ij) (ij is defined in Equation 4;
summation is over the i = 1, N subjects and j = 1, M components).
McLachlan and Peel (2000, section 6.11) report good results for the criterion
ICLBIC in a simulation study.
MULTIVARIATE BEHAVIORAL RESEARCH
85
C. Dolan, B. Jansen, and H. van der Maas
Various models considered here are nested. For instance, the latent
profile models are nested under the models including within component
covariances. In the case of nested models, comparison may be based on the
loglikelihood ratio (Azzelini, 1996). Loglikelihood difference tests have to be
interpreted with caution as they are based on asymptotic theory, which
generally requires large sample sizes. In addition, it is known that reliability
problems relating to small N are exacerbated in mixture modeling when the
components are poorly separated (Dolan & van der Maas, 1998).
Cross-Sectional Analyses: Results
The results of the cross-sectional analyses are shown in Table 4. We first
compare the unconstrained models to determine the number of components. At
occasion 1, BIC & ICLBIC favor the two-component model (ICLBIC –350.9
vs. –343.2; BIC –352.6 vs. –347.8). At the other occasions, BIC and ICLBIC
favor the three-component model. The difference between the first and
subsequent occasions may be due to the fact that the children completed the test
at the first occasion under supervision. In addition, Hosenfeld et. al. (1997) in
their study of analogical reasoning observed that children respond differently
given repeated exposure to the same test. Given the choice of the number of
components based on BIC and ICLBIC (2 at occasion 1; 3 at occasions 2 to 4),
we consider the sequence of the three constrained models: latent profile (model
2, in Table 4), fixed means (model 3), and latent profile + fixed means (model 4).
BIC (–376.1) and ICLBIC (–375.7) favor the latent profile model at the
first occasion. At occasion 2, ICLBIC favors the latent profile model
(124.2), but BIC favors the latent profile + fixed means model (115).
However the difference in BIC between model 3 and 4 is very small (115 vs.
116.7). At occasion 3 and 4, BIC and ICLBIC favor the latent profile model
+ fixed means. In view of these findings, we report results of the latent
profile model (model 3) in Table 5, and the latent profile + fixed means
models (model 4) in Table 6. The results in Table 6 are limited to the threecomponent models of occasions 2 to 4.
The mixing proportion at occasion 1 indicates that 82% of the subjects
are non-conservers. The means in the non-conserver component are close
to the expected values (.73, –1.99, 2.04, –.53). The means in the conserver
component deviate from the expected values of zero, but the standard errors
suggest that this deviation is not large in the case of items 3 and 4. The
estimated standard deviations agree with expectation in that they are 4 to 11
times larger in the conserver component.
The estimated mixing proportions at occasion 2 indicate that 48% of the
children are non-conservers, and 36% are conservers. The means of the
86
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
Table 4
Results of Cross-Sectional Analyses
NPAR –2LL
t
nc model NPAR
–2*LL
BIC
ICLBIC
comparison
1
2
2
2
2
1
2
3
4
29
21
17
9
–485.0
–433.8
–453.7
–398.6
–352.6
–338.0
–376.1
–357.5
–350.9
–336.9
–375.7
–357.4
1 vs. 2
1 vs. 3
1 vs. 4
8
12
20
51.2
31.3
86.4
3
3
3
3
1
2
3
4
44
36
26
18
–548.6
–518.5
–515.2
–473.5
–347.8
–354.2
–396.6
–391.3
–343.2
–338.0
–371.4
–355.8
1 vs. 2
1 vs. 3
1 vs. 4
8
18
26
30.0
33.4
75.1
2
2
2
2
1
2
3
4
29
21
17
9
29.9
69.9
60.0
106.6
158.0
162.5
135.1
146.3
158.8
163.0
135.4
146.4
1 vs. 2
1 vs. 3
1 vs. 4
8
12
20
40.0
30.1
76.7
3
3
3
3
1
2
3
4
44
36
26
18
–56.6
–20.9
1.8
34.4
137.7
138.1
116.7
115.0
148.2
151.9
124.2
134.6
1 vs. 2
1 vs. 3
1 vs. 4
8
18
26
35.7
58.4
91.0
2
2
2
2
1
2
3
4
29
21
17
9
–12.3
26.2
4.5
46.5
112.4
116.5
77.7
85.2
114.4
117.5
79.6
86.0
1 vs. 2
1 vs. 3
1 vs. 4
8
12
20
38.5
16.8
58.8
3
3
3
3
1
2
3
4
44
36
26
18
–83.4
–49.3
–52.2
–26.0
105.9
105.6
59.6
51.4
111.8
113.2
72.9
64.3
1 vs. 2
1 vs. 3
1 vs. 4
8
18
26
34.1
31.2
57.4
2
2
2
2
1
2
3
4
29
21
17
9
72.3
122.3
116.5
175.8
203.4
217.3
193.4
216.5
204.4
217.8
194.8
217.0
1 vs. 2
1 vs. 3
1 vs. 4
8
12
20
50.0
44.2
103.5
3
3
3
3
1
2
3
4
44
36
26
18
–11.8
14.4
28.9
60.4
187.1
177.2
146.5
141.7
194.1
183.9
158.0
155.2
1 vs. 2
1 vs. 3
1 vs. 4
8
18
26
26.2
40.7
72.2
1
2
2
3
3
4
4
Note. LL denotes the loglikelihood. In the loglikelihood difference tests (-2LL), the
unconstrained (2 or 3 component) mixture is the reference model. NPAR is the number of
parameters and DNPAR the difference in the number of parameters between models. Model
1: unconstrained; model 2: fixed means; model 3: latent profile; model 4: fixed means & latent
profile.
MULTIVARIATE BEHAVIORAL RESEARCH
87
C. Dolan, B. Jansen, and H. van der Maas
Table 5
Cross-Sectional Results
comp.
NC
est.
s.e.
C
est.
s.e.
p
.82
.04
.18
-
occasion 1
1
2
3
4
.73 –1.99 2.04 –.53
.01
.01 .01
.01
.36
–.58 .44 –.13
.09
.13 .21
.11
1
.09
.007
.39
.07
2
.09
.007
.51
.09
3
.08
.007
.90
.15
4
.08
.007
.46
.08
comp.
NC
est.
s.e.
C
est.
s.e.
?
est.
s.e.
p
.48
.06
.36
.16
.04
occasion 2
1
2
3
4
.71 –2.02 2.04 –.55
.01
.02 .01
.02
.22
–.09 .05
.04
.06
.05 .05
.10
.16 –1.61 1.10 –.43
.12
.24 .28
.16
1
.08
.009
.30
.04
.31
.08
2
.09
.01
.27
.04
.74
.16
3
.07
.01
.27
.04
.93
.19
4
.10
.01
.56
.07
.57
.11
comp.
NC
est.
s.e.
C
est.
s.e.
?
est.
s.e.
p
.53
.06
.40
.06
.07
-
occasion 3
1
2
3
4
.74 –2.01 2.04 –.53
.014
.012 .010 .017
.30
–.23 .41 –.01
.08
.11 .15
.11
.35 –1.91 –.44 –.57
.22
.05 .19
.02
1
.08
.01
.42
.06
.45
.15
2
.08
.008
.55
.08
.08
.03
3
.06
.008
.78
.11
.39
.13
4
.10
.012
.56
.08
.04
.014
comp.
NC
est.
s.e.
C
est.
s.e.
?
est.
s.e.
p
.45
.05
.22
.08
.33
-
occasion 4
1
2
3
4
.73 –2.03 2.03 –.54
.02
.01 .01
.01
.01
–.07 –.01
.24
.03
.06 .06
.09
.31 –1.22 .93 –.30
.09
.18 .10
.13
1
.12
.018
.13
.02
.48
.07
2
.06
.008
.21
.05
.87
.12
3
.07
.007
.23
.04
.98
.13
4
.06
.007
.34
.08
.66
.09
Note. Estimates (est.) and standard errors (s.e.) of the latent profile model fitted at occasion
1 to 4. The parameter p is the mixing proportion, and are the mean and standard
deviation, respectively. The subscript refers to item (1 to 4).
88
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
Table 6
Cross-Sectional Results
comp.
NC
est.
C
est.
?
est.
s.e.
p
.51
.30
.19
.05
occasion 2
1
2
3
4
.75
–2
2
–.5
.0
.0
.0
.0
.19 –1.06 .60 –.18
.09
.31 .28
.19
1
.08
.37
.29
.06
2
.10
.24
.95
.18
3
.10
.20
.87
.19
4
.11
.57
.66
.12
comp.
NC
est.
C
est.
?
est.
s.e.
p
.52
.29
.18
-
occasion 3
1
2
3
4
.75
–2
2
–.5
0
0
0
0
.60 –1.15 .76 –.48
.12
.30 .36
.18
1
.06
.40
.36
.08
2
.08
.32
.91
.20
3
.07
.22
1.23
.25
4
.10
.49
.49
.11
p
.47
.22
.31
-
occasion 4
2
3
4
–2
2
–.5
.0
.0
.0
–1.22 .90 –.28
.19 .21
.14
1
.11
.14
.48
.06
2
.07
.24
.88
.13
3
.08
.22
.97
.14
4
.08
.42
.69
.10
comp.
NC
est.
C
est.
?
est.
s.e.
1
.75
.0
.28
.09
Note. Estimates (est.) of the latent profile model with fixed means fitted at occasions 2, 3,
and 4. The parameter p is the mixing proportion, and are the mean and standard
deviation, respectively. Standard errors (s.e.) are reported only for the third component.
The subscript refers to item (1 to 4).
non-conservers are close to the expected values (.71, -2.02, 2.04, –.55), as
are those of the conservers. In the conservers, only the mean of the first
item deviates from its expected value (.22, s.e. .06). As at occasion 1, the
standard deviations of the non-conservers are much smaller that those of the
conservers. The third component includes 16% of the children. The means
of items 2, 3, and 4 lie between those in the conservers and non-conservers,
and the standard deviations are relatively large. This component may
account for random responding (see Thomas & Lohaus, 1993). The results
at occasion 2 (Table 6) are more in line with this interpretation. Here we
find intermediate mean values on all items (.19, –1.06, .60, –.18) and
MULTIVARIATE BEHAVIORAL RESEARCH
89
C. Dolan, B. Jansen, and H. van der Maas
relatively large standard deviations. The mixing proportions at occasion 2 are
similar in the two models (.48, .36, .16 vs. .51, .30, .19).
The results of occasion 3 are less clear-cut with respect to the third
component. The mixing proportion of the third component is smaller (.07).
This component cannot be interpreted very well as a random responding
component. Notably the means and standard deviations of items 2 and 4
resemble those of the non-conservers. The results obtained with model 4,
however, are easier to interpret as the third component may be interpreted as
a random responding component. The estimates of the means are
intermediate and the standard deviations are relatively large.
At occasion 4, the results of both model 3 and 4 are consistent with the
presence of three groups of children: conserver, non-conservers and random
responders (as observed in the water level task, e.g., Thomas & Lohaus,
1993). The stability between model 3 and 4 is greatest at occasion 4. The
estimates of mixing proportions and the means in the third component are
almost identical in these models.
The estimates of the mixing proportions suggest that the number of nonconservers does not change much from occasion 2 to 4 (.51, .52, .47 in model
4). Compared to occasions 2 and 3, the size of the third component
increases (from .19 and .18 to .31, in model 4).
Longitudinal Analyses: Results
The results of the longitudinal analyses are shown in Table 7. We consider
models 1 to 4. Of these four, the information criteria favor model 1 in the case
of item 1 (latent profile model), model 3 (covariance model) in case of items 2
to 4. Fixing the means to the expected values (model 2) is not tenable judging
by the loglikelihood ratio tests (45.9, 31.9, 33.4, 48.9, df = 2; see also the
comparison model 3 vs. 4). The loglikelihood ratio between model 1 and
model 3 may serve as a test of the significance of the correlations of the test
scores between occasions. As shown in Table 7, the differences equal 6.6,
12.4, 33.2, and 6.9, with 3 degrees of freedom. The value of the 2
associated with = .05 is 7.81, so that the (omnibus) hypothesis of zero
correlations may be rejected (given = .05) in the case of items 2 and 3. To
ease presentation, we present the parameter values only of model 3
(covariance model). We also fitted this model with the mixing proportions
constrained according to the absorbing constrained transition model (Table 3,
column 4). The loglikelihood ratios indicated that this model was not tenable
compared to model 3 in Table 7 (30.6, 38.2, 154.7, and 17.8, items 1 to 4,
respectively, df = 1).
90
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
Table 7
Results of Longitudinal Analyses
NPAR -2LL
ITEM model NPAR
–2*LL
BIC
ICLBIC
comparison
1
1
1
1
1
2
3
4
7
5
10
8
–181.6
–135.7
–188.2
–154.0
–149.3
–122.6
–142.0
–117.1
15.3
29.9
24.4
35.0
1 vs. 2
1 vs. 3
3 vs. 4
2
3
2
45.9
6.6
34.1
2
2
2
2
1
2
3
4
7
5
10
8
76.4
108.3
64.4
88.7
108.7
131.3
110.5
125.6
179.6
204.8
193.5
203.6
1 vs. 2
1 vs. 3
3 vs. 4
2
3
2
31.9
12.4
34.1
3
3
3
3
1
2
3
4
7
5
10
8
1.68
35.2
–31.5
–.2
33.9
58.2
14.5
36.6
86.9
111.1
67.4
89.6
1 vs. 2
1 vs. 3
3 vs. 4
2
3
2
33.4
33.2
31.3
4
4
4
4
1
2
3
4
7
5
10
8
–39.8
9.1
–46.7
9.7
–7.5
32.2
–.6
46.6
160.6
185.1
169.1
208.7
1 vs. 2
1 vs. 3
3 vs. 4
2
3
2
48.9
6.9
56.4
Note. Model 1: Latent profile; model 2: latent profile + fixed means; model 3: covariance
model; model 4: covariance model with fixed means. Mixing proportions in model 1 to 4 are
subject to constraints in Table 3, column 3.
Table 8 contains the parameter estimates of model 3. The results agree
largely with expectation. The means of the non-conservers are close to the
expected values: .74, –2.01, 2.03, and –.54. The means of the conservers
agree closely with the expected values in items 3 and 4 (.041 and –.071), but
less well in items 1 and 2 (.26, s.e. .043 and –.417, s.e. .089). The standard
deviations are in line with expectation: those in the conserver component are
much larger than those in the non-conserver component. The correlations
observed in the case of item 3 again agree with expectation. Notably, the
correlation between the scores of the conservers over time is large (.729).
This is consistent with the notion that individual differences in spatial ability play
MULTIVARIATE BEHAVIORAL RESEARCH
91
C. Dolan, B. Jansen, and H. van der Maas
Table 8
Parameter Estimates and Standard Error of Longitudinal Model 3
item 1
est. s.e.
item 2
est. s.e.
item 3
est.
s.e.
item 4
est.
s.e.
nc
c
nc
c
nc
t
c
.740
.265
.047
.394
.122
–.178
.285
.004
.043
.004
.026
.106
.106
.180
–2.01
–.417
.092
.647
–.101
.048
.553
.006
.089
.006
.070
.055
.496
.108
2.03
.041
.088
.436
–.059
.122
.729
.005
.056
.004
.039
.066
.081
.070
–.545
–.071
.061
.579
.071
–.078
.249
.005
.061
.005
.037
.088
.128
.109
p1
p2
p3
p4
p5
p6
p7
p8
p9
p 10
p 11
p 12
p 13
p 14
p 15
p 16
.263
.110
.022
.134
.022
.009
.027
.163
.018
.007
.001
.009
.021
.009
.026
.157
.050
.008
.007
.016
.007
.004
.008
.007
.007
.003
.001
.004
.008
.004
.008
-
.435
.102
.015
.111
.015
.004
.016
.121
.014
.003
.001
.004
.015
.004
.017
.123
.051
.011
.006
.017
.006
.002
.006
.026
.008
.002
.001
.002
.007
.002
.007
-
.405
.116
.024
.125
.024
.007
.026
.134
.014
.004
.001
.004
.015
.004
.016
.081
.049
.009
.007
.016
.007
.003
.006
.026
.007
.002
.001
.002
.006
.002
.006
.294
.112
.027
.128
.027
.010
030
.147
.020
.008
.002
.009
.023
.009
.027
.128
.050
.009
.009
.017
.009
.005
.009
.029
.009
.004
.001
.004
.009
.004
.009
-
p0
p
q
.745
.855
.711
.818
.879
.810
.859
.835
.778
-
.776
.826
.727
Note. , , and denote mean, standard deviation and correlation, respectively. The
subscript nc (c) stands for non-conserver (conserver). The subscript t denotes transitional.
The parameters pi (i = 1...16) are the mixing proportions. The mixing proportions are a
function of the parameters p0, q and p.
92
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
a role in this task among the conservers. In the non-conserver component, the
correlation is low (–.059). The correlation between the scores of those who
switch from conserver to non-conserver, or vice versa, is again low (.122). In
the case of items 1, 2, and 4, we obtained similar results, albeit apparently not
(statistically) significant in the case of items 1 and 4.
The estimates of the parameters p0, p, and q are also shown in Table 8.
The probability of being a non-conserver at occasion 1 is high: .74, .82 .86
and .77, for items 1 to 4, respectively. The conditional probability of
remaining in a given state at k + 1, given that one was in that state at the
occasion k, is high: .85, .88, .83, and .83 (remaining a conserver), and .71,
.81, .78, .73 (remaining a non-conserver). As is to be expected, the
probabilities of switching from non-conserver to conserver (.29, .19, .22, .27)
are larger than vice versa (.15, .12, .17, .17).
The probability of being a non-conserver at occasion 1 (on average .80)
agrees with the cross-sectional result at occasion 1 (.82). We know from
Table 6 that the proportions of non-conservers at occasion 2 to 4 are about
.51, .52, and .47, respectively. By summing the mixing proportions in Table
8, we obtain estimates of the number of non-conservers at each occasion.
For instance the sum of p1 to p8 is the estimate of the number of nonconservers at occasion 1 (see Table 3). The probability of being a nonconserver at occasions 2 to 4 are .56, .46, .40 (item 2); .68, .59, .52 (item 3);
and .69, .59, .52 (item 4). On average, we have .63, .53, and .47. These
results agree well with those obtained in the cross-section analyses, with the
exception of the second occasion (.63 vs. .51).
The mixing proportions vary somewhat over the items. Overall, we find
that the largest proportions are associated with components 1, 2, 4, 8 and 16,
that is, the components associated with the absorbing constrained transition
model (Table 3 column 4). Adding the mixing proportions of these
components, we find that they account for 83%, 89%, 86%, and 82% of the
responses to items 1 to 4, respectively.
To provide an insight into the actual scores of the subjects, we assigned
subjects to components on the basis of the maximum posterior probability
calculated in model 3 (see Equation 4). The expected means and the actual
scores on item 3 over the 4 occasions are shown in Figure 3. The
components 6 and 9 to 14 are depicted, but no cases were assigned to these
components, because the associated mixing proportions are close to zero and
the sample size is small.
MULTIVARIATE BEHAVIORAL RESEARCH
93
C. Dolan, B. Jansen, and H. van der Maas
Figure 3
Observed Data (squares) and Expected Means (circles connected by a solid line) of Item 3
Observed data were assigned to the components on the basis of the maximum posterior
probability. No cases were assigned to components 6, 9, 10, 11, 12, 13, 14, due to the small
mixing proportions and small sample size.
Discussion
The aim of this article was to demonstrate the usefulness and feasibility
of multivariate normal mixture modeling of Piagetian data. Most applications
of mixture modeling in this area are based on discrete modeling of balance
scale task data (Boom et al., 2001; Jansen & van der Maas, 1997, 2001,
2002) or water level task data (Thomas & Hettmansperger, 2001; Thomas,
Kessler, & Lohaus, 1999; Thomas & Lohaus, 1993). Normal mixture
94
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
modeling is a viable option, when responses within the components are
approximately normal.
We have obtained fairly consistent and
comprehensible results, given the small sample size and extensive missing
data. Although the results are consistent with the expected behavior on a
conservation task, an anonymous reviewer pointed out that we cannot be
completely sure that the present task is valid as a anticipation-(liquid)conservation test. However, this does not detract from our finding that
normal mixture modeling is feasible in this context.
Bearing in mind the limitations of the present task, the results of the crosssectional analyses suggest that a two (occasion 1) or a three-component
mixture is required to account for the data. As mentioned the differences
between these occasions may be due to the supervision during testing at the
first occasion. At all four occasions, a non-conserver component is quite
distinct: the means are about equal to the expected values, and the standard
deviations are relatively small. The conserver component is present, but less
distinct: the means deviate from the expected values. The standard deviations
are much larger, as expected. The third component (occasions 2 to 4)
generally appears to account for fairly irregular responses. In so far as this
component accommodates irregular responding, it may be similar to the random
responder component identified in Thomas’s analyses of the water level task,
or it may accommodate the responses of transitional children.
Normal mixture modeling has the advantage that the within component
covariance structure may be estimated.
However, our expectations
concerning the correlations within the conserver components were not borne
out in the cross-sectional analyses. BIC & ICLBIC consistently favored the
latent profile models. The mixing proportions suggest that about 80% of the
subjects were non-conservers at occasion 1, and between 40 and 50% at
occasions 2, 3, and 4. The drop in non-conservers from occasion 1 to 2 may
be due to children discussing the task among themselves.
In the longitudinal analyses, we fitted a highly constrained two-state
transition model. The aim of the model is to account for the presence of
conservers and non-conservers at each occasion, and for the possible
switches between the occasions. The results of fitting this constrained 16component mixture model are largely in line with expectation. Again we
found that the estimated means resembled the expected values, and that the
standard deviations were relatively small. The means in the conserver
component deviated somewhat from the expected values at occasions 1 and
2. The standard deviations are larger, as expected. We found that the within
component correlations (i.e., between the occasions) agreed well with
expectation, although the correlations were significant only at occasion 2 and
3. These correlations are a interesting source of information concerning the
MULTIVARIATE BEHAVIORAL RESEARCH
95
C. Dolan, B. Jansen, and H. van der Maas
nature of the task requirements within a given component. These
correlations are consistent with the presence of systematic individual
differences in the (visual) ability that is required to generate a response, given
that these children do know the correct response.
The model in which conserving was an absorbing state was found to be
untenable. Switching from conserving (i.e., understanding the correct
response) to non-conserving (producing an incorrect response) may seem
strange.
However, this behavior is known to occur (Thomas &
Hettmansperger, 2001; Thomas, Lohaus, & Kessler, 1999). Transitional
children may display such variation in their behaviors (van der Maas &
Molenaar, 1993). We did find that most subjects are in the components that
do not include switching from the conserver component to the non-conserver
component (components 1,2,4,8, and 16; see Table 3).
We did not attempt to fit a three state (component) transition model. A
third component may be required to account for irregular, transitional
behavior (see Thomas & Hettmansperger, 2001; Thomas, Kessler, &
Lohaus, 1999). Although a three state model includes 81 (34) components,
given 4 occasions, it is computationally feasible due to the many constraints.
Our main reasons for not doing so is the small sample size (N = 101) and the
extensive missing data (see Table 2). In addition, in modeling the same item
repeatedly, transitional behavior, in so far as it concerns inconsistent
responding, is accommodated by the components, which include switching
from the conserver to the non-conserver responses.
The model that we used to constrain the mixing proportion is a discrete
time stationary transition model. Markovian transition models have been
considered extensively as models of development (Brainerd, 1979; Collins &
Wugalter, 1992; Langeheine, 1994; Rindskopf, 1987; Thomas &
Hettmansperger, 2001; van de Pol & Langeheine, 1990). These models have
been applied mainly to discrete data. As is clear from our longitudinal results,
similar models can be applied to normally distributed responses using normal
mixture modeling. A variety of models may be considered, in addition to the
two that we fitted (Table 3, columns 3 & 4). As discussed by van de Pol and
Langeheine (1990), these include non-stationary transition models, moverstayer models (these include components for an excessive number of (non)
conservers), and multi-group models. Discrete data versions of the models
can be fitted using the PANMARK program (see van de Pol & Langeheine,
1990). Schmittmann, Dolan, and Neale (2003) have shown that the normal
versions of these models can be fitted using the Mx program (Neale, Boker,
Xie, & Maes, 1999).
96
MULTIVARIATE BEHAVIORAL RESEARCH
C. Dolan, B. Jansen, and H. van der Maas
References
Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on
Automatic Control, AU-19, 719-722.
Arminger, G. & Stein, P. (1997). Finite mixtures of covariance structure models with
regressors. Sociological Methods and Research, 26, 148-182.
Azzelini, A. (1996). Statistical inference based on the likelihood. London: Chapman and
Hall.
Boom, J., Hoijtinck, & Kunnen, S. (2001). Rules in the balance: Classes, strategies, or rules
for the balance scale task. Cognitive Development, 16, 717-736.
Brainerd, C. J. (1979). Markovian interpretations of conservation learning. Psychological
Review, 86, 181-213.
Collins, L. M. & Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic
latent variables. Multivariate Behavioral Research, 27, 131-157.
Dolan, C. V. & van der Maas, H. L. J. (1998). Fitting multivariate normal mixtures subject
to structural equation modeling. Psychometrika, 63, 227-253.
Everitt, B. S. & Hand, D. J. (1981). Finite mixture distributions. London: Chapman and
Hall.
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing.
Psychometrika, 44, 409-420.
Ghabramani, Z. & Jordan, M. I. (1994). Supervised learning from incomplete data via an
EM algorithm. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural
information processing systems 6. San Francisco: Morgan Kaufmann Publishers.
Gill, P. E., Murray, W., Saunders, M. A., & Wright M. H. (1986). User’s guide for NPSOL
(version 5.0-2) (Tech. Rep. SOL 86-2). Stanford, CA: Department of Operations
Research, Stanford University.
Hosenfeld, B, van der Maas, H. L. J., & van den Boom, D. C. (1997). Indicators of
discontinuous change in the development of analogical reasoning. Journal of
Experimental Child Development, 64, 367-395.
Jansen, B. R. J. & van der Maas, H. L. J. (1997). Statistical test of the rule assessment
methodology by latent class analysis. Developmental Review, 17, 321-357.
Jansen, B. R. J. & van der Maas, H. L. J. (2001). Evidence for the phase transition from rule
I to rule II on the balance scale task. Developmental Review, 21, 450-494.
Jansen, B. R. J. & van der Maas, H. L. J. (2002). The development of children’s rule use on
the balance scale task. Journal of Experimental Child Psychology, 81, 383-416.
Jedidi, K, Jagpal, H. S., & DeSarbo, W. S. (1997). STEMM: A general finite mixture
structural equation model. Journal of Classification, 14, 23-50.
Kemeny, J. G. & Snell, J. L. (1976). Finite markov chains. New York: Springer Verlag.
Langeheine, R. (1994). Latent variable Markov models. In A. von Eye & C. C. Clogg (Eds.),
Latent variable analysis. Applications for developmental research. Thousand Oaks:
Sage.
Lazarsfeld, P. F. & Henry, N. W. (1968). Latent structure analysis. New York: Houghton
Mifflin.
Little, R. J. & Rubin, D. B. (1989). The analysis of social science data with missing values.
Sociological Methods and Research, 18, 292-326.
McCutcheon, A. L. (1987). Latent class analysis. Beverly Hills, CA: Sage.
McLachlan, G. & Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons.
Muthén, B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika,
29, 81-117.
MULTIVARIATE BEHAVIORAL RESEARCH
97
C. Dolan, B. Jansen, and H. van der Maas
Muthén, B. & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the
EM algorithm. Biometrics, 55, 463-469.
Neale, M. C., Boker, S. M., Xie, G., & Maes, H. H. (1999). Mx: Statistical modeling (5th
edition). Richmond, VA: Authors.
Piaget, J. & Inhelder, B. (1969). The psychology of the child. New York: Basic Books.
Rindskopf, D. (1987). Using latent class analysis to test developmental models.
Developmental Review, 7, 66-85.
Rovine, M. J. (1994). Latent variables models and missing data analysis. In A. von Eye &
C. C. Clogg (Eds.), Latent variable analysis. Applications for developmental research.
Thousand Oaks: Sage.
Schmittmann, V. D., Dolan, C. V., & Neale, M. C. (2003). Discrete Markov models for
normally distributed response data: specification and fitting in Mx. Manuscript
submitted for publication.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Siegler, R. S. (1981). Developmental sequences within and between concepts. Monographs
of the Society for Research in Child Development, 46, No. 189.
Thomas, H. (1994). Mixture decomposition when the components are of unknown form. In
A. von Eye & C. C. Clogg (Eds.), Latent variable analysis. Applications for developmental
research. Thousand Oaks: Sage.
Thomas, H. & Hettmansperger, T. P. (2001). Modelling change in cognitive understanding
with finite mixtures. Applied Statistics, 40, 435-448.
Thomas, H. & Lohaus, A. (1993). Modeling growth and individual differences in spatial
tasks. Monographs of the Society for Research in Child Development, 46, No. 237.
Thomas, H., Lohaus, A., & Kessler, T. (1999). Stability and change in longitudinal water
level task performance. Developmental Psychology, 35, 1024-1037.
Thomas, H. & Turner, G. F. W. (1991). Individual differences and development in waterlevel task performance. Journal of Experimental Child Psychology, 51, 171-194.
van der Maas, H. L. J. (1993). Catastrophe analysis of stagewise cognitive development.
Unpublished doctoral thesis. University of Amsterdam, Department of Psychology.
van der Maas, H. L. J. & Molenaar, P. C. M. (1992). Stagewise cognitive development: An
application of catastrophe theory. Psychological Review, 99, 395-417.
van der Pol, F. & Langeheine, R. (1990). Mixed Markov latent class models. Sociological
Methodology, 20, 213-248.
Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate
Behavioral Research, 5, 329-350.
Yung, Y. F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika,
62, 297-330.
Accepted May, 2003.
98
MULTIVARIATE BEHAVIORAL RESEARCH