Latent Variable Theory

Measurement, 6: 25–53, 2008
Copyright © Taylor & Francis Group, LLC
ISSN 1536-6367 print / 1536-6359 online
DOI: 10.1080/15366360802035497
Latent Variable Theory
Denny Borsboom
University of Amsterdam
This paper formulates a metatheoretical framework for latent variable modeling.
It does so by spelling out the difference between observed and latent variables.
This difference is argued to be purely epistemic in nature: We treat a variable
as observed when the inference from data structure to variable structure can be
made with certainty and as latent when this inference is prone to error. This
difference in epistemic accessibility is argued to be directly related to the datagenerating process, i.e., the process that produces the concrete data patterns on
which statistical analyses are executed. For a variable to count as observed through
a set of data patterns, the relation between variable structure and data structure
should be (a) deterministic, (b) causally isolated, and (c) of equivalent cardinality.
When any of these requirements is violated, (part of) the variable structure should
be considered latent. It is argued that, on these criteria, observed variables are rare
to nonexistent in psychology; hence, psychological variables should be considered
latent until proven observed.
Key words: latent variables, measurement theory, philosophy of science, psychometrics,
test theory
In the past century, a number of models have been proposed that formulate
probabilistic relations between theoretical constructs and empirical data. These
models posit a hypothetical structure and specify how the location of an object in
this structure relates to the object’s location on a set of indicator variables. It is
common to refer to the hypothetical structure in question as a latent structure and
to the indicator variables as observed variables. In general, models that follow
the idea set forth above are called latent variable models.
There are several kinds of latent variable models, which are often categorized
in terms of the types of observed and latent variables to which they apply.
If the observed and latent variables are both continuous, then the resulting
model is called a factor model (Jöreskog, 1971; Lawley & Maxwell, 1963;
Correspondence should be addressed to Denny Borsboom, Department of Psychology, University
of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam. E-mail: [email protected]
26
BORSBOOM
Bollen, 1989); if the observed variable is categorical and the latent variable
is continuous, then we have an Item Response Theory (IRT) model (Rasch,
1960; Birnbaum, 1968; Hambleton & Swaminathan, 1985; Embretson & Reise,
2000; Sijtsma & Molenaar, 2002); if the observed and latent variables are both
categorical, the resulting model is known as a latent class model (Lazarsfeld &
Henry, 1968; Goodman, 1974); and if the observed variable is continuous while
the latent variable is categorical, then we get a mixture model (McLachlan &
Peel, 2000), which upon appropriate distributional assumptions becomes a latent
profile model (Lazarsfeld & Henry, 1968; Bartholomew, 1987).
However, various mixed forms of these models are possible. For instance,
at the latent level, one may have several distinct systems of continuous latent
variables that themselves define latent classes (Lubke & Muthén, 2005; Rost,
1990), and at the observed front these models may also relate to a mixture of
categorical and continuous observed variables (e.g., Moustaki, 1996; Moustaki &
Knott, 2000). In fact, any model that relates some kind of latent structure to an
observed structure could be called a latent variable model; and the possibilities
regarding the dimensionality and form of these structures are endless, as is the
number of functions that can be used to relate one to the other.
Like most statistical techniques, latent variable modeling is not an isolated
statistical number crunching endeavor but part of a research procedure embedded
in a set of more or less closely associated ideas, norms, and practices regarding
the proper treatment of data in scientific research. The present paper represents
an attempt to articulate these ideas, by articulating a fitting metatheoretical
framework for latent variable modeling. To distinguish this framework from
latent variable models themselves, we may indicate it with the term latent
variable theory, which indicates that latent variable modeling is central to it and
at the same time emphasizes that the theory is broader in scope than the purely
statistical formulation of latent variable models.
WHAT BINDS LATENT VARIABLE MODELS?
Mathematically, latent variable models specify a generalized regression function
that can be written as f(E(X))=g(), where f is a link function, E is the expectation operator, X denotes a matrix of observed variables, is a latent structure,
and g is some function that relates the latent structure to the observed variables.
If, upon a suitable choice of f, the function g is linear, then the resulting family of
models is covered by Generalized Linear Item Response Theory (Mellenbergh,
1994). This is true for most of the models used in factor analysis and IRT.
By expanding the matrices X and to apply to series of observations made
at different time points, models for time series, like the hidden Markov model
(Rabiner, 1989; Visser, Raijmakers, & Molenaar, 2002) or the dynamic factor
LATENT VARIABLE THEORY
27
model (Molenaar, 1985), may be formulated; it is also possible to model interand intraindividual differences simultaneously (Hamaker, Molenaar, & Nesselroade, 2007).
Thus, a latent variable model is simply a model that relates the expectation
of observables to a latent structure through some regression function. However,
most people working in latent variable modeling have a strong intuition that
the group of latent variable models comprises a homogeneous structure, in the
sense that they have something in common that separates them from other
commonly used statistical models (say, analysis of variance or principal components analysis). It is, however, useful to note that no such delineation follows
from the mathematical structure of the model. Mathematically, all that is being
said in this structure is that the expectation of some set of variables is a function
of another set of variables, and it is difficult to say why this should specify a
latent variable model. In fact, if we should take the mathematical structure itself
to define latent variable models, then virtually all statistical techniques count
as latent variable models, because it is in the nature of statistical techniques to
specify a relation between the expectation of one set of variables and another
set of variables. Hence, on this basis a latent variable model would be indistinguishable from, say, analysis of variance, a technique that one intuitively feels
should not be included as a latent variable model. If one wants to explain what
binds latent variable models, an appeal to the mathematical structure of the
model does not do the trick.
Clearly, what makes a latent variable model a latent variable model is not
the mathematical structure that is being used to link different sets of variables.
Rather, the important feature of the regression function central to latent variable
models is that the left-hand side of the equation contains a set of observed
variables, whereas the right-hand side contains a latent structure. Hence, if we
insist on distinguishing latent variable models from observed variable models,
we need to make clear what this distinction consists in.
LATENT AND OBSERVED VARIABLES
What is the difference between latent and observed variables? The use of the
term latent, but especially the term observed, suggests that this distinction
is of an epistemological character. Observed variables are variables that are
somehow epistemically accessible to the researcher, whereas latent variables are
not epistemically accessible. It is customary to illustrate this distinction with
substantive examples. Thus, one says that IQ scores are recorded, but general
intelligence is not; hence general intelligence is a latent variable and IQ an
observed variable. However, in order to characterize latent variable models
generally, it is not sufficient to point to some illustrative examples. Neither is
28
BORSBOOM
it clarifying to give a tautological, and hence not very informative, characterization of latent variables, as is not uncommon in the literature, for instance when
scholars say that “a latent variable is a variable that is not directly measured,”
or “a latent variable is a theoretical construct,” or “a latent variable is a variable
that underlies the observations.” It is important to understand somewhat more
precisely what the distinction between observed and latent variables amounts to.
This matter is not entirely straightforward. The reason for this is not so much
that the concept of a latent variable, as a hypothetical structure of inter- or
intraindividual differences, is problematic, but rather that it is difficult to grasp
the idea that a variable might be observed. Take familiar examples of variables
that, in statistical analyses, are commonly conceptualized as observed variables,
such as sex or age. It is hard to uphold the idea that the distinction between
these variables and variables that are seen as latent, such as general intelligence,
lies in the fact that the first are observed whereas the second are not. In a strict
reading of the word observed, nobody can claim to have observed sex, length,
or age. These are theoretical constructs just as well as general intelligence is.
Age is not a concrete object, subject to our perceptual processes, like stones or
trees or people might be taken to be, but a theoretical dimension. Theoretical
dimensions do not fall in the category of observable things. Thus, when one says
that one has learned, upon interrogation of the twins John and Jane, that John
is 15 minutes older than Jane, one cannot claim to have thereby observed the
variable age. On the basis of one’s observation of John and Jane, one has made
an inference regarding their relative positions on the dimension age, but it is not
thereby true that one has observed age itself.
It seems that, while the distinction between latent and observed variables has
something to do with a difference in the epistemic accessibility of such variables,
it is not literally a distinction between observable and unobservable things. So the
distinction between latent and observed variables does not parallel the distinction
between commonly discussed observables and unobservables like, say, rocks
and quarks. The interesting thing is that the main difference here does not lie
at the front of latent variables (both the existence of general intelligence and
the existence of quarks involve a hypothesis on the structure of the world that
is indirectly tested by experimental data) but at the front of observed variables.
Age is not observable in the sense that concrete objects like rocks are but is
itself a rather abstract dimensional concept.
Thus, observed variables like age and latent variables like intelligence seem to
have more in common that one might think. This can be further illustrated when
we examine the context in which latent variable models are used. For instance,
at the university where I work, we have testing sessions in which the entire
population of first-year students fills in a substantial number of psychological
tests as a part of their curriculum. When I analyze the so obtained data for,
say, sex differences in extraversion, I will conceptualize the variable sex as
LATENT VARIABLE THEORY
29
an observed variable. Nobody will object to this practice. Strictly speaking,
however, I have not even myself observed the people in question, because I was
not present at the testing sessions, let alone their sex. What I observe when I
am doing data analyses is a column of ones and zeros that has the variable label
sex attached to it—nothing more. To make matters worse, I also have a column
that contains the scores on an extraversion test, and these have the variable label
extraversion attached to them, so in this sense there appears to be very little
difference with regard to the situation with the variable sex. Yet I am not inclined
to treat extraversion as an observed variable; rather I will treat the test scores as
indicators of a latent structure. Why is this?
It is tempting, in the context of justifying my discriminative practices
regarding the treatment of the variables sex and extraversion in data analysis,
to justify my policy on the grounds that extraversion is not identical to a test
score. This is consonant with the dominant opinion in the literature on testing:
Extraversion is a theoretical construct with surplus meaning over the test scores
(Cronbach & Meehl, 1955; Messick, 1989). Therefore I need to conceptualize
extraversion as a latent variable. Unfortunately, however, exactly the same line
of reasoning holds for the variable sex. What I observe is that, in the row that
corresponds to subject no. 2005967, the column sex contains the entry 1. This is
what I have to work with. But of course one’s sex is not identical to a number
occurring in the relevant entry in my data file any more than extraversion is, so it
also carries surplus meaning; and I need to infer the relevant property (e.g., being
male) from the data just as well.
Thus, following this line of reasoning, equating sex with the column of ones
and zeros in my data file is no more justified than equating extraversion with the
score on an extraversion test. Therefore, we must draw two conclusions. First,
it is misleading to take the distinction between latent and observed variables
literally, because observed variables are not at all observed in the normal sense
of the word. Second, the difference in handling observed and latent variables
in actual data analysis cannot be defended by referring to the surplus meaning
of latent variables as theoretical constructs, for observed variables carry such
surplus meaning just as well. Moreover, in both cases it is necessary to make an
inference from an observed data pattern to an underlying property. It seems that
the distinction between latent and observed variables, which appears to be crucial
in distinguishing latent variable modeling techniques from statistical techniques
in general, has to be justified differently.
EPISTEMOLOGICAL PLANES
Both in the case of latent and of observed variables, the researcher makes an
inference from observed data patterns to the conclusion that objects have or do
30
BORSBOOM
not have a certain property (e.g., being of above average intelligence; being
male). However, there is an important distinction between these inferences:
Namely, the probability of an erroneous conclusion on the basis of data is judged
differently. That is, if sex is conceptualized as an observed variable, then the
researcher assumes that if, say, 1 is observed in the column that codes for sex, it
is certain that the corresponding person is male. For a latent variable this is not
the case. Upon observing an IQ score of 120, the researcher does not assume that
it is certain that the person has above average intelligence; the person in question
may, for instance, have been fortunate in guessing the right answers. Thus, the
inference to the person’s property (i.e., having above average intelligence) is, at
most, probable given the data but not certain.
It seems to me that the terminology of latent and observed variables codes
precisely this distinction. When we treat a variable as observed, we mean nothing
more than that we assume that the location of a person on that variable can be
inferred with certainty from the data. When we treat a variable as latent, we mean
that the inference in question cannot be made with certainty. It is important to
see that this formulates the distinction between latent and observed variables as a
purely epistemological distinction. As such, this distinction is partly a function of
the observer. Thus variables are not inherently latent or observed. They can only
be latent or observed with respect to the data at hand, or, in other words, with
respect to the observer and his or her measurement equipment. There is no need,
however, to assume that the distinction between latent and observed variables
is an ontological distinction between different kinds of variables. Thus, in this
view, latent variables are no more of a mystery than observed variables (although
observed variables are considerably more mysterious than commonly supposed).
The relativity of the predicates observed and latent is graphically illustrated
in Figure 1. Figure 1 shows a data plane, which contains strictly observable
data structures (appropriately arranged strings of zeroes and ones), and a world
plane, which represents the structures in the world to which inferences are to
be made (say, “Ed is male”). Such inferences are represented as arrows that run
from the data plane to the world plane. If an inference is certain, the connection
between the data plane and the world plane is drawn as a continuous arrow; if
it is not certain, the connection is drawn as a dotted arrow. Thus, in the figure,
data structure x gives a certain inference to . Hence x is an observed variable
with respect to . Data structure y gives a certain inference to , whereas z does
not give a certain inference to . Hence is observed with respect to y but not
with respect to z.
Now consider two researchers, John and Jane. The epistemic accessibility of
the world for these researchers is represented by their epistemological planes,
which are drawn inside the data plane in Figure 1. John has access to data
structures x and y, but Jane has access to only z. Thus, is an observed variable
for John, but it is a latent variable for Jane. Hence, in this scheme of thinking,
LATENT VARIABLE THEORY
31
Epistemological planes
y
data
world
x
ξ
z
ζ
John’s epistemological plane
Jane’s epistemological plane
FIGURE 1 The structure of the epistemic process. Boxes represent observed data structures
that sustain certain (continuous arrows) or uncertain (dotted arrows) inferences to variables
(represented as spheres).
the predicates latent and observed are relative to the epistemological state of the
researcher. A variable that I consider latent, given the data structures available
for me, may be considered observed by somebody who has access to different
data structures. This is consonant with the views of Bollen (2002), who takes
the view that variables that are considered latent today may, upon improvement
of measurement procedures, be considered observed in the future.
Now the term relative as it is used here must be interpreted with caution,
for it has a tendency to activate associations with relativist schemes of thinking
in the philosophy of science, which are in the present context inappropriate.
Relative here does not mean that the researcher is justified in treating variables
as observed or unobserved as he or she pleases. It means that the sentence “ is
observed for John” may be true, whereas “ is observed for Jane” may be false,
even though we are talking about the same structure . It is evident, however,
that this observer dependence does not imply that the question whether these
statements are true for any given observer is a relative issue.
Thus, although a variable may be latent for one person and observed for the
next, it could be defended that whether it is latent or observed for any given
person is a matter of fact. This is intuitively plausible. A researcher who states
that she treats sex as a latent variable (meaning that the inferences from the
data structure to the variable structure are epistemically certain) will be given
the benefit of the doubt in most situations. But a person who claims to have
observed intelligence will be considered as either methodologically naive or a
scientific con artist. It appears that, apart from the knowledge that the researcher
32
BORSBOOM
has on how to make inferences regarding variables on the basis of data patterns,
there is also an objective difference between the state of affairs in these two
situations that determines whether the variables can be treated as observed or
not. The question then becomes, what is this difference?
CAUSAL STRUCTURES AND EPISTEMOLOGICAL
ACCESSIBILITY
One answer that suggests itself is that the degree of epistemological accessibility
is a function of the causal structure that gives rise to the observations. Such
a view would hold that a variable can be taken as observed if the patterns in
the data structure have the proper causal antecedents. The question then is what
these proper causal antecedents are.
Some basic notation and definitions are required to explicate the problem
situation. The researcher observes data patterns in his or her data file (e.g., John
has produced data pattern 010010; Jane, 100101). Define the equivalence relation
∼ with respect to the data patterns D. The notation Da ∼Db then means that two
objects a and b have produced the same data pattern. The equivalence relation
is reflexive (Da ∼Da is true), symmetric (if Da ∼Db , then Db ∼Da ), and transitive
(if Da ∼Db and Db ∼Dc , then Da ∼Db ) and therefore partitions the objects into
equivalence classes that make up the elements of the data structure. It is assumed
here that the same construction can be made for the variable structure (i.e., for
any three objects a, b, and c having variable levels a , b , and c , it is true that
[a] a ∼a ; [b] if a ∼b , then b ∼a ; and [c] if a ∼b and b ∼c , then a ∼c ).
Call this system the variable structure. The variable structure may be much richer
than this (e.g., when one assumes that the levels of the variable are ordered,
or quantitative), but the existence of equivalence classes would seem to be the
minimal requirement for present purposes. The assumption that the measured
variable has a definite structure requires that such a thing as a variable structure
exists independent our observations; i.e., it involves realism about the structure
of the variable measured. It should be noted that this is not a metaphysically
innocent assumption (e.g., see Borsboom, 2005).
We thus have two sets of equivalence classes, one pertaining to the data
patterns (e.g., to strings like 100110) and the other to the variable structure.
The data patterns are what the researcher looks at on a computer screen when
doing the statistical analyses; the variable structure is inherent in the attribute
the researcher wants to measure, i.e., a structure that is “out there.” The question
before us is what the relation between the variable structure and the data structure
should be like for us to be justified in treating the structure as an observed
variable.
One option that suggests itself is that the causal chain (i.e., the measurement
process) that links the variable structure to the data structure be deterministic. In
LATENT VARIABLE THEORY
33
this view, we could imagine a mapping of the distinction between deterministic
and probabilistic epistemological relations to a corresponding distinction between
probabilistic and deterministic causal processes that give rise to the data. Thus,
if a variable stands in a deterministic causal relation to the observations in the
data structure, then the inferences from observations to that variable are also
deterministic, and hence, one might think, it can be treated as observed. I will
indicate the requirement that the data structure depends deterministically on a
variable structure as the requirement of determination.
This idea is consonant with the way observed variables are viewed in certain
latent variable modeling quarters. For instance, in the factor model, which has
continuous indicators and a continuous latent variable, setting the error variance
of an indicator variable equal to zero is equivalent to treating the factor as
an observed variable. In the logic of the factor model, this treatment could be
interpreted to involve the specification of a deterministic causal relation between
the factor and its indicators. Thus, observed variables can then be viewed as
limiting cases; they are latent variables that have been measured with perfect
reliability.
Although in a simple, unidimensional factor model such a view can be upheld,
I do not think that it will work in general. An immediate complication arises, for
example, in the case of multidimensionality. Suppose that two latent variables
conjointly determine the structure of the observations deterministically via some
function x = f (1 ,2 ). In this case, the value of 1 can be assessed from the
observed value of x if 2 is known, and the value of 2 can be assessed if
1 is known, but it is not possible to identify both at the same time from x.
Hence, in this case, a deterministic causal structure gives rise to the data, but
epistemological accessibility is limited; it is not possible to make inferences to
1 and 2 with certainty. Therefore, neither 1 nor 2 can be considered observed
with respect to the epistemological plane determined by x. That the generating
causal structure is deterministic is not a sufficient condition for treating a variable
as observed.
A plausible requirement that might deal with this problem could be called
causal isolation. For a variable to be observed, it must not only be related
to the data structure in a deterministic fashion but also be the only causally
relevant structure at work in producing variation in the data structure. This is the
philosophical pendant of the commonly made requirement of unidimensionality
in latent variable modeling. Thus, all the variation present in the data structure
must be uniquely determined by the variable structure. It could be argued, in
a worldview holding that every event has a deterministic cause, that causal
isolation implies determination: If the variable structure completely determines
the data structure, then there cannot be noise in the data, for this would mean
that something besides the variable structure is responsible for this noise, which
contradicts determination. In the present work, however, I will not confine myself
34
BORSBOOM
to a deterministic worldview, so as to leave open the possibility that there may
be genuinely probabilistic processes in nature. Hence I will treat causal isolation
and determination as distinct requirements.
Are determination and causal isolation sufficient for a variable to be
considered observed? It appears that this is not the case. Consider, for instance,
a Rasch model (Rasch, 1960), which has a unidimensional continuous latent
structure, dichotomous indicators, and a logistic item response function. Now
suppose that the indicators are improved so that whether they take the value 1
or 0 depends on the latent structure only, and that they do so in a deterministic
way. This means that the item response functions become step functions, i.e.,
we get Guttman’s (1950) model. In this case, the conditions of determination
and causal isolation are met. Nevertheless the variable structure cannot be taken
as observed. This is because the structure is much richer than the structure of
the observations: The variable structure is continuous, and therefore contains
infinitely many levels, whereas the data structure has only five possible data
patterns (0000, 1000, 1100, 1110, and 1111). Although inferences of the type
“John, who has pattern 1000, has a lower position on the latent variable than
Jane, who has pattern 1100” is not subject to error, inferences of the type “John
and James both have data pattern 1100, and therefore they have the same position
on the latent variable” cannot be made with certainty; hence, even though certain
aspects of the variable structure are observable here, part of it is still hidden
from our view.
For an observed variable, we need not only determination and causal isolation;
it should also be the case that the number of distinct data patterns is the same
as the number of distinct variable positions occupied by the objects that gave
rise to the data structure. Call this the requirement of equivalent cardinality. In
the case of the Guttman model discussed above, there will ordinarily be more
variable positions than data patterns, which means that the cardinality of data
structure and variable structure is not the same.
Now suppose the requirements of determination, causal isolation, and equivalent cardinality are met. What does this imply for the relation between the
variable structure and the data structure? First, equivalent cardinality means that
there are as many distinct data patterns as there are distinct variable positions
occupied by the objects that gave rise to the data structure. Second, determination means that the causal chain relating the variable structure to the data
structure is deterministic, so that there is no measurement error or noise. Third,
causal isolation means that the data pattern a given object produces is exclusively
dependent on that object’s position in the variable structure, which precludes
situations where the same data pattern can originate from distinct positions, as
might be the case in a multidimensional variable structure. If these conditions
are met, then this implies that distinct variable positions correspond to distinct
data patterns (the mapping of variable positions into data patterns is injective)
LATENT VARIABLE THEORY
35
and distinct data patterns correspond to distinct variable positions (the mapping
is surjective). An injective and surjective (hence, bijective) mapping constitutes an isomorphism; in this case, the data structure and variable structure are
isomorphic up to equivalence. Determination and causal isolation ensure that
the isomorphism exists for the right reasons, i.e., that the causal antecedents of
the isomorphism indeed involve the variable measured and do not arise as an
accident (as could, for instance, be the case if a column of data arose through
the tossing of a coin and happened to turn out such that it is isomorphic to that
variable). It seems plausible to me that, in this situation, the variable can be
considered observed.
Thus, in the case of observed variables, the data-gathering process must be
set up in such a way that distinct positions on the variable measured (e.g., being
male versus being female) translate into distinct positions in the data structure
(e.g., 1 or 0 in the relevant column of the data file) with no further residue; as
a result, there is a perfect one-to-one correspondence between the equivalence
classes of the data structure and those of the variable structure. Moreover, this
isomorphism exists “just like that,” i.e., does not require any additional activity
on part of the researcher: This is simply the way the data come in.
THE INTERNAL STRUCTURE OF ATTRIBUTES
The notion of observability as conceptualized above does not imply that all of
the relations present in the variable structure are epistemically accessible to the
researcher. For instance, suppose that the variable is quantitative (Michell, 1997)
and that the above conditions hold. In this case, the fact that different variable
positions map into unique data patterns is not enough to utilize the quantitative
nature of the variable in question. This requires additional steps on part of the
researcher; namely, numerical values have to be assigned to the objects that
produced the data patterns that preserve the quantitative structure of the variable
measured. To see why this is so, it is useful to realize that, even when the quite
strong conditions of determination, causal isolation, and equivalent cardinality
hold, the researcher still only has an isomorphism up to equivalence to work
with. As it stands, all that can be done with this is nominal measurement: the
allocation of objects into unordered categories.
Now, it is important to avoid the connotation of arbitrary labeling that
surrounds the idea of a nominal scale. In the present scheme of thinking, there
is no arbitrary labeling going on; whether or not two objects get the same label
is fully determined by the variable structure. Thus, if a psychologist is doing
diagnostic work by labeling some people as depressed and others as normal, this
is quite insufficient to speak of nominal measurement. One can speak of nominal
measurement only if (a) the variable structure is indeed made up of these two
36
BORSBOOM
categories, and (b) the psychologist’s diagnostic work (in which the psychologist
may act as the measurement instrument) is of such a character that it leads to data
patterns that are isomorphic with the variable structure (i.e., there is a perfect
correspondence between the variable structure and the data structure). One can
view this as a latent class model without error. Such a model will be quite hard
to realize in practice.
If the data at hand are to support more complicated research practices than
allocation to unordered categories, while the idea of a variable being observed is
to be retained, then the isomorphism between the data patterns and the variable
structure will have to extend beyond the preservation of equivalence classes
required for nominal measurement. The problem of explicating which relations
ought to hold among the objects in the population in order for stronger representations to be constructible has been taken up in the literature on axiomatic
measurement theory (Krantz, Luce, Suppes, & Tversky, 1971; Narens & Luce,
1986).
Note that the concept of homomorphism (many-to-one mapping), as utilized in
axiomatic measurement theory, applies to the relation between the set of objects
and the variable structure (many objects may occupy the same position on the
variable). The notion of observability as defined here requires an isomorphism
(one-to-one and onto mapping), not between the set of objects and the variable
structure, but between the variable structure and the data structure (each position
in the variable structure is uniquely associated with a data pattern in the data
structure and vice versa). Hence, in this sense, the present scheme of thinking is
consistent with axiomatic measurement theory but applies to a different part of
the measurement process. The deterministic nature of the models considered in
the representational measurement literature is congruent with the requirements
that were argued necessary to speak of an observed variable. The different levels
of measurement as introduced by Stevens (1946) and refined in Krantz et al.
(1971) can then be thought of as specifying the detail to which the variable
structure is mapped in the data structure.
One may note that the requirements made here are rather minimalist in the
sense that they do not imply full observability of the internal structure of the
variable. One might, for instance, counter against the presently proposed views
that a variable like length has internal relations (e.g., it sustains transitivity) that
are not necessarily observable under the requirements made here. For instance,
if one assigned randomly selected numbers to objects of different lengths, then
the variable length would still count as observed provided that one does this
consistently, and in a way that leads the numbers to causally depend on length
(i.e., such that objects of the same length receive the same numbers, and objects
of different length receive different numbers). However, the so constructed scale
would not have the properties of the scales we commonly use to measure length
(e.g., preservation of transitive relations, invariance up to multiplication by a
LATENT VARIABLE THEORY
37
constant). Clearly, if the variable figures in statistical analyses that place stronger
assumptions on this structure (e.g., when the variable is assumed to be linearly
related to some other variable), then more aspects of the variable structure must
be observed, that is, preserved in the data structure.
The reason for not making principled requirements on this score is that the
measurement level is a property of the scale in question, not of the variable
measured. This follows from the fact that one can measure a variable like length
on nominal, ordinal, interval, and ratio levels, so that a measurement level cannot
be uniquely attached to a variable. What one can say, however, is that some
variables allow for different scale levels than others; length is measurable on a
ratio scale whereas sex is not. This can be constructed as a dispositional property
of the variable in question (if appropriate methods were followed, the variable
could be measured on a ratio scale level), which has as its base (Rozeboom,
1973) the internal structure of the variable. In this scheme of thinking, a variable
could be said to be quantitative in the sense of Michell (1997) if it sustains such
a dispositional statement; the base of this disposition could then be constructed
to lie in the internal structure of the variable, as articulated axiomatically by
Hölder in 1901 for quantitative attributes (Michell & Ernst, 1996, 1997; see also
Michell, 1997, 1999).
It is important to note that the process of scale construction requires the
researcher to do more than just record data patterns; these commonly have to be
assigned function values in order to construct the type of isomorphisms required
for scales stronger than the nominal one. In such cases, the epistemological plane
is partly a function of the knowledge that the researcher has concerning how
to assign function values to different data patterns. In assigning such function
values, the researcher is actively expanding the data structure in order to achieve
a stronger isomorphism.
The methods of extensive measurement (Campbell, 1920; Krantz et al., 1971)
can be thought of as specifying ways to construct data structures for quantitative
variables that form one of the most powerful isomorphisms possible, namely a
ratio scale. Extensive measurement is based on experimental tests of the quantitative structure of the attribute (Michell, 1997, 1999) through concatenation.
Concatenation is the combination of two objects to form a new one; in the
case of length, for instance, by laying two rods end-to-end. When the attribute
measured combines in an additive fashion (so that the new rod’s length equals
the sum of the original rods’ lengths), it is straightforward to form a ratio scale
by repeated application of the concatenation operation (see Campbell, 1920,
p. 180, for a lucid description of this process). Extensive measurement is possible
for those physical attributes that can be concatenated (e.g., mass and length)
but has so far proven inapplicable to psychological attributes such as intelligence and extraversion. An alternative for establishing the quantitative structure
of variables, known as conjoint measurement (Luce & Tukey, 1964), can at
38
BORSBOOM
least in theory be used to establish quantitative scales without a concatenation
operation. However, this technique is rarely used in psychology (Michell, 1997,
1999). As a result, it is unknown whether variables that figure prominently in
psychological testing, such as intelligence or personality variables, can be taken
to have quantitative structure.
A quantitative variable structure is isomorphic to the real line (Michell, 1997),
and, of course, representing variables as lines is very common in latent variable
models with continuous variables (e.g., consider factors in a factor model). One
could, indeed, defend the thesis that such models assume that latent variables
have quantitative structure. However, it is important to see that this does not
mean that a variable like general intelligence is on equal footing with a variable
like length. There are at least two reasons for this. First, the requirements to
speak of an observed variable (as defined above) are not met in the case of
general intelligence (in the case of length they are at least approximately met for
middle-sized objects), so that the formation of equivalence classes on the basis
of observations is very hard: We do not know how to establish that two people
are equally intelligent, independently of looking at their test scores (in contrast,
we do know how to establish that two rods are equally long without using a tape
measure, namely by using our naked eye). Second, the quantitative structure of
length is an established fact, directly testable through concatenation, whereas the
quantitative structure of general intelligence is a hypothesis, which has not been
subjected to direct empirical tests (although one could defend the thesis that
such hypotheses are indirectly tested in latent variable modeling; Borsboom &
Mellenbergh, 2004).
Probabilistic Structures and Latent Variable Models
This is where we are now. A set of data patterns can be treated as an observed
variable if (a) the data patterns bear a deterministic causal connection to that
variable, (b) the variable in question is the only cause of variation in the measures,
and (c) the cardinality of the variable structure and data structure is the same. In
this case we have an isomorphism up to equivalence. This establishes nominal
measurement; if stronger scales are to be formed, this requires the demonstration
that stronger relations between the objects that form equivalence classes (e.g.,
greater than) exist and are preserved in function values assigned to these objects
on the basis of the data patterns they generated. This requires a sensible way
of assigning such function values—that is, scale construction. Whether such
stronger representations are possible depends on the internal structure of the
variable (e.g., whether it is quantitative) as well as on the resources of the
researcher (whether the researcher knows how to assign the function values).
The work in the representational theory of measurement (Krantz et al., 1971)
treats this problem in detail.
LATENT VARIABLE THEORY
39
In cases where we measure a variable, while one or more of the above requirements are violated, we should consider that variable latent. In principle, this
may occur either because determination is violated (as in probabilistic models,
like the Rasch model), because causal isolation is violated (as in multidimensional deterministic models), or because equivalent cardinality is violated (as in
a unidimensional deterministic IRT model, like the Guttman model). The most
extensively studied case is the case in which the relation between the variable
structure and the data structure is probabilistic; hence I will consider this case
in some detail.
When the measurement process is assumed to be probabilistic, this means that
shifting along the levels of the variable affects not the data patterns themselves
but the probability with which they arise. Thus, in this case, the researcher cannot
map the response patterns D to the variable structure. It is, however, possible to
map the probability of the response patterns, P(D), to the variable structure. This
is what most latent variable models currently in use do. The exact mapping is
given by the function that relates the response probabilities to the latent structure
(i.e., the item response function). Ideally, this function should be determined by
substantive considerations on the relation between the latent structure and the
response process (see Tuerlinckx & De Boeck, 2005, or Dolan, Jansen, & Van
der Maas, 2004, for some examples); in practice, however, the choice is often
one of mathematical convenience.
To accommodate for the fact that in the context of measurement (rather
than, for instance, prediction) the measured variable must have causal relevance
for the observations (see Borsboom, Mellenbergh, & Van Heerden, 2004), a
nondeterministic notation of causality can be adopted. Various schemes for
conceptualizing probabilistic causality exist and may be used for this purpose.
The notions of causality closest in spirit to common practices in latent variable
modeling are those of Pearl (2000) and Spirtes, Glymour, and Scheines (2000),
in which causal relations are represented in graphs, and tested via the conditional
independence relations that they imply. In this literature, many latent variable
models (namely all those that are unidimensional, have multiple indicators, and
satisfy local independence) would be classified as (unobserved) common cause
models. Common cause models are characterized by the fact that the common
cause “screens off” covariation between its effects: If X is the common cause of
Y and Z, then Y and Z must be conditionally independent given X. In the latent
variable modeling literature, essentially the same requirement is known as local
independence (i.e., the indicators are assumed to be statistically independent
conditional on the latent variable).
In the situation where the relation between data patterns and the variable
measured is probabilistic, the causal isolation requirement may be satisfied in
modified form. Naturally, variation in the latent variable cannot be the cause
of all variation in the data patterns (otherwise it would bear a deterministic
40
BORSBOOM
relation to the data). However, it is often taken to be the only cause of systematic
variation in the data; that is, the probabilities P(D) depend only on the latent
variable. In the latent variable literature, this notion is known as the assumption
of unidimensionality. However, this assumption is not strictly necessary, as
the existence of multidimensional IRT models and multiple factor models with
cross-loadings testifies.
The equivalent cardinality requirement is normally violated in probabilistic
models, because the number of latent variable positions is usually different from
the number of distinct data patterns. For instance, in an IRT model that number
is greater (because the latent variable is continuous, whereas the observations are
categorical), and in a latent profile model it is smaller (because the observations
are continuous, whereas the latent structure is categorical). Also, due to the
probabilistic structure of latent variable models, it is possible that objects with
different positions on the latent variable obtain the same response patterns and
that objects with the same position on the latent variable obtain different response
patterns.
An interesting question that arises on the present viewpoint is how to
determine the scale level of variables in the case of a nondeterministic
measurement structure. In the latent variable modeling literature, it has usually
been assumed that the scale level is determined by the class of transformations
that leave the empirical predictions of the model invariant (i.e., the probabilities
assigned to data patterns; see Fischer & Molenaar, 1995; Perline, Wright, &
Wainer, 1979; see also Kyngdon, 2008; Borsboom & Zand Scholten, in press).
This is in accordance with the fact that latent variable models construct a mapping
between the variable structure and the probability of data patterns, rather than
between the variable structure and the data patterns themselves. So, for instance,
the empirical predictions of the Rasch model are invariant up to linear transformations of the parameters in the model, and these parameters are therefore
considered to lie on an interval scale.
It can be doubted whether this is in keeping with the intended definition
of scale levels, as for instance utilized in Krantz et al. (1971); in these works,
the isomorphism that should be preserved is one between the actual function
values assigned to objects and an empirical relational system in which they play
their part (the empirical relational system could be taken to constitute a variable
structure in the terminology used in this paper). In latent variable models, the
actual function values assigned are model-based estimates. A relevant question
would therefore seem to be under which class of transformations of these actual
function values the isomorphism between the variable structure and the data
structure is preserved. But the immediate answer to this question is “never.”
The reason is that one can preserve an isomorphism only if that isomorphism
is present in the first place. And under a latent variable model, there cannot
be an isomorphism between the data patterns (and hence the function values
LATENT VARIABLE THEORY
41
assigned to them) and the variable structure; in fact, as has been argued in this
paper, the fact that no such isomorphism exists is exactly what necessitates the
use of a latent variable model. Therefore, the question of what the scale type of
variables is in case of a nondeterministic measurement structure is, as far I can
see, open to investigation; in fact, it is not entirely clear to me that the concept
of scales types, interpreted in this particular sense, applies to a nondeterministic
measurement model.
In scientific research, but especially in sciences like psychology, the
assumption that variables are observed is often too strong. In such cases, dropping
this assumption is plausible given the substantive context. However, this is not a
free lunch; it should be noted that dropping the observability assumption saddles
one with serious problems. First, the inference from data patterns to variable
positions is no longer automatic; second, such inferences can only be made under
the assumption that a particular probabilistic model generated the data, and the
number of candidate models is basically infinite, so one has to choose among
them; third, the structure of the latent space could be entirely different from that
of the observations (e.g., the observations are continuous while the latent space
is categorical or vice versa) and there is no easy way of figuring out what it looks
like. Of course, there are many ways of attacking these problems—in fact, how
best to do this is what much of the work in latent variable modeling is about;
relevant topics are model specification, identification, parameter estimation, and
model selection. Nevertheless, it is worth noting that giving up the observability
assumption means that hard work is going to have to be done.
Some Conceptual Problems of Latent Variable Theory
It has been argued here that what differs across situations where one is inclined
to use a latent variable model, and situations where one is not, is the degree to
which one can plausibly assume the inferences from data structure to variable
structure to be free of error. This view is predicated on a categorical distinction
between the structure of data and the structure of variables. In all cases, whether
labeled observed or latent, an inference from data structure to variable structure
is required. This inference has to be based on some theory concerning what is
often called the data-generating process.
Now it is clear that, according to the present scheme of thinking, the primary
causal agents that are supposed to figure in such a theory are variables. This
means that we must grant a serious ontological status to variables; they are
supposed to exist, have some definite structure, and be causally relevant to the
data. However, there are several problems in assigning this role to variables
(some of which have been discussed by Borsboom, Mellenbergh, & Van Heerden,
2003). Most of these problems follow from the abstract character of variables.
It is useful to discuss these issues to show that they are not detrimental to the
theoretical framework proposed here.
42
BORSBOOM
The abstract nature of variables. To see why the abstract nature of
variables is problematic, it is useful to note that variables, if they are taken to
exist, are not, properly speaking, localized in space or time. People vary in age,
but although the people to which different ages are ascribed may be localized
in space and time, their ages are not. For instance, on October 1, 2007, at
approximately 12:28 local time, I am writing this sentence at the approximate
coordinates (52 23’ N, 4 55’ E); my age at the present time is approximately
33 years and 10 months. This localization means that a traveler who happened
to walk into my office right now would encounter a human being by the name
of Denny Borsboom. But the traveler will not find Denny Borsboom’s age at
these coordinates. Neither will it help the traveler to continue his journey to my
home address or place of birth. Although this will allow him to pick up various
clues as to the property in question, he will not encounter my age there either.
There is, in fact, no place in the universe where the traveler will stumble across
an object and be able to exclaim, “Aha! Finally, there it is! The age of Denny
Borsboom!”
This conceptual impossibility applies, as far as I can see, to all variables as
utilized in scientific theories. It does not really matter whether these are the
subject of currently accepted scientific theory or not. To be sure, one can say that
the cup of coffee standing at my desk right now has a certain mass. But it would
be strange to say that the cup’s mass is itself located on my desk. The situation
is even more complicated when we consider variables in their generic form,
rather than specific instances of their levels. Although the sort of language abuse
implied by a statement to the effect that the particular mass of this particular
cup of coffee is located on my desk may be considered a proverbial mode of
speech, stating that the variable mass is located somewhere in the universe is
beyond the tolerable bounds of absurdity.
This lack of localization also applies to psychological variables. Extraversion,
general intelligence, spatial ability, attitudes, and self-efficacy are not in people’s
heads. When we open up a person’s head, we find a sort of gray jelly, not
psychological variables like general intelligence. It is to me a somewhat absurd
idea that we may, in the not too distant future, localize general intelligence
in the brain. This has nothing to do with the fact that general intelligence is
a psychological rather than a physical thing. It has to do with the fact that it
is a variable and not a concrete object; as such it has an inherently abstract
character—just like length, age, and volume do—that precludes it being localized
anywhere, and hence it cannot be localized in the brain either. The most that
reductionists can hope for when it comes to the reduction of psychological
variables is that data patterns pertaining to, say, the number of neurons in a
person’s head and data patterns pertaining to the performance on psychological
tests will conform to a unidimensional measurement model, so that both of these
data structures could be taken to depend on the same variable. In that case
LATENT VARIABLE THEORY
43
the variable that gives rise to variation in psychological test scores would be
identical to the variable that gives rise to variation in, say, counts of the number
of neurons in people’s heads. Note, however, that even in this case the variable
itself would not be in anybody’s head.
This issue should not be confused with the fact that between-subjects attributes
and dimensions are not the same as within-subjects attributes and dimensions, a
point made in, among others, Borsboom, Mellenbergh, and Van Heerden (2003,
2004); Borsboom and Dolan (2006); Cervone (2005); Hamaker, Molenaar, and
Nesselroade (2007); and Molenaar (2004). Within-subjects attributes also refer
to an abstract structure, be it one that describes variation across time points
rather than across subjects. Although we call such dimensions intraindividual or
within-subjects dimensions, this use of language should not be taken to mean that
they are literally inside persons. Strictly speaking, at each time point a person
can only be said to occupy one of the levels of such a variable. Such variables
are person-specific but not inside the person in any physical sense.
The causal relevance of variables. The question that now occurs is how
such an abstract, nonlocal thing as a variable can have causal effects. Causal
effects are often taken to describe a relation between events. One event happens
and then necessitates the occurrence of another event. If a number of such events
are coupled, we speak of a causal chain. A variable, however, is not an event
and hence cannot enter directly in such a causal chain. Thus, a different way of
thinking about this issue has to be found. There are several ways in which this
matter can be construed.
First, one may think about variables as describing structural differences in
properties across different individuals, across time, or both. These properties,
which are not variables but may be seen as levels of a variable, are then attached
to individuals’ objects at a given time point. The structure of a variable derives
from differences in these properties. Thus, in this view, length (in the abstract)
is not a property of an object, but “being 7 inches long” is. Although such a
property is not an event, the measurement procedure does consist of a sequence
of events that lead to a given data pattern, and this sequence may be interpreted as a causal chain. The property of being 7 inches long could be viewed
as a parameter in the model that describes this sequence of events. Because
different objects have different values for this parameter, they get different
data patterns. Depending on the precise nature of the causal chain, these data
patterns may sustain various measurement levels. A causal role for the variable
measured can then be construed, because the measurement procedure sustains
counterfactuals of the type “if this object had been shorter, the measurement
procedure would have led to a different data pattern”; these can, for instance,
be interpreted in terms of possible-world semantics (Lewis, 1973; Kripke,
1980).
44
BORSBOOM
A difficulty with this approach is that it reifies the scale level. That is,
in interpreting “being 7 inches long” as an inherent property of an object,
independent of any other objects that exist and independent of the details of the
observational procedure, the scale level is disconnected from our scaling efforts
and viewed as a feature of nature rather than as a result of our activities. I find
it difficult to believe, however, that such a property actually exists independent
from our measurement efforts. Also, the statement is meaningless unless there is a
unit of measurement corresponding to the word inch, and this unit is not absolute
but a matter of convention. Conventions, of course, are not very promising
candidates as ingredients of reality.
A second way of construing the issue is by taking the variable structure as a
primitive and deriving properties, like “being 7 inches long,” from an object’s
place in this structure in conjunction with our scaling activities, including any
conventions that may be coupled to these activities. The advantage of such a
view is that the concept of “place in a structure” is relational by definition. This
means that the same object can occupy a different place in different structures
(for instance, between-subjects structures of individual differences versus withinsubjects structures of change in time). Also, there is no need to reify the scale
level. Nevertheless, given that the object does occupy a certain place in the
variable structure, and that a given measurement procedure has been followed,
and that certain scaling conventions are in place, the resulting outcome value
is fixed. Similarly, if the object had occupied a different place, and the same
measurement procedure had been followed, and the same scaling conventions
had been in place, then the resulting outcome value would have been different.
This appears to me a prima facie plausible way to construct the reality and causal
relevance of variables.
To sustain counterfactuals like those proposed above, the measurement
procedure must have an element of lawfulness. The reason is that such
counterfactuals (e.g., “if this object had occupied a different position on the
variable, we would have observed a different measurement outcome”) involve
a thought experiment that considers what would have happened in a world that
is different from the one we inhabit. In order for the outcome of such a thought
experiment to be definite, there has to be a lawful relationship between the
parameter varied (e.g., the mass of the object) and the consequences of such
variation (a different measurement outcome). If there is no such lawfulness, the
outcome of counterfactuals is indeterminate. Thus, measurement as conceptualized here involves a lawful relation between the variable measured and the
measurement outcome. A measurement model can be considered to spell out the
structure of such a law.
There is an important consequence of this analysis. Namely, for data patterns
to count as measures of some variable structure, there must be a causal law
that connects positions in the variable structure with the values of measurement
LATENT VARIABLE THEORY
45
outcomes; however, that law does not describe a causal system at work in the
individual objects subjected to the measurement process. The causal system
consists of variables connected by parameters; the individual objects occupy
places in the variable structure that pertains to them. But, just like variables
themselves are not in the individuals that occupy their levels, the causal system
is not inside the individuals measured. What is required from the object measured
is that it behaves in accordance with the causal system, not that the object
contains, or is itself, the causal system. Exactly the same holds for intraindividual
variation and the causal relations that govern it. The system describes how the
individual varies over time; it is true of the individual but is not located in the
individual.
Pragmatics and the context of explanation. Causal relations are tied to
explanations. When we ask why something happened, we expect an answer that
subsumes the event under a set of general causal laws that explain its occurrence.
There is a natural tendency to ask of such an explanation that it is true. Truth,
however, is exclusive in the sense that most people believe that there cannot be
two distinct true causal explanations of a phenomenon. Thus, if a true explanation
of a phenomenon has been given, there is no room left for another one.
This view, however, leads to problems that result from the pragmatic character
of why-questions (Van Fraassen, 1980). Consider the following example. As the
phenomenon to be explained, we take a penalty missed by Dutch soccer player
Frank de Boer during the penalty shootout in the semifinal of the 2000 European
Championship. Suppose that we ask why De Boer missed this penalty. Is there
a single correct causal explanation that answers this question? A moment’s
reflection shows that this is not the case. To see why this is so, consider the
following specifications of the question: (1) Why did Frank de Boer miss this
penalty (rather than a given other penalty), and (2) why did Frank de Boer miss
the penalty (rather than other players in the shootout like, say, Patrick Kluivert,
who scored)?
Both of these questions are perfectly bona fide requests for a causal explanation, but the answer given need not be the same. Any answer to question
1 will seek to delineate the circumstances that set apart this penalty from the
many others taken by De Boer. For instance, in this case he had already missed
a penalty earlier in the match, which is likely to be included in the required
explanation. The answer to question 2, in contrast, will seek to delineate the
differences between Kluivert and De Boer at that particular moment in time; say,
Kluivert was handling the pressure better than De Boer. These two explanations
introduce distinct dimensions of variation: having missed a previous penalty
versus not having missed a previous penalty for question 1, and being able to
handle the pressure versus not being able to handle the pressure for question 2.
Thus, the causal systems these answers invoke do not include the same variable
46
BORSBOOM
structures.1 Nevertheless, they are plausible causal explanations that involve one
and the same event.
The same situation occurs in psychological measurement. Suppose that John
has correctly answered an item in an IQ test, and we ask for the explanation
of this event. When we consider the question why John answered the item
correctly, while Jane did not, we will make reference to dimensions of individual
differences between John and Jane. But when we ask why John answered the
item correctly this time, while he failed it a few years ago, we will invoke John’s
pattern of development over time. These two distinct causal stories may both be
true, even though they explain one and the same event.
We may conclude that the occurrence of a data pattern in itself does not
have a unique causal explanation; the question “why did data pattern D arise?”
is ambiguous because it does not articulate a contrast class of alternatives.
Is this a problem for the presently articulated view? I suggest it is not. The
reason for this is that the type of causal relations that is important in a
measurement context should depend on the specification of a contrast class of
alternatives because that class of alternatives is constitutive of the variable to be
measured.
To see this, consider the following example. A match is lit in a wooden house,
and this causes the house to burn down. There are many causal stories to be
told about this situation depending on the contrast classes of alternatives that
are chosen. With respect to a class of alternative wooden houses in which no
match was lit, lighting versus not lighting the match is identified as the causally
relevant variable. But with respect to a class of alternative houses in which a
match was also lit, but where the houses did not burn down due to the fact that
they were built of concrete rather than wood, the causally relevant variable is
not whether the match was lit but whether the house was made out of concrete
or wood. So, in the first population of houses, the observable data patterns
house burned down/not burned down can be considered to measure the variable
match was lit/match was not lit. In the second population of houses, the same
data patterns can be considered to measure the variable house was made of
wood/house was made of concrete. Finally, if the intention is to measure change
over time, we may consider the observation that the house burned down at time t
as an indicator of a transition that took place at an earlier time t’ (i.e., the match
was lit).
Thus, the same data patterns can measure different variables depending on
the contrast class of alternatives that they attach to. By picking out a given
contrast class, the researcher specifies the domain of variation of the attribute to
1
One may think that these explanations may be merged into one by introducing the additional
hypothesis that De Boer was handling the pressure less well (answer 1) because he had missed a
penalty before whereas Kluivert had not (answer 2). This, however, does not work because it so
happens that Kluivert had also missed a penalty earlier in the match.
LATENT VARIABLE THEORY
47
be measured through a given measurement procedure. Thus, any single observed
data pattern is “polygamous” in that it can serve as a measure of distinct variable
structures depending on the research context.2 This explains, for instance, why
it is possible to use one and the same test score (say, a person’s performance
on a digit span test) as an indicator of individual differences in one context
(e.g., when one studies the factorial structure of working memory capacity tests
by examining their covariance structure), as an indicator of the effect of experimental manipulations in another (e.g., when one studies the effect of interference
on test performance by inspecting mean differences across conditions), and as an
indicator of changes in cognitive functioning in studying development (e.g., when
one examines a time series of repeated administrations of the test for the same
person at different times).
In psychological research, the most relevant domains of variation are variation
over time, variation over people, and variation over situations. It is important
in empirical research that the utilized domain of variation matches the purposes
of the researcher. For instance, one should not expect data on interindividual
differences to be informative of the structure of intraindividual processes unless
there is an explicit rationale to justify such an expectation (which may involve
very strong assumptions; see Molenaar, 2004). Nor should one take the stability
of individual differences in, say, personality test scores over time to be indicative
of consistency of behavior over different situations (Mischel, 1968). In general,
one should be very careful when making inferences to domains of variation that
were not themselves sampled in the research setup.
IMPLICATIONS FOR PSYCHOLOGICAL RESEARCH
Latent variable modeling is rising in popularity but nevertheless is not a standard
tool in many areas in psychology. This is a fact that continues to surprise me:
One would expect that, in a field so plagued by measurement problems as
psychology, the use of latent variable modeling would be a common instrument
to get at least some grip on the relation between the data and the variables
one intends to measure, if only to determine whether one can get the job done
without a latent variable model (for instance, because under reasonable model
assumptions the sumscore is good enough for research purposes). But this is not
2
This should not be taken to mean that, to some degree, everything measures everything else,
as for instance a correlational conception of validity would imply (Borsboom, Mellenbergh, & Van
Heerden, 2004). When treating an observed data pattern as a measure of a variable, that variable has
to be causally relevant to the occurrence of the data pattern. However, the example does show that
what can be taken as causally relevant depends on the selection of a contrast class, i.e., is dependent
on the comparison in which the data pattern figures; and this may vary over studies depending on
the details of the investigation.
48
BORSBOOM
the case. Instead, researchers use all sorts of procedures to construct variables,
which are subsequently treated as observed; and these procedures involve an
awkward number of arbitrary decisions and unclear assumptions (e.g., see
Borsboom, 2006).
To give one example, in psychology, the task of specifying a structure
for psychological attributes, or of testing hypotheses concerning that structure,
is not widely perceived as a challenge for empirical researchers. Michell
(1997, 1999) has called attention to this problem by exposing the fact that
psychological attributes like extraversion of intelligence are often considered
to have quantitative structure, even though no serious theoretical motivation or
empirical backup for this assumption exists. The converse problem, which is that
attributes are assumed to be categorical where they might as well be continuous,
occurs frequently in psychiatric research; there, the categorical structure of
the diagnostic categories in the Diagnostic and Statistical Manual of Mental
Disorders (American Psychiatric Association, 1994) is often unproblematically
equated with the structure of the mental disorders thought to underlie them.
In both cases, the choice for the structure of psychological variables appears
to be mainly a function of historically determined conventions that have little
empirical or theoretical support. Now, there exist some papers on the problem
of distinguishing between different latent variable structures (Molenaar & Von
Eye, 1994; De Boeck, Wilson, & Acton, 2005; Waller & Meehl, 1998; Maraun,
Slaney, & Goddyn, 2003), but these represent relatively isolated efforts by
methodologists and do not play a part in a coordinated massive attack by
the psychological research community. One would expect—naively perhaps—
that determining the structure of one’s central theoretical terms (e.g., mental
disorders) is a matter of monumental importance for any science. But, judging
from where the research activity in psychology concentrates, apparently this is
not a widely shared conviction.
In fact, I do not think that many psychological researchers worry about
measurement problems; it actually seems that very few realize what their
magnitude really is. The reason for this is that psychology is in the grip of
a rather awkward form of operationalism. The general feeling appears to be
that if one constructs the data file in such a way that it contains numbers, and
these numbers are run through the most popular analyses today—like analysis
of variance or principal components analyses—then conclusions that concern
these numbers unproblematically generalize to the psychological attributes that
the researcher is interested in. In such procedures, psychological attributes are
equated with, or assumed to be isomorphic to, the numbers in the data file.
It is obvious that the assumption here is that the researcher is dealing with
observed variables as defined earlier in this paper. It is also obvious, however,
that this assumption is far too strong for most psychological measurement
procedures.
LATENT VARIABLE THEORY
49
This is of course not meant to imply that every researcher should use a latent
variable model in every type of research; in many cases, simple functions of
the data patterns (like the sumscore) may be fine for the purposes at hand. It is
meant to imply that a researcher, who cannot plausibly argue that he or she is
dealing with observed variables, should be aware of the fact that the attributes
measured do not automatically conform to the way the numbers are constructed
in a data file; hence, that there is a problem; hence, that something should be
done about it. What that something is—e.g., making an all-out modeling effort,
or estimating the robustness of observed variable techniques under plausible
modeling assumptions—depends on the research context.
The present investigations do suggest that the onus of proof lies with the
researcher who wants to assume that his or her variables are observed. Presently,
this is not the case: Researchers use observed variable techniques, except when
there are exceptional circumstances that lead them to use latent variable models.
It would seem rather more plausible that the researcher proceeds in the opposite
direction: using latent variable techniques unless there are circumstances that
justify or necessitate treating psychological attributes as observed. As has been
argued in this paper, the researcher who treats variables as observed is making
some very strong assumptions about the quality of the measurement procedures
that have been utilized. If such assumptions lack justification, which would seem
to be the rule rather than the exception in psychology, this is theoretically inadequate (although not necessarily practically inadequate). Thus, the conclusion
that can be drawn from the analysis presented here is simple: A psychological
variable is latent until proven observed.
DISCUSSION
In this paper, an attempt was made to construct the conceptual foundations for
latent variable modeling under the more general heading of latent variable theory.
It was argued that there is no reason to make an ontological distinction between
latent and observed variables; hence, all variables are ontologically on par. What
differs between situations where one treats variables as latent or observed is the
degree to which the researcher assumes variable structures to be epistemically
accessible. To treat variables as observed is to assume full accessibility; that is,
inferences from data to variable structure are assumed to be without error. Such
accessibility requires that the causal process that gives rise to variation in data
patterns is deterministic, that the variable measured is causally isolated in the
sense that it is the only variable at work in producing variation in data patterns,
and that the number of distinct patterns in the data equals the number of levels
of the variable measured.
If one or more of these assumptions are violated, then the inference from
data patterns to variable structure is prone to error. The variable the researcher
50
BORSBOOM
intends to measure is then to be conceptualized as a latent variable. In setting
up a model for such a situation, the researcher faces the problem of specifying
the structure of the variable in question as well as the function that relates this
structure to the variation in data patterns. These choices are ideally made on
substantive grounds, although in practice this is seldom the case. With regard
to the choice of form for the latent variable structure, there appears to be a
strong influence of one’s statistical upbringing; for instance, those who are
accustomed to working with factor models seem to conceptualize theoretical
constructs as continuous dimensions more or less automatically. Indeed, one
sometimes wonders whether psychologists are sufficiently well-informed on
the fact that psychological attributes may not necessarily behave as linearly
ordered dimensions (e.g., like the factors in a factor model) and that making this
assumption while it is false may seriously distort the interpretation of research
findings. With regard to the choice of the function that relates the observations to
the latent variable structure, mathematical convenience appears to be a primary
determinant; for instance, assuming a logistic function in an IRT model, or
assuming linearity in a factor model, enables standard parameter estimation
procedures and widely available software to be used.
Although one should not mitigate the importance of such practical concerns,
I think that psychology stands to gain considerably if more attention is devoted
to the substantive underpinnings of such modeling assumptions. The reason
for this is not so much technical as theoretical: Thinking about the relation
between a psychological attribute and the data patterns that are supposed to
measure it forces a deeper investigation into the nature of the attribute and the
way the measurement instrument is supposed to work. It requires one to spell
out, at least at a very coarse level, why one is justified in treating the data
patterns as measurements; i.e., it gives one the beginnings of an argument for the
validity of the measurement instrument used. Such arguments are badly needed
in psychological measurement.
It is unfortunately necessary to recognize that, in many areas of psychology
and the social sciences, the construction of measurement models is not among
researchers’ favorite activities. Also, there is a widespread trust in the representational power of the numbers that happen to pop up in the data files researchers
feed to statistical computing programs. Rarely is there an explicit recognition
that these numbers may not actually represent theoretical attributes very well.
Of course, with regard to the theoretical attributes hypothesized in the social
sciences (e.g., fearfulness, depression, intelligence), we know very little, and
this greatly complicates the informed construction of a measurement model.
On the other hand, when it comes to the representation of such attributes in
the data files used in empirical research, we can be quite certain about one
thing: In this particular area of science, observed variables probably do not
exist.
LATENT VARIABLE THEORY
51
ACKNOWLEDGMENTS
I would like to thank Conor Dolan for his comments on an earlier draft of this
paper. This research was supported by NWO innovational research grant no.
451-03-068.
REFERENCES
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders
(4th ed.). Washington, DC: American Psychiatric Publishing.
Bartholomew, D. J. (1987). Latent variable models and factor analysis. London: Griffin.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability.
In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479).
Reading, MA: Addison-Wesley.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A. (2002). Latent variables in psychology and the social sciences. Annual Review of
Psychology, 53, 605–634.
Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics.
Cambridge: Cambridge University Press.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440.
Borsboom, D., & Dolan, C. V. (2006). Why g is not an adaptation: A comment on Kanazawa.
Psychological Review, 113, 433–437.
Borsboom, D., & Mellenbergh, G. J. (2004). Why psychometrics is not pathological: A comment on
Michell. Theory & Psychology, 14, 105–120.
Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent
variables. Psychological Review, 110, 203–219.
Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological
Review, 111, 1061–1071.
Borsboom, D., & Zand Scholten, A. (2008). The Rasch model and additive conjoint measurement
theory from the perspective of psychometrics. Theory & Psychology, 18, 111–117.
Campbell, N. R. (1920). Physics, the elements. Cambridge: Cambridge University Press.
Cervone, D. (2005). Personality architecture: Within-person structures and processes. Annual Review
of Psychology, 56, 423–452.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological
Bulletin, 52, 281–302.
De Boeck, P., Wilson, M., & Acton, G. S. (2005). A conceptual and psychometric framework for
distinguishing categories and dimensions. Psychological Review, 112, 129–158.
Dolan, C. V., Jansen, B. R. J., & Van der Maas, H. L. J. (2004). Constrained and unconstrained normal
finite mixture modeling of multivariate conservation data. Multivariate Behavioral Research, 39,
69–98.
Embretson, S. E., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developments, and
applications. New York: Springer.
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231.
Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stoufer, L. Guttman, E. A. Suchman,
P. L. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Studies in social psychology in World War
II: Vol. IV. Measurement and prediction (pp 60–90). Princeton, NJ: Princeton University Press.
52
BORSBOOM
Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and applications.
Boston: Kluwer-Nijhoff.
Hamaker, E. L., Nesselroade, J. R., & Molenaar, P. C. M. (2007). The integrated trait-state model.
Journal of Research in Personality, 41, 295–315.
Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.
Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1).
New York: Academic Press.
Kripke, S. A. (1980). Naming and necessity. Oxford: Blackwell.
Kyngdon, A. (2008). The Rasch model from the perspective of the representational theory of
measurement. Theory & Psychology, 18, 89–109.
Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical method. London: Butterworth.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin.
Lewis, D. (1973). Counterfactuals. Oxford: Blackwell.
Lubke, G. H., & Muthén, B. (2005). Investigating population heterogeneity with factor mixture
models. Psychological Methods, 10, 21–39.
Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fundamental
measurement. Journal of Mathematical Psychology, 1, 1–27.
Maraun, M., Slaney, K., & Goddyn, L. (2003). An analysis of Meehl’s MAXCOV-HITMAX
procedure for the case of dichotomous indicators. Multivariate Behavioral Research, 38, 81–112.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Mellenbergh, G. J. (1994). Generalized Linear Item Response Theory. Psychological Bulletin, 115,
300–307.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–103).
Washington, DC: American Council on Education and National Council on Measurement in
Education.
Michell, J., & Ernst, C. (1996). The axioms of quantity and the theory of measurement: Part I, an
English translation of Hölder (1901). Journal of Mathematical Psychology, 40, 235–252.
Michell, J., & Ernst, C. (1997). The axioms of quantity and the theory of measurement: Part II, an
English translation of Hölder (1901). Journal of Mathematical Psychology, 41, 345–356.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British
Journal of Psychology, 88, 355–383.
Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept.
Cambridge: Cambridge University Press.
Mischel, W. (1968). Personality and assessment. New York: Wiley.
Molenaar, P. C. M. (1985). A dynamic factor model for the analysis of multivariate time series.
Psychometrika, 50, 181–202.
Molenaar, P. C. M. (2004). A manifesto on psychology as ideographic science: Bringing the person
back into scientific psychology, this time forever. Measurement, 2, 201–218.
Molenaar, P. C. M., & Von Eye, A. (1994). On the arbitrary nature of latent variables. In A. von
Eye & C. C. Clogg (Eds.), Latent variables analysis (pp. 226–242). Thousand Oaks: Sage.
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British
Journal of Mathematical and Statistical Psychology, 49, 313–334.
Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391–411.
Narens, L., & Luce, R. D. (1986). Measurement: The theory of numerical assignments. Psychological
Bulletin, 99, 166–180.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, England: Cambridge
University Press.
Perline, R., Wright, B. D., & Wainer, H. (1979). The Rasch model as additive conjoint measurement.
Applied Psychological Measurement, 3, 237–255.
LATENT VARIABLE THEORY
53
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech
recognition. Proceedings of the IEEE, 77, 257–286.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen:
Paedagogiske Institut.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis.
Applied Psychological Measurement, 14, 271–282.
Rozeboom, W. W. (1973). Dispositions revisited. Philosophy of Science, 40, 59–74.
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand
Oaks: Sage.
Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. Cambridge,
MA: MIT Press.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667–680.
Tuerlinckx, F., & De Boeck, P. (2005). Two interpretations of the discrimination parameter.
Psychometrika, 70, 629–650.
Van Fraassen, B. C. (1980). The scientific image. Oxford: Clarendon Press.
Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov models to
psychological data. Scientific Programming, 10, 185–199.
Waller, N. G., & Meehl, P. E. (1998). Multivariate taxonometric procedures. Thousand Oaks: Sage.