Fit of Different Models for Multitrait–Multimethod Experiments

STRUCTURAL EQUATION MODELING, 9(2), 213–232
Copyright © 2002, Lawrence Erlbaum Associates, Inc.
Fit of Different Models for
Multitrait–Multimethod Experiments
Irmgard W. Corten and Willem E. Saris
Department of Methods and Techniques
University of Amsterdam
Germà Coenders
Department of Economics
University of Girona
William van der Veld, Chris E. Aalberts, and Charles Kornelis
Department of Methods and Techniques
University of Amsterdam
In the past, several models have been developed for the estimation of the reliability
and validity of measurement instruments from multitrait–multimethod (MTMM) experiments. Suggestions have been made for additive, multiplicative and correlated
uniqueness models, whereas recently Coenders and Saris (2000) suggested a procedure to test these models against one another. In this article, the different models suggested for the analysis of MTMM matrixes have been compared for their fit to 87 data
sets collected in the United States (Andrews, 1984; Rodgers, Andrews, & Herzog,
1992), Austria (Költringer, 1995), and the Netherlands (Scherpenzeel & Saris,
1997). As most variables are categorical, the analysis has been carried out on the basis of polychoric–polyserial correlation coefficients and of Pearson correlations. The
fit of the models based on polychoric correlations is much worse than the fit of models based on product moment correlations, but in both cases a model that assumes additive method effects fits most data sets better than the other models, including the
so-called multiplicative models.
Campbell and Fiske (1959) proposed assessing the validity of measures by means
of a comparison of correlations in a multitrait–multimethod (MTMM) matrix. This
Requests for reprints should be sent to Willem E. Saris, University of Amsterdam, Binnengasthuis,
Oudezijds Achterburgwal 237, 1012 DL Amsterdam, The Netherlands. E-mail: [email protected]
214
CORTEN ET AL.
matrix is the result of a MTMM experiment, in which multiple traits are measured
by several methods. Their procedure has been frequently used in determining the
validity of measures, but many authors, beginning with Althauser, Herberlein, and
Scott (1971), have pointed to a number of shortcomings in the analysis of the
matrices. Alternative procedures for evaluating MTMM data (e.g., confirmatory
factor analysis [CFA] models) have been proposed by several researchers (Alwin,
1974; Andrews, 1984; Jöreskog, 1971; Kenny, 1976).
In the CFA approach to MTMM data, each observed variable loads on one trait
factor and one method factor. These method factors represent the variation due to
the method used. It is usually assumed that the error terms are uncorrelated and
that method factors are not correlated with trait factors. The latter assumption allows the variance to be decomposed into trait variance, method variance, and random error variance in order to assess measurement quality.
Widaman (1985) developed a taxonomy of hierarchically nested CFA models
for MTMM data with or without method factors and with or without correlations
among trait factors and among method factors. Marsh (1989) claimed that these
CFA models of MTMM data often led to ill-defined solutions and he expanded
Widaman’s taxonomy with the correlated uniqueness (CU) model. In this model
there are no method factors; instead, method effects are represented as correlated
errors. When the number of traits is three, the CU model is equivalent to the CFA
model in which method factor correlations are assumed to be zero. When the number of traits is greater than three, the CFA model is nested within the CU model.
Marsh and Bailey (1991) found that the CU model almost always led to proper solutions.
The early CFA models specified additive method effects. When method effects
are additive, the size of the method effects does not vary across traits. According to
Andrews (1984) and Kumar and Dillon (1992), additive effects can be expected
when methods consist of category labels or scale lengths in questionnaires. Individuals may differ in the way they use a certain response scale, and it may be expected that the use of such a scale by a single individual is more or less constant
across traits.
As early as 1967, Campbell and O’Connell suggested that the method effect
could also be multiplicative; that is, the method correlation would be proportional
to the correlations between the traits of interest. Processes that can lead to multiplicative effects include, for example, differential augmentation and differential attenuation (Campbell & O’Connell, 1967, 1982). According to Bagozzi and Yi
(1991) and Campbell and O’Connell (1967), multiplicative method effects are
quite common when raters are used as methods. Coenders, Satorra, and Saris
(1997) suggested that ordinal measurement also leads to multiplicative method effects because the categorization errors of closely related variables can be correlated. The direct product (DP) model was developed by Browne (1984) for analyzing multiplicative method effects.
FIT OF MODELS FOR MTMM EXPERIMENTS
215
Coenders and Saris (2000) showed that the CU model, which is an additive
model, can be used to specify both additive and multiplicative method effects by
imposing certain restrictions on the correlated errors. In fact, they showed that, for
designs with three methods, a CU model with multiplicative constraints is locally
equivalent (Luijben, 1989) to the DP model (Coenders & Saris, 1998), whereas a
CU model with additive constraints is locally equivalent to certain constrained versions of the CFA model. They also showed that the CU model can be used to test
for additive and multiplicative method effects, by testing the suggested constraints.
In this article we focus on the fit of the general and constrained CU models. This is
done by means of the percentage of nonconvergence and improper solutions, and the
standardized root mean square residual (RMSR), adjusted for the number of free parameters. Before doing so, the CU model and the constraints used for defining additive and multiplicative method effects are described in the following sections.
CORRELATED UNIQUENESS MODEL
In this section, we follow the formulation of Coenders and Saris (2000). The CU
model (Kenny, 1976; Marsh, 1989; Marsh & Bailey, 1991) belongs to the family of
factor analysis models. The model is specified as
xij = λij ξi + δij
"i , j
(1)
where xij is the measurement of Trait I with Method J, expressed in deviations from
the mean; δij is the random measurement error plus method effect component for
xij; ξi are the standardized trait factors with correlations ϕii′; and λij is the loading of
xij on ξi (when standardized and squared it can be interpreted as a measurement
quality indicator).
The specification includes the conventional assumption of no correlation between trait factors and error terms:
cov(δij ξi ¢ ) = 0
"ij, i¢
(2)
where i, i′ identify the traits, and j, j′ the methods. In all equations in this article, i
may be equal to i′ and j may be equal to j′, unless explicitly stated to the contrary.
Covariances among error terms corresponding to pairs of variables measured
with the same method (monomethod error covariances) constitute unrestricted
model parameters, represented as cov(δij δi′ j ). The inclusion of such parameters is
a very straightforward manner of accounting for method effects. Covariances
among error terms corresponding to pairs of variables measured with two different
methods (heteromethod error covariances) are constrained to be zero:
cov(δij δi ¢ j ¢ ) = 0
if j ¹ j ¢
(3)
216
CORTEN ET AL.
If the cov(δijδi′j′) are not equal to zero in the population, then the model is
misspecified. This could be seen as a possible drawback of the CU model. In some
circumstances, some of the methods of measurement are so similar that this assumption cannot be expected to hold (de Wit, 1994). Yet, this assumption has
rarely been successfully relaxed in applied research: The DP model and most versions of the CFA models also make this assumption. Relaxation of the assumption
often leads to nonconvergence, inadmissibility and problems of
underidentification (Andrews, 1984; Bagozzi, 1993; Bagozzi & Yi, 1991; Marsh
& Bailey, 1991; Saris, 1990a). Furthermore, even with heteromethod error
covariances present, when fitting models which assume zero heteromethod error
covariances, the bias of the estimates is fairly minor (Saris, 1990b; Scherpenzeel,
1995), at least if method effects are not too strong. Marsh (1989) and Marsh and
Bailey (1991) went further to show that, even in the cases in which the CU model is
misspecified due to heteromethod error covariances, the CU estimates are closer to
the population parameter values than the estimates of the correctly specified but
highly unstable CFA model in which the assumption of uncorrelated heteromethod
covariances has been relaxed. The comparatively better performance of the CU
model is due to the fact that the lower sampling variability of the CU estimates outweighs the bias arising from the violation of the model’s assumptions. Therefore,
Coenders and Saris (2000) concluded that the restrictive assumptions of the CU
model are not so serious and are more than compensated for by the model’s lack of
practical problems. Marsh (1989) and Marsh and Bailey (1991) reported that the
CU model rarely leads to problems of empirical underidentification, failure to converge, or inadmissible estimates.
Figure 1 shows a path diagram of the CU model for a design with t = 3 traits and m =
3 methods. The variables measured with the same method appear together in the diagram. Each variable is only affected by one trait factor and an error term. The
covariances for the pairs of variables measured with the same method account for the
method effects. The CU model is scale invariant according to the definition of Cudeck
(1989) and can be shown to lead to the following implied variances and covariances:
var (xij )
= λij2 + var (δij )
cov(xij xi ¢ j ¢ )
= λij φii ¢ λi ¢ j ¢
if i ¹ i¢ and j ¹ j ¢
cov(xij xi ¢ j )
= λij φii ¢ λi ¢ j + cov(δij δi ¢ j ) if i ¹ i¢
= λij λij ¢
cov(xij xij ¢ )
if j ¹ j ¢
(4)
where var(xij ) is a variance, cov(xij xi′j′ ) is a heterotrait–heteromethod covariance,
cov(xij xi′j) is a heterotrait–monomethod covariance, and cov(xij xij′) is a
monotrait–heteromethod covariance. The terms cov(δij δi′j), which are added to the
heterotrait–monomethod covariances, generally make them larger than the corre-
FIGURE 1
Path diagram of the CU model for three traits and three methods.
217
218
CORTEN ET AL.
sponding heterotrait–heteromethod covariances. The equation also decomposes
the variance of the xij variables into trait variance (λ2ij) and error variance (var(δij)),
the latter including both random error and method variance.
A particularly interesting case of the CU model is the congeneric measurement
(CM) model, one of the most commonly used measurement error models
(Jöreskog, 1969, 1971). The specification of the CM model is identical to that of
the CU model except for Equation 3, which is reexpressed as
cov(δij δi ¢ j ¢ ) = 0
if i ¹ i¢ or j ¹ j ¢
(5)
so that both heteromethod and monomethod error covariances are constrained to
zero and method effects are thus ignored. If there are no method effects, the CM
model is correct and scale invariant. If the method effects are important, the model
is misspecified and the CU model should be used instead.
Although the CU model has traditionally been linked to the modeling of additive method effects, it does not restrict monomethod error covariances at all, so that
it may be used to model both additive and multiplicative method effects, as has
been shown by Coenders and Saris (2000; see the next section).
DEFINITION OF ADDITIVE AND MULTIPLICATIVE
METHOD EFFECTS
Coenders and Saris (2000) gave two original alternative definitions of additive and
multiplicative method effects, which are more closely related to the current modeling approaches than those given in Campbell and O’Connell (1967) and in Browne
(1984). The definitions assume that the CU model holds. This, of course, implies that
heteromethod error covariances are zero. The CU model suggests that method effects can be found in the parameters reflecting error covariances among variables
measured with the same method; that is, the cov(δij δi′j) terms in Equation 4. Accordingly, all the definitions presented implicitly entail different restrictions on them.
Definitions
Definition A. Method effects are called additive if the population MTMM
covariance matrix can be fitted by a CU model in which the error covariance parameters fulfill the restriction:
cov(δij δi ¢ j ) = a j
" j, i ¹ i¢
(6)
where aj ≥ 0 is a constant related to Method J. In other words, method effects are
additive if all error covariance parameters related to Method J are equal.
FIT OF MODELS FOR MTMM EXPERIMENTS
219
Definition M. Method effects are called multiplicative if the population
MTMM covariance matrix can be fitted by a CU model in which the error
covariance parameters fulfill the restriction:
cov(δij δi ¢ j ) = b j φii ¢
" j, i ¹ i¢
(7)
where bj ≥ 0 is a constant related to Method J and φii′ is the correlation between
Trait I and Trait I′. In other words, method effects are multiplicative if the error
covariance parameters related to Method J are proportional to the trait correlation
φii¡′.
Definitions A and M are appealing in their simplicity, but they are not problem
free. It can be shown that a CU model with the restrictions in Equations 6 or 7 is
scale dependent in the sense given by Cudeck (1989); among other things this
makes analysis of covariance and correlation matrixes yield different results and
invalidates statistical tests when analyzing correlation matrixes. Moreover, Definitions A and M are not justifiable if measurements with the same method have different units of measurement, although this situation is quite rare in MTMM designs. Definitions AI and MI solve both problems, thus leading to scale invariant
models. It is clear that these definitions are the same as A and M except for scale
corrections.
Definition AI. Method effects are called additive if the population MTMM
covariance matrix can be fitted by a CU model in which the error covariance parameters fulfill the restriction:
cov(δij δi ¢ j ) = c j λij λi ¢ j
" j, i ¹ i¢
(8)
where cj ≥ 0 is a constant related to Method J and the λs are trait loadings.
Definition MI. Method effects are called multiplicative if the population
MTMM covariance matrix can be fitted by a CU model in which the error
covariance parameters fulfill the restriction:
cov(δij δi ¢ j ) = d j φii ¢ λij λi ¢ j
" j, i ¹ i¢
(9)
where dj ≥ 0 is a constant related to Method J.
We label the CU models with these constraints as CUA, CUAI, CUM, and
CUMI models, all of which are nested into the regular CU model. In the following
sections we detect which model best fits a large number of data sets from different
countries. In order to discuss this issue, we first discuss the data and the methods to
be used.
220
CORTEN ET AL.
DATA
The data for this study come from several surveys in four different countries and
have been used in previous analyses by other authors: the United States (Andrews,
1984), Austria (Költringer, 1995), Belgium (Billiet, Loosveldt, & Waterplas,
1986), and the Netherlands (Scherpenzeel & Saris, 1997). Most of these experiments were conducted in the context of an international study of quality of survey
questions (Saris & Van Meurs, 1990). The MTMM experiments were always incorporated within larger surveys. Within the MTMM experiments, all traits were
measured with the same set of methods. The variations in method in these studies
concern many different characteristics such as scale characteristics, question formulation aspects, and the data collection procedure.
In the United States, the surveys were carried out by the University of Michigan. Part of the data were gathered in 1984 and 1987 as a part of panel surveys. In
these cases personal interviews were used. The other surveys were conducted in
1979 by telephone interviews. In contrast to the study of Andrews, data sets in this
article have been reduced to nonoverlapping data sets.
In Austria, all data were gathered in 1992 by one Austrian survey research institute (IFES) using only personal interviews. In the Netherlands, several research organizations collected the data using several data collection techniques. These include telepanel interviewing, computer-assisted telephone interviewing, mail
questionnaires, and computer-assisted self-interviewing with the interviewer present.1 In Belgium, face-to-face interviewing was used with occasionally a dropoff
part of the questionnaire.
In all cases, the data come from national or at least regional samples. Many
samples are larger than 1,000 people. For each experiment, a correlation matrix
was estimated. In total, 87 MTMM matrixes were analyzed: 15 from the United
States, 10 from Austria, and 62 from the Netherlands and Belgium. Missing values
were handled by listwise deletion.
METHOD
In all studies, responses for at least three traits (questions) have been collected. The
formulation of these questions has altered at least three times in order to study the
variation in reliability and validity of these different forms. Two different designs
have been used. In cross-sectional studies, all three variations in the method used
have been collected in one survey leading to three repetitions within one survey. As
a rule, the interval between the repeated measures was at least 20 min as it was
found that people usually cannot remember their previous answer after 20 min if
1For
an explanation of these data collection techniques, see Scherpenzeel and Saris (1997).
FIT OF MODELS FOR MTMM EXPERIMENTS
221
similar questions have been asked in the meantime (Van Meurs & Saris, 1990). The
second design used was a panel design in which the same question is only repeated
twice in one wave but is again repeated twice in the next wave. Mostly one of the
methods was used twice in these studies at two points in time but in general this exact repetition of the same method was treated as a separate method.
Estimation
Several scales used in the surveys are categorical and it has been shown that this
can cause a very large bias in the estimates of the correlations (e.g., O’Brien, 1985;
O’Brien & Homer, 1987; Olsson, 1979b; Parry & McArdle, 1991; Quiroga, 1992).
Polychoric–polyserial correlations (Olsson, 1979a; Olsson, Drasgow, & Dorans,
1982) have been suggested as an alternative. Polychoric–polyserial correlations
assume that for each observable ordinal variable yi there is an underlying normally
distributed continuous variable y*i related to yi through the step function:
yi = k when τi,k-1 < y*i ≤ τi,k
(10)
where the parameters τi, 1, …, τ i, mi-1 are called thresholds. Polychoric–polyserial
correlations are the estimates of the correlations of the underlying normal y*i variables. A polychoric correlation is used to relate two ordinal variables, both assumed to be generated by the step function in Equation 10. A polyserial correlation
is used to relate a variable that is assumed to follow the function in Equation 10 to a
continuously observed variable.
In a study on the reliability and validity of ordinal survey measures, Saris,
Scherpenzeel, and van Wijk (1998) found that the reliability estimates are lower if
Pearson correlation coefficients are used instead of the polychoric–polyserial correlations, whereas the validities remain approximately the same. It is controversial
whether the lower Pearson reliability estimates can be considered to be bias or to
be a proper reflection of the lower measurement quality of ordinal data with respect to continuous data (Coenders et al., 1997). Furthermore, in accordance with
previous literature (Coenders et al., 1997; Homer & O’Brien, 1988) Saris et al.
(1998) found that the estimated correlations between the trait factors hardly varied
with the kind of correlation coefficients used.
In this study, both kinds of correlation coefficients are used, to see whether this
affects the fit of the general and constrained CU models. All correlations were estimated with the PRELIS program (Jöreskog & Sörbom, 1988).
The CU models were estimated using the maximum likelihood estimation procedure in the LISREL8 program (Jöreskog & Sörbom, 1989, 1993). When the maximum
likelihood estimators are used for polychoric–polyserial correlation matrixes, the estimates are consistent if the underlying continuous variables follow a multivariate normal distribution (Babakus, Ferguson, & Jöreskog, 1987), but the statistical tests are incorrect. However, more advanced and theoretically sounder estimation methods such
222
CORTEN ET AL.
as weighted least squares may fail even to be inconsistent, unless the number of variables is very small and the sample size very large (Muthén & Kaplan, 1989).
When the computer program stated that the matrix to be analyzed was not positively definite, unweighted least squares was used as the estimation procedure,
which is consistent both for Pearson, polychoric correlations, and polyserial correlations whenever maximum likelihood is consistent.
When a scale dependent model is estimated, statistical tests are not possible for
any type of correlation matrix or estimation procedure.
Comparison of the Fit
The general model and the four constrained CU models were fitted to the MTMM matrixes. The fit of the CU, CUA, and CUM models were compared for the percentage of
nonconvergence, the percentage of improper solutions, and the standardized RMSR.
Nonconvergence occurs when an estimation algorithm is unable to generate
values that are sufficiently small after repeated iterations to stop the iteration process. There are several causes of nonconvergence: (a) incongruence between the
observed correlations and the model, either because of sampling fluctuations or of
model misspecification; (b) the specifications of the program regarding the convergence criterion, the number of iterations allowed, and the starting values; and (c)
the complexity of the model regarding for instance the total number of variables,
the ratio of the number of latent variables on the number of observed variables, the
presence of nonlinear constraints (e.g., in the CUAI, CUM, and CUMI models), or
the number of free parameters (higher in the CU model).
Improper solutions refer to parameter estimates that are outside their permissible range of values (e.g., negative error variances, like Heywood cases) and correlations greater than one. Van Driel (1987) discussed three causes of negative error
variances: sampling fluctuations combined with true parameter values close to
zero, misspecification of the model, and underidentification of the model. Van
Driel showed that it is possible to discover the probable cause of a Heywood case
by examining the confidence interval around the negative parameter estimate.
When positive values fall within the confidence interval and its width is comparable to that for proper estimates, the cause is probably sampling error. Gerbing and
Anderson (1987) showed that when sampling error is the cause of the improper solution, fixing this parameter at zero has hardly any influence on the other parameter estimates or on the overall Goodness-of-Fit Index (GFI; Gerbing & Anderson,
1987). Therefore, negative error variances that were small and nonsignificant were
not regarded as unacceptable parameter estimates in this study.
As mentioned in a previous section, the CUA and CUM models are nested
within the general CU model. This means that one could test for additive and
multiplicative method effects by using the difference in χ2 between the CUA
(CUM) model and the CU model. This difference in χ2 values of two nested mod-
FIT OF MODELS FOR MTMM EXPERIMENTS
223
els is itself distributed as χ2 with degrees of freedom equal to the difference in degrees of freedom for the two models. Yet, the characteristics of the χ2 difference
test share the same problems as the “single” χ2 test (Likelihood Ratio [LR] test)
and, consequently, the same problems occur as in the standard use of the LR test.
First, as mentioned in the previous section, the assumptions underlying the test are
not met in our case. Second, the test provides no information on its statistical power,
which tends to be neglected (Bentler & Bonett, 1980; Saris, den Ronden, & Satorra
1987; Saris & Satorra 1988; Satorra & Saris 1985; Saris, Satorra, & Sörbom 1987).
The power of the test is the probability that an incorrect model will be rejected. However, when the power of the test is high (i.e., the test statistic is very sensitive to minor
specification errors), one does not know whether a significant value of the test statistic
is due to large or small specification errors in the model and the simple rejection of the
model is no longer interpretable. On the other hand, when the power of the test is low,
even large specification errors have a low probability of being detected. Making decisions on the basis of the test statistic is therefore not justified without taking into account the power of the test. The power of the test certainly depends on the kind and the
size of the misspecification in the model and on the size of the sample. However, the
power of the test also varies with the characteristics of the model (e.g., the number of
indicators per factor, the size of the loadings and the correlation between the factors).
Saris and Satorra (1988) showed that the power of the .05 level test for the same kind
and size of error can vary from .05 to 1. As a consequence, under certain conditions a
specification error will almost certainly lead to the rejection of the model while under
other conditions the same error will not be detected.
Furthermore, the LR test is usually applied as a test of the whole model. Yet, it is
doubtful that one can use the test in this way. As Saris et al. (1987) showed, the
power of the test has a different sensitivity for different types of errors. Therefore,
the LR test does not test all parameters of the model but only those for which it is
sensitive. One could only draw conclusions if one were to evaluate the power of the
test for all kinds of errors of specification in the model.
Due to the problems in applying the LR test, alternative measures of fit have
been developed. These measures can quantify the degree of fit of the model along a
continuum.2 Yet, many of these fit indexes are actually based on the χ2 statistic and
show similar application problems as the LR test. As has been shown with simulated and real data, the indexes (with few exceptions such as the Tucker–Lewis Index; Marsh, Balla, & Hau, 1996), are also substantially affected by sample size.
2Examples are the Akaike’s Information Criterion, the Schwarz Criterion, the GFI and the Adjusted
Goodness-of-Fit Index, the standardized RMSR, the Bentler–Bonett Index, the Relative Fit Index, the
Tucker–Lewis Index, the ∆2 (or Incremental Fit Index) of Bollen, the Fit Index and the comparative fit
index of Bentler (Relative Noncentrality Index of Marsh and McDonald), the root mean squared error
of approximation and the Expected Cross-Validation Index. (See, e.g., Bentler, 1990; Bentler & Bonett,
1980; Bollen, 1989; Browne & Cudeck, 1993; Marsh et al., 1988; Mulaik et al., 1984.)
224
CORTEN ET AL.
Corten, Saris, and Satorra (2001) expected all fit indexes based on the χ2 statistic
also to be sensitive to the characteristics of the model with the same consequences
as with the LR test: Under certain conditions a given misspecification will be detected, under others it will not. Corten et al. conducted a Monte Carlo study on the
dependence of several fit indexes of model characteristics. The preliminary results
of this study show that most of the fit indexes indeed vary with the characteristics
of the model. Finally, inferences based on the LR test and the LR test-based fit indexes are suspect for scale dependent models estimated with correlation matrixes
and whenever the statistical assumptions of the LR test fail to hold.
A descriptive measure of the size of the residuals that is unrelated to the LR statistical theory has been chosen to compare the fit of the different models, namely
the standardized RMSR. The standardized RMSR is a measure of the average deviation of the observed correlations from the correlations reproduced by the hypothesized model and equals zero if the model perfectly describes the data. Of course,
the standardized RMSR partially suffers from some of the problems of the other fit
indexes (e.g., the dependence of sample size). However, a study by Hu and Bentler
(1998) of the sensitivity of fit indexes to detect model misspecification has shown
that the ML-based standardized RMSR is the most sensitive index.
Due to the fact that the standardized RMSR has the drawback that it gets smaller
whenever more parameters are freed, even if the relevance of the added parameter
is low, we adjusted the standardized RMSR for loss of degrees of freedom.
Normally the formula for the standardized RMSR is
p
SRMSR =
2
i
i=1 j =1
(sij - σˆ ij )sii s jj
(11)
p(p + 1)2
To take the parsimony of the models into account, we multiplied the standardized RMSR by the square root of the number of distinct variances and covariances
and divided it by the square root of the degrees of freedom to achieve a so-called
parsimony standardized RMSR:
p
PSRMSR =
i
i=1 j =1
2
(sij - σˆ ij )sii s jj
df
(12)
In this way we derive the average residual per degree of freedom.
Due to the fact that the standardized RMSR offers no test comparable to the χ2 difference test to determine which of the models fits the data significantly better, the
five models were compared by means of a sign test on related samples with regard to
225
FIT OF MODELS FOR MTMM EXPERIMENTS
the value of the parsimony standardized RMSR. A t test was not advisable because
the distribution of the parsimony standardized RMSR was far from normal.
RESULTS
In this section, the results of the study are described. First, the results of the analysis on the Pearson correlations are presented, followed by those of the analysis on
the polychoric–polyserial correlations.
The percentage of nonconvergence and the percentage of improper solutions
are the first issue on which the fit of the different models is compared. For this
comparison all experiments were used. In the case of the comparison of the parsimony standardized RMSR, only those experiments were used where all five models (the general CU model and the four constrained CU models) converged and led
to proper solutions.
Analyses Based on Pearson Correlations
Table 1 shows the percentage of nonconvergence for the five kinds of models.
Nonconvergence occurred in 1% to 8% of the cases. The percentage in which
the iterative procedure does not lead to convergence is higher for the constrained
models than for the general CU model; however, in general, one can conclude
that convergence is not really a problem with the CU models, as was stated by
Marsh and Bailey (1991). Analysis of the Pearson correlations produced no improper solutions (nonsignificant improper solutions were not regarded as improper solutions).
A sign test for related samples was conducted to see whether there is any systematic difference in parsimony standardized RMSR between the five models. The
TABLE 1
Percentage of Nonconvergence, Percentage of Improper Solutions, Mean
and Standard Deviation of PSRMSR, Pearson Correlations
Model
% Nonconvergence
% Improper Solutions
Average PSRMR
SD PSRMR
CU
CUA
CUAI
CUM
CUMI
1
2
5
6
8
0
0
0
0
0
0.51
0.52
0.61
0.70
0.74
0.27
0.28
0.30
0.54
0.56
Note. PSRMSR = Parsimony Standardized Root Mean Square Residual: CU = correlated uniqueness; CUA = CU, additive method effects, scale-dependent; CUAI = CU, additive method effects,
scale-invariant; CUM = CU, multiplicative method effects, scale-dependent; CUMI = CU, multiplicative method effects, scale-invariant.
226
CORTEN ET AL.
TABLE 2
Sign Test for Related Samples With Regard to the Value of the PSRMSR
and Pearson Correlations
Pairs
CU–CUA
CU–CUAI
CU–CUM
CU–CUMI
CUA–CUAI
CUM–CUMI
CUA–CUM
CUAI–CUMI
Negative
Differences
Positive
Differences
No. of Ties
z
Asymptotic
Significance 2-Tailed
24
43
59
62
57
55
71
58
56
37
21
18
7
10
6
17
0
0
0
0
16
15
3
5
+3.466
–0.559
–4.137
–4.808
–6.125
–5.458
–7.293
–4.619
0.001
0.576
0.000
0.000
0.000
0.000
0.000
0.000
Note. PSRMSR = Parsimony Standardized Root Mean Square Residual: CU = correlated uniqueness; CUA = CU, additive method effects, scale-dependent; CUAI = CU, additive method effects,
scale-invariant; CUM = CU, multiplicative method effects, scale-dependent; CUMI = CU, multiplicative method effects, scale-invariant.
results are shown in Table 2. In the first four rows, the general CU model is compared with the constrained models. The value of the sign test, together with the
number of negative and positive differences, shows that the CU model that assumes additive method effects and is scale dependent fits the data better than the
CU model without constraints. There is no difference between the fit of the scale
invariant additive model and the general model. However, a model with multiplicative method effects fits the data worse than the general model.
In the last four rows the constrained models are compared with each other. The
figures show that a scale-dependent model leads to a smaller value of the parsimony standardized RMSR than a scale invariant model. It is also obvious, at least
in the data set used in this study, that the value of the parsimony standardized
RMSR is on average smaller for a model with additive method effects than for a
model that assumes multiplicative method effects.
Polychoric–Polyserial Correlations
Table 3 shows the first part of the results for the polychoric–polyserial correlations.
The percentage of nonconvergence varies between 2 and 15 and is higher for the constrained models than for the general model. Whereas improper solutions did not occur where Pearson correlations were used, Table 3 shows that when the analyses are
conducted on polychoric–polyserial correlations, improper solutions occur rather
often. For the general CU model, negative error variances or correlations greater than
one (or less than –1) were found in 25% of the experiments. For the restricted models,
this percentage was lower, between 14% and 17%.
FIT OF MODELS FOR MTMM EXPERIMENTS
227
TABLE 3
Percentage of Nonconvergence, Percentage of Improper Solutions, Mean
and Standard Deviation of PSRMSR, and Polychoric Correlations
Model
% Nonconvergence
% Improper Solutions
Average PSRMSR
SD PSRMSR
CU
CUA
CUAI
CUM
CUMI
2
5
7
16
15
24
17
14
16
14
0.066
0.062
0.083
0.085
0.094
0.232
0.028
0.037
0.048
0.058
Note. PSRMSR = Parsimony Standardized Root Mean Square Residual: CU = correlated uniqueness; CUA = CU, additive method effects, scale-dependent; CUAI = CU, additive method effects,
scale-invariant; CUM = CU, multiplicative method effects, scale-dependent; CUMI = CU, multiplicative method effects, scale-invariant.
When these results are compared with those of the Pearson correlations, there
are striking differences. With polychoric–polyserial correlations, nonconvergence
occurs about twice as often as with Pearson correlations. Whereas Pearson correlations reveal no improper solutions, when analyzing polychoric–polyserial correlations, these do constitute a problem. With regard to the percentage of
nonconvergence and the percentage of improper solutions, with the
polychoric–polyserial correlations, the general CU model leads to at least as many
problems as the constrained models which assume additive method effects. With
the Pearson correlations, the general model hardly ever gave problems of
nonconvergence or improper solutions.
The fact that improper solutions do not occur in the analysis of Pearson correlations but do occur with polychoric–polyserial correlations can be explained from
the tendency of polychoric and polyserial correlations to yield higher reliability estimates and thus error variances closer to zero (Saris et al., 1998).
Again, a sign test was carried out on the value of the parsimony standardized
RMSR. From Table 4, one can infer that on average the value of the parsimony standardized RMSR is smaller for the CU model without restrictions than for the multiplicative models and the scale invariant additive model. Yet, the CU model with additive constraints, which is scale dependent, fits the data better than the CU model
without constraints. This is more or less in agreement with the results of the sign test
on the value of the parsimony standardized RMSR for the Pearson correlations.
Comparing the value of the parsimony standardized RMSR of the four restricted models, one can see that scale dependent models lead to a smaller value
of the parsimony standardized RMSR than do scale invariant models. Furthermore, the parsimony standardized RMSR is smaller for a model that assumes additive method effects than for a model with which method effects are constrained to be multiplicative. These results correspond to those of the Pearson
correlations.
228
CORTEN ET AL.
TABLE 4
Sign Test for Related Samples With Regard to the Value of the PSRMSR
and Polychoric Correlations
Pairs
CU–CUA
CU–CUAI
CU–CUM
CU–CUMI
CUA–CUAI
CUM–CUMI
CUA–CUM
CUAI–CUMI
Negative
Differences
Positive
Differences
Number of
Ties
z
Asymptotic
Significance 2-Tailed
11
32
38
40
44
35
43
31
38
17
11
9
2
6
4
17
0
0
0
0
3
8
2
1
+3.714
–2.000
–3.714
–4.286
–6.045
–4.373
–5.543
–1.876
0.000
0.046
0.000
0.000
0.000
0.000
0.000
0.000
Note. PSRMSR = Parsimony Standardized Root Mean Square Residual: CU = correlated uniqueness; CUA = CU, additive method effects, scale-dependent; CUAI = CU, additive method effects,
scale-invariant; CUM = CU, multiplicative method effects, scale-dependent; CUMI = CU, multiplicative method effects, scale-invariant.
CONCLUSIONS
The results of these analyses show that the CU model that assumes additive
method effects and is scale dependent fits the data better than the CU model without restrictions on the correlated errors and the CU model with multiplicative
method effects. The percentage of nonconvergence and improper solutions is the
same as or less than with the other models. Moreover, the average value of the parsimony standardized RMSR is significantly smaller. This applies both to Pearson
and polychoric correlations.
Comparing the four constrained models, the inferences are also the same for
both kind of correlation coefficients. Scale dependent models lead to a smaller
value of the parsimony standardized RMSR than scale invariant models, whereas
CU models with additive constraints have a better fit on the basis of the parsimony
standardized RMSR than models that assume multiplicative method effects.
The conclusion of the study is clear. The CU model which assumes additive
method effects and is scale dependent should be preferred for comparative purposes above the general CU model and the models with multiplicative method effects. The CUAD model is equivalent to a CFA model in which correlations among
method factors are constrained to zero and method loadings are constrained to be
equal within a given method (Coenders & Saris, 2000), whereas the CUMI model
is equivalent to a DP model for designs with three methods (Coenders & Saris,
2000). Therefore, the results suggest that the CFA model used by the International
Research Group on Methodology and Comparative Survey in its measurement
FIT OF MODELS FOR MTMM EXPERIMENTS
229
quality studies is a better choice than the multiplicative models like the DP model
or even, in most cases, than the CU model (Saris & van Meurs, 1990).
Finally, with respect to the use of the polychoric–polyserial correlation coefficients, the study shows that it is not always profitable to use these statistics, because the number of improper solutions and nonconvergence is considerably increased. If one is interested in estimating the disattenuated correlations among trait
factors, one would be better advised to use the product–moment correlation coefficients. One will get about the same results without paying this technical penalty.
The estimates of the factor loadings will, however, be lower, thus reflecting the
lower measurement quality of the ordinal observed variables with respect to the
underlying continuous y* variables.
REFERENCES
Althauser, R. P., Herberlein, T. A., & Scott, R. A. (1971). A causal assessment of validity: The augmented multitrait–multimethod matrix. In H. M. Blalock, Jr. (Ed.), Causal models in the social sciences (pp. 151–169). Chicago: Aldine.
Alwin, D. (1974). An analytic comparison of four approaches to the interpretation of relationships in
the multitrait–multimethod matrix. In H. L. Costner (Ed.), Sociological methodology 1973–1974
(pp. 79–105). San Francisco: Jossey-Bass.
Andrews, F. M. (1984). Construct validity and error components of survey measures: A structural modeling approach. Public Opinion Quarterly, 48, 409–442.
Babakus, E., Ferguson, C. E., & Jöreskog, K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of
Marketing Research, 24, 222–229.
Bagozzi, R. P. (1993). Assessing construct validity in personality research. Applications to measures of
self-esteem. Journal of Research in Personality, 27, 49–87.
Bagozzi, R. P., & Yi, Y. (1991). Multitrait–multimethod matrices in consumer research. Journal of
Consumer Research, 17, 426–439.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,
238–246.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of
covariance structures. Psychological Bulletin, 88, 588–606.
Billiet, J., Loosveldt, G., & Waterplas, L. (1986). Het survey-interview onderzucht: effecten van het
ontwerp en gebruik van vragenlijsten op de kwaliteit van de antwoorden. Leuven, Sociologisch
Onderzoeksinstituut KU Leuven.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Browne, M. W. (1984). The decomposition of multitrait–multimethod matrices. British Journal of
Mathematical and Statistical Psychology, 37, 1–21.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the
multitrait–multimethod matrices. Psychological Bulletin, 56, 81–105.
Campbell, D. T., & O’Connell, E. J. (1967). Method factors in multitrait–multimethod matrices: Multiplicative rather than additive? Multivariate Behavioral Research, 2, 409–426.
230
CORTEN ET AL.
Campbell, D. T., & O’Connell, E. J. (1982). Methods as diluting trait relationships rather than adding
irrelevant systematic variance. In D. Brinberg & L. Kidder (Eds.), Forms of validity in research (pp.
93–111). San Francisco: Jossey-Bass.
Coenders, G., & Saris, W. E. (1998). Relationship between a restricted correlated uniqueness model
and a direct product model for multitrait–multimethod data. In A. Ferligoj (Ed.), Advances in methodology, data analysis and statistics: Metodološki Zvezki (pp. 151–172). Ljubljana, Slovenia: FDV.
Coenders, G., & Saris, W. E. (2000). Testing nested additive, multiplicative and general multitrait
multimethod models. Structural Equation Modeling, 7, 219–250.
Coenders, G., Satorra, A., & Saris, W. E. (1997). Alternative approaches to structural modeling of ordinal data: A Monte Carlo study. Structural Equation Modeling, 4, 261–282.
Cudeck. R. (1989). Analysis of correlation matrices using covariance structure models. Psychological
Bulletin, 105, 317–327.
de Wit, H. (1994). Cijfers en hun achterliggende realiteit: De MTMM-kwaliteitsparameters op hun
kwaliteit onderzocht [Numbers and their underlying reality: The MTMM quality parameters for
measurement quality research]. Unpublished doctoral dissertation, Catholic University of Leuven,
Belgium.
Gerbing, D. W., & Anderson, J. C. (1987). Improper solutions in the analysis of covaricance structures: Their interpretability and a comparison of alternate respecifications. Psychometrika, 52,
99–111.
Homer, P., & O’Brien, R. M. (1988). Using LISREL models with crude rank category measures. Quality and Quantity, 22, 191–201.
Hu, L., & Bentler, P. M. (1998). Fit indices in structure modeling: Sensitivity to underparameterized
model specification. Psychological Methods, 3, 424–453.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika, 34, 183–202.
Jöreskog, K. G. (1971). Statistical analysis of congeneric tests. Psychometrika, 36, 109–133.
Jöreskog, K. G., & Sörbom, D. (1988). PRELIS: A program for multivariate data screening and data
summarization (2nd ed.). Mooresville, IL: Scientific Software International.
Jöreskog, K. G., & Sörbom, D. (1989). LISREL VII: A guide to the program and applications. Chicago:
SPSS.
Jöreskog, K. G., & Sörbom, D. (1993). New features in LISREL8. Chicago: Scientific Software International.
Kenny, D. A. (1976). An empirical application of confirmatory factor analysis to the multitrait–
multimethod matrix. Journal of Experimental Social Psychology, 12, 247–252.
Költringer, R. (1995). Measurement quality in Austrian personal interview surveys. In W. E. Saris & A.
Münnich (Eds.), The multitrait–multimethod approach to evaluate measurement instruments (pp.
207–224). Budapest, Hungary: Eötvös University Press.
Kumar, A., & Dillon, W. R. (1992). An integrative look at the use of additive and multiplicative
covariance structure models in the analysis of MTMM data. Journal of Marketing Research, 29,
51–64.
Luijben, T. (1989). Statistical guidance for model modification in covariance structure analysis. Amsterdam: Sociometric Research Foundation.
Marsh, H. W. (1989). Confirmatory factor analysis of multitrait–multimethod data: Many problems and
a few solutions. Applied Psychological Measurement, 13, 335–361.
Marsh, H. W., & Bailey, M. (1991). Confirmatory factor analyses of multitrait–multimethod data: comparison of the behavior of alternative models. Applied Psychological Measurement, 15, 47–70.
Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indices: A clarification
of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 315–353), Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.
FIT OF MODELS FOR MTMM EXPERIMENTS
231
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor
analysis: The effect of sample size. Psychological Bulletin, 103, 391–410.
Mulaik, S. A., James. L. R., Van Alstine, J., Bennett, N., Lind, S., & Stillwell, C. D. (1989). Evaluation of goodness-of-fit indexes for structural equation models. Psychological Bulletin, 105,
430–445.
Muthén, B., & Kaplan, D. (1989). A comparison of some methodologies for the factor analysis of
non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and
Statistical Psychology, 45, 19–30.
O’Brien, R. M. (1985). The relationship between ordinal measures and their underlying values: Why all
the disagreement? Quality and Quantity, 19, 265–277.
O’Brien, R. M., & Homer, P. (1987). Correction for coarsely categorized measures: LISREL’s
polyserial and polychoric correlations. Quality and Quantity, 21, 349–360.
Olsson, U. (1979a). Maximum likelihood estimation of the polychoric correlation coefficient.
Psychometrika, 44, 443–460.
Olsson, U. (1979b). On the robustness of factor analysis against crude categorization of observations.
Multivariate Behavioral Research, 14, 485–500.
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika,
47, 337–347.
Parry, C. D. H., & McArdle, J. J. (1991). An applied comparison of methods for least-squares factor
analysis of dichotomous variables. Applied Psychological Measurement, 15, 35–46.
Quiroga, A. M. (1992). Studies of the polychoric correlation and other correlation measures for ordinal
variables. Unpublished doctoral dissertation, University of Uppsala.
Rodgers, W. L., Andrews, F. M., & Herzog, A. R. (1992). Quality of survey measures: A structural
modeling approach. Journal of Official Statistics, 8, 251–275.
Saris, W. E. (1990a). The choice of a model for evaluation of measurement instruments. In W. E. Saris
& A. van Meurs (Eds.), Evaluation of measurement instruments by meta-analysis of
multitrait–multimethod studies (pp. 118–129). Amsterdam: North Holland.
Saris, W. E. (1990b). Models for evaluation of measurement instruments. In W. E. Saris & A. van Meurs
(Eds.), Evaluation of measurement instruments by meta-analysis of multitrait–multimethod studies
(pp. 52–80). Amsterdam: North Holland.
Saris, W. E., den Ronden, J., & Satorra, A. (1987). Testing structural equation models. In P. Cuttance &
R. Ecob (Eds.), Structural modelling by example (pp. 202–220). New York: Cambridge University
Press.
Saris, W. E., Satorra, A., & Sörbom, D. (1987). The detection and correction of specification errors in
structural equation models. Sociological Methodology, 17, 105–129.
Saris, W. E., & Satorra, A. (1988). Characteristics of structural equation models which affect the power
of the Likelihood Ratio Test. In W. E. Saris & I. N. Gallhofer (Eds.), Sociometric research (Vol. 2, pp.
220–236). London: Macmillan.
Saris, W. E., Scherpenzeel, A. C., & van Wijk, T. (1998). Validity and reliability of subjective social indicators: The effect of different measures of association. Social Indicators Research, 45, 173–199.
Saris, W. E., & van Meurs, A. (Eds.). (1990). Evaluation of measurement instruments by meta-analysis
of multitrait–multimethod matrices. Amsterdam: North Holland.
Satorra, A., & Saris, W. E. (1985). The power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90.
Scherpenzeel, A. C. (1995). A question of quality: Evaluating survey questions by
multitrait–multimethod studies. Unpublished doctoral dissertation, University of Amsterdam.
Scherpenzeel, A. C., & Saris, W. E. (1997). The validity and reliability of survey questions. A meta
analysis of MTMM studies. Sociological Methods and Research, 25, 341–383.
van de Veld, W., & Saris, W. E. (2000). Problems in multitrait–multimethod studies. Kwantitatieve
Methoden, 64, 87–116.
232
CORTEN ET AL.
van Driel, O. P. (1987). On various causes of improper solutions in maximum likelihood factor analysis.
Psychometrika, 43, 225–243.
van Meurs, A., & Saris, W. E. (1990). Memory effects in MTMM studies. In W. E. Saris & A. van
Meurs (Eds.), Evaluation of measurement instruments by meta-analysis of multitrait-multimethod
matrices (pp. 136–146). Amsterdam, North Holland.
Widaman, K. F. (1985). Hierarchically nested covariance structure models for multitrait–multimethod
data. Applied Psychological Measurement, 9, 1–26.