Analysis of Structural Equation Models Fr Continuous Random

Analysis of structural equation models based on a
mixture of continuous and ordinal random variables in
the case of complex survey data
Stephen du Toit
Scientific Software International
Summary
The present paper deals with the use of design weights to fit SEM models to a mixture
of continuous and ordinal manifest variables with or without missing values with
optional specification of stratum and/or cluster variables. It also deals with the issue of
robust standard error estimation and the adjustment of the chi-square goodness of fit
statistic.
Results based on real data are presented.
Research was supported by SBIR grant 5R44AA014999-03 from NIAAA to Scientific
Software International.
1
1 Introduction
We assume that the population from which the sample data are obtained can be
stratified into H strata. Within each stratum h , nh clusters or primary sampling units
(PSUs) are drawn and within the h -th stratum and k -th cluster, nh ultimate sampling
k
units (USUs) are drawn with design weights whkl , where l denotes the l -th USU
within the k -th cluster, which in turn is nested within stratum h .
In the subsequent sections we briefly discuss parameter estimation and the Taylor
linearization method employed in LISREL to produce robust standard error estimates
under single stage sampling.
2 A reparameterization of the general LISREL model
y i = υ + Λ ( α + ui ) + ε i
⎛τy ⎞
⎟⎟
⎝ τx ⎠
υ = ⎜⎜
⎛ Λ I − B −1 0 ⎞
)
⎟
Λ=⎜ y(
⎜
0
Λ x ⎟⎠
⎝
⎛ α y + Γκ ⎞
⎟
κ ⎟⎠
⎝
α = ⎜⎜
⎛ w i + Γv i ⎞
⎟
v i ⎟⎠
⎝
ui = ⎜⎜
⎛ Ψ y + ΓΦΓ '
Cov ui = Ψ = ⎜
⎜
ΦΓ '
⎝
( )
ΓΦ ⎞
⎟
Φ ⎟⎠
2
⎛ εy ⎞
⎟
⎜ εδ ⎟
⎝ ⎠
εi = ⎜
Θε
⎜ Θδε
⎛
Cov (εi ) = Θ = ⎜
⎝
Θεδ ⎞
⎟
Θδ ⎟⎠
From the definitions above it follows that:
•
Λ y of the general LISREL model is part of the first p × m submatrix of Λ
•
•
•
•
Λ x of the general LISREL model is the final q × n submatrix of Λ
B of the general LISREL model is an m × m matrix with zero diagonal
elements and is part of the first p × m submatrix of Λ
Γ of the general LISREL model is an m × n submatrix of α and Ψ
τ y of the general LISREL model consists of the first p elements of υ
•
•
τ x of the general LISREL model consists of the final q elements of υ
α y of the general LISREL model consists of the first m elements of α
•
•
•
κ of the general LISREL model consists of the last n elements of α
•
•
•
Θεδ = Θδε
′ of the general LISREL model is the second p × q submatrix of Θ
Φ of the general LISREL model is the final m × m submatrix of Ψ
Ψ y of the general LISREL model is part of the first m × m submatrix of Ψ
Θε of the general LISREL model is the first p × p submatrix of Θ
Θδ of the general LISREL model is the final q × q submatrix of Θ
3
3 Mixture of ordinal and continuous dependent variables
Let
⎛ y 0i : pi ×1⎞
⎟,
⎜ y : qi ×1⎟
Ni
⎝
⎠
yi = ⎜
i = 1,2,
,N
where N denotes the number of cases and where the ( pi + qi ) ×1 vector y i of
manifest variables are partitioned into a pi ×1 vector y 0i of ordinal and a ( qi ×1)
vector y Ni of continuous manifest variables.
It is further assumed that the SEM model has m latent variables ui , where
u1, u 2 , , u N are i.i.d. N ( 0, Ψ ) .
The likelihood function for case i , is evaluated as
f ( y i ) = ∫ f ( y 0i , y Ni , ui )dui
ui
= ∫ f ( y 0i , y Ni | ui )g ( ui ) dui
ui
Under the assumption of conditional independence, it follows that
pi
(
)
qi
(
)
f ( y i ) = ∫ ∏ f y 0ij | ui ⋅ ∏ f y Nij | ui g ( ui ) dui .
ui j =1
j =1
Hence
⎧⎪ pi
f ( y i ) = exp ⎨ ln f y 0ij | ui +
⎪⎩ j =1
ui
∫
∑
(
)
pi
(
)
⎫
⎪
∑ ln f y Nij | ui + ln g (ui )⎬dui
j =1
⎪⎭
In general, a closed-form solution to this integral does not exist. To evaluate
integrals of the type described above, we use a direct implementation of GaussHermite quadrature.
4
With this rule, an integral of the form
I (t ) = ∫ f (t )exp ⎡⎣ −t 2 ⎤⎦ dt
is approximated by the sum
Q
I (t ) ≈ ∑ wu f ( zu ) ,
u =1
where wu and zu are weights and nodes of the Hermite polynomial of degree Q.
Adaptive quadrature generally requires fewer points and weights to yield estimates
of the model parameters and standard errors that are as accurate as would be
obtained with more points and weights in non-adaptive quadrature. The reason for
that is that the adaptive quadrature procedure uses the empirical Bayes means and
covariances, updated at each iteration to essentially shift and scale the quadrature
locations of each case (subject) in order to place them under the peak of the
corresponding integral.
(
)
f y 0ij | ui : Ordinal variables
Suppose that y 0ij has C categories, then
(
P y 0ij = c
(
)
) (
)
= P y 0ij ≤ c − P y 0ij ≤ c − 1 , c = 1,2,
(
, C −1
)
where P y 0ij = 0 = 0
(
)
(
)
and P y 0ij = c = 1 − P y 0ij ≤ c − 1 .
In LISREL the logit, probit, log-log and cumulative log-log functions are available.
For the logistic link function, for example,
(
) (
(
))
P y 0ij ≤ c = 1 1 + exp −ηij ,
where ηij = τ ic + λ 'ij ( α + ui ) .
5
In the case of ordinal variables it is assumed that the corresponding subsets of υ
and Θ are set to zero means and unit variances. The parameter τ ic is the so-called
threshold parameter.
(
)
f y Nij | ui : Normal variables
(
) (
f y Nij | ui = 2πθ jj
)
−1 2
exp−
1
2θ jj
(
)
2
yij − μij ,
where μij = v j + λ i' ( α + ui ) .
The parameter v j is the so-called intercept parameter and θ jj the residual variance.
Special case
y 0i depends only on u0i : m0 ×1
y Ni depends only on u Ni : mN ×1
⎛ u 0i ⎞
⎟⎟ and m = m0 + mN .
u
⎝ Ni ⎠
where ui : ( m ×1) = ⎜⎜
f ( y 0i , y Ni ) =
∫ ∫ f ( y 0i , y Ni , u0i , u Ni )du Ni du0i
u0 u N
⎧
= ⎪⎨
u0 ⎪
⎩ui
f ( y Nij | u N ) ⋅ g ( u N | u0 )
∫ ∫∏
j =1
qi
⎫
du N ⎪⎬ ×
⎪⎭
f ( y 0ij | u0 ) ⋅ g ( u0 ) du0 ,
∏
j =1
pi
Hence
⎧
pi
⎩⎪
j =1
(
)
⎫
f ( y i ) = ∫ exp ⎪⎨ln f ( y Ni | u0 ) + ∑ ln f y 0ij | u0 ln g ( u0 ) ⎪⎬du0
u0
⎭⎪
6
ln f ( y Ni | u0 )
Since y Ni ~ N ( μi , Σi ) , where y Ni = v + Λ Niα + Λ Niu Ni , hence μi = v + Λ Niα ,
Σi = Λ Ni Ψ 22Λ ′Ni + Dθ and u0i ~ N ( α, Ψ11 ) , it follows from well-known results for
normal conditional distributions that
(
y Ni ~ N μ y⋅0 , Σ y⋅0
)
where
−1
μ y⋅0 = υ + Λ Niα + Λ Ni Ψ 21Ψ11
( u 0i − α )
−1
Σ y⋅0 = Σi − Λ N 1Ψ 21Ψ11
Ψ12Λ ′N 1
Therefore
ln f ( y Ni | u0 ) = −
−1 2
pi
′
1
1
ln 2π − ln Σ y⋅0
− y Ni − u y⋅0 Σ −y1⋅0 y Ni − u y⋅0
2
2
2
(
)
(
)
7
4 Parameter estimation
Parameter estimation is relatively straightforward and can be summarized by the
following two steps.
Step 1
Calculate the natural logarithm of the likelihood function, ln L , where
H nh nhk
ln L = ∑∑∑ whkl ln f ( y hkl | γ )
(1)
h =1 k =1 l =1
Step 2
Obtain an estimate γ̂ of γ by solving the set of simultaneous equations
∂ ln L
=0
∂γ γ = γˆ
(2)
In general, no closed-form solution to the set of equations (2) exists, and therefore
parameter estimates are obtained interactively using the Fisher scoring algorithm:
( ) ( )
γˆ (t +1) = γˆ (t ) + I −n 1 γˆ (t ) g γˆ (t )
where γ̂ (t ) denotes the parameter values at iteration t , t = 1,2,
(3)
, and g (⋅) denotes the
gradient vector and where I n (⋅) denotes the information matrix.
8
Let C =
∂γ*′
presented in symbolic form as
∂γ
⎡ γ \ γ *′
⎢
⎢ vecΛ
y
⎢
⎢ vecΛ y
⎢
⎢ vecB
⎢
αy
⎢
⎢
κ
⎢
C = ⎢ vec ( Γ )
⎢
⎢ vecs Ψ y
⎢
⎢ vecs ( Φ )
⎢
⎢ vecs ( Θ ε )
⎢
⎢ vecs Θ δ
⎢
τy
⎢
⎢
τx
⎣⎢
( )
( )
⎡⎣ vecΛ ⎤⎦
C11
C21
C31
0
0
0
'
α ' ⎡⎣ vecsΨ ⎤⎦
0
0
0
0
0
0
C42
0
C52
0
C62
C63
'
⎡⎣ vecsΘ ⎤⎦
0
0
0
0
0
0
0
0
C73
0
0
0
C83
0
0
0
0
C94
0
0
0
C10,4
0
0
0
0
0
0
0
0
'
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
0 ⎥
⎥
0 ⎥
⎥
0 ⎥
⎥
0 ⎥
C11,5 ⎥⎥
C12,5 ⎥⎥⎦
υ'
0
0
0
0
0
0
Using the chain rule for matrix differentiation, it follows that
g(γ) = C
∂ ln L
∂γ *
(4)
and
⎡ ∂ 2 ln L ' ⎤
I n ( γ ) = − E ⎢C * *' C ⎥
⎣⎢ ∂γ ∂γ
⎦⎥
t +1
t
Iterations are continued until γˆi( ) − γˆi( ) ≺ ε ∀i = 1, 2,
(5)
, nfree where ε is a small
scalar value, e.g. 10-6.
9
5 Approximate covariance matrix of estimators
An approximate expression for the asymptotic covariance matrix of γ̂ is given by
Cov ( γˆ ) ≈ Ι n −1 ( γ ) GΙ n −1 ( γ )
Using results derived by Binder (1983) and Fuller (1975), it follows that, under single
stage sampling with replacement (WR) or without replacement (WOR)
nh (1 − f h ) nh
G=∑
∑ t hi. − t h.. t hi. − t h..
h=1 nh − 1 i =1
H
(
)(
)
'
where
nh
i
• nh = ∑ mhij , with mhij the number of cases within stratum h, cluster i, and
j =1
USU j .
n
• f h = h , the sampling rate for stratum h .
N
• t hij = g hij ( γˆ ), where g hij (γˆ ) is the hij -th contribution to the gradient vector
g(γ ) .
• t hi. =
• t h.. =
mh
ij
∑ t hij
j =1
nh
1
t hi.
nh ∑
i =1
In practice, we assume a zero contribution to G for strata that contain a single PSU
(cluster). Additionally, if there is no variable to define clusters, the observations
within each stratum are treated as being the primary sampling units.
10
6 Adjustment to the chi-square goodness of fit statistics
2
Simulation studies indicated that the χ LR
-statistic based on the difference between
two deviance statistics in general yields a too high rejection rate.
Let
)
(
(
d1 = tr Ι n ( γˆ 1 ) Cov ( γˆ 1 ) , d 2 = tr Ι n ( γˆ 2 ) Cov ( γˆ 2 )
)
( )
where Ι n γ s , s = 1, 2 denotes the information matrix under H1 and H 2 .
A correction to the χ 2 -statistic for testing the difference in two deviance statistics is
given by
2
2
χ robust
= c × χ LR
,
where ( nfree2 > nfree1 )
c=
nfree2 − nfree1
,
abs(d1 − d 2 )
and where nfree1 and nfree2 respectively denote the total number of parameters to be
estimated under the H1 and H 2 models.
11
7 Example: Confirmatory Factor Analysis Model
The data set forms part of the data library of the Alcohol and Drug Services Study
(ADSS). The ADSS is a national study of substance abuse treatment facilities and
clients. Background data and data on the substance abuse of a sample of 1752
clients were obtained. The sample was stratified by census region and within each
stratum a sample was obtained for each of three facility treatment types within
each of the four census regions.
The following variables included in the PSF were selected from the survey data:
o CENREG: This variable indicates the census region and has four categories,
these being "Northeast", "Midwest", "South", and "West" respectively.
o FACTYPE: The facility treatment type has four categories, too, representing
facilities with "residential treatment", "outpatient methadone treatment",
"outpatient non-methadone treatment", and "more than one type of
treatment" respectively.
o COCEU: An indicator variable with value "1" if the respondent has ever used
cocaine, and "0" otherwise.
o MAREU: An indicator variable with value "1" if the respondent has ever used
marijuana, and "0" otherwise.
o DEPR: This indicator variable is coded "1" if the respondent is depressed,
and "0" otherwise.
o EDU: A categorical variable representing the respondent's level of education
at admission. It has 5 categories, these being (from 1 to 5) "less than 8
years", "8 – 11 years or less than High School graduate", "High School
graduate / GED", "some college", and "college graduate / postgraduate".
12
o JAILR: This indicator variable indicates whether the respondent had a prison
or jail record prior to admission.
o NUMTE: A count variable, indicating the total number of treatment episodes
prior to admission.
From the main menu bar, select the Data, Survey Design … option.
Add the survey design variables to the appropriate text boxes.
13
The SIMPLIS syntax file is shown below. The only difference between this and the
usual SIMPLIS syntax is the addition of the paragraph
$ADAPQ(12) CLL
Other options are PROBIT, LOGLOG and LOGIT.
14
A portion of the LISREL output is shown next.
15
Fit statistics and threshold values are shown below.
16
8 Example: Confirmatory Factor Analysis Model with latent
variable relationship and latent variable means
17
Mean Model
18
Alternative Parameterization
19
9 Example: Confirmatory Factor Analysis Model with a mixture
of ordinal and continuous variables
20
21
22
Further Reading
An, A.B. (2003). Performing Logistic Regression on Survey Data with the New
SURVEYLOGISTIC. Paper 258-27 presented at SUGI 27, held on April 14-17, 2003.
Orlando, Florida.
Binder, D.A. (1983). On the Variances of Asymptotically Normal Estimators from
Complex Surveys. International Statistical Review, 51, 279-292.
Binder, D.A., & Patak, Z. (1994). Use of estimating functions for individual
estimation from complex surveys, Journal of the American Statistical Association,
83, 1035-1043.
Chambers, R. & Skinner, C.J. (ed.) Analysis of Survey Data. NY: Wiley, 2003.
Feder, M., Nathan, G., & Pfefferman, D. (2000). Multilevel Modeling of Complex
Survey Longitudinal Data with Time Varying Random Effects, Survey Methodology,
26(1), 53-65, Statistics Canada.
Fuller, W.A. (1975). Regression Analysis for Sample Survey. Sankhya, Series C, 37,
117-132.
Horvitz, D. G., and Thompson, D. J. (1952). A generalization of sampling
without replacement from a finite universe, Journal of the American Statistical
Association, 47, 663 - 685.
Jöreskog, K.G. & Sörbom, D. (2004). LISREL 8.70 for Windows [Computer
Software]. Lincolnwood, IL: Scientific Software International, Inc.
Jöreskog, K.G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A
comparison of three approaches, Multivariate Behavioral Research, 36(3), 347-387.
Graubard & Korn (1993). Hypothesis testing with complex survey data, JASA, 88,
629-641.
Kish, L. (1965). Survey Sampling, New York: John Wiley.
Kish, L., & Frankel, M.R. (1974). Inference from Complex Samples, Journal of Royal
Statistical Society, B(36), 1-37.
Muthen, B.O. (1984). A general structural equation model with dichotomous, ordinal,
categorical, and continuous latent variable indicators, Psychometrika, 49, 115-132.
Muthen, BO., & Satorra, A. (1995). Complex sample data in structural equation
modeling, in: P Marsden (Ed.). Sociological Methodology, 216-316.
Nathan, G. (1988). Inference Based on Data from Complex Sample Designs, in:
Handbook of Statistics, P.R. Krishnaiah & C.R. Rao (Eds.), Amsterdam: Elsevier, 247266.
Rao, J.N.K. (1975). Unbiased variance estimation for multistage designs, Sankhya,
C(37), 133-139.
Rao, J.N.K. (1994). Estimation of totals and distributing functions using auxiliary
information at the estimation stage, J. Official Statist., 10, 153-165.
23
Rao, J.N.K., & Scott, A.J. (1981). The Analysis of Categorical Data from Complex
Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way
Tables, Journal of the American Statistical Association, 76, 221-230.
Rao, J.N.K., Scott, A.J., & Skinner, C.J. (1998). Quasi-score test with survey data,
Statistica Sinica 8, 1059-1070.
Sarndal, C.E., Swensson, B., & Wretman, J. (1992). Model assisted survey
sampling, New York: Springer.
SAS Institute, Inc. (2004). SAS/STAT®: User’s Guide. Cary, NC: SAS Institute, Inc.
Satorra, A., & Bentler, PM. (1994). Corrections to test statistics and standard
errors in covariance structure analysis, in: A. van Eye & C.C. Clogg (Eds.), Latent
variable Analysis in Developmental Research, 285-305, Thousand Oaks, CA: Sage
Publications.
Shapiro, G. M., & Bateman, D.V. (1978). A better alternative to the collapsed
stratum variance estimate, Proceedings of the Survey Research Methods, American
Statistical Association, 451-456.
Skinner, C.J., Holt, D., & Smith, T.M.F. (1989). Analysis of Complex Surveys,
Chichester: Wiley.
Skinner, C. J. (1989). Domain means, regression and multivariate analysis. In
Analysis of Complex Surveys (eds. C.J. Skinner, D. Holt and T.M.F. Smith) 59-87,
Wiley.
Smith, T., and Holmes, D. (1989) Multivariate Analysis. In Analysis of Complex
Surveys (eds. C.J. Skinner, D. Holt and T.M.F. Smith) 165-190,Wiley.
SUDAAN User Manual Release 8.0, Second Edition. (2002). Research Triangle
Institute.
Traat, I., Meister, K., & Sostra, K. (2001). Statistical inference in sampling theory,
Theory of stochastic processes, 7(23), no. 1-2, 301-316.
Wolter, K. M. (1985). Introduction to Variance Estimation, New York: Springer-Verlag.
Yates, F., and Grundy, P.M. (1953). Selection without replacement from within
strata with probability proportional to size, Journal of the Royal Statistical Society,
B(15), 253 - 261.
24