Linear Growth Quadratic Growth

An Introduction to
Latent Curve Models
Instructor: Shelley Blozis, UC Davis
1
Outline
•
•
•
•
Longitudinal panel study design
Latent Curve Models
Missing Data
Some examples of fitting latent curve models
using Mplus and SAS PROC MIXED
2
Longitudinal Panel Design
• A sample of subjects is
observed at multiple
points in time
Intellectual Development
Test Score
20
15
• The timing of the measurements
can be the same for all individuals
10
5
0
0
5
10
Age
15
20
– Fixed Occasions Design
• Or the timing can vary between
individuals
– Varying Occasions Design
3
Latent Growth Curve Models
of several
different
statistical
•• One
But the
parameters
of the
functionmethods
vary for
the
analysis
of longitudinal panel data
between
individuals
• Assume that all individuals in a population
This allows
for the
individual
curves to vary
have
the same
functional
form
Intellectual Development
Test Score
20
15
10
5
0
0
5
10
15
20
Age
4
BRIEF DETOUR: COMMON FACTOR
ANALYSIS
The latent curve model is based on a latent variable model
5
Common factor model
• X1 = λ 1 + δ 1
• X2 = λ 2 + δ 2
• X3 = λ 3 + δ 3
• X = Λξ + δ
–
–
–
–
X : set of manifest variables
Λ : factor loading matrix
ξ : factors (latent variables)
δ : set of uniquenesses
6
Common factor model with a mean structure
• X = τ + Λξ + δ
– X is the set of manifest
variables
– τ is the set of intercepts
– Λ is the factor loading
matrix
– ξ is the set of factors
(latent variables)
– δ is the set of
uniquenesses
7
Latent Variables
• Multiple manifest variables serve as indicators
of an underlying, unobserved variable
• Indicators are reflective of the construct
• Example: intelligence, self-esteem, life
satisfaction
8
Similar to the common factor analysis model
with the mean structure, the latent curve model
is used to account for the means, variances and
covariances of the observed scores measured
over time
9
Making the distinction
• Factor analysis
– The factors of a factor analysis model represent
unobservable variables, such as intelligence or
self-esteem
• In a latent curve model, the factors represent
characteristics of change
– Not latent variables in the standard sense
– Represent unobservable characteristics of change
10
Longitudinal panel study:
Intellectual test score for n = 100 children
assessed up to four times at different ages
Intellectual Development
• Let yti be an
observed test score
Test Score
20
15
10
5
0
0
5
10
Age
15
20
– t is an occasion, t =
1,…,4
– i is the individual
11
Intellectual Development
Test Score
20
15
10
5
0
0
5
10
15
20
Age
• A latent curve model
yi = Ληi + εi
– Based on a common factor model with a mean structure
• Using this framework, specify a form of
change for the response
12
If a measured response is modeled by a linear function of
time, then a latent curve model could be specified as
where
13
Model intellectual test scores using a
linear function of children’s age
Intellectual Development
Individual i’s expected
rate of change
Test Score
20
15
10
5
0
0
5
10
15
20
Age
The expected value of yi at Agei = 0
The expected value of yi at (Agei – 5)
14
Linear Growth Model
But often the intent is to
describe variation between
individuals in the coefficients
and possibly account for this
variation
From this model we can estimate each individual’s set of
coefficients
We can also estimate the population trajectory
15
Use the individual-specific coefficients
to estimate each person’s trajectory
We can then work with the model to describe variation between
individuals in their response level and rate of change
16
Linear growth model
• The coefficients are each a sum of
– a fixed effect (a constant for the population) and
– a random effect (unique to the individual)
17
Model
Assumptions
The residual
• Assume that the responses of each individual follow an
underlying trajectory; here we assume linear growth
• The observations are assumed to be due to this trajectory
plus other factors that are captured in the residual
– Other factors include possible measurement error, as well as other
factors apart from time
18
An illustration
Observed scores that
includes measurement
error
“inherent” trajectory for
an individual
Scores free of
measurement error
19
Model
Assumptions
The residual
• Assume that the responses of each individual follow an
underlying trajectory; here we assume linear growth
• The observations are assumed to be due to this trajectory
plus other factors that are captured in the residual
– Other factors include possible measurement error, as well as other
factors apart from time
• Possible assumptions about the residuals
– Independent between individuals
– Independent within individuals
– Constant variance across time
20
Model
Assumptions
The person-specific coefficients
• Assume the coefficients vary between individuals according to
the random effects, b0i and b1i
21
The ‘typical’ trajectory
An illustration
“inherent” trajectory for
an individual
Observed scores
Source: Harring & Blozis (2014), Behav Res, 46, 3720384
22
Model
Assumptions
The person-specific coefficients
• Assume the coefficients vary between individuals according to
the random effects, b0i and b1i
• Each individual can have a curve that is unique from those of
others
• Under the model, b0i and b1i are often assumed to be
normally distributed, means equal to 0
– Each has a variance to describe individual differences
– The two random effects can covary
23
Model
Assumptions
The population model
• Describes the typical response
Figure adapted from Blozis & Harring (2016), Structural Equation
Modeling, 23(6), 904-920.
24
Latent Curve Models versus
Multilevel Models
• We have options in how we approach the
analysis
– Latent curve models
• As a structural equation model (SEM)
– Multilevel models
• It’s possible to specify equivalent models
under the two approaches, and by applying
the same estimation methods, obtain identical
results
25
Estimating the model:
A latent curve model vs. a multilevel model
• Using Mplus to take the latent curve model
approach and SAS PROC MIXED (or R) to take
the multilevel model approach
– Fit a linear growth model to repeated measures of
intellectual ability
26
Latent Curve Model Approach
• The intercept and slope are the latent
variables
• Time (Age) is incorporated into the model as
specific and constrained values of the factor
loadings, all possibly unique to each person
27
Factor loadings,
specified as fixed,
constrained to specific
values
The means of ‘Int’ and ‘Slope’
relate to the population model
The variances of ‘Int’ and ‘Slope’
relate to variation in the level and
rate of change across individuals
‘Int’ and ‘Slope’ free to covary
28
Data model: yi = Ληi + εi
• The factor loading matrix, Λ, reflects assumptions about the
pattern of change in the response variable
– Each column is known as a ‘basis function’
– For linear change, Λ has two columns
• The first is a column of ones to represent the intercept
• The second is a column with values equal to the times of
measurement
– In our example, time is represented by the child’s ages at each assessment
• Λ =
1
5.58
1
7.83
1
10.58
1
17.17
29
Data model: yi = Ληi + εi
• The factor, ηi, is unknown and varies across
individuals
– For an individual, the elements that make up ηi
represent different aspects of change in y
– For linear change, ηi contains two factors
• ηi = (η0i,η1i )'
– η0i is individual i’s intercept
– η1i is individual i’s slope
• The factors are weights, each linked to a basis function
defined in the factor matrix
30
Data model: yi = Ληi + εi
• According to the model
– An observed score is modeled as a weighted linear
combination of the basis functions plus residual
– Underlying trajectory for an individual is given by
Ληi
– The residual is the difference between the
observed scores and the individual’s underlying
trajectory
31
Estimation using Mplus
• Obtain maximum likelihood estimates of
model parameters
• Estimate
– Factors means
– Factor variances and their covariance
– A common variance of the time-specific residuals
32
Setting up the data file
• Using the latent variable model approach, set
up the data file in wide format
33
Save as ascii file; no header
Indicate
missing data
34
Mplus syntax for fitting a linear growth model
random effects
The | symbol is used in conjunction with
TYPE=RANDOM to name and define the random
effects
35
Mplus syntax for fitting a linear growth model
Specifying ‘maximum likelihood’ estimation –
we have other options for estimation
The model assumes that the variances of the
residuals are constant across time
“(1)” will constrain the residual variances to be
equal
36
Results from Mplus
37
Fitting the model using a
multilevel model approach
• Using SAS PROC MIXED
– Using the same estimator as was used in Mplus
(maximum likelihood)
• PROC MIXED requires data in long-format
38
Data file in wide format
Data for the
first child
(famid = 1)
Data for the
2nd child
(famid = 2)
39
Bring data into SAS
40
PROC MIXED syntax for fitting a linear
growth model with random coefficients
41
Results from SAS PROC MIXED
Fixed Intercept and Slope
Variance of the random intercept
Covariance between the random
Variance
theslope
random slope
interceptof
and
42
Mplus and PROC MIXED
comparison
43
Missing Data
• Missing data are often encountered in
longitudinal panel studies
– Participants may miss one or more of the planned
assessments, including those who drop from a
study and do not return
44
Intelligence test scores
Only 16% of cases have complete data
for all 4 waves
45
How latent curve models and
multilevel models handle missing data
• Even though many participants have
incomplete data for the 4 waves, all
participants are included in the analysis
• Due to the method of estimation
– Maximum likelihood
– Does not require complete data for the response
variable
46
In the intelligence study, scores are studied
according to Age that differs between children
• For children who have
complete data for all 4
waves, Λ is composed of 4
rows, 2 columns, ages are
unique to the child
1
5.58
1
7.83
1
10.58
1
17.17
• For children who have data
for waves 1-3 but not 4, Λ is
composed of 3 rows, 2
columns, ages are unique to
the child
1
2.33
1
5.17
1
11.42
47
Due to the way in which the parameters of the
models are estimated, missing response data
(e.g., intelligence test scores) are handled
• This is different from other statistical
methods, such as ANOVA, that require
complete data for all cases for estimation
48
Assumptions about missing data
• For complete-case methods (e.g., ANOVA), data are
assumed to be missing completely at random
(MCAR)
– MCAR = whether data are missing or not is independent of
the missing data as well as any observed data
• For available-case methods (e.g., latent curve
models) data are assumed to be missing at random
(MAR)
– MAR = whether data are missing or not is independent of
the missing values; possibly related to any observed data
49
Back to our example
• Linear growth is one possible model to
describe test scores
• Can we improve on model fit by considering a
nonlinear form of change, such as by applying
a quadratic growth model?
50
Interpretation
•
•
•
•
Age is centered at 5 years
Intercept
Linear slope
Quadratic slope
51
Slight modification of the
Mplus syntax
52
Quadratic Growth Model - Results
53
Comparison of Model Fit
Linear Growth
Quadratic Growth
54
For nested models, we can use the
likelihood ratio test to compare fit
• Calculate the deviance for each model
– Deviance = -2*loglikelihood
• Statistic: chi-square = difference in deviances
• Linear growth: deviance = -2*(-387.508) = 775.016
• Quadratic growth: deviance = -2*(-363.494) = 726.988
• Chi-square = 775.016 - 726.988 = 48.028
– df = difference in degrees of freedom between models
– df = 4
– Chi-square of 48.028 with 4 df, p < .0001
55
Accounting for between-person
differences in the random effects
• Mother’s IQ test score
• Repeated measures of child test score
56
Mplus syntax for a
conditional growth model
57
Conditional Growth Model - Results
58
Conditional Growth Model Results
59
One more model:
Evaluate our assumptions about the withinsubject residuals
• Typical assumption
– Residuals are independent with constant variance
across time
• Other structures are possible
– Autocorrelation
– Independent with heterogeneous variances across
time
60
Allow for autocorrelation between
adjacent residuals
61
Results
62
Just an introduction to
latent curve models
• As a framework for the analysis of longitudinal
panel data, these models offer many
possibilities
• Are there any benefits to a latent curve model
approach versus multilevel?
– Yes!
63
Resources
• Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An introduction to
latent variable growth curve modeling: Concepts, issues, and application
(2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
• Bollen, K. A., & Curran, P.J. (2006). Latent curve models: A structural
equation perspective. Hoboken, NJ: Wiley.
• Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models,
hierarchical models, and individual growth models. Journal of Educational
and Behavioral Statistics, 23, 323-355.
• Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis:
Modeling change and event occurrence. New York: Oxford University
Press.
64