Latent Class Analysis
Karen Bandeen-Roche
October 27, 2016
Objectives
For you to leave here knowing…
• When is latent class analysis (LCA) model useful?
• What is the LCA model its underlying assumptions?
• How are LCA parameters interpreted?
• How are LCA parameters commonly estimated?
• How is LCA fit adjudicated?
• What are considerations for identifiability / estimability?
Motivating Example
Frailty of Older Adults
“…the sixth age shifts into the lean and
slipper’d pantaloon, with spectacles on nose and
pouch on side, his youthful hose well sav’d, a
world too wide, for his shrunk shank…”
-- Shakespeare, “As You Like It”
The Frailty Construct
Fried et al., J Gerontol 2001; Bandeen-Roche et al., J Gerontol, 2006
Frailty as a latent variable
• “Underlying”: status or degree of syndrome
• “Surrogates”: Fried et al. (2001) criteria
– weight loss above threshold
– low energy expenditure
– low walking speed
– weakness beyond threshold
– exhaustion
Part I:
Model
Latent class model
ε1
Y1
…
Frailty
η
Ym
εm
Measurement
Structural
Well-used latent variable models
Latent
variable
scale
Observed variable scale
Continuous
Discrete
Continuous Factor analysis Discrete FA
LISREL
IRT (item response)
Discrete
Latent profile
Latent class
Growth mixture analysis,
regression
General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)
Analysis of underlying subpopulations
Latent class analysis
POPULATION
…
P1
∏11
Y1
∏1M
…
YM
Ui
PJ
∏J1
Y1
∏JM
…
YM
Lazarsfeld & Henry, Latent Structure Analysis, 1968; Goodman, Biometrika, 1974
Latent Variables: What?
Integrands in a hierarchical model
• Observed variables (i=1,…,n): Yi=M-variate; xi=P-variate
• Focus: response (Y) distribution = GYYxx(y/x)
( y x ) ; x-dependence
• Model:
– Yi generated from latent (underlying) Ui:
(Measurement)
F (y U ) = u
, x;π )
Y U ,x
– Focus on distribution, regression re Ui:
F (u x; β ) (Structural)
Ux
• Overall, hierarchical model:
FY x ( y x ) = ∫ FY U ,x ( y U = u, x )dFU x (u x )
Latent Variable Models
Latent Class Regression (LCR) Model
• Model:
J
M
fY x ( y x ) = ∑ Pj ∏ π mj
j =1
• Structural model:
(1 − π mj )1− ym
m =1
[U x ] = Pr{U
i
ym
i
i
= j } = Pr{η = j } = Pj , j = 1,..., J
• Measurement model: [Yi Ui ]
π mj = Pr{Yim = 1Ui = j } = Pr{Yim = 1ηi = j }
= “conditional probabilities”
>
isπMxJ
• Compare to general form:
FY x ( y x ) = ∫ FY U ,x ( y U = u, x )dFU x (u x )
Latent Variable Models
Latent Class Regression (LCR) Model
• Model:
J
M
fY x ( y x ) = ∑ Pj ∏ π mj
j =1
ym
m =1
1− y m
(1− π )
mj
• Measurement assumptions: [Yi Ui ]
– Conditional independence
Ø {Yi1,…,YiM} mutually independent conditional on Ui
Ø Reporting heterogeneity unrelated to measured,
unmeasured characteristics
Latent Variable Models
Latent Class Regression (LCR) Model
• Model:
J
M
fY x ( y x ) = ∑ Pj ∏ π mj
j =1
ym
m =1
1− y m
(1− π )
mj
• Measurement assumptions: [Yi Ci ]
– Conditional independence
Ø {Yi1,…,YiM} mutually independent conditional on Ci
Ø Reporting heterogeneity unrelated to measured,
unmeasured characteristics
Analysis of underlying subpopulations
Method: Latent class analysis
• Seeks homogeneous subpopulations
• Features that characterize latent groups
– Prevalence in overall population
– Proportion reporting each symptom
– Number of them
= least to achieve homogeneity / conditional independence
Latent class analysis
Prediction
• Of interest: Pr(C=j|Y=y)
= posterior probability of class membership
• Once model is fit, a straightforward calculation
Pr(C=j|Y=y) = Pr (C = j, Y = y )
Pr (Y = y )
M
Pj ∏ π mjym (1 − π mj )
=
J
m =1
m
1− ym
1− ym
ym
P
π
∑ k ∏ mk (1 − π mk )
k =1
m =1
=θ ij when evaluated at yi
Part II:
Fitting
Estimation
Broad Strokes
• Maximum likelihood
– EM Algorithm
– Simplex method (Dayton & Macready, 1988)
– Possibly with weighting, robust variance correction
• ML software
– Specialty: Mplus, Latent Gold
– Stata: gllamm
– SAS: macro
– R: poLCA
• Bayesian: winBugs
Estimation
Methods other than EM algorithm
• Bayesian
• MCMC methods (e.g. per Winbugs)
• A challenge: label-switching
• Reversible-jump methods
• Advantages: feasibility, philosophy
• Disadvantages
• Prior choice (high-dimensional; avoiding illogic)
• Burn-in, duration
• May obscure identification problems
Estimation
Likelihood maximization: E-M algorithm
A process of averaging over missing data – in
this case, missing data is class membership.
Estimation
Likelihood maximization: E-M algorithm
• Rationale: LVs as “missing” data
• Brief review
• “Complete” data W = {Y , x, u}
• Complete data log likelihood
= w (φ | w)
= log Fy ,u|x ( y , u | x, φ ) taken as a function of ϕ
• Iterate between
• (K+1) E-Step: evaluate Q(φ | φ ( k ) ) = Eu| y , x w (φ | W ) | y, x;φ ( k )
[
• (K+1) M-Step: maximize
Q(φ | φ (k ) )
wrt ϕ
• Convergence to a local likelihood maximum under regularity
Dempster, Laird, and Rubin, JRSSB, 1977
]
Estimation
EM example: Latent Class Model
m
J
⎧ J
1− y im ⎫
y im
max L = ∑ log⎨∑ Pj ∏ π mj (1 − π mj ) ⎬ + ψ ∑ Pj
i =1
j =1
⎩ j =1 m =1
⎭
η
n
∧
yimθij
∂L n θij (yim − π mj )
:∑
= 0 ⇒ π mj = ∑ n
∂π mj i =1 π mj (1 − π mj )
i =1
∑θ
h =1
∧
∂L n
1
: ∑ {θij − Pj n}= 0 ⇒ Pj = ∑θij
∂Pj i =1
n
hj
EM-Algorithm
Latent class model
1.
2.
3.
4.
5.
A process of averaging over missing data – in
this case, missing data is class membership.
Choose starting set of posterior probabilities
Use them to estimate P and π (M-step)
Calculate Log Likelihood
Use estimates of P and π to calculate posterior
probabilities (E-step)
Repeat 2-4 until LL stops changing.
Global and Local Maxima
Multiple starting values very important!
Example: Frailty
Women’s Health & Aging Studies
• Longitudinal cohort studies to investigate
– Causes / course of physical and cognitive disability
– Physiological determinants of frailty
– Up to 7 rounds spanning 15 years
• Companion studies in community, Baltimore, MD
– ≥ moderately disabled women 65+ years: n=1002
– ≤ mildly disabled women 70-79 years: n=436
• This project: n=786 age 70-79 years at baseline
– Probability-weighted analyses
Guralnik et al., NIA, 1995; Fried et al., J Gerontol, 2001
Example: Latent Frailty Classes
Women’s Health and Aging Study
Conditional Probabilities (π)
Criterion
2-Class Model
CL. 1
“NONFRAIL”
CL. 2
“FRAIL”
Weight Loss
.073
.26
Weakness
.088
Slowness
3-Class Model
CL. 1
“ROBUST”
CL. 2
“INTERMED.”
CL. 3
“FRAIL”
.072
.11
.54
.51
.029
.26
.77
.15
.70
.004
.45
.85
Low Physical
Activity
.078
.51
.000
.28
.70
Exhaustion
.061
.34
.027
.16
.56
Class
Prevalence (P)
(%)
73.3
26.7
39.2
53.6
7.2
Bandeen-Roche et al., J Gerontol, 2006
Example: Latent Frailty Classes
Women’s Health and Aging Study
Conditional Probabilities (π)
Criterion
2-Class Model
3-Class Model
CL. 1
“ROBUST”
CL. 1
“NONFRAIL”
CL. 2
“FRAIL”
Weight Loss
.073
.26
.072
.11
.54
Weakness
.088
.51
.029
.26
.77
Slowness
.15
.70
.004
.45
.85
Low Physical
Activity
.078
.51
.000
.28
.70
Exhaustion
.061
.34
.027
.16
.56
Class
Prevalence (P)
(%)
73.3
26.7
39.2
53.6
7.2
CL. 2
“INTERMED.”
CL. 3
“FRAIL”
We estimate that 26% in the “frail”
Subpopulation exhibit weight loss”
Bandeen-Roche et al., J Gerontol, 2006
Part III:
Evaluating Fit
Choosing the Number of Classes
•
•
•
•
a priori theory
Chi-Square goodness of fit
Entropy
Information Statistics
– AIC, BIC, others
• Lo-Mendell-Rubin (LMR)
– Not recommended (designed for normal Y)
• Bootstrapped Likelihood Ratio Test
Entropy
Measures classification error
0 – terrible
1 – perfect
⎡ N J
⎤
%
%
= j | Yi ) *log ⎡⎣ Pr( SCi i=j
= j | Yi ) ⎤⎦ ⎥
⎢ −∑∑ Pr( SCi i=j
i =1 j =1
⎣
⎦
E = 1−
N *log( J )
Dias & Vermunt (2006)
Information Statistics
• s = # of parameters
• N= sample size
• smaller values are better
• AIC: -2LL+2s
• BIC: -2LL + s*log(N)
BIC is typically recommended
- Theory: consistent for selection in model family
- Nylund et al, Struct Eq Modeling, 2007
Likelihood Ratio Tests
• LCA models with different # of classes NOT
nested appropriately for direct LRT.
• Rather: LRT to compare a given model to
the “saturated” model
– LCA df (binary case):
J-1
+
J*M
P parameters
π parameters
(sum to 1)
(M items*J classes)
– Saturated df:
2M -1
– Goodness of fit df:
2M – J(M+1)
Bootstrapped Likelihood Ratio Test
• In the absence of knowledge about theoretical
distribution of difference in –2LL, can construct
empirical distribution from data.
• per Nylund (2006) simulation studies, performs
“best”
Example: Frailty Construct Validation
Women’s Health & Aging Studies
• Internal convergent validity
• Criteria manifestation is syndromic
“a group of signs and symptoms that occur
together and characterize a particular abnormality”
- Merriam-Webster Medical Dictionary
Validation: Frailty as a syndrome
Method: Latent class analysis
• If criteria characterize syndrome:
– At least two groups (otherwise, no cooccurrence)
– No subgrouping of symptoms (otherwise,
more than one abnormality characterized)
Conditional Probabilities of Meeting Criteria
in Latent Frailty Classes WHAS
Criterion
2-Class Model
CL. 1
“NONFRAIL”
CL. 2
“FRAIL”
Weight Loss
.073
.26
Weakness
.088
Slowness
3-Class Model
CL. 1
“ROBUST”
CL. 2
“INTERMED.”
CL. 3
“FRAIL”
.072
.11
.54
.51
.029
.26
.77
.15
.70
.004
.45
.85
Low Physical
Activity
.078
.51
.000
.28
.70
Exhaustion
.061
.34
.027
.16
.56
Class
Prevalence (%)
73.3
26.7
39.2
53.6
7.2
Bandeen-Roche et al., J Gerontol, 2006
Results: Frailty Syndrome Validation
• Data: Women’s Health and Aging Study
• Single-population model fit: inadequate
• Two-population model fit: good
– Pearson χ2 p-value=.22; minimized AIC, BIC
• Frailty criteria prevalence stepwise across classes—no
subclustering
• Syndromic manifestation well indicated
Example
Residual checking
• Frailty construct
Part IV:
Identifiability / Estimability
Identifiability
• Rough idea for “non”-identifiability: More unknowns than
there are (independent) equations to solve for them
• Definition: Consider a family of distributions
FΦ = {F( y,φ );φ ∈ Φ}. The parameter φ ∈ Φ is (globally)
identifiable iff ∃ no φ * ∈ Φ : F(y,φ ) = F(y,φ * ) a.e.
Identifiability
Related concepts
• Local identifiability
• Basic idea: ϕ identified within a neighborhood
• Definition: F is locally identifiable at φ0 if there
exists a neighborhood τ about φ0 : F ( y;φ0 ) = F ( y, φ ) ⇒
φ = φ0 for all φ ∈ τ ⊂ Φ.
• Estimability, empirical identifiability: The information
matrix for ϕ given y1,…,yn is non-singular.
Identifiability
Latent class (binary Y)
• Latent class analysis (measurement only)
• Parameter dimension: 2M -1
• Unconstrained J-class model: J-1 + J*M
• Need 2M ≥ J(M+1) (necessary, not sufficient)
• Local identifiability: evaluate the Jacobian of the
likelihood function (Goodman, 1974)
• Estimability: Avoid fewer than 10 allocation per “cell”
• n > 10*(2M)
(rule of thumb)
Identifiability / estimability
Frailty example
• Latent class analysis
• Need 2M ≥ J(M+1) (necessary, not sufficient)
• M=5; J=3;
• 32 ≥ 3·(5+1) – YES
• By this criterion, could fit up to 9 classes
• Local identifiability: evaluate the Jacobian of the
likelihood function (Goodman, 1974)
• Estimability: n > 10*(2M)
• n > 10*(25) = 320 - YES
Objectives
For you to leave here knowing…
• When is latent class analysis (LCA) model useful?
• What is the LCA model its underlying assumptions?
• How are LCA parameters interpreted?
• How are LCA parameters commonly estimated?
• How is LCA fit adjudicated?
• What are considerations for identifiability / estimability?
© Copyright 2026 Paperzz