PDF

OUTLINE
Linear and Generalized Linear Mixed
Models with Flexible Random Eects
Distribution for Longitudinal Data
1. Introduction and motivation
2. Linear mixed models with exible random eects
distribution
Daowen Zhang
North Carolina State University
SNP for random eects distribution
Estimation and Inference
Monte Carlo EM algorithm with \double" rejection
sampling
3. Generalized linear mixed models with exible random
eects distribution
[email protected]
http://www.stat.ncsu.edu/dzhang2/
4. Application
5. Simulation studies
6. Summary and discussion
Based on joint works with Marie Davidian and Junliang Chen
Slide 1
Slide 2
1
1. Introduction and Motivation
Longitudinal data: Clinical trials, epidemiological studies, social
science studies, etc.
Features of longitudinal data:
Each subject has repeated measures over time.
Our interest is not limited to the subjects in the sample;
instead, we want to make inference for the population from
which the sample is drawn.
Each subject's data tend to be more similar than data from
other subjects =) correlation.
An example: Framingham study
In this study, each of 2634 participants was examined every 2 years
for a 10 year period for his/her cholesterol level.
Study objectives:
1. How does cholesterol level change over time?
2. How are the cholesterol level and its change associated with sex
and baseline age?
A subset of 200 subjects' data is used for illustrative purpose.
Correlation has to be taken into account to yield valid inference.
Slide 3
Slide 4
Cholesterol level over time for a subset of 200 subjects
from Framingham study
Linear Mixed Model for Longitudinal Data
Data:
Cholesterol levels over time
400
•
300
250
150
200
Cholesterol level
350
••
•
•
•
•
••
••
••
•
••••
•••
••
•••
••
•••••
•••
•••
•
••
••
•
••
•
•
•
••
•••
•••
••
•••
•••
••
•
••••
••
•
•••
•••
•
••••
••
•
••
•
0
2
4
•
•
••
•
••
••
•••
••
••
••
•••
•
••••
••
•••
••
•
••••
••
•••
•
•••
••
••
•
••
•
••
••
••••
••••
•••
•
•
•••
•
•••
•
•••
••
•
••••
•
•
6
8
Covariates: xij (p 1) and sij (q 1).
Model:
•
•
••
•
•
•
•
•
•••
•
••
••••
••
•••
••
••
••
•
••
•••
••
•
•
•
•
Response yij for subject i at the j th time point.
yij = x1ij 1 + + xpij p + sTij bi + Ui (tij ) + ij ;
••
•
••
•••
••
•••
••
•
••••
•••
•••
•••
•
•••
••
•••
•
•
••
•
where 's are xed eects, bi are random eects, Ui (tij ) is a
stochastic process, ij is \measurement error".
Common assumptions:
bi N(; D(!)).
Ui(tij ) is a mean zero Gaussian stochastic process.
ij N(0; 2).
=) The likelihood function has a closed form.
10
Time in years
Slide 5
Slide 6
3
Histogram of 200 estimated subject-specic intercepts
0 1 02 3 2
@ b A N @4 5 ; 4 i0
0
00
b i1
1
01
01
11
31
5A
However, this assumption may be too restrictive!, yielding
invalid or ineÆcient inference for xed eects and random
eects.
Slide 7
30
20
have a bivariate normal distribution
Percentage
where yij is the cholesterol level of subject i measured at the
j th time points, ij N(0; 2 ), and bi = (b0i ; b1i ) is assumed to
40
yij = b0i + b1i tij + 1 agei + 2 sexi
+3 agei tij + 4 sexi tij + ij ;
10
For Framingham data, we may entertain the following model to
address some of the questions:
0
150
200
250
300
Estimates of subject specific intercepts
Slide 8
350
where PK (z ) is a K th polynomial, 'q (z ) is density of N (0; Iq ).
K = tuning parameter.
2. Linear Mixed Models with Flexible Random
Eects Distribution
Model:
yij = xTij + sTij bi + Ui (tij ) + ij ; i = 1; :::; m; j = 1; :::; ni ;
When K = 2, q = 2, z = (z1; z2),
P2 (z ) = a00 + a10z1 + a01z2 + a20z12 + a11 z1z2 + a02 z22:
where bi has a smooth but unspecied distribution.
Q: How to model the distribution of bi ?
A: Seminonparametric (SNP) representation of Gallant and
Nychka (1987):
hK (z) is a density =)
Z
bi = + RZi ;
hK (z )dz = 1 () EfPK2 (U )g = 1;
U N(0; Iq ):
EfPK2 (U )g = aT Aa (A > 0), density constraint becomes
aT Aa = 1 () aT B 2 a = 1 () cT c = 1;
where is q 1, R is q q lower triangle, Zi is q 1.
When K = 0, hK (z ) = N(0; Iq ).
Approximate density h(z ) of Zi by
where c = Ba.
hK (z ) = PK2 (z )'q (z );
Slide 9
Slide 10
5
Some SNP densities for K = 2
polar coordinate transformation:
0.3
0.4
c1 = sin( 1)
c2 = cos( 1 )sin( psi2)
1.
The tuning parameter K need not be very large to make SNP
exible.
0.2
Density
Density
0.2
0.1
2 ( =2; =2]; t = 1; :::; d
d
0.0
t
)
1 );
d 1
0.1
cd 1 = cos( 1 )cos( 2 ) sin(
cd = cos( psi1)cos( 2) cos(
0.3
0.0
-4
-2
0
2
4
z
Slide 11
-6
-4
-2
0
z
Slide 12
2
4
Note: f (Yi jz; ) is normal density with mean Vi Æ + SiT Rz and
variance Var(ei ) =) f (Yi jz ; )'q (z ) is joint normal when
Zi N (0; Iq ),
=)
f (Yi jZi ; )'q (Zi ) = g(Yi ; )g(ZijYi ; )
Estimation and Inference
Model in matrix notation:
Substituting bi = + RZi into model,
=)
=)
yij = xTij + sTij ( + RZi) + eij (eij = Ui (tij ) + ij )
Yi = Vi Æ + SiT RZi + ei ;
where Æ = ( T ; T )T .
=)
Likelihood:
f ( Yi ; ) =
Z
PK2 (z )g(z jYi ; )dz
Zi jYi is normal with some mean and variance (depending on ).
X logff (Y ; )g
m
i
i=1
where
Z
= g(Yi ; )EZijYi ; fPK2 (Zi )g;
Given K , the log-likelihood of model parameters :
`( ; Y ) =
f (Yi ; ) = g(Yi ; )
=)
f (Yi jz ; )PK2 (z )'q (z )dz
`( ; Y ) =
X logfg(Y ; )g + X log[E
m
m
i
i=1
j
Z i Y i ;
i=1
fPK2 (Zi)g];
has a closed form expression!
Slide 13
Slide 14
7
Choice of K
Inference for xed eects:
Optimizer nlpqn in SAS is used to maximize `(; Y ).
1. Akaike Information Criterion (AIC):
AIC = `(^; Y ) pnet:
Initial value of can be obtained by maximizing
`p (; Y ) = `(; Y ) N (aT Aa 1):
2. Schwarz Bayesian Information Criterion (BIC):
BIC = `(^; Y ) 0:5pnetlog(N ):
Variance for ^:
Var(^) =
" @ `(^; Y ) #
2
@@T
3. Hannan-Quinn Criterion (HQ):
1
:
Inference for bi :
^bi = ^ + R^Z^i, where Z^i is posterior mode or posterior mean
(has closed form).
HQ = `(^; Y ) pnetlog(log(N )):
Larger is better!
AIC prefers larger models, BIC prefers smaller models, HQ is
intermediate.
Reference: D. Zhang and M. Davidian (Biometrics, 2001).
Slide 15
Slide 16
EM algorithm: Treat y as observed data, b = (b1; :::; bm ) as
3. Generalized Linear Mixed Models with
Flexible Random Eects Distribution
missing data, (y; b) as \complete data"; Given (r),
Data:
E-step:
Q(j(r)) = Eflogf (y; b; )jy; (r)g =
Response: yij jbi conditionally independent with bij = E [yij jbi ],
Var[yij jbi ] = wij 1 v(bij ).
Covariates: xij and sij .
Z
logf (y; b; )f (bjy; (r))db:
M-step: Maximize Q(j(r)) w.r.t to get (r+1).
Back to E-step with (r+1) until convergence.
Advantage of EM algorithm:
Model: GLMM with SNP random eects
`((r+1); y) `((r); y) for 8r:
g(bij ) = xTij + sTij bi ; i = 1; :::; m; j = 1; :::; ni ;
where g(:) is a link function such as the logit link for binary data,
the distribution of bi is approximated by SNP:
Problem of using EM algorithm for GLMMs with SNP random
eects:
bi = + RZi :
Challenge: Likelihood function does not have a closed form!
E-step has to calculate integrations.
M-step is not easy to carry out.
Slide 17
Slide 18
9
\Double" rejection sampling scheme
Monte Carlo EM algorithm:
E-step: Obtain a random sample b(1); :::; b(L) from f (bjy; (r))
and approximate Q(j(r)) by MC average
QL (j(r)) =
=
1
1
X logf (y; b
L
L l=1
X X X log f (y
L
m
ni
L l=1 i=1 j =1
ij
(l)
1
X X log f
M-step: Maximize Q(j ) w.r.t to get (r )
L
m
K
(r+1)
.
Back to E-step with (r+1) until convergence.
Question: How to get a random sample b(1); :::; b(L) ?
Form an envelope for hK (z ; (r) ):
0 hK (z ; (r)) dK (z ; (r)).
Standardize dK (z ;
gK (z ;
(b ; ) :
(l)
i
) gK (z;
l=1 i=1
Update L if necessary.
Slide 19
; )
jb ; ; ) + L
(l)
i
First rejection sampling from fK (bi ; (r) ):
):
) = d K (z ;
(r )
)
Z
d K (t;
sum of density
p) = weighted
; V Bernoulli(0:5)).
(r )
)dt :
(r )
( = ( 1)V
(r)
(r)
2
(a). Generate u U (0; 1), z gK (z ; '(r)); (b). If
u hK (z ; (r))=dK (z ; (r)) then accept z ; otherwise go to (a)
until a z is obtained (called zi ).
bi = (r) + R(r)zi.
Slide 20
Second rejection sampling from fK (bi jyi (r)):
1. Generate bi from the rst rejection sampling scheme.
Monte Carlo EM algorithm with \double" rejection:
2. Generate u U (0; 1). If
1. Choose K , (0), L. Set r = 0.
2. Generate b(l) from f (bjy; (r) ) (l = 1; :::; L) using \double"
rejection sampling.
u fK (yi jbi ; (r); (r) )=i ; i = supb fK (yi jb; (r); (r) );
then accept bi ; otherwise, return to step 1 until a bi is
3. Calculate QL (j(r)).
accepted.
4. Maximize QL (j(r)) w.r.t. to get (r+1))
Note:
5. Construct a 100(1 )% CE for (r+1). If (r) is inside the
CE, then set L = L + [L=k] (k = 3, say).
The acceptance rate of the rst rejection sampling is usually
high ( 50%).
6. At convergence, set (r+1)) to be the MLE of ; otherwise go to
step 2.
Depending on data, the acceptance rate of the second rejection
sampling can be very low.
Benet: Allows MC error to be calculated at each iteration.
Slide 21
Slide 22
11
Variance of ^:
Var(^) =
"X logf (y ; ^) logf (y ; ^) #
m
@
i=1
i
@
i
T
1
f (yi ; ) can be written
Z
f (yi ; ) = f (yi jz ; )PK2 (z ; )'q (z )dz = Eff (yi jZ ; )PK2 (Z ; )g
4. Application to Framingham Data
:
where Z N (0; Iq ).
Data: yij = cholesterol level/100, tij = (year-5)/10, sex and
baseline age.
Model:
yij = b0i + b1i tij + 1 agei + 2 sexi
+3 agei tij + 4 sexi tij + ij :
Approximate f (yi ; ) by
f ( yi ; ) =
1
X f f ( y jz
L
L l=1
i
(l)
; )PK2 (z (l); )g;
where z (1); :::; z (L) is a random sample from N(0; Iq ).
Reference: J. Chen, D. Zhang and M. Davidian (Biostatistics,
2001(?)).
Slide 23
The distribution of (b0i ; b1i ) is approximated by a bivariate
SNP density with K = 0; 1; 2.
3 (4) tells how baseline age (sex) aects the change of
cholesterol level.
Slide 24
Regression CoeÆcient Estimates
Model Selection Criteria
Criterion
K=0
K=1
K=2
Log-likelihood
AIC
BIC
HQ
-147.3518
-157.3518
-182.1059
-166.7404
-135.4209
-147.4209
-177.1258
-158.6873
-135.3278
-150.3279
-187.459
-164.4107
All criteria selected K = 1.
K=0
K=1
K=2
Parameter
Estimate(SE)
Estimate(SE)
Estimate(SE)
1 (age)
2 (sex)
3 (ageyear)
4 (sexyear)
E(b0i)
E(b1i)
0.0148(0.0035)
-0.0064(0.0549)
-0.0114(0.0028)
0.1799(0.0450)
1.7219(0.1505)
0.6800(0.1213)
0.0115(0.0032)
-0.0011(0.0473)
-0.0112(0.0028)
0.1799(0.0454)
1.8608(0.1404)
0.6711(0.1226)
0.0128(0.0032)
-0.0285(0.0462)
-0.0104(0.0028)
0.1677(0.0453)
1.8161(0.1407)
0.6419(0.1225)
Males tend to have a larger change rate than females; older
people tend to have smaller change rates, etc.
Slide 25
Slide 26
13
Counter plot of the estimated density for (b0i ; b1i )
0
Slope
0.6
0.8
Density
2
1.0
4
1.2
Estimated density for (b0i ; b1i )
1
.
op
e
0.5
1
Inte
2
t
rcep
0.4
.
0.5
1.0
1.5
2.0
Intercept
Slide 27
..
0.2
3
Sl
. .
.
.. . . .
. ...
. . . .... .
.
. . .. ..... ........ ...
.
.
. .
.
... ..... ..... .
..
. ... ....... ... .
. ... .. ......... . ..
. ..... .
.. . . .. ......... ..... .
.
... . ... ... .
. ..
.
..
. ... . . .
. .
.
.
Slide 28
2.5
3.0
3.5
Estimated marginal density for b1i
1.5
Density
1.0
0.6
0.0
0.0
0.2
0.5
0.4
Density
0.8
2.0
1.0
2.5
1.2
3.0
Estimated marginal density for b0i
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0.2
0.4
0.6
Intercept
0.8
1.0
1.2
Slope
Slide 29
Slide 30
15
Simulation results, 100 data sets: MC Ave. and MC SD are average and standard deviation of the estimates,
5. Simulation Studies
Ave. SE is average of estimated standard errors, RE is Monte Carlo mean square error for the indicated t
divided by that for K
True model:
K
yij = bi + tij 1 + wi 2 + ij ; i = 1; :::; 100; j = 1; :::; 5:
tij = j 3, 1 = 2; wi = I (i 50), 2 = 1; ij N(0; 0:52).
Case 1: bi 0:7N( 3; 1) + 0:3N(2; 1) (mixture of normals);
Case 2: bi N( 1:5; 6:25).
100 data sets were simulated.
Fit the model with K = 0; 1; 2 to each data set.
Case 1: AIC preferred K = 1; 2 35%, 65% ( BIC: 76%, 24%;
HQ: 56%, 44%)
Case 2: AIC preferred K = 0; 1; 2 84%, 7%, 9% ( BIC: 97%,
3%, 0%; HQ: 89%, 5%, 6%).
Slide 31
MC Ave.
= 0.
True values of parameters are in parentheses.
=0
MC SD
Preferred by BIC
Ave. SE
MC Ave.
MC SD
Ave. SE
Preferred by HQ
RE
MC Ave.
MC SD
Ave. SE
RE
(a) Mixture Scenario
1
(2)
2.000
0.017
0.016
2.000
0.017
0.016
1.00
2.000
0.017
0.016
1.00
2
(1)
1.158
0.472
0.493
1.034
0.234
0.209
0.21
1.028
0.230
0.208
0.23
1:614
0.369
0.349
1:552
0.275
0.269
0.52
1:549
0.273
0.269
0.52
var(b) (6.25)
6.045
0.638
0.862
6.098
0.654
0.690
1.01
6.099
0.655
0.695
1.00
0.498
0.018
0.018
0.498
0.018
0.018
1.00
0.498
0.018
0.018
1.00
E(b) (
1:5)
(0.5)
(b) Normal Scenario
1
(2)
2.000
0.017
0.016
2.000
0.017
0.016
1.00
2.000
0.017
0.016
1.00
2
(1)
0.994
0.512
0.489
0.987
0.533
0.487
1.17
0.990
0.550
0.479
1.08
1:491
0.363
0.346
1:487
0.373
0.345
1.09
1:489
0.380
0.343
1.05
var(b) (6.25)
5.955
0.789
0.849
5.957
0.790
0.861
1.00
5.958
0.790
0.863
1.00
0.498
0.018
0.018
0.498
0.018
0.018
1.00
0.498
0.018
0.018
1.00
E(b) (
(0.5)
1:5)
Slide 32
(a) True (solid) and estimated densities: normal(long
dashed), BIC(dotted). (b) Estimated densities by HQ
The proposed models can be useful for analyzing longitudinal
data when distributional assumption of random eects in
mixed models is violated.
The SNP approach is capable of detecting departure from the
normal.
Simulation studies show satisfactory performance of the
inference procedure.
Computation is relatively straightforward for normal data, but
could be intensive for non-normal data.
Current research: Extend SNP approach to other popular
models.
0.3
0.0
0.1
0.2
Densities
0.3
0.2
0.1
0.0
Densities
0.4
(b)
0.4
(a)
6. Summary and Discussion
-6
-4
-2
0
2
4
-6
x
-4
-2
0
2
4
x
Slide 33
Slide 34
17