Latent-Variable Models for Longitudinal Data with Bivariate Ordinal

Latent-Variable Models for Longitudinal Data with Bivariate
Ordinal Outcomes
David Todem,1,∗ KyungMann Kim2 and Emmanuel Lesaffre3
1 Department
of Statistics, University of Wisconsin-Madison,
1210 W. Dayton St., Madison, WI 53706, U.S.A.
2 Department
of Biostatistics and Medical Informatics, University of Wisconsin-Madison,
600 Highland Ave., Madison, WI 53792, U.S.A.
3 Biostatistical
∗ email:
Centre, K.U.Leuven, Kapucijnenvoer 35, B-3000 Leuven, Belgium
[email protected]
Summary. We use the concept of a latent variable to derive the joint distribution of bivariate
ordinal outcomes, and then extend the model to allow for longitudinal data. Specifically, we relate
the observed ordinal outcomes using threshold values to a bivariate latent variable, which is then
modeled as a linear mixed model. Random effects terms are used to tie all together repeated
observations from the same subject. The cross-sectional association between the two outcomes is
modeled through the correlation coefficient of the bivariate latent variable, conditional on random
effects. Assuming conditional independence given random effects, the marginal likelihood, under
the missing data at random assumption, is approximated using an adaptive Gaussian quadrature
for numerical integration. The model provides fixed effects parameters that are subject-specific,
but retain the population-averaged interpretation when properly scaled. This is particularly well
suited for the situation in which population comparisons and individual level contrasts are of equal
importance. Data from a randomized intervention trial of a cardiovascular educational program
where the responses of interest are changes in hypertension and hypercholestemia status illustrate
the proposed model. Generalization from bivariate to multivariate models is also discussed.
Key words: Adaptive Gaussian quadrature; Bivariate ordinal outcome; Importance sampling;
Latent variable; Maximum marginal likelihood; Random effects; Repeated measures; Threshold
values.
1.
Introduction
Multiple outcomes are often used as primary endpoints in many longitudinal studies. For example,
in growth curve studies not only the height of the child is measured, but also the weight. Often,
even length measurements are taken of various parts of the body (e.g., arm, leg). Another example
is a social study known as the panel study in which is a sociological study where multiple questions
are posed to the subject to measure the outcomes of interest. The same is true in clinical trials, but
more importantly the effect of a new treatment is usually evaluated from the analysis of a bivariate
response vector comprising an efficacy and a safety parameter. Additionally, it is often the case
that subjects’ responses to treatment are classified according to an ordinal or graded scale, e.g.,
1
the visual analogue scale.
An example of bivariate ordinal data in time is provided by a longitudinal trial of a cardiovascular educational program in which the responses of interest are changes in hypertension and
hypercholestemia status. Each of the 266 participants in the analysis sample was randomized either
to an audio-dietary educational intervention group or to a non-intervention group (control group).
At each of the two follow-up times, 4 months and 12 months, the two outcomes were derived from
changes in blood pressure and serum cholesterol from baseline. These outcomes were re-coded to
ordinal scales according to whether there was a positive change (1), no change (2) and a negative
change (3), based on the criteria established by the NIH-sponsored expert panels (NCEP, 1993;
JNC, 1993). Not only were the marginal effects of the education intervention on blood pressure and
cholesterol of interest, but so were education effects on the association between these two outcomes
(Ten Have and Morabia, 1999). It was also of interest to know if the answer depended on the time.
The literature suggests that dietary interventions do not typically impact both blood pressure and
cholesterol. There is a need to investigate the effect of the intervention on the association between
the two outcomes, to understand why this is the case.
Statistical analysis of bivariate ordinal outcome data in time raises a number of challenging
issues. It is well known, for example, that repeated measurements from the same subject over time
necessitates the use of methods for correlated data. Multiplicity of the outcomes at any timepoint
is another important issue. Most modeling approaches in the literature, either deal with crosssectional data or are restricted to longitudinal univariate outcomes. Limited work has been done
on bivariate ordinal outcomes in time. Although separate models can be fitted to each outcome,
such an approach fails to borrow strength across the outcomes variables. By exploiting the correlation structure with a multivariate model, efficiency and power could be greatly increased (O’Brien,
1984). Several authors have developed maximum likelihood procedures for cross-sectional multivariate ordinal outcomes. Glonek and McCullagh (1995), and Molenberghs and Lesaffre (1994)
approached the specification of the joint probability of multivariate observations through the firstand higher-order marginal parameters. One important advantage of this approach is that multivari-
2
ate observations of unequal numbers of variables per subject may be analyzed quite naturally. Kim
(1995) also considered a maximum likelihood estimation for bivariate ordinal measures by using a
constrained latent variable specific to ophthalmologic studies. Williamson and Kim (1996) further
developed marginal mean regression techniques based on the Plackett (1965) and Dale (1986) global
odds ratios as a measure of association. All these models, were not meant for longitudinal outcomes
and therefore could not be used for multivariate ordered categorical data in time. Ten Have and
Morabia (1999) then extended the original Dale (1986) model to accommodate the time component in analyzing longitudinal bivariate binary outcomes. Basically, their model uses the concept
of global odds ratios to represent the cross-sectional association and random effects terms for the
longitudinal association. They also included random effects in the global odds ratios which can be
seen as a means to construct a goodness-of-fit test for the basic starting model.
Although this extended Dale model as proposed by Ten Have and Morabia (1999) is certainly
desirable in situations where the interest is in studying the within-subject evolution, it does not
accommodate population-averaged comparisons, surely due to the fact that the logit link function
does not have a simple representation of the marginal means of the outcomes. Specifically, the fixed
effects parameters have a subject-specific interpretation and describe on average how a subject’s
probability of experiencing the event of interest depends on time (Zeger, Liang and Albert, 1988).
This may not be relevant in studies where population comparisons and within-subject comparisons
are of equal importance. Another drawback of the logit link is that it does not reduce to the usual
logit model when each marginal outcome has only a single random effects and only one observation
per level of this random effects (McCullogh, 1994). Finally, this model was restricted to only
bivariate binary outcomes in time.
A number of authors have recently proposed regression techniques for longitudinal multinomial
outcomes (Clayton, 1992; Gange, Linton, Scott, DeMets, and Klein, 1993; Miller, Davis, and
Landis, 1994; Lipsitz, Kim and Zhao, 1994). These authors have predominantly adopted the
generalized estimating equations (GEE) of Liang and Zeger (1986) to proportional odds models
for clustered ordinal responses. However, in longitudinal studies, this method is not necessarily
3
appropriate. Indeed, when the missing data mechanism is not completely at random, the standard
GEE methods provide biased parameter estimates. Furthermore, the GEE-based approach, as a
distribution-free methodology, does not lend itself to classical tools for model checking. Hence, the
search for other alternatives continue.
Our approach is based on the concept of a latent variable that allows a full likelihood-based
modeling of longitudinal bivariate ordinal responses with ignorable missing data. We will generalize
the work of Hedeker and Gibbons (1994) to allow for multiple outcomes and that of Lesaffre and
Molenberghs (1991), Lesaffre and Kaufmann (1992), and Kim (1995) to allow for longitudinal
outcomes. Specifically, the random effects terms are introduced in the model via the latent variable
to model the induced longitudinal association or other heterogeneity among subjects. The crosssectional association is modeled through the correlation coefficient of the underlying multivariate
normal latent variable, conditional on random effects. Random effects can also be included in the
correlation coefficient. Of course, this will complicate the model considerably. But the inclusion
is justified on two grounds: (1) the cross-sectional association is of interest in itself, certainly in a
model for bivariate responses; and (2) the inclusion of a random effects structure can be seen as a
means to construct a goodness-of-fit test for the basic starting model. Ochi and Prentice (1984) have
also developed a class of latent-variable models to construct equi-correlated probit models. One
limitation of their approach is that it was restricted to multivariate binary outcomes although its
extension to ordinal outcomes is straightforward. Furthermore, their model did not accommodate
multiple random effects and could not model negative correlations between the marginal outcomes.
Finally, our approach has the advantage over the Ochi and Prentice model in that it does not require
the mean to be constant within levels of the random effect. Latent-variable models have also been
described in developmental toxicity studies where the interest was to specify the joint distribution
of discrete and continuous outcomes (see e.g. Catalano and Ryan, 1992; and Fitzmaurice and Laird,
1995). But our interest is in bivariate ordinal outcomes.
In Section 2, we describe the mixed effects model for bivariate ordinal responses. For this
we describe the concept of latent variable in the bivariate setting, and then extend the model
4
to accommodate longitudinal data. Section 3 describes the estimation procedure for the model
parameters. In section 4, the proposed model is applied to the example data set and compared to a
population-averaged bivariate model assuming independence across time and to univariate probit
random effects models. Finally, the usefulness and future extension of the model are discussed in
Section 5.
2. A latent-variable model for bivariate ordinal data in time
2.1 Model formulation
Due to the lack of a natural distribution for ordinal outcomes, it is convenient conceptually
to assume that the observed ordinal responses Y = (Y1 , Y2 )T are generated from an underlying
latent variable W = (W1 , W2 )T with two sets of threshold values a = (a1 , a2 , ..., ar−1 )T and b =
(b1 , b2 , ..., bs−1 )T , where r and s represent the number of ordinal levels for the first and the second
marginal outcome, respectively. Specifically, the univariate responses Y1kt and Y2kt for unit k, e.g.,
subject in the longitudinal setting and cluster in repeated data setting, at time point τkt fall in
category i and j, respectively, if the first component W1kt of the latent response exceeds ai−1 but
does not exceed ai , and so for the second component W2kt with respect to bj−1 and bj . Hence,
letting a0 = b0 = −∞ and ar = bs = ∞, the model formulation at the first stage is given by
(Y1kt = i, Y2kt = j) ⇔ (ai−1 ≤ W1kt < ai , bj−1 ≤ W2kt < bj )
∀i, j 1 ≤ i ≤ r and 1 ≤ j ≤ s.
(1)
And for the cumulative events, we get
(Y1kt ≤ i, Y2kt ≤ j) ⇔ (W1kt < ai , W2kt < bj ) ∀i, j 1 ≤ i ≤ r and 1 ≤ j ≤ s.
(2)
The threshold values must be monotonically increasing to reflect the ordinal nature of the
observed outcomes. And for a binary outcome, only one threshold value representing the usual
intercept is needed.
At the second stage, we consider a mixed effects regression model for the bivariate latent response as follows:
Wkt = xktβ + zkt dk + εkt , = 1, 2
dk ∼ N (0, D) and εkt ∼ N (0, Hkt ).
5
(3)
We further assume that dk = (dT1k , dT2k )T and εkt = (ε1kt , ε2kt )T are independent. x1kt and x2kt are
the 1 × q1 and 1 × q2 design row vectors for the fixed effects with associated column vector slopes β1
and β2 , respectively. Also, z1kt and z2kt are the 1 × r1 and 1 × r2 design row vectors for the random
effects associated with the column vectors of unknown random effects d1k and d2k , respectively.
σ12
ρkt σ1 σ2
. The
The vector εkt is the residuals vector with covariance matrix Hkt =
ρkt σ1 σ2
σ22
parameters β1 and β2 are common to all subjects, while d1k and d2k are subject-specific.
For subject k, z1k and z2k often contain fixed (time-independent) subject-specific covariates,
but time-varying covariates are also possible as indicated by Lesaffre, Todem and Verbeke (2000).
To obtain a well-formulated model such as in Morrell, Pearson and Brant (1997), these matrices are
restricted to satisfy the conditions, rank(xk |zk ) = rank(xk ), = 1, 2, where (xk |zk ) represents
the matrix obtained by stacking the matrices xk and zk .
To complete the model formation, the correlation coefficient ρkt of the bivariate latent variable
given dk , [Wkt |dk ], may depend on covariates, with a design row vector x3kt and corresponding
slope vector β3 . This correlation coefficient is modeled using the Fisher transformation,
1 + ρkt
= x3kt β3
log
1 − ρkt
to ensure that −1 ≤ ρkt ≤ 1.
The parameters β and σ , = 1, 2, are not jointly identifiable, but the ratios β /σ , = 1, 2,
are estimable (Catalano and Ryan, 1992). Hence, the bivariate latent response, conditional on dk ,
can be re-scaled by assuming that σ1 = σ2 = 1. Furthermore, following Gibbons and Bock (1987),
it is useful to orthogonalize the random effects by letting dk = T θk where T , a lower triangular
matrix with positive diagonal elements, is the Cholesky decomposition of D, i.e., T T T = D. The
re-parameterized model is then given by


 E(W
k |θk )= xk β + zk T θk
k
(4)
log 1+ρ
1−ρk = x3k β3


T
θk ∼ N (0, Ir1 +r2 ) and T T = D
x1k 0
z1k 0
β1
, zk =
, β =
and Ir1 +r2 is the identity matrix of
where xk =
0 x2k
0 z2k
β2
order r1 + r2 . The covariance matrix for subject k is equal to Vk = zk DzkT + Hk , where Hk =
Diag(Hk1 , Hk2 , ..., Hknk ).
6
For the sake of computing analytically the partial derivatives of the log-likelihood with respect
to the Cholesky decomposition T of D, the matrix T and the corresponding vector of independent
T11 0
θ1k
and θk =
. The
random effects θk are decomposed respectively as T =
T21 T22
θ2k
model as written in (4) yields the following regression model for each marginal component with
uncorrelated random effects:
E(W1kt |θk ) = x1kt β1 + z1kt T11 θ1k
E(W2kt |θk ) = x2kt β2 + z2kt (T21 θ1k + T22 θ2k )
2.2
Features of the model and parameter interpretation
The model as described above accounts for the cross-sectional, longitudinal and the cross-
sectional-by-longitudinal association as follows:

T
ρkt +z1kt Cov(d1k ,d2k )z2kt


Corr(W1kt, W2kt ) = √

T
T )
(1+z1kt V ar(d1k )z1kt )(1+z2kt V ar(d2k )z2kt




T
zkt V ar(dk )zkt
Corr(Wkt, Wkt ) = 1+z V ar(d )z
T , = 1, 2; t = t
kt
k kt




T
z1kt Cov(d1k ,d2k )z2kt


 Corr(W1kt, W2kt ) = √
, t = t
T
T
(1+z1kt V ar(d1k )z1kt )(1+z2kt V ar(d2k )z2kt )
The correlation coefficient ρkt accounts implicitly for the cross-sectional association between the
two actual outcomes given the subject-specific random effects dk . The multivariate nature of the
(subject) random effects accounts for both the longitudinal association and the cross-sectional-bylongitudinal association, respectively, through the variances and the covariance of the marginal
components of d1k and d2k . The correlation structure of the model, captured through ψ ktt =
corr(Wkt , W kt ), , = 1, 2; t, t = 1, 2, ...nk , can be represented schematically by considering the
two latent outcomes at two time points t and t as shown in Figure 1.
[Figure 1 about here.]
On the latent scale,
E(Wkt |θk ) = xkt β + zkt T θk and E(Wkt ) = xkt β , = 1, 2.
Therefore, the fixed effects parameters have both the subject-specific and marginal interpretations,
as it is typical in linear mixed models (see Diggle et al., 1994).
7
This does not hold on the data scale. However, population-averaged parameters can be expressed
as a factor of subject-specific parameters and therefore are equivalent with respect to model testing
and reduction. From equation (2), the cumulative distribution of (Y1kt , Y2kt ) conditional on random
effects is given as follows:
pr(Y1kt ≤ i, Y2kt ≤ j|dk ) = Φ(ai − x1kt β1 − z1kt d1k , bj − x2kt β2 − z2kt d2k )
(5)
Taking the expectation of the conditional probability in equation (5) with respect to the distribution
of the random effects dk and using the results of the theorem given in Appendix A, we get


bj − x2kt β2
ai − x1kt β1

,
pr(Y1kt ≤ i, Y2kt ≤ j) = Φρkt  T
T
1 + z1kt V ar(d1k )z1kt
1 + z2kt V ar(d2k )z2kt
which gives for each marginal component

a
−x
β
1
i

1kt

pr(Y1kt ≤ i) = Φ √

T

1+z1kt V ar(d1k )z1kt

bj −x2kt β2


T
 pr(Y2kt ≤ j) = Φ √
(6)
(7)
1+z2kt V ar(d2k )z2kt
where Φ(.) and Φρkt (.) represent, respectively, the cumulative distribution function (CDF) of a
univariate standard normal and the bivariate standard normal with a correlation coefficient ρkt .
Hence, for the first marginal outcome, it is easily seen from equation (7) that the populationaveraged parameters associated with β1 and ai are smaller in magnitude provided that V ar(d1k ) is
positive and estimable and are given respectively by √
β1
T
1+z1kt V ar(d1k )z1kt
2.3
and √
ai
.
T
1+z1kt V ar(d1k )z1kt
Generalization of the method to multivariate ordinal outcomes in time
The proposed model for bivariate ordinal outcomes can be easily extended to the case where
m ordered categorical outcomes Ykt = (Y1kt , Y2kt , ..., Ymkt )T are observed over time. One typically
assumes the existence of an underlying multivariate normal response Wkt = (W1kt , W2kt , ..., Wmkt )T
which is related to the observed outcome through m vectors of threshold values a1 , a2 , ..., am−1 and
am , where a = (a1 , a2 , ..., a,s −1 )T , with s being the number of ordered levels of the th outcome.
Specifically, the model formulation at the first stage is given by,
(Y1kt = i1 , ..., Ymkt = im ) ⇔ (ai1 −1 ≤ W1kt < ai1 , ..., aim −1 ≤ Wmkt < aim )
∀ i1 , i2 , ..., im in the set of possible outcomes.
8
At the second stage, we consider a mixed effects regression model for the latent response as follows,
Wkt = xkt β + zkt dk + εkt , = 1, 2, ..., m
dk ∼ N (0, D) and εk ∼ N (0, Hkt )
For a relatively small number, m, of cross-sectional outcomes, the correlation matrix Hkt may
be left unstructured. However, when m is large, a structure could be given to Hkt , assuming for
example that its off-diagonal elements are all equal. This structure corresponds to that of the
equi-correlated probit model of Ochi and Prentice (1984). The optimization and the estimation
can proceed from there.
3.
Maximum marginal likelihood estimation
Given the above random effects regression model for the latent bivariate response W and relations
(1) and (3), the joint probability of the bivariate ordinal response conditional on θk is given by
pr(Y1kt = i, Y2kt = j|θk ) = pr(ai−1 ≤ W1kt < ai , bj−1 ≤ W2kt < bj |θk )
= Φρkt (ai − e1kt , bj − e2kt ) − Φρkt (ai−1 − e1kt , bj − e2kt )
− Φρkt (ai − e1kt , bj−1 − e2kt ) + Φρkt (ai−1 − e1kt , bj−1 − e2kt )
where ekt = E(Wkt |θk ), = 1, 2.
Assuming that the bivariate responses of subject k are conditionally independent given θk ,
the joint probability L(yk |θk ) for the observed outcome matrix yk is equal to the product of the
conditional probabilities of all time-point responses and is given by
L(yk |θk ) =
nk r s
[pr(Y1kt = i, Y2kt = j|θk )]I(y1kt =i)×I(y2kt =j)
t=1 i=1 j=1
where I(.) is the indicator variable.
Then the marginal density of Yk in the population, i.e., the contribution of subject k to the
marginal likelihood, is the integral of L(yk |θk ) weighted by the joint density function of the transformed random effects terms, namely,
L(yk ) = (2π)
−(r1 +r2 )/2
Rr1 +r2
L(yk |θk ) exp(−θk 2 /2)dθk
where Rr1 +r2 is the (r1 + r2 ) − dimensional euclidian space with . being the euclidian norm.
The marginal likelihood for a sample of N independent subjects is given by L = N
k=1 L(yk ).
Maximizing the log-likelihood, log L, with respect to Θ = (β1 , β2 , β3 , a, b, vec(T )) with vec(T ) being
9
the vector of unique non-zero elements of T , the vector of model parameters, yields the likelihood
equation
∂L(yk )
∂ log L −1
=
=0
L (yk )
∂Θ
∂Θ
N
k=1
which will give MLE, provided that
∂ 2 log L
∂Θ∂ΘT
is positive definite at the optimum solution. The key
computational features rely then on evaluating
∂L(yk )
∂Θ
=
Rr1 +r2
nk r
t=1
i=1
s
j=1 I(y1kt
= i) × I(y2kt = j)p−1
ktij
∂pktij
∂Θ
×L(yk |θk ) (2π)−(r1 +r2 )/2 exp(−θk 2 /2)dθk
where pktij = P r(Y1kt = i, Y2kt = j|θk ). This leads to the computation of
∂pktij
∂Θ .
For the first univariate outcome, the partial derivative of pktij with respect to β1 is given by
∂pktij
bj −e2kt ρkt (ai −e1kt )
bj −e2kt ρkt (ai−1 −e1kt )
√ 2
√ 2
−φ(ai − e1kt )Φ
+ φ(ai−1 − e1kt )Φ
∂β1 =
+φ(ai − e1kt )Φ
1−ρkt
bj−1 −e2kt ρkt (ai −e1kt )
√ 2
1−ρkt
− φ(ai−1 − e1kt )Φ
1−ρkt
bj−1 −e2kt ρkt (ai−1 −e1kt )
√ 2
1−ρkt
x1kt
where φ(.) is the probability density function of the standard normal distribution. The partial
derivatives of pktij with respect to a threshold value ai , i = 1, ...r − 1, gives
∂pktij
bj −e2kt ρkt (ai −e1kt )
bj −e2kt ρkt (ai−1 −e1kt )
√ 2
√ 2
δi,i φ(ai − e1kt )Φ
− δi−1,i φ(ai−1 − e1kt )Φ
∂a =
i
−δi,i φ(ai − e1kt )Φ
1−ρkt
bj−1 −e2kt ρkt (ai −e1kt )
√ 2
1−ρkt
− δi−1,i φ(ai−1 − e1kt )Φ
1−ρkt
bj−1 −e2kt ρkt (ai−1 −e1kt )
√ 2
1−ρkt
where δi,i = 1 if i = i and 0 otherwise.
Differentiating pktij with respect to β2 and threshold values bj , j = 1, ...s − 1, can be done
analogously.
Differentiating pktij with respect to β3 through a chain rule is given by
∂pktij
∂β3
= [φρkt (ai − e1kt , bj − e2kt ) − φρkt (ai−1 − e1kt , bj − e2kt )
−
x3kt β3
2x3kt e
φρkt (ai − e1kt , bj−1 − e2kt ) + φρkt (ai−1 − e1kt , bj−1 − e2kt )] (1+e
x3kt β3 2 .
)
Differentiating pktij with respect to vec(T ) = (vec(T11 ), vec(T21 ), vec(T22 )) requires the computations of
∂pktij
∂ekt
(derived above);
∂ekt
∂vec(T )
= (θk ⊗zkt )JrT , = 1, 2; and
∂e2kt
∂vec(T21 )
= θ1k ⊗z2kt , where ⊗
is the direct product operator. As noted by Hedeker and Gibbons (1994), JrT is the transformation
10
matrix of Magnus (1988), with dimension r (r + 1)/2 × r2 , which eliminates the elements above
the main diagonal.
The likelihood equations can be solved using a Quasi-Newton approach where Θγ , the parameter
at step γ, is updated as follows,
Θγ+1 = Θγ + Ie (Θγ ; y)−1
∂ log L
∂Θγ
where Ie (Θγ ; y), an empirical and consistent estimator of the information matrix at step γ (derived
in Appendix B), is given by
Ie (Θγ ; y) =
N
k=1
with S̄ =
1
N
N
k=1 L
−1 (y
L
−2
∂L(yk )
(yk )
∂Θγ
∂L(yk )
∂Θγ
∂L(yk )
k ) ∂Θγ .
At the optimum point, i.e., γ = γmax , Ie (Θγmax ; y) =
T
− N S̄ S̄ T
(8)
N
∂L(yk )
−2
k=1 L (yk ) ∂Θγ
max
∂L(yk )
∂Θγmax
T
can be
inverted to get the asymptotic variance-covariance matrix of the model parameter estimates.
3.1
Adaptive Gaussian quadrature and computations
Gaussian quadrature rules are used to approximate integrals of functions with respect to a
given kernel by a weighted average of the integrand evaluated at predetermined abscissas as in
Pinheiro and Bates (2000). This methodology relies basically on the concept of orthogonal functions
for which high degree of accuracy is attained when the integrand is sufficiently smoothed. The
weights and abscissas used in Gaussian quadrature rules for the most common kernels, including the
normal kernel, can be obtained from the tables of Abramowitz and Stegun (1964). If Q univariate
quadrature points are requested, then a r1 + r2 -dimensional integral requires Qr1 +r2 multivariate
points PqT = (Pq1 , Pq2 , ..., Pq,r1 +r2 ), q = 1, ..., Qr1 +r2 , with associated weights given by the product
1 +r2
Π(Pqh ).
of the corresponding univariate weights Π(Pq ) = rh=1
As the number of random effects, r1 + r2 , increases, the multidimensional quadrature points
increases exponentially in the quadrature solution. However, several authors have reported that
the number of points in each univariate dimension can be reduced for higher dimensional integrals
without impairing the accuracy of the approximations. For example, we found that as few as five
11
points per dimension were sufficient to obtain adequate accuracy when random intercept models
were fitted.
Hence, the contribution L(yk ) of subject k to the likelihood can be approximated by
L(yk ) ≈ π
−(r1 +r2 )/2
r1 +r2
Q
√
L(yk | 2 Pq )Π(Pq ).
(9)
q=1
The Gaussian quadrature as noted by Pinheiro and Bates (2000) can be viewed as a deterministic version of Monte Carlo integration in which the random sample of θk are generated from the
N (0, Ir1 +r2 ). In a pure Gaussian quadrature approach, the quadrature points and the corresponding weights are fixed beforehand, but in Monte Carlo, they are left to random choice. Because
importance sampling tends to be much more efficient than the deterministic Monte Carlo, we consider the equivalence of importance sampling in the Gaussian quadrature context which is termed
by Pinheiro and Bates (2000) as the adaptive Gaussian quadrature. Here, the grid of the abscissas
in the scale of θk is centered around the conditional mode θ̂k rather than 0, as in (9). We recall
that
L(yk ) = (2π)
−(r1 +r2 )/2
Rr1 +r2
exp(log(L(yk |θk )) − θk 2 /2)dθk
For the ease of notation, let
H(yk , θ̂k ) =
∂ 2 h(yk , θk ) θk =θ̂k
∂θk θkT
where h(yk , θk ) = log(L(yk |θk )) − θk 2 /2 and θ̂k = arg maxθk h(yk , θk ). In addition, we consider
1
the scaling of θk using H 2 (yk , θ̂k ), the Cholesky decomposition of H(yk , θ̂k ) as follows,
1
θk = θ̂k + H 2 (yk , θ̂k )θk .
The adaptive Gaussian quadrature is then given by
L(yk ) ≈ (2π)
−(r1 +r2 )/2
r1 +r2
Q
1
exp h(yk , θ̂k + H 2 (yk , θ̂k )Pq ) + Pq 2 Π(Pq )
q=1
12
3.2
Goodness-of-fit
As the model is likelihood-based, likelihood-ratio test (LRT) statistics could be used to test for
statistical significance of the fixed effects terms in the model. However, testing for the need or the
reduction of the dimensionality of the random effects results in a non-standard problem since the
null set does not lie in the interior of the parameter space (Chernoff, 1954).
An informal assessment of the model fit can be performed by comparing the observed proportions to the fitted probabilities using expressions in (6) and (7), marginally and jointly for the two
outcomes. Others have used this approach for monitoring random effects models (see e.g. Legler
and Ryan, 1997; Hedeker and Gibbons, 1994; and Ten Have and Morabia, 1999).
4.
Example
To illustrate the application of the proposed model to longitudinal bivariate ordinal outcomes,
we examined data collected in the cardiovascular educational study. These data were previously
analyzed by Ten Have and Morabia (1999) with outcomes re-coded to binary scales (negative vs. no
change or positive changes). Modeling the data as binary outcomes results in a loss of information
although they are more convenient to handle from a medical standpoint.
As noted by these authors, the data set contains missing outcomes at the 4-month-visit and at
12-month visit. Specifically, out of 266 subjects, there were 208 subjects with reported outcomes at
the 4-month visit and 243 subjects at the 12-month visit. Missingness was less severe at 12 months
due to additional efforts by the study coordinators to contact study patients at 12 months. Preliminary analyses were performed to see if the missing visits were related to the unobserved outcomes.
For example, analyses of the dropout indicator, at 4 months and at 12 months, using the blood
pressure and cholesterol level at the previous visit as covariates, were performed. No association
involving these covariates was found at a 5% significance level. Therefore, the Missingness in these
data was considered to be ignorable. Under such conditions, the maximum likelihood estimation
yields consistent and unbiased estimates (Rubin, 1976).
From a statistical perspective, the effects under investigation, for each component of the model,
are intervention (audio intervention being the reference group), time (12 months being the reference
13
time), and intervention-time interaction. The intervention-time interaction effect corresponds to
the difference in intervention between the two follow-up times. The proposed latent-variable model
denoted as MEBO1 with correlated random intercept terms is first fitted to the data. A simple
version of MEBO1 denoted as MEBO2, which assumes independence between the random effects of
each marginal component is also considered. The aim here is to assess the robustness of the model
estimates under MEBO1. A third model, referred to as IBO, is a naive bivariate model in that it
omits the random effects parameters while assuming the same fixed effects structure as MEBO1.
Results are also presented for two separate random intercept models (MEUO1 and MEUO2), one
for each response, to assess the robustness of the marginal probit estimates when the outcomes are
jointly analyzed. The results obtained from these last models were exactly the same as the ones
provided by MIXOR (A FORTRAN program for modeling univariate ordinal outcome in time,
proposed by Hedeker and Gibbons, 1994). Each of the five model estimates obtained using GAUSS
(Aptech, 1990) and displayed in table 1 contains all these effects in each component.
Sensitivity of the estimates. As expected, the impact of including random effects in the bivariate
probit model (MEBO1) is similar to the impact reported in the literature for the univariate ordinal
mixed effects model. The estimates and standard errors under MEBO1 exceed consistently the
corresponding population-averaged counterparts (IBO) reflecting the heterogeneity among subjects
as noted by Zeger et al. (1988). We should note, however, that this trend is not observed for
the estimates of the correlation coefficient ρkt . A theoretical explanation on this is yet to be
found. Table 1 also reveals that the marginal probit parameter estimates and standard errors
under MEBO1 are very similar to the analogous parameter estimates and standard errors under
the univariate MEUO1 and MEUO2. However, this is not necessarily an indication of orthogonality
between the marginal probit and the correlation coefficient parameter estimates for at least one
reason: the marginal probit standard errors under the MEBO1 are smaller than the analogous
standard errors under MEUO1 and MEUO2 in contrast to what is expected for the bivariate probit
model without random effects. Finally, the estimates of model MEBO2 that assumes independence
between the random effects are very close to those under model MEBO1. In fact, the observed value
14
of the LRT statistic is 0.1112 which is less than 3.8415, the critical point based on a χ2 distribution
with 1 degree of freedom. This suggests that ignoring the cross-sectional by longitudinal association
is not a big issue in these data.
[Table 1 about here.]
Model interpretations. Under the MEBO1 model, the intervention-time interaction effect estimate on the probit for cholesterol is -0.3859 (0.2728), which is not significant at 5% level (p=0.1571).
By rescaling this estimate to get the population-averaged parameter estimates, subjects on the
audio-intervention arm, in spite of the lack of statistical significance, exhibit higher probabilities
of observing a positive or at least no change in the cholesterol status than subjects under the
non-intervention arm at 4 months. And this gap between the two intervention groups is greater
at 12 months. Under the same model, the intervention-time interaction effect estimate on the
probit for blood pressure is -0.5975 (0.2666), which is significant at 5% level (p=0.025). Again
by rescaling this estimate to get the population-averaged parameter estimates, subjects under the
audio-intervention arm exhibit higher probabilities of observing a positive or at least no change in
the blood pressure status than subjects under the non-intervention arm at 12 months. However, at
4 months, participants under the non-intervention arm perform better. Therefore the intervention
becomes important when additional educational materials are introduced between 4 and 12 months.
[Table 2 about here.]
Goodness-of-fit. In an informal assessment of goodness-of-fit, Table 2 compares the observed
cumulative proportions and MEBO1-based cumulative fitted probabilities for each intervention by
time combination. A very good agreement between the observed and predicted marginal and joint
proportions is indicated, which should be expected given that all three components of the model
include the intervention-time interaction.
5.
Discussion
We have described latent-variable models for analyzing longitudinal bivariate ordinal outcomes
which provide both a person-specific and marginal covariate-response interpretations of the fixed
15
effects model parameters. The model formation assumes two stages. First, the ordinal response is
related to a continuous latent variable via the threshold concept. Second, a classical multivariate
mixed model for the latent response is formulated. Assuming conditional independence given random effects terms, the marginal likelihood for ordinal outcomes is approximated using an adaptive
Gauss-Hermite quadrature approach to numerical integration.
An important feature of these models is that they allow irregularly spaced measurements across
time, time-dependent and independent covariates and ignorable missing data. However, the data
set used to illustrate our methodology does not have a continuous covariate, but our estimation
approach does theoretically accommodate them. For example, if the data set is highly unbalanced
with respect to time, we can easily fit a random intercept and a random slope with respect to time
to model each subject profile.
Although the proposed models are motivated by the two-stage modeling approach, it is the
marginal models that are fitted to the data. Hence, inferences based on the marginal models do
not explicitly assume the presence of random effects representing the natural heterogeneity among
subjects. However, competing models with the same marginal fit correspond to the same maximized
likelihood and with equal fixed effects estimates.
An important question with respect to our approach is whether we actually believe in the
threshold model and the unobserved latent variable or if we merely use it as a device to handle
ordinal data. Several authors reported in the literature that latent-variable models require large
data sets in order to estimate all model parameters (see e.g., Garrett and Zeger, 2001). In many
instances, it is unclear if there is enough data to estimate the model parameters uniquely or with
any precision. To deal with this issue, we fitted the proposed model using different parameter
starting values to check if the model is identifiable. This technique is obviously too empirical
and heuristic. Therefore, there is a need to develop a formal procedure for monitoring latent
variable models. Garrett and Zeger (2001) have proposed the use of a Bayesian approach where
the posterior distribution of each parameter is compared to the prior distribution. Other authors
have also used this Bayesian approach, under the terminology Rubin’s posterior predictive check, for
16
model monitoring when tools such as the LRT is not applicable. Typically, the use of the limiting
distribution of the LRT statistic for the test of a need of random effects or the reduction of their
dimension is not valid due to the fact that the regularity condition that the null hypothesis has to
lie in the interior of the parameter space is not met.
Though our models were built from the features of the data at hand, they have general application to situations where multiple ordinal outcomes are recorded over time. However, in some
situations such as the presence of informative missing data, the proposed models need to be extended accordingly. The model could also be extended by assuming that the random effects terms
follow a non-parametric distribution for which the support points and the frequencies have to be
estimated from the data. Even the number of support points could be estimated as well. This
will definitely lead to larger parameter’s standard errors as the normality assumptions do bring
additional information to the model formation. All these issues will be the focus of future work.
Acknowledgements
The first author wishes to thank Dr Thomas Ten Have for the permission to use data from the
cardiovascular educational program study. Financial support for this work was provided by the
University of Wisconsin-Madison through the Graduate School, Medical School and Comprehensive
Cancer Center. The first author finally acknowledges the financial support from the IBC/ENAR
student award for the presentation of this work during the ENAR 2002 Spring meeting.
References
Aptech, S. (1990). GAUSS 2.1 User’s manual. Kent, WA: Author.
Catalano, P. J. and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and
continuous outcomes. Journal of the American Statistical Association 87, 651–658.
Chernoff, H. (1954). On the distribution of the likelihood ratio. Annals of the mathematical statistics
25, 573–578.
17
Dale, J. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42,
907–917.
Diggle, P., Liang, K. and Zeger, S. (1994). Analysis of longitudinal data. Clarendon Press, Oxford.
Fitzmaurice, G. M. and Laird, N. L. (1995). Regression models for a bivariate discrete and continuous outcome with clustering. Journal of the American statistical Association 90, 845–852.
Garrett, E. and Zeger, S. (2001). Assessing estimability of latent class models using a bayesian
estimation approach. In The impact of technology on biometrics. ENAR Spring Meeting.
Gibbons, R. and Bock, R. D. (1987). Trend in correlated proportions. Psychometrika 52, 113–124.
Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. Journal of the Royal
Statistical Society Series B 57, 533–546.
Hedeker, D. and Gibbons, R. (1994). A random-effects ordinal regreession model for multilevel
analysis. Biometrics 50, 933–944.
Kim, K. (1995). A bivariate cumulative probit regression model for ordered categorical data.
Statistics in Medicine 14, 1341–1352.
Kim, K. and Todem, D. (2000). A gauss program for fitting bivariate probit model with applications
to ophtamologic studies. Technical report, Department of Biostatistics, University of WisconsinMadison.
Legler, J. M. and Ryan, L. M. (1997). Latent variable model for teratogenesis using multiple binary
outcomes. Journal of the American Statistical Association 92, 13–20.
Lesaffre, E. and Kaufmann, H. (1992). Existence and uniqueness of the maximum likelihood
estimator for a multivariate probit model. Journal of the American statistical Association 87,
805–811.
Lesaffre, E. and Molenberghs, G. (1991). Multivariate probit analysis : a neglected procedure in
medical statistics. Statistics in Medicine 10, 1391–1403.
Lesaffre, E., Todem, D. and Verbeke, G. (2000). Flexible modelling of the covariance matrix in a
linear random effects model. Biometrical Journal 42, 807–822.
Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models.
18
Biometrika 73, 13–22.
Lipsitz, S., Kim, K. and Zhao, L. (1994). Analysis of repeated categorical data using generalized
estimating equations. Statistics in Medicine 13, 1149–1163.
McCullogh, C. E. (1994). Maximum likelihood variance components estimation for binary data.
Journal of the American Statistical Association 89, 330–335.
Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a
multivariate plackett distribution. Journal of the American Statistical Association 89, 633–644.
Morrell, C., Pearson, J. D. and Brant, L. J. (1997). Linear transformations of linear mixed effects
models. The American Statistician 51, 338–343.
O’Brien, P. C. (1984). Procedures for comparing multiples endpoints. Biometrics 40, 1079–1087.
Ochi, Y. and Prentice, R. L. (1984). Likelihood inference in a correlated probit regression model.
Biometrika 71, 531–543.
Pinheiro, J. C. and Bates, D. M. (2000). Mixed effects models in S and S-plus. Springer, New-York.
Plackett (1965). A class of bivariate distribution. Journal of the American statistical Association
60, 516–522.
Rubin, D. B. (1976). Inference and mising data. Biometrika 63, 581–592.
Ten Have, T. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association
parameters for longitudinal bivariate binary response data. Biometrics 55, 85–93.
Williamson, J. and Kim, K. (1996). A global odds ratio regression model for bivariate ordered
categorical data from ophthalmologic studies. Statistics in Medicine 15, 1507–1518.
Zeger, S. L., Liang, K.-Y. and Albert, P. (1988). Models for longitudinal data: A generalized
estimating equation approach. Biometrics 44, 1049–1060.
Appendix A
Computation of the marginal cumulative probabilities
In order to deduce the marginal probabilities distributions of (Y1kt , Y2kt ), we let ΦC (u) be the
bivariate normal distribution function with argument u, mean vector 0 and covariance matrix C.
19
We also let φC (v, µ) be the bivariate normal density function with argument v, mean vector µ and
covariance matrix C.
Theorem 1. If
pr(Y1kt ≤ i, Y2kt ≤ j|dk ) = ΦC (u + xkt β + zkt dk )
(A.1)
where u = (ai , bj ) and dk is a random vector with mean E(dk ) and covariance matrix D, then
pr(Y1kt ≤ i, Y2kt ≤ j) = Φzkt Dz T +C (u + xkt β + zkt E(dk ))
kt
(A.2)
Proof. Integrating out the random effects dk in (A.1), we find, after a variable transformation and
a change of the order of integration
pr(Y1kt ≤ i, Y2kt ≤ j) =
where f (ω) =
Rr1 +r2
ai −x1kt β1
bj −x2kt β2
f (ω)dω
−∞
(A.3)
−∞
φD (u, E(dk ))φC (ω, −zkt u)du with Rr1 +r2 being the support space of the
random effects dk .
Using standard results on normal distributions, one can show that
f (ω) = φzkt Dz T +C (ω, −zkt E(dk ))
kt
(A.4)
Inserting (A.4) in (A.3) and (A.2) concludes the proof.
Appendix B
Approximation to the observed information matrix
Assuming the data y = (y1 , y2 , ..., yN ) is a matrix of independent random vectors with a common
probability density function L(Θ, yk ) ≡ L(yk ). The log-likelihood and the score functions for N inN
dependent subjects are given respectively by log L = N
k=1 log L(yk ) and S(Θ, y) =
k=1 S(Θ; yk )
where S(Θ; yk ) =
∂ log L(yk )
.
∂θ
Hence the expected information matrix for the whole data is I(Θ) =
N i(Θ) with i(Θ) = E s(Θ; yk )sT (Θ; yk ) = Cov(s(Θ; yk )) being the information contained in a
single observation. The empirical information in a single observation can be estimated by
N
1
1 s(Θ; yk )sT (Θ; yk ) − 2 S(Θ; y)S T (Θ; y).
î(Θ) =
N
N
k=1
20
The empirical information matrix Ie (Θ; y) based on N independent subjects and displayed in equation (8) is then given by
Ie (Θ; y) =
N
k=1 s(Θ; yk )s
which, for Θ = Θ̂ (MLE), reduces to Ie (Θ̂; y) =
T (Θ; y
N
k)
−
1
T
N S(Θ; y)S (Θ; y)
k=1 s(Θ̂; yk )s
T (Θ̂; y
k)
since S(Θ̂; y) = 0.
Ie (Θ̂; y) is commonly used in practice to approximate the observed information matrix which
is difficult if not impossible to compute analytically; see for example Hedeker and Gibbons (1996),
Kim and Todem (2000).
Ie (Θ̂; y)/N is a consistent estimator of i(Θ) at the maximum likelihood. The use of this estimator
can be justified also in the following sense:
2
∂ log L
I(Θ; y) = − ∂Θ∂Θ
T
N ∂ 2 log L(yk )
= − k=1 ∂Θ∂ΘT
N
N
T
=
k=1 s(Θ; yk )s (Θ; yk ) −
k=1
1 ∂ 2 L(yk )
L(yk ) ∂Θ∂ΘT
where I(Θ; y) is the observed information matrix which is different from the expected version I(Θ)
defined earlier. The second term on the right-hand of the last equality has zero expectation. Hence,
N
T
I(Θ̂; y) ≈
k=1 s(Θ̂; yk )s (Θ̂; yk )
= Ie (Θ̂; y)
where the accuracy of this approximation depends on how close Θ̂ is to Θ. In the particular case
of multinomial distributed data, Kim and Todem (2000) have shown that the second term on the
right-hand side of the first equality above is zero, and so the equality above holds exactly.
21
ψ 11ktt’
W1kt
ψ 12 ktt
W1kt’
ψ 12 ktt’
W2 kt
ψ 12 kt’t’
W2 kt’
ψ 22 ktt’
Figure 1. The correlation structure of the proposed model on the latent scale
22
Table 1
Parameter estimates (standard errors) under the MEBO1, MEBO2, MEUO1, MEUO2 and IBO
models fitted to the cardiovascular trial data
Component
Parameter
Thresh.1
Thresh.2 - Thresh.1
CHOL
Interv. (audio)
Time (12 months)
Interv. by Time
Thresh.1
Thresh.2 - Thresh.1
BP
Interv. (audio)
Time (12 months)
Interv. by Time
Intercept
CHOL-BP
Interv. (audio)
Time (12 months)
Interv. by Time
T11
Random effects
variance terms
(Cholesky)
T21
T22
log L
—
MEBO1
−1.8949
(0.2195)
2.4548
(0.2195)
−0.1989
(0.2252)
0.0609
(0.1796)
−0.3859
(0.2728)
−1.0736
(0.1481)
1.8167
(0.1234)
0.2745
(0.2035)
0.2421
(0.1676)
−0.5975
(0.2666)
0.2209
(0.4513)
−0.0384
(0.6213)
0.1600
(0.7528)
0.0094
(1.0037)
1.2135
(0.1734)
−0.0407
(0.1274)
0.9239
(0.1274)
−860.3047
MEBO2
−1.8977
(0.2187)
2.4589
(0.2186)
−0.1989
(0.2241)
0.0602
(0.1794)
−0.3860
(0.2728)
−1.0747
(0.1482)
1.8194
(0.1235)
0.2760
(0.2031)
0.2413
(0.1674)
−0.5977
(0.2641)
0.1805
(0.4330)
−0.0467
(0.6193)
0.1794
(0.7478)
0.0051
(1.0012)
1.2176
(0.1720)
—
—
0.9287
(0.1275)
−860.3603
MEUO1
−1.8965
(0.2283)
2.4588
(0.2167)
−0.1992
(0.2309)
0.0590
(0.2059)
−0.3824
(0.2713)
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
1.2158
(0.1708)
—
—
—
—
−411.4020
MEUO2
—
—
—
—
—
—
—
—
—
—
−1.0722
(0.1569)
1.8167
(0.1431)
0.2773
(0.1943)
0.2424
(0.1657)
−0.5977
(0.2368)
—
—
—
—
—
—
—
—
—
—
—
—
0.9249
(0.1422)
−449.7377
IB0
−1.2158
(0.1144)
1.5677
(0.0777)
−0.1585
(0.1534)
0.0653
(0.1883)
−0.2593
(0.2735)
−0.8009
(0.1089)
1.3470
(0.0661)
0.1940
(0.1505)
0.1854
(0.1738)
−0.4337
(0.2571)
0.1021
(0.2306)
−0.1355
(0.3251)
0.0921
(0.3953)
−0.0133
(0.5341)
—
—
—
—
—
−895.9990
MEBO1 and MEBO2: Mixed Effects Bivariate Ordinal model under correlated (T12 = 0) and uncorrelated
(T12 = 0) random effects respectively; MEUO1 and MEUO2: Mixed Effects Univariate Ordinal model for
the first and second marginal respectively; IBO: Bivariate Ordinal model under independence.
CHOL and BP: Probit parameters for Cholesterol response and Blood pressure response respectively; CHOLBP: Parameters (on the Fisher transformation scale) of the underlying correlation for Cholesterol-Blood
pressure response.
2
,
T11 , T21 and T22 : Cholesky decomposition of the random intercept variance terms (V ar(d1k ) = T11
2
2
Cov(d1k , d2k ) = T11 T21 and V ar(d2k ) = T22 + T21 ).
23
Table 2
Model-based (MEBO1) marginal and joint cumulative probabilities for changes in Cholesterol and
Blood pressure status with the corresponding observed cumulative proportions in parentheses
Event
CHOL= 1
CHOL ≤ 2
Marginal(BP)
BP= 1
BP ≤ 2
Marginal(CHOL)
Intervention
No
No
Yes
Yes
Visit (months)
4
12
4
12
0.0262
0.0209
0.0237
0.0496
(0.0242)
(0.0099)
(0.0168)
(0.0467)
0.0826
0.0731
0.0907
0.1446
(0.1048)
(0.0693)
(0.0756)
(0.1121)
0.1141
0.1068
0.1404
0.1916
(0.1290)
(0.0990)
(0.1513)
(0.1682)
Intervention
No
No
Yes
Yes
Visit (months)
4
12
4
12
0.1407
0.1104
0.1121
0.1814
(0.1371)
(0.1188)
(0.1261)
(0.1776)
0.4557 (0.4435)
0.4111 (0.4059)
0.4375 (0.4454)
0.555 (0.5794)
0.6391
0.6245
0.6853
0.7547
(0.6210)
(0.6139)
(0.6891)
(0.7850)
Intervention
No
No
Yes
Yes
Visit (months)
4
12
4
12
0.2153
0.1670
0.1612
0.2331
(0.2177)
(0.1485)
(0.1765)
(0.2150)
0.7074
0.6436
0.6245
0.7274
(0.7016)
(0.6535)
(0.6218)
(0.7383)
—
—
—
—
The changes in cholesterol status (CHOL) and blood pressure (BP) are three-level ordinal outcomes where
1 represents a positive change; 2 no change; and 3 a negative change.
24