An extended random-effects approach to modeling repeated

Lifetime Data Anal (2007) 13:513–531
DOI 10.1007/s10985-007-9064-y
An extended random-effects approach to modeling
repeated, overdispersed count data
Geert Molenberghs · Geert Verbeke ·
Clarice G. B. Demétrio
Received: 19 August 2007 / Accepted: 28 September 2007 / Published online: 14 November 2007
© Springer Science+Business Media, LLC 2007
Abstract Non-Gaussian outcomes are often modeled using members of the
so-called exponential family. The Poisson model for count data falls within this tradition. The family in general, and the Poisson model in particular, are at the same
time convenient since mathematically elegant, but in need of extension since often
somewhat restrictive. Two of the main rationales for existing extensions are (1) the
occurrence of overdispersion, in the sense that the variability in the data is not adequately captured by the model’s prescribed mean-variance link, and (2) the accommodation of data hierarchies owing to, for example, repeatedly measuring the outcome
on the same subject, recording information from various members of the same family,
etc. There is a variety of overdispersion models for count data, such as, for example, the negative-binomial model. Hierarchies are often accommodated through the
inclusion of subject-specific, random effects. Though not always, one conventionally
assumes such random effects to be normally distributed. While both of these issues
may occur simultaneously, models accommodating them at once are less than common. This paper proposes a generalized linear model, accommodating overdispersion
and clustering through two separate sets of random effects, of gamma and normal
type, respectively. This is in line with the proposal by Booth et al. (Stat Model 3:179–
181, 2003). The model extends both classical overdispersion models for count data
G. Molenberghs (B)
Center for Statistics, Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
G. Verbeke
Biostatistical Centre, Katholieke Universiteit Leuven, Leuven, Belgium
e-mail: [email protected]
C. G. B. Demétrio
ESALQ, Piracicaba, SP, Brazil
e-mail: [email protected]
123
514
Lifetime Data Anal (2007) 13:513–531
(Breslow, Appl Stat 33:38–44, 1984), in particular the negative binomial model, as
well as the generalized linear mixed model (Breslow and Clayton, J Am Stat Assoc
88:9–25, 1993). Apart from model formulation, we briefly discuss several estimation
options, and then settle for maximum likelihood estimation with both fully analytic
integration as well as hybrid between analytic and numerical integration. The latter is
implemented in the SAS procedure NLMIXED. The methodology is applied to data
from a study in epileptic seizures.
Keywords Bernoulli model · Beta-binomial model · Maximum likelihood ·
Negative-binomial model · Poisson model
1 Introduction
Next to continuous and binary outcomes, counts, also termed Poisson data, feature
prominently in the modeling literature and in applied modeling work. It is common to
place such models within the generalized linear modeling (GLM) framework (Nelder
and Wedderburn 1972; McCullagh and Nelder 1989; Agresti 2002). This framework
allows one to specify first and second moments only, on the one hand or full distributional assumptions on the other hand. In the latter case, the exponential family
(McCullagh and Nelder 1989) provides an elegant and encompassing mathematical framework, since it has the normal, Bernoulli/binomial, and Poisson models as
prominent members.
The elegance of the framework draws from certain linearity properties of the loglikelihood function, producing mathematically convenient score equations and ultimately straightforward to use inferential instruments, both in terms of point and interval
estimation as well as for hypothesis testing.
Nevertheless, it has been clear for several decades, for binary data but even more
so for counts, that a key feature of the GLM framework and the exponential family, the so-called mean-variance relationship, may be overly restrictive. By this relationship, we indicate that the variance is a deterministic function of the mean. For
example, for Bernoulli outcomes with success probability µ = π , the variance is
v(µ) = π(1 − π ), while for counts following the Poisson model, the relationship is
even simpler: v(µ) = µ. Note that, in the case of continuous, normally distributed
outcomes, the mean and variance are entirely separate parameters. While i.i.d. binary
data cannot contradict the mean-variance relationship, i.i.d. counts can. This explains
why early work has been devoted to formulating models that explicitly allow for overdispersion. For binary data, hierarchies need to be present in the data in order to violate
the mean-variance link. One such class of hierarchies is with repeated measures or
longitudinal data, where an outcome on a study subject is recorded repeatedly over
time. With such models gaining momentum, not only for the Gaussian case (Verbeke
and Molenberghs 2000), but also for non-Gaussian data (Molenberghs and Verbeke
2005), extension of the GLM framework have been formulated.
Breslow has contributed in substantial ways to both strands of extensions for the
Poisson model. Breslow (1984) is devoted to overdispersion in the Poisson model,
also termed extra-Poisson variation. One of the important models in this respect is the
123
Lifetime Data Anal (2007) 13:513–531
515
negative-binomial model, where the natural parameter is assumed to follow a gamma
distribution. Details of the model will be given in Sect. 3.2. Lawless (1987) also contributed to this class of extensions. Even better known is Breslow’s work on extending the GLM for repeated measures using normally distributed random effects, the
so-called generalized linear mixed model (GLMM), proposed by Breslow and Clayton (1993) and Wolfinger and O’Connell (1993). Also noteworthy in this context is the
paper by Engel and Keen (1992). Such random effects not only modify the variance
structure, they also induce correlation between the repeated measures on the same
subject.
Conceivably, there are situations where overdispersion and the need for hierarchical modeling occur simultaneously. Of course, overdispersion can be cast within the
context of hierarchical modeling, but it is useful to note that the hierarchical framework encompasses, at the same time, overdispersion and correlation between repeated
observations. We believe, in concordance with Booth et al. (2003) who formulated a
model combining normal and gamma random effects, that formulating an extended
model encompassing both not only responds to a data-analytic need, it also is a fitting
tribute to Professor Norman Breslow’s seminal work in these areas.
The paper is organized as follows. In Sect. 2, a motivating case study, stemming
from patients with epileptic seizures, is introduced. Section 3 presents three main
ingredients: standard generalized linear models, with emphasis on the Poisson model
(Sect. 3.1); (2) classical models for overdispersion (Sect. 3.2); (3) models for repeated
measures with normal random effects, with focus on the Poisson-normal model (Sect.
3.3). In Sect. 4 our combined model, in line with the proposal by Booth et al. (2003),
is proposed. Section 5 considers various estimation methods and then settles for likelihood-based estimation, employing a combination of analytic and numerical integration
techniques for the marginal density. The case study is analyzed in Sect. 6.
2 A clinical trial in epileptic patients
The data considered here are obtained from a randomized, double-blind, parallel group
multi-center study for the comparison of placebo with a new anti-epileptic drug (AED),
in combination with one or two other AEDs. The study is described in full detail
in Faught et al. (1996). The randomization of epilepsy patients took place after a
12-week baseline period that served as a stabilization period for the use of AEDs,
and during which the number of seizures were counted. After that period, 45 patients
were assigned to the placebo group, 44 to the active (new) treatment group. Patients
were then measured weekly. Patients were followed (double-blind) during 16 weeks,
after which they were entered into a long-term open-extension study. Some patients
were followed for up to 27 weeks. The outcome of interest is the number of epileptic
seizures experienced during the last week, i.e., since the last time the outcome was
measured. The key research question is whether or not the additional new treatment
reduces the number of epileptic seizures. As a summary of the data, Fig. 1 shows a
frequency plot, over all visits, over both treatment groups. We observe a very skewed
distribution, with largest observed value equal to 73 seizures in one week time. Average
and median evolutions are shown in Fig. 2. The unstable behavior can be explained
123
516
Lifetime Data Anal (2007) 13:513–531
Fig. 1 Epilepsy data: frequency plot, over all visits, over both treatment groups
Fig. 2 Epilepsy data: average and median evolutions over time
by the presence of extreme values, but is also the result of the fact that very little
observations are available at some of the time-points, especially past week 20. This
is also reflected in Table 1, which shows the number of measurements at a selection
of time-points. Note the serious drop in number of measurements past the end of the
actual double-blind period, i.e., past week 16. An interesting aspect of using these data
is that they were used as one of the three illustrating examples in Booth et al. (2003).
3 Review of key ingredients
In Sect. 3.1, we will first describe the conventional exponential family and generalized
linear modeling based on it. Section 3.2 is devoted to a brief review of models for overdispersion.
123
Lifetime Data Anal (2007) 13:513–531
Table 1 Epilepsy data
Week
1
Number of measurements
available at a selection of
time-points, for both treatment
groups separately
517
# Observations
Placebo
Treatment
Total
45
44
89
5
42
42
84
10
41
40
81
15
40
38
78
16
40
37
77
17
18
17
35
20
2
8
10
27
0
3
3
3.1 Standard generalized linear models
A random variable Y follows an exponential family distribution if the density is of the
form
f (y) ≡ f (y|θ, φ) = exp φ −1 [yθ − ψ(θ )] + c(y, φ) ,
(1)
for a specific set of unknown parameters θ and φ, and for known functions ψ(·) and
c(·, ·). Often, θ and φ are termed ‘natural parameter’ (or ‘canonical parameter’) and
‘scale parameter,’ respectively.
It can easily be shown (Verbeke and Molenberghs 2005), that the first two moments
follow from the function ψ(·) as:
E(Y ) = µ = ψ (θ ),
Var(Y ) = σ 2 = φψ (θ ).
(2)
(3)
An important implication is that, in general, the mean and variance are related
through σ 2 = φψ [ψ −1 (µ)] = φv(µ), with v(·) the so-called variance function,
describing the mean-variance relationship.
In some situations, e.g., when quasi-likelihood methods are employed (McCullagh
and Nelder 1989, Molenberghs and Verbeke 2005), no full distributional assumptions
are made, but one rather restricts to specifying the first two moments (2) and (3). In
such an instance, the variance function v(µ) can be chosen in accordance with a particular member of the exponential family. If not, then parameters cannot be estimated
using maximum likelihood principles. Instead, a set of estimating equations needs to
be specified, the solution of which is referred to as the quasi-likelihood estimates.
In a regression context, where one wishes to explain variability between outcome
values based on measured covariate values, the model needs to incorporate covariates.
This leads to so-called generalized linear models. Let Y1 , . . . , Y N be a set of independent outcomes, and let x1 , . . . , x N represent the corresponding p-dimensional vectors
123
518
Lifetime Data Anal (2007) 13:513–531
of covariate values. It is assumed that all Yi have densities f (yi |θi , φ) which belong to
the exponential family, but a different natural parameter θi is allowed per observation.
Specification of the generalized linear model is completed by modeling the means µi
as functions of the covariate values. More specifically, it is assumed that
µi = h(θi ) = h(xi β),
for a known function h(·), and with β a vector of p fixed unknown regression coefficients. Usually, h −1 (·) is called the link function. In most applications, the so-called
natural link function is used, i.e., h(·) = ψ (·), which is equivalent to assuming
θi = xi β. Hence, it is assumed that the natural parameter satisfies a linear regression
model.
Turning to the central exponential family member for this paper, let Y be Poisson
distributed with mean λ, denoted by Y ∼ Poi(λ). The density can be written as
f (y) =
e−λ λ y
= exp{y ln λ − λ − ln y!},
y!
(4)
from which it follows that the Poisson distribution belongs to the exponential family,
with natural parameter θ equal to ln λ, mean µ = λ, scale parameter φ = 1, and
variance function v(µ) = µ = λ. The logarithm is the natural link function, leading
to the classical Poisson regression model Yi ∼ Poisson(λi ), with ln λi = xi β.
3.2 Overdispersion models
It is clear from (4) that the standard Poisson model forces equality between mean and
variance. However, comparing the sample average with the sample variance might
already reveal this assumption is not in line with a particular set of data. Therefore, a
number of extensions have been proposed (Breslow 1984; Lawless 1987). A straightforward step is to allow the overdispersion parameter φ = 1, so that (3) produces
Var(Y ) = φµ. This is in line with the moment-based approach mentioned in the previous section, although one can still think of such moments as arising from a random
sum of Poisson variables, a point made by Hinde and Demétrio (1998a,b).
An elegant way forward is through a two-stage approach. A commonly encountered instance is by assuming that Yi |λi ∼ Poi(λi ) and then further that λi is a random
variable with E(Yi ) = µi and Var(Yi ) = σi2 . Using iterated expectations, it follows
that
E(Yi ) = E[E(Yi |λi )] = E(λi ) = µi ,
Var(Yi ) = E[Var(Yi |λi )] + Var[E(Yi |λi )] = E(λi ) + Var(λi ) = µi + σi2 .
Note that we have not assumed a particular distributional form for the random
effects λi . A common choice is the gamma distribution, leading to the so-called negative-binomial model. The full model will be discussed in Sect. 5.
123
Lifetime Data Anal (2007) 13:513–531
519
Note that it is easy to extend this model to the case of repeated measurements. We
then assume a hierarchical data structure, where now Yi j denotes the jth outcome measured for cluster (subject) i, i = 1, . . . , N , j = 1, . . . , n i and Yi is the n i -dimensional
vector of all measurements available for cluster i.
In this case, the scalar λi becomes a vector λi = (λi1 , . . . , λin i ) , with E(λi ) = µi
and Var(λi ) = i . Similar logic as in the univariate case produces E(Yi ) = µi and
Var(Yi ) = Mi + i , where Mi is a diagonal matrix with the vector µi along the
diagonal. Note that the diagonal structure of Mi reflects the conditional independence
assumption: all dependence between measurements on the same unit stem from the
random effects. A versatile class of models results. For example, assuming the components of λi are independent, a pure overdispersion model results, without correlation
between the repeated measures. On the other hand, assuming λi j = λi , i.e., all components are equal, then Var(Yi ) = Mi + σi2 Jn i , where Jn i is an n i × n i dimensional
matrix of ones. Such a structure can be seen as a Poisson version of compound symmetry. Of course, one can also combine general correlation structures between the
components of λi .
Alternatively, this repeated version of the overdispersion model can be combined
with normal random effects in the linear predictor. Models of this type have been
proposed by Thall and Vail (1990) and Dean (1991). Let us first review this well-know
instance of a generalized linear mixed model (Sect. 3.3), and then move on to our
proposed combination of both (Sect. 4).
3.3 Models with normal random effects: the Poisson-normal model
The generalized linear mixed model is the most frequently used random-effects model
in the context of non-Gaussian repeated measurements. Not only is it a rather straightforward extension of the generalized linear model for univariate data to the context
of clustered measurements, it also bears resemble with the linear mixed model for
continuous, normally distributed, repeated measures.
In full generality, one assumes that, conditionally on q-dimensional random effects
bi , often assumed to be drawn independently from the N (0, D), the outcomes Yi j be
independent with exponential-family densities of the form
f i (yi j |bi , β, φ) = exp φ −1 [yi j θi j − ψ(θi j )] + c(yi j , φ) ,
(5)
with η(µi j ) = η[E(Yi j |bi )] = x i j β + z i j bi for a known link function η(·), with x i j
and z i j p-dimensional and q-dimensional vectors of known covariate values, with β
a p-dimensional vector of unknown fixed regression coefficients, and with φ a scale
parameter. Finally, let f (bi |D) be the density of the N (0, D) distribution for the
random effects bi .
For the specific case of Poisson data, the assumptions are
123
520
Lifetime Data Anal (2007) 13:513–531
Yi j ∼ Poi(λi j ),
(6)
ln(λi j ) = x i j β + z i j bi ,
(7)
bi ∼ N (0, D).
(8)
When indeed normality for the random effects is assumed, the mean vector and
variance-covariance matrix of Yi can be derived relatively easily. Details of the calculations are given in Appendix A.
The mean vector of µi = E(Yi ) has components:
µi j = exp
x i j β
1 + zi j D zi j ,
2
(9)
with the variance-covariance matrix
Var(Yi ) = Mi + Mi e Z i D Z i − Jn i Mi .
(10)
In the special case of univariate data with a single normal random effect bi ∼
N (0, d), expressions (9) and (10) simplify to:
1
µi = exp xi β + d ,
2
Var(Yi ) = µi + µi2 (ed − 1),
(11)
(12)
the latter being of the well-known quadratic form.
4 Models combining overdispersion with normal random effects
Combining ideas from the overdispersion models in Sect. 3.2 and the Poisson-normal
model of Sect. 3.3, we can specify, as did Booth et al. (2003), who were the first in this
respect, a model for repeated Poisson data with overdispersion, by extending (6)–(8) to
Yi j ∼ Poi(λi j ),
λi j = θi j exp x i j β + z i j bi ,
bi ∼ N (0, D).,
E(θ i ) = E[(θi1 , . . . , θin i ) ] = i ,
Var(θ i ) = i
(13)
(14)
(15)
(16)
(17)
This model has the same structure of the one by Booth et al. (2003). In addition,
we will provide expressions for the mean vector, the variance-covariance matrix and,
in Sect. 5, for the joint marginal probabilities.
123
Lifetime Data Anal (2007) 13:513–531
521
The mean vector of µi = E(Yi ) now has components:
1 + zi j D zi j ,
2
(18)
Var(Yi ) = Mi + Mi Pi − Jn i Mi ,
(19)
µi j = φi j exp
x i j β
with the variance-covariance matrix
where Mi is defined as before and the ( j, k)th element of Pi equals
σi, jk + φi j φik
1 1 z Dz ik ·
z D zi j .
= exp
· exp
2 ij
φi j φik
2 ik
pi, jk
(20)
Details of the calculations are given in Appendix A.
In the special case of univariate data with a single random θi with mean φi and
variance σ 2 , and single normal random effect bi ∼ N (0, d), expressions (18) and (19)
simplify to:
1
µi = φi exp xi β + d ,
2
Var(Yi ) = µi + µi2 · ed − 1 +
(21)
σi2
φi2
ed .
(22)
It is instructive to see that (22) still is a quadratic in µi , exactly like (12), but now
with a coefficient depending on both random effects.
Note that there are other examples of models including non-normal random effects.
For example, Lee and Nelder (1996, 2001, 2003), in a series of influential papers, proposed hierarchical generalized linear models, where random effects can be non-normal. In addition, these authors proposed a computationally feasible estimating method,
both in numerical as well as in statistical terms.
5 Estimation
We first consider the classical negative-binomial model, then consider the conventional
Poisson-normal model, whereafter the combined model is tackled.
5.1 The negative-binomial model
In the classical univariate overdispersion model, a common choice for the distribution
of the parameter is (dropping the index i) λ ∼ Gamma(α, β), with density
f (λ) =
1
λα−1 e−λ/β ,
β α (α)
123
522
Lifetime Data Anal (2007) 13:513–531
where (·) is the gamma function. Straightforward algebra produces
+∞
y −λ
1
α−1 −λ/β λ e
λ
e
β α (α) 0
y!
y α
1
β
α+y−1
.
=
α−1
β +1
β
P(Y = y) =
The corresponding mean and variance are then given by αβ and αβ(β + 1), respectively.
5.2 The Poisson-normal and combined models
Random-effects models can be fitted by maximization of the marginal likelihood,
obtained by integrating out the random effects from conditional densities of the form
(5), in particular from their Poisson-normal form as specified by (6)–(8). The likelihood contribution of subject i generally is
f i ( yi |β, D, φ) =
ni
f i j (yi j |bi , β, φ) f (bi |D) d bi ,
(23)
j=1
from which the likelihood for β, D, and φ is derived as
L(β, D, φ) =
N
f i ( yi |β, D, φ)
i=1
=
ni
N i=1
f i j (yi j |bi , β, φ) f (bi |D) d bi .
(24)
j=1
The key problem in maximizing (24) is the presence of N integrals over the
q-dimensional random effects bi . Generally, no closed-form solution exists, in which
case one resorts to such methods as numerical integration, series expansion methods,
including penalized quasi-likelihood and marginal quasi-likelihood, Laplace approximation, etc. For an overview, see Molenberghs and Verbeke (2005). Some of these
methods have been implemented in standard software, including the SAS procedures
NLMIXED for numerical integration and GLIMMIX for expansion methods. Several
of the series expansion methods tend to exhibit bias, an issue taken up in Breslow and
Lin (1995).
In some special cases, these integrals can be worked out analytically. The best
known example is the linear mixed effects model (Verbeke and Molenberghs 2000).
Fortunately, also the Poisson-normal model lends itself to such analytic calculations.
We already produced mean and variance expressions (9) and (10), but also the full
joint probabilities can be derived. Now, the same is true for the combined model, for
which also mean and variance expressions (18) and (19) have been derived.
123
Lifetime Data Anal (2007) 13:513–531
523
As shown in Appendix B, the joint probability of Yi takes the form:
⎤
⎡
ni y +t
yi j + t j
α j + yi j + t j − 1
⎣
·
P(Yi = yi ) =
· (−1)t j · β j i j j ⎦
yi j
αj − 1
j=1
t
⎛
⎞
ni
× exp ⎝ (yi j + t j )x i j β ⎠
j=1
⎧ ⎡
⎤ ⎡
⎤⎫
ni
ni
⎨1 ⎬
⎣ (yi j + t j )z i j ⎦ D ⎣ (yi j + t j )z i j ⎦ .
× exp
⎩2
⎭
j=1
(25)
j=1
In the above equation, the vector-valued index t = (t1 , . . . , tn i ) ranges over all
non-negative integer vectors.
When there are no overdispersion random effects, (25) simplifies to:
⎤
⎡
n i
ni
tj
j=1
(−1)
1
P(Yi = yi ) = n i
· exp ⎣ (yi j + t j )x i j β ⎦
n i
y
!
t
!
j=1 i j
j=1 j
j=1
t
⎧ ⎡
⎤ ⎡
⎤⎫
ni
ni
⎨1 ⎬
⎣ (yi j + t j )z i j ⎦ D ⎣ (yi j + t j )z i j ⎦ .
× exp
⎩2
⎭
j=1
(26)
j=1
It is instructive to consider these expressions for univariate data. In this case, (25)
simplifies to
+∞ yi + t
α + yi + t − 1
yi +t
t
P(Yi = yi ) =
(−1) exp xi β(yi + t)
β
yi
α−1
t=0
1
(27)
+ d(yi + t)2 ,
2
and (26) becomes
P(Yi = yi ) =
+∞
1 (−1)t
1
exp xi β(yi + t) + d(yi + t)2 .
yi !
t!
2
(28)
t=0
It is easy to show that, when no normal random effect is present, (27) reduces to the
standard negative-binomial model, while (28) becomes the standard Poisson model.
While such expressions can be usefully used to implement maximum likelihood
estimation, with numerical accuracy governed by the number of terms included in the
series, one can also proceed by what we will term partial marginalization. By this we
123
524
Lifetime Data Anal (2007) 13:513–531
refer to integrating (13)–(17) over the gamma random effects only, leaving the normal
random effects untouched. The corresponding probability is:
P(Yi j = yi j |bi ) =
α j + yi j −1
αj − 1
·
βj
1+κi j β j
yi j ·
1
1+κi j β j
α j
y
κi ji j , (29)
where κi j = exp[x i j β + z i j bi ]. Note that, with this approach, we assume that the
gamma random effects are independent within a subject. This is fine, given the correlation is induced by the normal random effects.
Now, it is easy to obtain the fully marginalized probability by numerically integration the normal random effects out of (29) using a tool such as the SAS procedure
NLMIXED that allows for normal random effects in arbitrary, user-specified models.
It is important to realize that not all parameters may be simultaneously identifiable.
For example, the gamma-distribution parameters α and β are not simultaneously identifiable when the linear-predictor part is also present, since there is aliasing with the
intercept term. Therefore, one can set, for example, β equal to a constant, removing
the identifiability problem. It is then clear that α, in the univariate case, or the set of
α j in the repeated-measures case, describe the additional overdispersion, in addition
to what stems from the normal random effect(s).
Note that the estimation options considered here are not the only ones. One might,
for example, opt for fully Bayesian inferences. Alternatively, the EM algorithm can be
used, in line with Booth et al. (2003). These authors also considered non-parametric
maximum likelihood, in the spirit of Aitkin (1999) and Alfò and Aitkin (2000).
6 Analysis of the epilepsy data
We will analyze the epilepsy data, introduced in Sect. 2. Note that the data were analyzed before in Molenberghs and Verbeke (2005, Ch. 19), using generalized estimating
equations (Liang and Zeger 1986) and the Poisson-normal model. These authors used
a slightly different parameterization.
Let Yi j represent the number of epileptic seizures patient i experiences during week
j of the follow-up period. Also, let ti j be the time-point at which Yi j has been measured, ti j = 1, 2, . . . until at most 27. Let us consider the combined model (13)–(17),
with specific choices
ln(λi j ) =
(β00 + bi ) + β01 ti j if placebo
(β10 + bi ) + β11 ti j if treated,
(30)
where the random intercept bi is assumed to be zero-mean normally distributed with
variance d. We consider special cases (1) the ordinary Poisson model, (2) the negative-binomial model, (3) the Poisson-normal model, together with (4) the combined
model. The SAS implementation is briefly discussed in Appendix C. Estimates (standard errors) are presented in Table 2. Clearly, both the negative-binomial model and
the Poisson-normal model are important improvements, in terms of the likelihood, relative to the ordinary Poisson model. This should come as no surprise since the latter
123
Lifetime Data Anal (2007) 13:513–531
525
Table 2 Epilepsy study
Effect
Parameter
Poisson
Negative-binomial
Estimate (s.e.)
Estimate (s.e.)
−1.0966 (0.1219)
Intercept placebo
β00
1.2662 (0.0424)
Slope placebo
β01
−0.0134 (0.0043)
−0.0126 (0.0111)
Intercept treatment
β10
1.4531 (0.0383)
0.8809 (0.1196)
Slope treatment
β11
−0.0328 (0.0038)
−0.0352 (0.0101)
Difference in slopes
β11 − β01
−0.0195 (0.0058)
−0.0227 (0.0150)
Ratio of slopes
β11 /β01
2.4576 (0.8481)
2.8084 (2.6070)
Negative-binomial parameter
α
–
0.4773 (0.5775)
Variance of random intercepts
d
–
–
11,590→ −1492
−6755
Poisson-normal
Combined
Estimate (s.e.)
Estimate (s.e.)
−2 log-likelihood
Effect
Parameter
Intercept placebo
β0
0.8179 (0.1677)
−2.9862 (0.1965)
Slope placebo
β1
−0.0143 (0.0044)
−0.0248 (0.0077)
Intercept treatment
β0
0.6475 (0.1701)
−3.2420 (0.1982)
Slope treatment
β2
−0.0120 (0.0043)
−0.0118 (0.0075)
Difference in slopes
β11 − β01
0.0023 (0.0062)
0.0130 (0.0107)
Ratio of slopes
β11 /β01
0.8398 (0.3979)
0.4751 (0.3345)
Negative-binomial parameter
α
–
2.4640 (0.2113)
Variance of random intercepts
d
1.1568 (0.1844)
1.1289 (0.1850)
6272= −6810(g.l.)
−7664
−2 log-likelihood
Parameter estimates and standard errors for the regression coefficients in (1) the Poisson model, (2) the
negative-binomial model, (3) the Poisson-normal model, and (4) the combined model. Estimation was done
by maximum likelihood using numerical integration over the normal random effect, if present
unrealistically assumes there is neither overdispersion nor correlation within the outcomes, while clearly both are present. In addition, when considering the combined
model, there is a very strong improvement in fit when gamma and normal random
effects are simultaneously allowed for. This strongly affects the point and precision
estimates of such key parameters as the slope difference and the slope ratio. There is
also an impact on hypothesis testing. The Poisson model leads to unequivocal significance for both the difference ( p = 0.0008) and ratio ( p = 0.0038), whereas for the
Poisson normal this is not the case for the slopes ( p = 0.7115), while some significance is maintained for the ratio ( p = 0.0376). Since the Poisson-normal is commonly
used, it is likely that in practice one would decide in favor of a treatment effect when
considering the slope ratio. This is no longer true with the negative-binomial model,
where the p-values change to p = 0.01310 and p = 0.2815, respectively. Of course,
one must not forget that, while the negative-binomial model accommodates overdispersion, the θi j random effects are assumed independent, implying independence
123
526
Lifetime Data Anal (2007) 13:513–531
between repeated measures. Again, this is not realistic and therefore the combined
model is a more viable candidate, corroborated further by the aforementioned likelihood comparison. This model produces non-significant p-values of p = 0.2260 and
p = 1591, respectively.
Thus, in conclusion, whereas the conventionally used and broadly implemented
Poisson-normal model would suggest a significant effect of treatment, our combined
model issues a message of caution, because there is no evidence whatsoever regarding
a treatment difference.
Molenberghs and Verbeke (2005, Ch. 19), considered a Poisson-normal model with
random intercepts as well as random slopes in time. It is interesting to note that, when
allowing for such an extension in our models, the random slopes improve the fit of the
Poisson-normal model, but not of the combined one (details not shown).
Recall that the data were analyzed, too, by Booth et al. (2003). While we considered
four different models, these authors focused on the Poisson-normal and combined
implementations. There are further differences in actual fixed-effects and randomeffects models considered, as well as in us further considering inferences for differences and ratios.
7 Concluding remarks
In this paper, we have argued that, rather than choosing between normal and gamma
random effects, both can be usefully integrated into a single model, which we have
termed the combined model. The flexibility of the model stems from using the normal random effects to induce association between repeated Poisson data, whereas the
gamma distributed random factor in the log-linear predictor fine tunes the overdispersion. The model produces the standard negative-binomial and Poisson-normal models
as special cases, both when there are repeated measures as well as with univariate
outcomes. Of course, the basic Poisson model is another special case, jointly of all
aforementioned special cases.
We derived explicit expressions, not only for means, variances, and covariances of
the combined model and all submodels, also for the joint marginal probability of the
outcome vector, such closed form solutions have been obtained.
In terms of estimation, we have focused on maximum likelihood estimation. This
can be done by integrating over the random effects, either fully analytically, using the
explicit expressions derived, or by combining analytic and numeric techniques. The
latter has been implemented in the SAS procedure NLMIXED and applied to a case
study in patients with epileptic seizures. The data analysis has shown that not only
does the combined method improve the fit over its special cases, there is also an impact
on the conclusions about key scientific parameters of the model.
We have focused on counts, for which explicit expressions can be derived. It is
perfectly possible to consider other types of outcomes. A general framework, encompassing the binary and Poisson types as special cases, is currently under investigation.
Acknowledgements Financial support from the IAP research network #P6/03 of the Belgian Government
(Belgian Science Policy) is gratefully acknowledged. The third author is supported by CNPq, a Brazilian
Science Funding Agency.
123
Lifetime Data Anal (2007) 13:513–531
527
Appendix
A Mean and variance for the combined model
Let us derive the mean and variance expressions (18) and (19). The mean of a component can easily be derived as follows:
! " #$
E(Yi j ) = E E E Yi j |θi j , bi
&
%
= E E θi j exp x i j β + z i j bi
= E φi j exp x i j β + z i j bi
&
%
= φi j exp x i j β E exp z i j bi
1 = φi j exp x i j β + z i j D z i j ,
2
where the last step follows from the following consideration. Note that the function
f = f i j = z i j bi is normally distributed with variance d = z i j D z i j . Hence, the
expectation takes the form
1 f2
1
E ef = √ √
e f e− 2 · d d f
2π d
1 ( f −d)2 1
1
= √ √
e− 2 · d e 2 d d f
2π d
1
= e2d.
(31)
(32)
Turning to the variance, a general expression is:
Var(Yi j ) = E{E[Var(Yi j |θi j bi )]} + E{Var[E(Yi j |θi j bi )]} + Var{E[E(Yi j |θi j bi )]}
= T1 + T2 + T3 .
(33)
We will derive expressions for each of the three terms in turn. Write θi j exp(x i j β +
z i j ) = θi j κi j for brevity. Then it is easy to see, using that mean and variance are the
same in the Poisson model, and equal to θi j κi j using our notation:
T1 = E(θi j )E(κi j ),
T2 =
E(θi2j )E(κi2j ) − E(θi j )2 E(κi j )2 ,
%
&
T3 = E(θi j )2 E(κi2j ) − E(κi j )2 ,
(34)
(35)
(36)
where expectations are taken over the appropriate random effect. Substituting (34)–
(36) into (33) and simplifying produces:
Var(Yi j ) = E(θi j )E(κi j ) + E(θi2j )E(κi2j ) − E(θi j )2 E(κi j )2 .
(37)
123
528
Lifetime Data Anal (2007) 13:513–531
Similar logic, applied to the covariance produces, for j = k:
Cov(Yi j , Yik ) = E(κi j κik )Cov(θi j , θik ) + Cov(κi j , κik )E(θi j )E(θik )
= E(κi j κik )E(θi j θik ) − E(κi j )E(κik )E(θi j )E(θik ).
(38)
(39)
Clearly, means, variances, and covariances of the θi j effects are
E(κi j ) = φi j , Var(κi j ) = σi, j j , Cov(κi j , κik ) = σi, jk ,
(40)
respectively. For the κi j components, (31) produces the mean form
E(κi j ) = e x i j β + 2 z i j D z i j .
1
(41)
Computations similar to (31) lead to:
Var(κi j ) = e2 x i j β + z i j D z i j · e z i j D z i j − 1 .
(42)
The corresponding covariance form is:
1 Cov(κi j , κik ) = e x i j β + x ik β + 2 ( z i j D z i j + z ik D z ik ) · e z i j D z ik − 1 .
(43)
Plugging (40)–(43) into (37) and (39), and re-organizing terms, leads to (19). Clearly,
these expressions also produce their simplified counterparts for the special cases,
including the negative-binomial model, the Poisson-normal model, and the Poisson
model, not only for the repeated-measures case, but clearly also for the further simplifications produced by considering univariate data only.
B Full marginal density for the combined model
Since it can be the Poisson-normal case follows from the combined model by taking
limits over the gamma distribution only, it is easier not to follow this route but rather
consider the Poisson-normal model separately. To derive (26), first write the terms
of the linear predictor as x i j β = γi j and z i j bi = ci j , for convenience. The joint
probability follows from:
P(Yi = yi ) =
ni
1
1 γi j +ci j yi j −eγi j +ci j
1
e
e
e− 2 bi D bi dbi (44)
1/2
q/2
yi j !
|D| (2π )
i=1
We will use the following series expansion of the double exponential factor:
e−e
γi j +ci j
=
+∞
(−1)t etγi j
t=0
123
t!
etci j .
Lifetime Data Anal (2007) 13:513–531
529
Plugging this expression into (44) and re-ordering terms produces:
ni
ni
n i
1
e j=1 (yi j +t j )γi j
P(Yi = yi ) =
(−1) j=1 t j
n i 1
y !
j=1 t j !
j=1 i j t =(t1 ,...,tni )
n
i
1
− 21 bi D bi
× e j=1 (yi j +t j )ci j
e
dbi .
|D|1/2 (2π )q/2
(45)
The integral
in (45) can
in a fashion similar to (31), and hence it reduces
be reorganized
n i
to exp 0.5wi j Dwi j , with wi j = j=1 (yi j + t j )z i j . Substituting this back in (45),
and re-ordering factors, leads to (26).
Now turning to the combined model, the parameters λi j change to
γi j +ci j
λi j = θi j e x i j β + z i j bi = θi j eγi j +ci j = e'
,
with γi j and ci j the shorthand notation introduced above, and '
γi j absorbing the gamma
random effect. With this notation, P(Yi = yi |ηi ) takes the form (26). We then merely
have to integrate further over the gamma random effect, assuming that θi j are independent, gamma distributed random variables θi j ∼ Gamma(α j , β j ). Re-expressing
γi j in its constituents, the integral factor is:
'
1
α
βj j
1
(α j )
α −1 −θi j /β j yi j +t j
e
θi j
θi j j
dθi j =
1
α
βj j
1
α +y +t
β j i j j (α j + yi j + t j ).
(α j ) j
Combining this expression with the other factors, re-ordering terms, and using properties of the Gamma function, produces (25).
Obviously, the above computations simplify without problem to the special cases
of the ordinary Poisson model and the negative-binomial models, respectively. The
Poisson-normal and combined model expressions for univariate data follow without
any problem, too.
C SAS implementation
A SAS program, using the procedure NLMIXED, for the standard Poisson-normal
model is as follows:
proc nlmixed data=test qpoints=50;
title ’Poisson-normal Model’;
parms int0=0.5 slope0=-0.1 int1=1 slope1=0.1 sigma=1;
if (trt = 0) then eta = int0 + b + slope0*time;
else if (trt = 1) then eta = int1 + b + slope1*time;
lambda = exp(eta);
model nseizw ˜ poisson(lambda);
random b ˜ normal(0,sigma**2) subject = id;
esti\-mate ’difference in slope’ slope1-slope0;
123
530
Lifetime Data Anal (2007) 13:513–531
esti\-mate ’ratio of slopes’ slope1/slope0;
esti\-mate ’variance RIs’ sigma**2;
run;
The special case of the Poisson model simply follows from removing the RANDOM
statement, and inclusion of random slopes next to random intercepts implies the following straightforward change to four of the lines of code:
. . .
parms int0=0.5 slope0=-0.1 int1=1 slope1=0.1 d11=1 rho=0 d22=0.1;
if (trt = 0) then eta = int0 + b1 + (slope0+b2)*time;
else if (trt = 1) then eta = int1 + b1 + (slope1+b2)*time;
. . .
random b1 b2 ˜ normal([0,0],[d11,rho*sqrt(d11)*sqrt(d22), d22]) subject = id;
. . .
These programs make use of the built-in Poisson likelihood. In view of the combined
model, it is instructive to re-implement them using the so-called general likelihood,
i.e., user-defined likelihood, feature, for which the following lines replace their counterpart in the earlier program:
. . .
loglik=-lambda+nseizw*eta;
model nseizw ˜ general(loglik);
. . .
Now, as noted by Booth et al. (2003), who construct a program similar in spirit, the
general likelihood feature is ideally suited to implement the combined model, using
(29), the partially integrated version of the model:
proc nlmixed data=test qpoints=50;
title ’Poisson-Combined Model’;
parms int0=0.5 slope0=-0.1 int1=1 slope1=0.1 sigma=1 alpha=5;
if (trt = 0) then eta = int0 + b + slope0*time;
else if (trt = 1) then eta = int1 + b + slope1*time;
lambda = exp(eta);
beta=20;
loglik=lgamma(alpha+y)-lgamma(alpha)+y*log(beta)
-(y+alpha)*log(1+beta*lambda+y*eta;
model y ˜ general(loglik);
random b ˜ normal(0,sigma**2) subject = id;
estimate ’difference in slope’ slope1-slope0;
estimate ’ratio of slopes’ slope1/slope0;
estimate ’variance RIs’ sigma**2;
run;
The implementation of the combined model is relatively easy and certainly of the
same order of programming complexity than the classical Poisson-normal model.
When comparing our program to the one by Booth et al. (2006), note that we employ
a somewhat different programming statements logic, and that the program is supplemented with additional ESTIMATE statements for key inferential parameters.
123
Lifetime Data Anal (2007) 13:513–531
531
References
1. Agresti A (2002) Categorical data analysis, 2nd edn. John Wiley & Sons, New York
2. Aitkin M (1999) A general maximum likelihood analysis of variance components in generalized linear
models. Biometrics 55:117–128
3. Alfò M, Aitkin M (2000) Random coefficient models for binary longitudinal responses with attrition.
Stat Comput 10:279–288
4. Booth JG, Casella G, Friedl H, Hobert JP (2003) Negative binomial loglinear mixed models. Stat
Model 3:179–181
5. Breslow NE (1984) Extra-Poisson variation in log-linear models. Appl Stat 33:38–44
6. Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am
Stat Assoc 88:9–25
7. Breslow NE, Lin X (1995) Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82:81–91
8. Dean CB (1991) Estimating equations for mixed-Poisson models. In: Godambe VP (ed) Estimating
functions. Oxford University Press, Oxford
9. Engel B, Keen A (1994) A simple approach for the analysis of generalized linear mixed models.
Statistica Neerlandica 48:1–22
10. Faught E, Wilder BJ, Ramsay RE, Reife RA, Kramer LD, Pledger GW, Karim RM (1996) Topiramate
placebo-controlled dose-ranging trial in refractory partial epilepsy using 200-, 400-, and 600-mg daily
dosages. Neurology 46:1684–1690
11. Hinde J, Demétrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal
27:151–170
12. Hinde J, Demétrio CGB (1998) Overdispersion: models and estimation. XIII Sinape, São Paulo
13. Lawless J (1987) Negative binomial and mixed Poisson regression. Can J Stat 15:209–225
14. Lee Y, Nelder JA (1996) Hierarchical generalized linear models (with discussion). J R Stat Soc Ser
B 58:619–678
15. Lee Y, Nelder JA (2001) Hierarchical generalized linear models: a synthesis of generalized linear
models, random-effect models and structured dispersions. Biometrika 88:987–1006
16. Lee Y, Nelder JA (2003) Extended-REML estimators. J Appl Stat 30:845–856
17. Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika
73:13–22
18. McCullagh P, Nelder JA (1989) Generalized linear models. Chapman & Hall, London
19. Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
20. Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser B 135:370–384
21. Thall PF, Vail SC (1990) Some covariance models for longitudinal count data with overdispersion.
Biometrics 46:657–671
22. Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data. Springer-Verlag,
New York
23. Wolfinger R, O’Connell M (1993) Generalized linear mixed models: a pseudo-likelihood approach.
J Stat Comput Simul 48:233–243
123