arXiv:math/0505600v1 [math.ST] 27 May 2005
The Annals of Statistics
2005, Vol. 33, No. 2, 522–541
DOI: 10.1214/009053604000001255
c Institute of Mathematical Statistics, 2005
ASYMPTOTIC RESULTS WITH GENERALIZED ESTIMATING
EQUATIONS FOR LONGITUDINAL DATA
By R. M. Balan1 and I. Schiopu-Kratina
University of Ottawa and Statistics Canada
We consider the marginal models of Liang and Zeger [Biometrika
73 (1986) 13–22] for the analysis of longitudinal data and we develop
a theory of statistical inference for such models. We prove the existence, weak consistency and asymptotic normality of a sequence of
estimators defined as roots of pseudo-likelihood equations.
1. Introduction. Longitudinal data sets arise in biostatistics and lifetime testing problems when the responses of the individuals are recorded
repeatedly over a period of time. By controlling for individual differences,
longitudinal studies are well-suited to measure change over time. On the
other hand, they require the use of special statistical techniques because the
responses on the same individual tend to be strongly correlated. In a seminal
paper Liang and Zeger (1986) proposed the use of generalized linear models
(GLM) for the analysis of longitudinal data.
In a cross-sectional study, a GLM is used when there are reasons to believe
that each response yi depends on an observable vector xi of covariates [see
the monograph of McCullagh and Nelder (1989)]. Typically this dependence
is specified by an unknown parameter β and a link function µ via the relationship µi (β) = µ(xTi β), where µi (β) is the mean of yi . For one-dimensional
observations, the maximum quasi-likelihood estimator β̂n is defined as the
solution of the equation
(1)
n
X
i=1
µ̇i (β)vi (β)−1 (yi − µi (β)) = 0,
where µ̇i is the derivative of µi and vi (β) is the variance of yi . Note that this
equation simplifies considerably if we assume that vi (β) = φi µ̇(xTi β), with a
Received May 2003; revised March 2004.
Supported by the Natural Sciences and Engineering Research Council of Canada.
AMS 2000 subject classifications. Primary 62F12; secondary 62J12.
Key words and phrases. Generalized estimating equations, generalized linear model,
consistency, asymptotic normality.
1
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Statistics,
2005, Vol. 33, No. 2, 522–541. This reprint differs from the original in pagination
and typographic detail.
1
2
R. M. BALAN AND I. SCHIOPU-KRATINA
nuisance scale parameter φi . In fact (1) is a genuine likelihood equation if
T
T
the yi ’s are independent with densities c(yi , φi ) exp{φ−1
i [(xi β)yi − b(xi β)]}.
In a longitudinal study, the components of an observation yi = (yi1 , . . . , yim )T
represent repeated measurements at times 1, . . . , m for subject i. The approach proposed by Liang and Zeger is to impose the usual assumptions of
a GLM only for the marginal scalar observations yij and the p-dimensional
design vectors xij . If the correlation matrices within individuals are known
(but the entire likelihood is not specified), then the m-dimensional version
of (1) becomes a generalized estimating equation (GEE).
In this article we prove the existence, weak consistency and asymptotic
normality of a sequence of estimators, defined as solutions (roots) of pseudolikelihood equations [see Shao (1999), page 315]. We work within a nonparametric set-up similar to that of Liang and Zeger and build upon the
impressive work of Xie and Yang (2003).
Our approach differs from that of Liang and Zeger (1986), Xie and Yang
(2003) and Schiopu-Kratina (2003) in the treatment of the correlation structure of the data recorded for the same individual across time. As in Rao
(1998), we first obtain a sequence of preliminary consistent estimators (β̃n )n
of the main parameter β0 (under the “working independence assumption”),
which we use to consistently estimate the average of the true individual correlations. We then create the pseudo-likelihood equations whose solutions
provide our final sequence of consistent estimators of the main parameter.
In practice, the analyst would first use numerical approximation methods
(like the Newton–Raphson method) to solve a simple estimating equation,
where each individual correlation matrix is the identity matrix. The next
step would be to solve for β in the pseudo-likelihood equation, in which all
the quantities can be calculated from the data. This approach eliminates the
need to introduce nuisance parameters or to guess at the correlation structures, and thus avoids some of the problems associated with these methods
[see pages 112 and 113 of Fahrmeir and Tutz (1994)]. We note that the asg
sumptions that we require for this two-step procedure [our conditions (AH),
e w )] are only slightly more stringent than those of Xie and Yang
( eIw ), (C
(2003). They reduce to conditions related to the “working independence
assumption” when the average of the true correlation matrices is asymptotically nonsingular [our hypothesis (H)].
As in Lai, Robbins and Wei (1979), where the linear model is treated,
we relax the assumption of independence between subjects and consider
residuals which form a martingale difference sequence. Thus our results are
more general than results published so far, for example, Xie and Yang (2003)
for GEE, and Shao (1992) for GLM.
Since a GEE is not a derivative, most of the technical difficulties surface
when proving the existence of roots of such general estimating equations.
ESTIMATION AND LONGITUDINAL DATA
3
Two distinct methods have been developed to deal with this problem. One
gives a local solution of the GEE and relies on the classical proof of the
inverse function theorem [Yuan and Jennrich (1998) and Schiopu-Kratina
(2003)]. The other method, which uses a result from topology, was first
brought into this context by Chen, Hu and Ying (1999) and was extensively
used by Xie and Yang (2003) in their proof of consistency. We adopt this
second method, which facilitates a comparison of our results to those of Xie
and Yang (2003) and incorporates the inference results for GLM contained
in the seminal work of Fahrmeir and Kaufmann (1985).
This article is organized as follows. Section 2 is dedicated to the existence
and weak consistency of a sequence of estimators of the main parameter.
To accommodate the estimation of the average of the correlation matrices
in the martingale set-up, we require two conditions: (C1) is a boundedness
condition on the (2+δ)-moments of the normalized residuals, whereas (C2) is
a consistency condition on the normalized conditional covariance matrix. In
this context we use the martingale strong law of large numbers of Kaufmann
(1987). Section 3 presents the asymptotic normality of our estimators. This
is obtained under slightly stronger conditions than those of Xie and Yang
(2003), by applying the classical martingale central limit theorem [see Hall
and Heyde (1980)]. For ease of exposition, we have placed the more technical
proofs in the Appendix.
We introduce first some matrix notation [see Schott (1997)]. If A is a
p × p matrix, we will denote with kAk its spectral norm, with det(A) its
determinant and with tr(A) its trace. If A is a symmetric matrix, we denote
by λmin (A)[λmax (A)] its minimum (maximum) eigenvalue. For any matrix
A, kAk = {λmax (AT A)}1/2 . For a p-dimensional vector x, we use the Euclidean norm kxk = (xT x)1/2 = tr(xxT )1/2 . We let A1/2 be the symmetric
square root of a positive definite matrix A and A−1/2 = (A1/2 )−1 . Finally,
we use the matrix notation A ≤ B if λT Aλ ≤ λT Bλ for any p-dimensional
vector λ.
Throughout this article we will assume that the number of longitudinal
observations on each individual is fixed and equal to m. More precisely, we
will denote with yi := (yi1 , . . . , yim )′ , i ≤ n, a longitudinal data set consisting
of n respondents, where the components of yi represent measurements at
different times on subject i. The observations yij are recorded along with
a corresponding p-dimensional vector xij of covariates and the marginal
expectations and variances are specified in terms of the regression parameter
β through θij = xTij β as follows:
(2)
µij (β) := Eβ (yij ) = µ(θij ),
2
(β) := Varβ (yij ) = µ̇(θij ),
σij
where µ is a continuously differentiable link function with µ̇ > 0, that is, we
consider only canonical link functions.
Here are the most commonly used such link functions:
4
R. M. BALAN AND I. SCHIOPU-KRATINA
1.
2.
3.
4.
In the linear regression, µ(y) = y.
In the log regression for count data, µ(y) = exp(y).
In the logistic regression for binary data, µ(y) = exp(y)/[1 + exp(y)].
In the probit regression for binary data, µ(y) = Φ(y), where Φ is the standard normal distribution function; we have Φ̇(y) = (2π)−1/2 exp(−y 2 /2).
In the sequel the unknown parameter β lies in an open set B ⊆ Rp and
β0 is the true value of this parameter. We normally drop the parameter β0
to avoid cumbersome notation.
2 (β), . . . , σ 2 (β)) and
Let µi (β) = (µi1 (β), . . . , µim (β))T , Ai (β) = diag(σi1
im
1/2
1/2
Σi (β) := Covβ (yi ). Note that Σi = Ai R̄i Ai , where R̄i is the true correlation matrix of yi at β0 . Let Xi = (xi1 , . . . , xim )T .
We consider the sequence εi (β) = (εi1 (β), . . . , εim (β))T with εij (β) = yij −
µij (β), and we assume that the residuals (εi )i≥1 form a martingale difference
sequence, that is,
E(εi |Fi−1 ) = 0
for all i ≥ 1,
where Fi is the minimal σ-field with respect to which ε1 , . . . , εi are measurable. This is a natural generalization of the case of independent observations.
Finally, to avoid keeping track of various constants, we agree to denote
with C a generic constant which does not depend on n, but is different from
case to case.
2. Asymptotic existence and consistency. We consider the generalized
estimating equations (GEE) of Xie and Yang (2003) in the case when the
“working” correlation matrices are Rindep
= I for all i. This is also known as
i
the “working independence” case, the word “independence” referring to the
observations on the same individual. Let (β̃n )n be a sequence of estimators
such that
P (gnindep (β̃n ) = 0) → 1 and
(3)
P
P
β̃n → β0 ,
where gnindep (β) = ni=1 XTi εi (β) is the “working independence” GEE.
The following quantities have been used extensively in the work of Xie and
Yang (2003) and play an important role in the conditions for the existence
and consistency of β̃n :
Hindep
=
n
n
X
XTi Ai Xi ,
πnindep :=
i=1
maxi≤n λmax ((Rindep
)−1 )
i
mini≤n λmin ((Rindep
)−1 )
i
τ̃nindep := m max λmax ((Rindep
)−1 ) = m,
i
i≤n
(γn(0) )indep := max xTij (Hindep
)−1 xij .
n
i≤n,j≤m
= 1,
5
ESTIMATION AND LONGITUDINAL DATA
We will also use the following maxima:
(3)
µ (θij ) [3]
.
kn (β) = max max i≤n j≤m
µ̇(θij ) [l],indep
= supβ∈B indep (r) kn (β), l = 2, 3, are bounded,
µ̈(θij ) ,
kn[2] (β) = max max i≤n j≤m µ̇(θij ) The fact that the residuals (εi )i≥1 form a martingale difference sequence
does not change the proofs of Theorem 2 and Theorem A.1(ii) of Xie and
Yang (2003). Following their work, we conclude that the sufficient conditions
for the existence of a sequence (β̃n )n with the desired property (3) are:
(AH)indep for any r > 0, kn
[l]
n
) → ∞,
(I∗w )indep λmin (Hindep
n
(0) indep
∗
indep
1/2
(Cw )
n (γn )
→ 0,
)1/2 (β −β0 )k ≤ m1/2 r}. We denote by (C)indep
(r) := {β; k(Hindep
where Bindep
n
n
indep
the set of conditions (AH)
, (I∗w )indep , (C∗w )indep .
It turns out that, in practice, the analyst will have to verify conditions
similar to (C)indep in order to produce the estimators that we propose (see
Remark 5). All the classical examples corresponding to our link functions
1–4 are within the scope of our theory. We present below two new examples.
Example 1. Suppose that p = 2. Let xij = (aij , bij )T , un =
P
2
2 b2 and w = P
vn = i≤n,j≤m σij
n
i≤n,j≤m σij aij bij . In this case
ij
Hindep
n
u
= n
wn
P
2 2
i≤n,j≤m σij aij ,
wn
,
vn
indep ) = (u + v − d )/2, with
indep ) = (u + v + d )/2 and λ
λmax (H
n
n
n
n
n
n
min (Hn
pn
√
dn := (un − vn )2 + 4wn2 . Note that wn = un vn cos θn for θn ∈ [0, π] and
det(Hindep
) = un vn sin2 θn [see also page 79 of McCullagh and Nelder (1989)].
n
Suppose that
α := lim inf sin2 θn > 0.
n→∞
Since
1
λmin (Hindep
)
n
=
)
λmax (Hindep
n
det(Hindep
)
n
=
un + vn + dn
,
un vn sin2 θn
one can show that condition (I∗w )indep is equivalent to min(un , vn ) → ∞. On
the other hand,
xTij (Hindep
)−1 xij
n
a2ij
aij
aij bij b2ij
bij
− 2wn
+
≤
=
+ 1/2
1/2
un
un vn
vn
un
vn
−1/2
Condition (C∗w )indep holds if n1/2 maxi≤n,j≤m(un
−1/2
aij + vn
2
.
bij )2 → 0.
6
R. M. BALAN AND I. SCHIOPU-KRATINA
Example 2. The case of a single covariate with p different levels [oneway ANOVA; see also Example 3.13 of Shao (1999)] is usually treated
by identifying each of these levels with one of the p-dimensional vectors
e1 , . . . , ep , where ek has the kth component 1 and all the other components
0. We can say that xij ∈ {e1 , . . . , ep } for all i ≤ n, j ≤ m. In this case, Hindep
n
is a diagonal matrix. More precisely,
Hindep
n
=
p
X
νn(k) ek eTk ,
k=1
(k)
νn
P
(k)
∗ indep is
2 . Let ν = min
where
= i≤n,j≤m;xij =ek σij
n
k≤p νn . Condition (Iw )
equivalent to νn → ∞ and condition (C∗w )indep is equivalent to n1/2 νn−1 → 0.
The method introduced by Liang and Zeger (1986) and developed recently
in Xie and Yang (2003) relies heavily on the “working” correlation matrices
Ri (α) which are chosen arbitrarily by the statistician (possibly containing a
nuisance parameter α) and are expected to be good approximations of the
unknown true correlation matrices R̄i .
In the present paper, we consider an alternative approach in which at
each step n, the “working” correlation matrices Ri (α), i ≤ n, are replaced
by the random matrix
e n :=
R
n
1X
Ai (β̃n )−1/2 εi (β̃n )εi (β̃n )T Ai (β̃n )−1/2
n i=1
which depends only on the data set and is shown to be a (possibly biased)
consistent estimator of the average of the true correlation matrices
n
¯ := 1 X R̄ .
R̄
i
n
n i=1
e n is obtained under the following two conditions imThe consistency of R
−1/2
εi , with E(yi∗ yi∗T ) = R̄i :
posed on the (normalized) residuals yi∗ = Ai
(C1) there exists a δ ∈ (0, 2] such that supi≥1 E(kyi∗ k2+δ ) < ∞,
(C2)
1
n
Pn
i=1 Vi
P
→ 0, where Vi = E(yi∗ yi∗T |Fi−1 ) − R̄i .
Remark 1. Condition (C1) is a bounded moment requirement which
is usually needed for verifying the conditions of a martingale limit theorem, while condition (C2) is satisfied if the observations are independent.
Condition (C2) is in fact
a requirement on the (normalized) conditional coP
variance matrix Vn = ni=1 E(yi∗ yi∗T |Fi−1 ). More precisely, if the following
hypothesis holds true:
¯ ) ≥ C for all n,
(H) there exists a constant C > 0 such that λ (R̄
min
n
7
ESTIMATION AND LONGITUDINAL DATA
P
¯ −1/2 (V /n)R̄
¯ −1/2 − I →
then condition (C2) is equivalent to R̄
0 [which is
n
n
n
similar to (3.1) of Hall and Heyde (1980) or (4.2) of Shao (1992)]. Note
that (H) is implied by the following stronger hypothesis, which is needed in
Section 3:
(H′ ) There exists a constant C > 0 such that λmin (R̄i ) ≥ C for all i.
Hypothesis (H′ ) is satisfied if R̄i = R̄ for all i, where R̄ is nonsingular.
The following result is essential for all our developments.
Theorem 1.
(C2), we have
e n ). Under conditions (C)indep , (C1) and
Let Rn = E(R
L1
e n − Rn → 0
R
(elementwise).
a.s.
e n − Rn → 0
If the convergence in condition (C2) is almost sure, then R
¯ .
(elementwise). The same conclusion holds if Rn is replaced by R̄
n
P
−1/2
−1/2
b n) =
b n = n−1 n A
and note that E(R
εi εTi Ai
Proof. Let R
i=1 i
¯
R̄n . Our result will be a consequence of the following two propositions,
whose proofs are given in Appendix A. Proposition 1.
Under conditions (C1) and (C2), we have
Proposition 2.
L1
¯ →
b n − R̄
0
R
n
(elementwise).
Under conditions (C)indep , (C1) and (C2), we have
L1
en − R
bn → 0
R
(elementwise).
In what follows we will assume that the inverse of the (nonnegative defie n exists with probability 1, for every n. We consider
nite) random matrix R
the following pseudo-likelihood equation:
n
X
(4)
i=1
e i,n (β)−1 εi (β) = 0,
Di (β)T V
e n Ai (β)1/2 . Note that (4)
e i,n (β) := Ai (β)1/2 R
where Di (β) = Ai (β)Xi and V
can be written as
g̃n (β) :=
n
X
i=1
e −1 Ai (β)−1/2 εi (β) = 0.
XTi Ai (β)1/2 R
n
8
R. M. BALAN AND I. SCHIOPU-KRATINA
We consider also the estimating function
gn (β) =
n
X
−1/2
XTi Ai (β)1/2 R−1
εi (β).
n Ai (β)
i=1
P
1/2
1/2
−1
Note that Mn := Cov(gn ) = ni=1 XTi Ai R−1
n R̄i Rn Ai Xi .
As in Xie and Yang (2003), we introduce the following quantities:
Hn :=
n
X
1/2
1/2
XTi Ai R−1
n Ai Xi ,
πn :=
i=1
λmax (R−1
n )
−1 ,
λmin (Rn )
τ̃n := mλmax (R−1
n ),
γn(0) := max
max (xTij H−1
n xij ),
i=1,...,n j=1,...,m
γ̃n = τ̃n γn(0) .
Remark 2. A few comments about τ̃n are worth mentioning. First,
(n)
¯(n)
Mn ≤ τn Hn , where τn := maxi≤n λmax (R−1
n R̄i ) ≤ τ̃n . Also, since rjk − r̄jk →
(n)
(n)
0 and |¯r̄ jk | ≤ 1, we can assume that |rjk | ≤ 2, for n large enough (here
(n) (n)
¯ ). Therefore τ̃ ≥ 1/2.
r , ¯r̄ are the elements of the matrices R , resp. R̄
jk
n
jk
n
n
The reason why we prefer to work with τ̃n instead of τn will become apparent in the proof of Proposition 3 (given in Appendix A.2). Another reason
is, of course, the fact that τ̃n does not depend on the unknown matrices R̄i .
Our approach requires a slight modification of the conditions introduced
by Xie and Yang (2003) to accommodate the use of τ̃n instead of τn . Let
1/2 r}. Our conditions are:
en (r) := {β; kH1/2
B
n (β − β0 )k ≤ (τ̃n )
[l]
g for any r > 0, k̃n = sup
(AH)
e
β∈B
n (r)
[l]
kn (β), l = 2, 3, are bounded,
( eIw ) (τ̃n )−1 λmin (Hn ) → ∞,
e w ) (πn )2 γ̃n → 0, and n1/2 πn γ̃n → 0.
(C
Remark 3. Note that ( eIw ) implies (I∗w )indep , which implies λmin (Hn ) →
∞. This follows from the inequalities
τ̃n
1 indep
indep
indep
= Hindep
≤ Hn ≤ λmax (R−1
≤ λmin (R−1
.
H
n ) · Hn
n ) · Hn
2m n
m n
Remark 4. Our conditions depend on the matrix Rn , which cannot be
P
e n − Rn →
written in a closed form. Since R
0, it is desirable to express our
e n . In practice, if the sample size is large
conditions in terms of the matrix R
g ( eIw ), (C
e w ) by using R
en
enough, one may choose to verify conditions ( AH),
(instead of Rn ) in the definitions of Hn , πn , γ̃n .
9
ESTIMATION AND LONGITUDINAL DATA
Remark 5.
If we suppose that hypothesis (H) holds, then for n large
C
≤ λmin (Rn ) ≤ λmax (Rn ) ≤ 2m.
2
(0)
(0)
(0)
In this case (τ̃n )n and (πn )n are bounded, C ′ (γn )indep ≤ γn ≤ C(γn )indep ,
en (r) ⊆ B indep (r ′ ). Thereand for every r > 0 there exists r ′ > 0 such that B
n
g ( eIw ), (C
e w ) are equivalent to (AH)indep , (I∗ )indep , (C∗ )indep ,
fore, conditions (AH),
w
w
respectively. In order to verify (H), it is sufficient to check that there exists
a constant C > 0 such that
e n) ≥ C
det(R
for all n a.s.
under the hypothesis of Theorem 1.
We need to consider the derivatives
e n (β) := −
D
∂g̃n (β)
,
∂β T
Dn (β) := −
∂gn (β)
.
∂β T
The next theorem is a modified version of Theorem A.2, respectively, Theorem A.1(ii) of Xie and Yang (2003).
Theorem 2.
g and (C
e w ):
Under conditions (AH)
(i) for every r > 0
P
sup kH−1/2
Dn (β)H−1/2
− Ik → 0;
n
n
en (r)
β∈B
(ii) there exists c0 > 0 such that for every r > 0
en (r)) → 1.
P (Dn (β) ≥ c0 Hn for all β ∈ B
Proof. (i) The first two terms produced by the decomposition Dn (β) =
Hn (β) + Bn (β) + En (β) are shown
to be bounded by πn2 γ̃n , whereas the
√
2
third term is bounded in L by nπn γ̃n . [Here Hn (β), Bn (β), En (β) have the
same expressions as those given in Xie and Yang (2003) with Ri (α), i ≤ n,
replaced by Rn .] The arguments are essentially the same as those used in
Lemmas A.1(ii), A.2(ii) and A.3(ii) of Xie and Yang (2003). The fact that we
are replacing the “working” correlation matrices Ri (α), i = 1, . . . , n, with the
matrix Rn and we assume that (εi )i≥1 is a martingale difference sequence
does not influence the proof. Finally we note that (ii) is a consequence of (i).
The next two results are intermediate steps that are used in the proof of
our main result. Their proofs are given in Appendix A.2.
10
R. M. BALAN AND I. SCHIOPU-KRATINA
Proposition 3.
Suppose that the conditions of Theorem 1 hold. Then
P
(τ̃n )−1/2 Hn−1/2 (g̃n − gn ) → 0.
Proposition 4. Suppose that the conditions of Theorem 1 hold. Under
g and (C
e w ),
conditions (AH)
P
e n (β) − Dn (β)]H−1/2 k → 0.
sup kHn−1/2 [D
n
en (r)
β∈B
The next theorem is our main result. It shows that under our slightly modig ( eIw ), (C
e w ) and the additional conditions of Theorem 1,
fied conditions (AH),
one can obtain a solution β̂n of the pseudo-likelihood equation g̃n (β) = 0,
which is also a consistent estimator of β0 .
Theorem 3. Suppose that the conditions of Theorem 1 hold. Under
g ( eIw ) and (C
e w ), there exists a sequence (β̂n )n of random
conditions (AH),
variables such that
P (g̃n (β̂n ) = 0) → 1
and
P
β̂n → β0 .
q
Proof. Let ε > 0 be arbitrary and r = r(ε) = (24p)/(c21 ε), where c1 is
a constant to be specified later. We consider the events
e
En := kHn−1/2 g̃n k ≤
inf
en (r)
β∈∂ B
kH−1/2
(g̃n (β) − g̃n )k
n
,
e n := {D
e n (β̄) nonsingular, for all β̄ ∈ B
en (r)}.
Ω
By Lemma A of Chen, Hu and Ying (1999), it follows that on the event
en ∩ Ω
e n , there exists β̂n ∈ B
en (r) such that g̃n (β̂n ) = 0. Therefore, it remains
E
e
e
to prove that P (En ∩ Ωn ) > 1 − ε for n large.
By Taylor’s formula and Lemma 1 of Xie and Yang (2003) we obtain that
en (r) there exist β̄ ∈ B
en (r) and a p × 1 vector λ, kλk = 1 such
for any β ∈ ∂ B
that
kHn−1/2 (g̃n (β) − g̃n )k
e n (β̄)H−1/2 λ| · r(τ̃n )1/2
≥ |λT Hn−1/2 D
n
≥ {|λT Hn−1/2 Dn (β̄)H−1/2
λ|
n
e n (β̄) − Dn (β̄)]H−1/2 λ|} · r(τ̃n )1/2 .
− |λT Hn−1/2 [D
n
By Theorem 2(ii) there exists c0 > 0 such that
(5)
P (λT H−1/2
Dn (β)H−1/2
λ ≥ c0
n
n
en (r), for all λ, kλk = 1) > 1 − ε/6
for all β ∈ B
ESTIMATION AND LONGITUDINAL DATA
11
when n is large. Let c′0 ∈ (0, c0 ) be arbitrary. By Proposition 4,
e n (β) − Dn (β)]H−1/2 λ| ≤ c′
P (|λT Hn−1/2 [D
n
0
(6)
en (r), for all λ) > 1 − ε/6
for all β ∈ B
when n is large. Therefore, if we put c1 := c0 − c′0 , we have
(7)
P
inf
en (r)
β∈∂ B
kHn−1/2 (g̃n (β) − g̃n )k ≥ c1 r(τ̃n )1/2 > 1 − ε/3.
e n ) > 1 − ε/3 for n large.
From (5) and (6) we can also conclude that P (Ω
On the other hand, by Chebyshev’s inequality and our choice of r, we
−1/2
have P (kHn gn k ≤ c1 r(τ̃n )1/2 /2) > 1 − ε/6 for all n. By Proposition 3,
−1/2
P (kHn (g̃n − gn )k ≤ c1 r(τ̃n )1/2 /2) > 1 − ε/6 for n large. Hence
P (kHn−1/2 g̃n k ≤ c1 r(τ̃n )1/2 ) > 1 − ε/3.
(8)
en ) > 1 − (2ε)/3 for n large. This conFrom (7) and (8) we obtain that P (E
cludes the proof of the asymptotic existence.
We proceed now with the proof of the weak consistency. Let δ > 0 be
arbitrary. By ( eIw ) we have τ̃n /λmin (Hn ) < (δ/r)2 for n large. We know
en ∩ Ω
e n , there exists β̂n ∈ B
en (r) such that g̃n (β̂n ) = 0.
that on the event E
Therefore, on this event
−1/2
kβ̂n − β0 k ≤ kH−1/2
k · kH1/2
· (τ̃n )1/2 r < δ
n
n (β̂n − β0 )k ≤ [λmin (Hn )]
for n large. This proves that P (kβ̂n − β0 k ≤ δ) > 1 − ε for n large. 3. Asymptotic normality. Let cn = λmax (M−1
n Hn ). In this section we
will suppose that (cn τ̃n )n is bounded.
Theorem 4.
Under the conditions of Theorem 3,
Hn (β̂n − β0 ) + oP (1).
Mn−1/2 g̃n = M−1/2
n
en (r)}, we have g̃n = D
e n (β̄n )(β̂n −
Proof. On the set {g̃n (β̂n ) = 0, β̂n ∈ B
−1/2
e
β0 ) for some β̄n ∈ Bn (r) by Taylor’s formula. Multiplication with Mn
yields
1/2
−1/2
Mn−1/2 g̃n = Mn−1/2 H1/2
Hn (β̂n − β0 ),
n An Hn (β̂n − β0 ) + Mn
−1/2
−1/2
e n (β̄n )Hn
where An := Hn D
− I = oP (1), by Theorem 2(i) and Propo−1/2 1/2
1/2
1/2
sition 4. The result follows since kMn Hn k ≤ cn and kHn (β̂n − β0 )k ≤
(τ̃n )1/2 r.
12
R. M. BALAN AND I. SCHIOPU-KRATINA
(D)
1/2
−1/2
1/2
−1/2
Let γn := max1≤i≤n λmax (Hn XTi Ai R−1
). Note that
n Ai Xi Hn
(D)
2
γn ≤ Cdn γ̃n , where dn = maxi≤n,j≤m σij . We consider the following conditions:
e δ ) there exists a δ > 0 such that:
(N
(i) Y := supi≥1 E(kyi∗ k2+δ |Fi−1 ) < ∞ a.s.;
(D)
(ii) (cn τ̃n )1+2/δ γn → 0,
P
(C2)′ maxi≤n λmax (Vi ) → 0.
e δ )(i), with Y integrable, implies conRemark 6. Note that condition (N
′
dition (C1), whereas condition (C2) is a stronger form of (C2). Part (ii) of
e ) was introduced by Xie and Yang (2003).
condition (N
δ
The following result gives the asymptotic distribution of g̃n .
Lemma 1. Suppose that the conditions of Theorem 1 hold. Under cone δ ), (C2)′ and (H′ )
ditions (N
d
g̃n → N (0, I).
M−1/2
n
Proof. We note that
Mn−1/2 g̃n = M−1/2
gn + M−1/2
(g̃n − gn )
n
n
−1/2
P
−1/2
and kMn (g̃n − gn )k ≤ (cn τ̃n )1/2 k(τ̃n )−1/2 Hn (g̃n − gn )k → 0, by Propod
−1/2
sition 3. Therefore it is enough to prove that Mn gn → N (0, I). By the
Cramér–Wold theorem, this is equivalent to showing that: ∀ λ, kλk = 1
λT M−1/2
gn =
n
(9)
n
X
i=1
−1/2
1/2
d
Zn,i → N (0, 1),
−1/2
εi . Note that E(Zn,i |Fi−1 ) = 0 for
where Zn,i = λT Mn XTi Ai R−1
n Ai
all i ≤ n, that is, {Zn,i ; i ≤ n, n ≥ 1} is a martingale difference array.
Relationship (9) follows by the martingale central limit theorem with the
Lindeberg condition [see Corollary 3.1 of Hall and Heyde (1980)] if
(10)
n
X
i=1
2
E[Zn,i
I(|Zn,i | > ε)|Fi−1 ] → 0
and
(11)
n
X
i=1
P
2
E(Zn,i
|Fi−1 ) → 1.
a.s.
ESTIMATION AND LONGITUDINAL DATA
13
e δ ) exactly as in Lemma 2 of Xie
Relationship (10) follows from condition (N
δ/2
and Yang (2003) with ψ(t) = t . Relationship (11) follows from conditions
(C2)′ and (H′ ):
n
X
i=1
2
E(Zn,i
|Fi−1 ) − 1
=
n
X
i=1
=
n
X
i=1
2
2
[E(Zn,i
|Fi−1 ) − E(Zn,i
)]
1/2
1/2
−1
−1/2
λT M−1/2
XTi Ai R−1
λ
n
n Vi Rn Ai Xi Mn
T
−1/2
Mn M−1/2
λ
≤ max λmax (Vi ) · max λmax (R̄−1
n
i ) · λ Mn
1≤i≤n
1≤i≤n
P
≤ C −1 max λmax (Vi ) → 0.
i≤n
Putting together the results in Theorem 4 and Lemma 1, we obtain the
asymptotic normality of the estimator β̂n .
Theorem 5.
(C2)′ and (H′ ),
e ),
Under the conditions of Theorem 3 and conditions (N
δ
d
M−1/2
Hn (β̂n − β0 ) → N (0, I).
n
Remark 7. In applications we would need a version of Theorem 5 where
Mn is replaced by a consistent estimator. We suggest the estimator proposed
by Liang and Zeger (1986) [see also Remark 8 of Xie and Yang (2003)]. The
details of the proof are omitted.
APPENDIX
A.1. The following lemma is a consequence of Kaufmann’s (1987) martingale strong law of large numbers and can be viewed as a stronger version
of Theorem 2.19 of Hall and Heyde (1980).
Lemma A.1. Let (xi )i≥1 be a sequence of random variables and let
(Fi )i≥1 be a sequence of increasing σ-fields such that xi is Fi -measurable
for every i ≥ 1. Suppose that supi E|xi |α < ∞ for some α ∈ (1, 2]. Then
n
1X
(xi − E(xi |Fi−1 )) → 0
n i=1
a.s. and in Lα .
14
R. M. BALAN AND I. SCHIOPU-KRATINA
Proof. Note that yi = xi − E(xi |Fi−1 ), n ≥ 1, is a martingale difference
sequence. By the conditional Jensen inequality
|yi |α ≤ 2α−1 {|xi |α + |E(xi |Fi−1 )|α } ≤ 2α−1 {|xi |α + E(|xi |α |Fi−1 )}
and supi≥1 E|yi |α ≤ 2α supi≥1 E|xi |α < ∞. Hence
X E|yi |α
i≥1
iα
≤ sup E|yi |α ·
i≥1
X 1
i≥1
iα
< ∞.
The lemma follows by Theorem 2 of Kaufmann (1987) with p = 1, Bi = i−1 .
(n) (n) (n)
Proof of Proposition 1. We denote by r̂jk , ¯r̄ jk , vjk (j, k = 1, . . . , m)
¯ , V , respectively. We write
b , R̄
the elements of the matrices R
n
(n)
r̂jk
(12)
(n)
−¯
r̄ jk =
n
n
n
X
n
1X
1
(i)
∗ ∗
∗ ∗
(y y − E(yij yik |Fi−1 )) +
v .
n i=1 ij ik
n i=1 jk
The first term converges to zero almost surely and in L1+δ/2 by applying
∗ y ∗ , and using condition (C1). The second term
Lemma A.1 with xi = yij
ik
converges to zero in probability by condition (C2). This convergence is also
P
(i)
in L1+δ/2 because the sequence {n−1 ni=1 vjk }n has uniformly bounded moments of order 1 + δ/2 and hence is uniformly integrable. (n)
Proof of Proposition 2. We denote by r̃jk (j, k = 1, . . . , m) the ele n . Let δ̃i,jk := [σij σik ]/[σij (β̃n )σik (β̃n )] − 1, ∆µ
e ij :=
ements of the matrix R
µij (β̃n ) − µij (β0 ) and
e ij εik ) := εij (β̃n )εik (β̃n ) − εij εik = (∆µ
e ij )(∆µ
e ik ) − (∆µ
e ij )εik − (∆µ
e ik )εij .
∆(ε
With this notation, we have
(n)
(n)
r̃jk − r̂jk =
=
n
n
1X
εij (β̃n )εik (β̃n )
εij εik
1X
−
n i=1 σij (β̃n )σik (β̃n ) n i=1 σij σik
n e
n e
n
εij εik
∆(εij εik ) 1 X
∆(εij εik )
1X
1X
+
δ̃i,jk +
δ̃i,jk .
n i=1 σij σik
n i=1 σij σik
n i=1 σij σik
From here, we conclude that
(n)
|r̃jk
(n)
− r̂jk | ≤ Un,jk
(
)
n
1X
|y ∗ y ∗ | ,
+ max |δ̃i,jk | · Un,jk +
i≤n
n i=1 ij ik
15
ESTIMATION AND LONGITUDINAL DATA
where
Un,jk :=
n
n
n
e ij | · |∆µ
e
e ij |
e
|∆µ
|∆µ
|∆µ
1X
1X
1X
ik |
ik |
∗
∗
+
· |yij
|
· |yik
|+
n i=1
σij σik
n i=1 σij
n i=1 σik
[3]
[2]
[1]
= Un,jk + Un,jk + Un,jk .
Recall that our estimator β̃n was obtained in the proof of Theorem 2 of
Xie and Yang (2003) as a solution of the GEE in the case when all the
“working” correlation matrices are Rindep
= I. One of the consequences of
i
the result of Xie and Yang is that for every fixed ε > 0, there exist r = rε
and N = Nε such that, if we denote Ωn,ε = {β̃n lies in Bnindep (r)}, then
P (Ωn,ε ) ≥ 1 − ε
for all n ≥ N.
We define β̃n to be equal to β0 on the event Ωcn,ε . Therefore,
on Ωcn,ε : max |δ̃i,jk | = 0
i≤n
e ij = 0.
and ∆µ
Using Taylor’s formula and condition (AH)indep , we can conclude that on
the event Ωn,ε , there exists a constant C = Cε such that
µ̇(xTij β0 )
− 1 ≤ C · (γn0 )indep · (m1/2 r)
|δ̃i,jk | = T
µ̇(xij β̃n )
(
for all i ≤ n,
)
n
n 2
e ij )2
X
σij (β̄n ) 2 2
(∆µ
1X
−1
T
σij xij xTij (β̃n − β0 )
=
n
(
β̃
−
β
)
n
0
2
2
n i=1 σij
σ
ij
i=1
≤n
−1
T
(β̃n − β0 )
(
n
X
1/2
2 1/2
XTi Ai [Ai (β̄n )A−1
i ] Ai Xi
i=1
)
(β̃n − β0 )
indep 1/2
≤ n−1 max λ2max [Ai (β̄n )A−1
) (β̃n − β0 )k2
i ] · k(Hn
i≤n
≤ Cn−1 (m2 r).
P
1
(n) L
∗ )2 ] = E[r̂ (n) ] = O(1) since r̂ (n) − ¯
r̄jj → 0 and
Note also that E[n−1 ni=1 (yij
jj
jj
(n)
¯r̄ jj
= 1. Applying the Cauchy–Schwarz inequality to each of the three sums
that form Un,jk , we can conclude that
[1]
E[Un,jk ] → 0
[l]
and E[(Un,jk )2 ] → 0,
l = 2, 3.
On the other hand,
E max |δ̃i,jk | · Un,jk =
i≤n
Z
max |δ̃i,jk | · Un,jk dP
Ωn,ε i≤n
16
R. M. BALAN AND I. SCHIOPU-KRATINA
#
"
≤ C(γn(0) )indep
Z
Ω
Un,jk → 0,
n
1X
(n)
(n)
|y ∗ y ∗ | ≤ C(γn(0) )indep [E(r̂jj )]1/2 [E(r̂kk )]1/2 → 0.
E max |δ̃i,jk | ·
i≤n
n i=1 ij ik
A.2.
Proof of Proposition 3.
(n)
(n)
(0)
2 /σ 2 ]1/2 , R
e −1 := Q
e n = (q̃ )
Let hijk = [σij
n
ik
jk j,k=1,...,m
and R−1
n := Qn = (qjk )j,k=1,...,m . With this notation, we write
(τ̃n )−1/2 H−1/2
(g̃n − gn )
n
m
X
(n)
=
(q̃jk
j,k=1
(n)
− qjk ) ·
(
−1/2
(τ̃n )
H−1/2
n
n
X
(0)
hijk xij εik
i=1
)
.
(n) P
(n)
By Theorem 1, q̃jk − qjk → 0 for every j, k. The result will follow once we
−1/2 P
(0)
n
2
prove that {(τ̃n )−1/2 Hn
i=1 hijk xij εik }n is bounded in L for every j, k.
Since (εik )i≥1 is a martingale difference sequence, we have
2 !
n
X
(0)
h
x
ε
E (τ̃n )−1/2 H−1/2
ij
ik
n
ijk
i=1
(
n
X
(0) 2
−1
= (τ̃n )
tr
−1
= (τ̃n )
−1
because
Pn
≤ (τ̃n )
T
2
i=1 σij xij xij
tr
H−1/2
n
(hijk )
2
σik
xij xTij
i=1
(
H−1/2
n
n
X
2
σij
xij xTij
i=1
!
!
Hn−1/2
Hn−1/2
)
)
(4mτ̃n ) tr(I) = 4mp
≤
Pn
T
i=1 Xi Ai Xi
Proof of Proposition 4.
≤ λmax (Rn )Hn ≤ 4mτ̃n Hn . We write
Dn (β) = Hn (β) + Bn (β) + En (β),
e n (β) = H
e n (β) + Ben (β) + Een (β),
D
e n (β), Ben (β), Een (β) have the same expressions as Hn (β), Bn (β), En (β),
where H
e n . Our result will follow by the following three lemwith Rn replaced by R
mas.
17
ESTIMATION AND LONGITUDINAL DATA
g holds. If (πn γ̃n )n is bounded,
Lemma A.2. Suppose that condition (AH)
then for any r > 0 and for any p × 1 vector λ with kλk = 1,
P
e n (β) − Hn (β)]H−1/2 λ| → 0.
sup |λT H−1/2
[H
n
n
en (r)
β∈B
g holds. If (π 2 γ̃n )n is bounded,
Lemma A.3. Suppose that condition (AH)
n
then for any r > 0 and for any p × 1 vector λ with kλk = 1,
P
λ| → 0.
sup |λT H−1/2
[Ben (β) − Bn (β)]H−1/2
n
n
en (r)
β∈B
g holds. If (n1/2 γ̃n )n is bounded,
Lemma A.4. Suppose that condition (AH)
then for any r > 0 and for any p × 1 vector λ with kλk = 1,
P
sup |λT H−1/2
[Een (β) − En (β)]H−1/2
λ| → 0.
n
n
en (r)
β∈B
Proof of Lemma A.2.
(n)
Using Theorem 1 and the fact that |rjk | ≤ 2
1/2
1/2
1/2 P
1/2
e −1 Rn − I = Rn (R
e −1 − R−1 )Rn → 0
for n large, we have An = Rn R
n
n
n
(elementwise). For every β,
e n (β) − Hn (β)]H−1/2 λ|
|λT Hn−1/2 [H
n
n
X
T −1/2 T
1/2 −1/2
−1/2
1/2
−1/2 =
λ Hn Xi Ai (β) Rn An Rn Ai (β) Xi Hn λ
i=1
≤ max{|λmax (An )|, |λmin (An )|} · {λT H−1/2
Hn (β)H−1/2
λ}.
n
n
en (r)
The result follows, since one can show that for every β ∈ B
λ − 1|
|λT Hn−1/2 Hn (β)H−1/2
n
(13)
−1/2
−1/2
H[2]
λ|
≤ λT H−1/2
H[1]
λ + 2|λT H−1/2
n
n (β)Hn
n
n (β)Hn
≤ Cπn γ̃n + 2C(πn γ̃n )1/2 ≤ C,
where
H[1]
n (β) =
n
X
i=1
H[2]
n (β) =
n
X
i=1
1/2
1/2
1/2
1/2
1/2
1/2
XTi (Ai (β) − Ai )R−1
n (Ai (β) − Ai )Xi ,
1/2
XTi (Ai (β) − Ai )R−1
n Ai Xi .
18
R. M. BALAN AND I. SCHIOPU-KRATINA
We used the fact that supβ∈Be
1/2
n (r)
−1/2
maxi≤n λmax {(Ai (β)Ai
− I)2 } ≤ C γ̃n ,
g as in Lemma B.1(ii) of Xie and Yang (2003).
which follows by condition (AH)
[1]
−1/2
−1/2
Proof of Lemma A.3. Let wi,n (β)T = λT Hn XTi Gi (β)×diag{Xi Hn
−1/2
and zi,n (β) = Rn Ai (β)−1/2 (µi − µi (β)). We have
−1/2
|λT H−1/2
[Ben[1] (β) − B[1]
λ|
n
n (β)]Hn
n
X
T
wi,n (β) An zi,n (β)
=
i=1
(
≤ kAn k
n
X
i=1
2
kwi,n (β)k
)1/2 (
n
X
i=1
2
kzi,n (β)k
)1/2
by using the Cauchy–Schwarz inequality. Methods similar to those developed
in the proof of Lemma A.2(ii) of Xie and Yang (2003) show that for any β ∈
en (r), Pn kwi,n (β)k2 ≤ Cπn γn(0) and Pn kzi,n (β)k2 ≤ Cπn τ̃n λmax (H−1/2
Hn (β̄)×
B
n
i=1
i=1
−1/2
Hn ) ≤ Cπn τ̃n [using (13) for the last inequality]. Hence
P
−1/2
sup |λT H−1/2
[Ben[1] (β) − B[1]
λ| ≤ CkAn kπn (γ̃n )1/2 → 0.
n
n (β)]Hn
en (r)
β∈B
−1/2
−1/2
Let vi,n (β)T = λT Hn XTi Ai (β)1/2 Rn
1/2
Ai (β)1/2 Rn . We have
−1/2
An Rn
−1/2
diag{Xi Hn
[2]
λ}Gi (β)×
−1/2
|λT H−1/2
[Ben[2] (β) − B[2]
λ|
n
n (β)]Hn
( n
n
)1/2 ( n
)1/2
X
X
X
vi,n (β)T zi,n (β) ≤
=
kvi,n (β)k2
kzi,n (β)k2
.
i=1
i=1
en (r), Pn
One can prove that for any β ∈ B
−1/2
−1/2
{λT Hn Hn (β)Hn
ity]. Hence
λ} ≤
i=1
(0)
(β)k2
i=1 kvi,n
(0)
Cπn γn kAn k2 [using
≤ Cπn γn λmax (A2n )×
(13) for the last inequalP
−1/2
sup |λT Hn−1/2 [Ben[2] (β) − B[2]
λ| ≤ CkAn kπn (γ̃n )1/2 → 0.
n (β)]Hn
en (r)
β∈B
[1]
[1]
Proof of Lemma A.4. We write Een (β) − En (β) = [Een (β) − En (β)] +
and we use a decomposition which is similar to that given
in the proof of Lemma A.3(ii) of Xie and Yang (2003). More precisely, we
write
[2]
[2]
[Een (β) − En (β)]
λT Hn−1/2 [Een[1] (β) − En[1] (β)]H−1/2
λ = Tn[1] + Tn[3] (β) + Tn[5] (β),
n
λT Hn−1/2 [Een[2] (β) − En[2] (β)]H−1/2
λ = Tn[2] + Tn[4] (β) + Tn[6] (β),
n
−1/2
λ}Rn
19
ESTIMATION AND LONGITUDINAL DATA
Pm
[l]
(n)
j,k=1 (q̃jk
where Tn (β) =
[1]
Sn,jk = λT H−1/2
n
n
X
(n)
[l]
− qjk ) · Sn,jk (β) for l = 1, . . . , 6 and
[1]
(0)
−1/2
Gi ]j [diag{Xi H−1/2
λ}]j hijk xij εik ,
n
−1/2
Gi (β)]j [diag{Xi Hn−1/2 λ}]j [Ai (β)−1/2 Ai
[Ai
i=1
[3]
Sn,jk (β) =
λ
T
H−1/2
n
n
X
[Ai
[1]
1/2
i=1
− I]k
(0)
× hijk xij εik ,
[5]
Sn,jk (β) = λT H−1/2
n
n
X
−1/2
[Ai
i=1
[2]
Sn,jk
T
=λ
H−1/2
n
n
X
[1]
[1]
(0)
(Gi (β) − Gi )]j [diag{Xi H−1/2
λ}]j hijk xij εik ,
n
[2]
1/2
(0)
[diag{Xi H−1/2
λ}]k [Gi Ai ]k hijk xij εik ,
n
i=1
[4]
Sn,jk (β) = λT H−1/2
n
n
X
−1/2
[Ai
i=1
[2]
1/2
Ai (β)1/2 − I]j [diag{Xi Hn−1/2 λ}]k [Gi (β)Ai ]k
(0)
× hijk xij εik ,
[6]
Sn,jk (β) = λT H−1/2
n
n
X
i=1
[2]
[2]
1/2
(0)
[diag{Xi H−1/2
λ}]k [(Gi (β) − Gi )Ai ]k hijk xij εik
n
(here we have denoted with [∆]j the jth element on the diagonal of a matrix
∆).
[2]
[1]
(n) P
(n)
Since q̃jk − qjk → 0, it is enough to prove that {Sn,jk }n , {Sn,jk }n and
[l]
{supβ∈Be (r) |Sn,jk (β)|}n , l = 3, 4, 5, 6, are bounded in L2 for every j, k =
n
1, . . . , m.
We have
[1]
E(|Sn,jk |2 )
≤ tr
(
H−1/2
n
≤ Cγn(0) tr
(
n
X
−1/2 [1] 2
(0)
2
[Ai
Gi ]j [diag{Xi Hn−1/2 λ}]2j (hijk )2 σik
xij xTij
i=1
H−1/2
n
n
X
i=1
2
σij
xij xTij
!
H−1/2
n
)
≤
(
n
X
i=1
H−1/2
n
≤ Cγn(0) (4mpτ̃n ) = C γ̃n ≤ C.
en (r),
By the Cauchy–Schwarz inequality, for every β ∈ B
[3]
|Sn,jk (β)|2
!
−1/2 [1]
1/2
[Ai
Gi (β)]2j [diag{Xi Hn−1/2 λ}]2j [Ai (β)−1/2 Ai
− I]2k
)
)
20
R. M. BALAN AND I. SCHIOPU-KRATINA
×
(
n
X
(0)
xij )2
(hijk )2 ε2ik (λT H−1/2
n
i=1
(
≤ Cnγn(0) γ̃n
T
· λ
Hn−1/2
n (r)
(0)
(hijk )2 ε2ik xij xTij
i=1
(0)
[3]
Hence E(supβ∈Be
n
X
)
−1/2
|Sn,jk (β)|2 ) ≤ Cnγn γ̃n ·{λT Hn
!
H−1/2
λ
n
)
.
Pn
(
−1/2
2
T
×
i=1 σij xij xij )Hn
λ} ≤ Cnγn0 γ̃n · (4mτ̃n ) ≤ Cn(γ̃n )2 ≤ C.
en (r),
Similarly, by the Cauchy–Schwarz inequality, for every β ∈ B
[5]
|Sn,jk (β)|2
≤
(
n
X
i=1
×
(
−1/2
[1]
[1]
λ}]2j
[Ai
(Gi (β) − Gi )]2j [diag{Xi H−1/2
n
n
X
(0)
(hijk )2 ε2ik (λT Hn−1/2 xij )2
i=1
≤ Cnγn(0) γ̃n
and E(supβ∈Be
(
· λ
T
H−1/2
n
n
X
)
(0)
(hijk )2 ε2ik xij xTij
i=1
)
!
)
H−1/2
λ
n
[5]
n (r)
[l]
|Sn,jk (β)|2 ) ≤ Cn(γ̃n )2 ≤ C.
The terms Sn,jk (β), l = 2, 4, 6, can be treated by similar methods. REFERENCES
Chen, K., Hu, I. and Ying, Z. (1999). Strong consistency of maximum quasi-likelihood
estimators in generalized linear models with fixed and adaptive designs. Ann. Statist.
27 1155–1163. MR1740117
Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the
maximum likelihood estimator in generalized linear models. Ann. Statist. 13 342–368.
MR773172
Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York. MR1284203
Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. MR624435
Kaufmann, H. (1987). On the strong law of large numbers for multivariate martingales.
Stochastic Process. Appl. 26 73–85. MR917247
Lai, T. L., Robbins, H. and Wei, C. Z. (1979). Strong consistency of least squares
estimates in multiple regression. II. J. Multivariate Anal. 9 343–361. MR548786
Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized
linear models. Biometrika 73 13–22. MR836430
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman
and Hall, London. MR727836
Rao, J. N. K. (1998). Marginal models for repeated observations: Inference with survey data. In Proc. Section on Survey Research Methods 76–82. Amer. Statist. Assoc.,
Alexandria, VA.
ESTIMATION AND LONGITUDINAL DATA
21
Schiopu-Kratina, I. (2003). Asymptotic results for generalized estimating equations with
data from complex surveys. Rev. Roumaine Math. Pures Appl. 48 327–342. MR2038208
Schott, J. R. (1997). Matrix Analysis for Statistics. Wiley, New York. MR1421574
Shao, J. (1992). Asymptotic theory in generalized linear models with nuisance scale parameters. Probab. Theory Related Fields 91 25–41. MR1142760
Shao, J. (1999). Mathematical Statistics. Springer, New York. MR1670883
Xie, M. and Yang, Y. (2003). Asymptotics for generalized estimating equations with
large cluster sizes. Ann. Statist. 31 310–347. MR1962509
Yuan, K.-H. and Jennrich, R. I. (1998). Asymptotics of estimating equations under
natural conditions. J. Multivariate Anal. 65 245–260. MR1625893
Department of Mathematics
and Statistics
University of Ottawa
Ottawa, Ontario
Canada K1N 6N5
e-mail: [email protected]
Statistics Canada
HSMD 16RHC
Ottawa, Ontario
Canada K1A 0T6
e-mail: [email protected]
© Copyright 2026 Paperzz