ST 790, Homework 3 Solutions
Spring 2017
1. Standard errors for covariance parameters.
(a) For brevity, let V ξk ,i = ∂/∂ξk V i (ξ, x i ), V ξk ,ξ` ,i = ∂ 2 /∂ξk ∂ξ` V i (ξ, x i ), for k, ` = 1, ... , Q; write
b i = V (b
b ξ ,i , and V
b ξ ξ ,i to denote evaluation at b
V
ξ, x i ), V
ξ; and V 0i , V ξ 0,i , and V ξ ξ 0,i to
k `
k
k `
k
denote evaluation at ξ 0 .
Invoking the standard M-estimation argument, we have
b −1 V
b ξ ,i V
b −1 (Y i − X i β)
b −1 V
b ξ ,i )
b TV
b − tr(V
(Y i − X i β)
i
i
i
m
1
1
X
.
−1/2
.
0=m
.
b −1 V
b ξ ,i V
b −1 (Y i − X i β)
b −1 V
b ξ ,i )
b TV
b − tr(V
(Y i − X i β)
i
i
i
Q
Q
i=1
−1
−1
(Y i − X i β 0 )T V −1
0i V ξ1 0,i V 0i (Y i − X i β 0 ) − tr(V 0i V ξ1 0,i )
..
= m−1/2
.
−1
−1
−1
i=1
(Y i − X i β 0 )T V 0i V ξQ 0,i V 0i (Y i − X i β 0 ) − tr(V 0i V ξQ 0,i )
b − β ) + E m m1/2 (b
+ Am m1/2 (β
ξ − ξ 0 ),
0
m
X
where
Am = m−1
m
X
i=1
(1)
−1 T
2(Y i − X i β 0 )T V −1
0i V ξ1 0,i V 0i X i
..
,
.
−1
−1 T
T
2(Y i − X i β 0 ) V 0i V ξQ 0,i V 0i X i
and E m is the (Q × Q) matrix with (`, k) element
m−1
m n
X
−1
−1
−1
−1
(Y i − X i β 0 )T − V −1
0i V ξk 0,i V 0i V ξ` 0,i V 0i + V 0i V ξk ξ` 0,i V 0i
i=1
−1
−1
− V −1
V
V
V
V
ξ
0,i
ξ
0,i
`
k
0i
0i
0i (Y i − X i β 0 )
o
−1
−1
.
− tr − V −1
V
V
V
+
V
V
ξ
0,i
ξ
0,i
ξ
ξ
0,i
`
k
k `
0i
0i
0i
(2)
p
It is clear that Am −→ 0, as the model for the mean is correctly specified. By the standard
result for Z with mean zero that E(Z T AZ ) = tr(AV ) = tr(V A), it is clear that the expectation
of a summand in (2) is
−1
−1
−1
−1
tr(−V −1
0i V ξk 0,i V 0i V ξ` 0,i + V 0i V ξk ξ` 0,i − V 0i V ξ` 0,i V 0i V ξk 0,i
−1
−1
−1
−1
−1
−1
+ V −1
0i V ξk 0,i V 0i V ξ` 0,i − V 0i V ξk ξ` 0,i ) = −tr(V 0i V ξ` 0,i V 0i V ξk 0,i ) = −tr(V 0i V ξk 0,i V 0i V ξ` 0,i ).
(3)
p
Thus, E m −→ −Λ, where Λ is the (Q × Q) symmetric matrix whose (`, k ) element is
lim m−1
m→∞
m
X
−1
tr(V −1
0i V ξ` 0,i V 0i V ξk 0,i ).
(4)
i=1
Denote (1) by C m . Assuming that Λ is invertible, we can rewrite the above compactly as
m1/2 (b
ξ − ξ 0 ) ≈ Λ−1 C m .
1
(5)
Clearly, as the estimating equation is unbiased, C m has mean zero. By the hint in the problem
statement with µ = 0, it is straightforward to see that the covariance matrix of a summand in
(1) has (`, `) element
−1
−1
−1
−1
−1
2 tr(V −1
0i V ξ` 0,i V 0i V 0i V 0i V ξ` 0,i V 0i V 0i ) = 2 tr(V 0i V ξ` 0,i V 0i V ξ` 0,i )
and similarly (`, k) element
−1
2 tr(V −1
0i V ξ` 0,i V 0i V ξk 0,i ).
It follows by the central limit theorem that
L
C m −→ N (0, 2Λ),
so that, by Slutsky’s theorem,
L
m/12 (b
ξ − ξ 0 ) −→ N (0, 2Λ−1 ),
where Λ is the (Q × Q) symmetric matrix whose (`, k) element is given in (4).
(b) From the above general argument, we can start from (5). Here, from (1), C m becomes
−1
V0i Vξ1 0,i )
m
X
..
{(Yi − X i β 0 )2 V0i−1 − 1}
(6)
C m = m−1/2
,
.
−1
i=1
V0i VξQ 0,i )
and, from (4), Λ reduces to the (Q × Q) symmetric matrix whose (`, k ) element is
lim m−1
m→∞
m
X
V0i−2 Vξ` 0,i Vξk 0,i .
(7)
i=1
Because now
var{(Yi − X i β 0 )2 V0i−1 − 1} = 2 + κ,
a summand of (6) has mean zero and covariance matrix (2 + κ) times the symmetric (Q × Q)
matrix with (`, k) element
V0i−2 Vξ` 0,i Vξk 0,i ,
so that the central limit theorem gives
L
C m −→ N {0, (2 + κ)Λ}.
It follows that
L
m1/2 (b
ξ − ξ 0 ) −→ N {0, (2 + κ)Λ−1 },
as required.
This problem demonstrates that it is possible to derive standard errors for the components of
b
ξ, most notably those corresponding to the distinct elements of D in a subject-specific linear
mixed effects model under the assumption of normality. However, the form of these standard
errors is highly predicated on the assumption of normality, as the leading constant being
“2” holds only when normality holds. If normality doesn’t hold, the resulting standard errors
could seriously misrepresent the uncertainty in estimation of ξ. The result in (b) suggests
we might try to estimate κ, but estimating of higher order moments (> 2) is very difficult;
2
moreover, for ni > 1, the large sample covariance matrix of b
ξ could also depend on the
coefficient of skewness of the true distribution.
This is an example of the more general phenomenon that inference on second moments is
harder and more dependent on the form of the true distribution of the data than inference on
the mean.
2. Balanced data. We have
Xi
= ( Z ∗ 0 ),
bVi = V ∗ = Z ∗ D 1 Z ∗T + σ 2 I n ,
i = 1, ... , m1
= ( 0 Z ∗ ),
bVi = V ∗ = Z ∗ D 2 Z ∗T + σ 2 I n ,
i = m1 + 1, ... , m.
First, note that
X Ti V −1
i Xi
=
=
∗
Z ∗T V ∗−1
0
1 Z
0
0
0
0
∗
0 Z ∗T V ∗−1
2 Z
,
i = 1, ... , m1
,
i = m1 + 1, ... , m.
Thus, it is straightforward to see that
! m
∗
X
m1 Z ∗T V ∗−1
0
T −1
1 Z
,
Xi Vi Xi =
∗
0
m2 Z ∗T V ∗−1
2 Z
i=1
so that
m
X
!−1
X Ti V −1
i Xi
=
i=1
∗ −1
0
m1−1 (Z ∗T V ∗−1
1 Z )
∗T ∗−1 ∗ −1
−1
0
m2 (Z V 2 Z )
.
Similarly,
m
X
X Ti V −1
i Yi
=
i=1
=
Z ∗T V ∗−1
1
0
Pm1
i=1 Y i
m1 Z ∗T V ∗−1
1 Y1
∗T ∗−1
m2 Z V 2 Y 2
+
0
∗−1 Pm
Z ∗T V 2
i=m1 +1 Y i
.
,
where Y k , k = 1, 2, are the obvious means of the Y i . Thus, substituting these results and
multiplying out, we get
∗T ∗−1 ∗ −1 ∗T ∗−1
(Z
V
Z
)
Z
V
Y
1
1
1
b=
β
.
∗ −1 ∗T ∗−1
(Z ∗T V ∗−1
V2 Y2
2 Z ) Z
By entirely similar calculations, we have
(Z ∗T Z ∗ )−1 Z ∗T Y 1
b
β OLS =
.
(Z ∗T Z ∗ )−1 Z ∗T Y 2
Thus, we need to show for k = 1, 2, that
∗ −1 ∗T ∗−1
(Z ∗T V ∗−1
V k Y k = (Z ∗T Z ∗ )−1 Z ∗T Y k .
k Z ) Z
3
By the matrix inversion formula in Appendix A, we have
V ∗−1
= (Z ∗ D k Z ∗T + σ 2 I n )−1
k
∗T ∗ −2 −1 ∗T
= σ −1 I n − σ −4 Z ∗ (D −1
Z σ ) Z
k +Z
∗T ∗ −1 ∗T
= σ −2 [I n + Z ∗ {−(σ 2 D −1
Z )} Z ]
k +Z
Thus,
∗
∗T ∗ −1
−2 ∗T ∗
(Z ∗T V ∗−1
Z + (Z ∗T Z ∗ ){−(σ 2 D −1
Z )} (Z ∗T Z ∗ )].
k Z ) = σ [Z
k +Z
Now apply the matrix inversion formula again to this expression:
∗ −1
(Z ∗T V ∗−1
= σ 2 [(Z ∗T Z ∗ )−1 − (Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )
k Z )
∗T ∗
×{−σ 2 D −1
Z ) + (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )}−1 (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1
k − (Z
= σ 2 {(Z ∗T Z ∗ )−1 + σ −2 D k }.
Likewise,
Z ∗T V ∗−1
k Yk
∗T ∗ −1 ∗T
= σ −2 Z ∗T {I ni − Z ∗ (σ 2 D −1
Z ) Z }Y k
k +Z
∗T ∗ −1 ∗T
= σ −2 {Z ∗T Y k − (Z ∗T Z ∗ )(σ 2 D −1
Z ) Z Y k }.
k +Z
Multiplying this out, we obtain
∗ −1
(Z ∗T V ∗−1
= σ 2 [(Z ∗T Z ∗ )−1 − (Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )
k Z )
∗T ∗
×{−σ 2 D −1
Z ) + (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )}−1 (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1
k − (Z
= σ 2 {(Z ∗T Z ∗ )−1 + σ −2 D k }.
Likewise,
Z ∗T V ∗−1
k Yk
∗T ∗ −1 ∗T
= σ −2 Z ∗T {I ni − Z ∗ (σ 2 D −1
Z ) Z }Y k
k +Z
∗T ∗ −1 ∗T
= σ −2 {Z ∗T Y k − (Z ∗T Z ∗ )(σ 2 D −1
Z ) Z Y k }.
k +Z
Multiplying this out, we obtain
∗ −1
(Z ∗T V ∗−1
= σ 2 [(Z ∗T Z ∗ )−1 − (Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )
k Z )
∗T ∗
×{−σ 2 D −1
Z ) + (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1 (Z ∗T Z ∗ )}−1 (Z ∗T Z ∗ )(Z ∗T Z ∗ )−1
k − (Z
= σ 2 {(Z ∗T Z ∗ )−1 + σ −2 D k }.
Likewise,
Z ∗T V ∗−1
k Yk
∗T ∗ −1 ∗T
= σ −2 Z ∗T {I ni − Z ∗ (σ 2 D −1
Z ) Z }Y k
k +Z
∗T ∗ −1 ∗T
= σ −2 {Z ∗T Y k − (Z ∗T Z ∗ )(σ 2 D −1
Z ) Z Y k }.
k +Z
Combining and multiplying, we have
∗ −1 ∗T ∗−1
(Z ∗T V ∗−1
Vk Yk
k Z ) Z
∗T ∗ −1 ∗T
= {(Z ∗T Z ∗ )−1 + σ −2 D k }{Z ∗T Y k − (Z ∗T Z ∗ )(σ 2 D −1
Z ) Z Yk}
k +Z
∗T ∗ −1 ∗T
= (Z ∗T Z ∗ )−1 Z ∗T Y k − {σ 2 D −1
Z )} Z Y k
k + (Z
∗T ∗ −1 ∗T
+σ −2 D k Z ∗T Y k − σ −2 D k (Z ∗T Z ∗ ){σ 2 D −1
Z )} Z Y k
k + (Z
h
∗T ∗ −1
= (Z ∗T Z ∗ )−1 Z ∗T Y k + − {σ 2 D −1
Z )} + σ −2 D k
k + (Z
i
∗T ∗ −1
−σ −2 D k (Z ∗T Z ∗ ){σ 2 D −1
+
(Z
Z
)}
Z ∗T Y k .
k
4
The second term on the right hand side can be written as
i
h
∗T ∗
∗T ∗
∗T ∗ −1 ∗T
−2
+
Z
Z
)
−
σ
D
(Z
Z
)
(σ 2 D −1
Z ) Z Yk
− I ni + σ −2 D k (σ 2 D −1
k
k
k +Z
i
h
∗T ∗ −1 ∗T
Z ) Z Y k = 0.
= − I ni + I ni + σ −2 D k (Z ∗T Z ∗ ) − σ −2 D k (Z ∗T Z ∗ ) (σ 2 D −1
k +Z
Thus, we have demonstrated that
∗ −1 ∗T ∗−1
(Z ∗T V ∗−1
V k Y k = (Z ∗T Z ∗ )−1 Z ∗T Y k ,
k Z ) Z
k = 1, 2,
as required.
3. Age-related macular degeneration clinical trial, continued. Figure 5.4 is a bit of a mess, but
plots of random subsets of the data in each treatment group suggest that it is reasonable
to posit a hierarchical model where individual-specific trajectories follow a straight line relationship, with visual acuity measures pretty variable within some subjects. This would of
course lead to a population mean model that is also linear. That is, letting Yij be the acuity
for individual i at time tij , we can posit an individual-level model as
Yij = β0i + β1i tij + eij .
You probably adopted this as your basic individual-level model.
The first thing you probably did was to investigate assumptions on the within and amongindividual sources of variation and correlation. As noted in class, a common approach is to
adopt a basic population model that does not include relationships between the individualspecific parameters (β0i and β1i here) and among-individual covariates beyond an obvious
categorical covariate like treatment group in a randomized study for this purpose. Under this
approach, it would be reasonable to adopt the initial model
β0i
= β00 (1 − δi ) + β01 δi + b0i
β1i
= β10 (1 − δi ) + β11 δi + b1i ,
(8)
say. Given that this is a randomized study, a simplification would be to take the typical
intercept to be the same in both treatment groups; i.e.,
β0i = β0 + b0i .
In fact, we can test for this in the preferred model (and there is little evidence that the intercepts are different). You may have adopted a fancier model involving baseline lesion severity
from the start and done this in the context of this model; that’s fine, too.
In the attached programs, we investigate different assumptions on var(e i |x i ) and var(b i |x i )
under this basic model. Using SAS, we can investigate models with different D matrix for
each treatment group (which can’t be done with lme in R) for var(b i |x i ). The time points are
not equally spaced, so if we want to investigate within-individual correlation, we need to use
something like exponential correlation. I tried fitting this including a “nugget” component for
measurement error, but it would not converge/led to non positive definite D matrix, suggesting overparameterization. Given that visual acuity is a count, measurement error is probably
negligible, so that within-individual variation is due primarily to the realization process. Based
on the information criteria for the models fitted in the SAS program, I settled on the model
with common D and common within-individual exponential correlation structure.
5
Interpretation of the questions in (i) – (iv) should be from a SS perspective. In (iii) and (iv),
the questions involve whether or not the representations on individual intercept (baseline
acuity) and slope in (8) should be modified to include systematic dependence on lesion
grade. You might have decided to look at this first and then answer (i) and (ii) or answer
(i) and (ii) using the basic model; either strategy is fine. Basing these on the basic model
effectively averages across the distribution of lesion grade in the population of patients and
thereby could be viewed as providing a more “global” analysis. In the attached programs,
we consider (i) and (ii) under the basic model and then tackle (iii) and (iv). See the output for
results.
You should have explained the process you went through to arrive at the models you selected
and explained how you formulated and addressed the questions of interest in terms of your
models. You should have also explained any diagnostic plots and analyses you constructed
and commented on what your analyses imply about the reliability of the model assumptions
(normality, etc).
6
© Copyright 2026 Paperzz