ST 790, Homework 5 Solutions
Spring 2017
1. (a) In shorthand notation, let
T −1
Bi = X Ti (x i , β)V −1
i (β, ξ, x i ) = X i V i
(p × n).
Then it is straightforward that if
Bi = (Bi1 , ... , Bin ),
then
Bij =
n
X
X ik V ijk ,
k=1
where X ij is the jth column of X Ti and V ijk is the (j, k ) element of V −1
i . With these definitions,
we can write the estimating equation as
n
m
X
X
Rij
Bij {(Zij − fj (ai , β)} .
(1)
π j (Hi,j−1 )
i=1
j=1
Then for an individual who drops out at time j + 1, so for whom Ri1 = 1, ... , Rij = 1, Ri,j+1 =
0, ... , Rin = 0, from (1), the summand becomes
j
X
π −1
k (Hi,k−1 )B ik {Yik − fk (a k , β)
k=1
−1
−1
= π −1
1 B i1 {Yi1 − f1 (a i , β)} + π 2 (Hi1 )B i2 {Yi2 − f2 (a i , β)} + · · · + π j (Hi,j−1 )B ij {Yij − fj (a i , β)}
−1 defined on p. 267 of
Note that Bij defined above is different from X (j)T
(β){V (j)
i
i (β, ξ, x i )}
the notes.
(b) If all the λj (Hi,j−1 ) are correctly specified and MAR holds, then
e i , x i ) = pr(Rij = 1|Hi,j−1 ).
π j (Hi,j−1 ) = pr(Rij = 1|Z
Then consider the ith summand, which from (1) is
n
X
j=1
Rij
Bij (x i ){(Zij − fj (ai , β)},
π j (Hi,j−1 )
e i ). Now for the jth term in
where we now emphasize that Bij (x i ) depends on x i (but not on Z
1
the summand
Rij
E
Bij (x i ){(Zij − fj (ai , β)} x i
π j (Hi,j−1 )
Rij
e
=E
Bij (x i ){(Zij − fj (ai , β)} Z i , x i x i
π j (Hi,j−1 )
#
"
e i, xi)
E(Rij |Z
=E
Bij (x i ){(Zij − fj (ai , β)} x i
π j (Hi,j−1 )
#
"
e i, xi)
pr(Rij = 1|Z
Bij (x i ){(Zij − fj (ai , β)} x i
=E
π j (Hi,j−1 )
pr(Rij = 1|Hi,j−1 )
=E
Bij (x i ){(Zij − fj (ai , β)} x i
π j (Hi,j−1 )
π j (Hi,j−1 )
=E
Bij (x i ){(Zij − fj (ai , β)} x i
π j (Hi,j−1 )
= E[Bij (x i ){E(Zij |x i ) − fj (ai , β)}] = 0
(2)
by MAR and the fact that the fj (ai , β) are correctly specified.
Applying this argument to each term in the summand yields the result.
2. Write R i , Z i , etc for short here, so that
b i = DZ T R −1 (Y i − f i ).
b
i
i
b i in Z i and f i , but this doesn’t matter to the following argument, as all we are
There are b
trying to do is reexpress (9.85) as (9.87).
There are two things to show. We can simplify (9.85) to
−1/2
p(Y i |x i ; β, γ, D) ≈ (2π)−ni /2 |R i |−1/2 |D|−1/2 |D −1 + Z Ti R −1
i Zi|
T
b i }.
b D −1 b
× exp{−(1/2)(Y i − f i )T R i (Y i − f i ) − (1/2)b
i
Thus, to show the equivalence, we first want to show that
|R i ||D||D −1 + Z Ti R −1
i Z i | = |R i + Z i DZ i |.
(3)
This is straightforward by using standard matrix results. One way is to invoke this one: If A
is (p × q) and B is (q × p), then
|I p + AB| = |I q + BA|.
We apply this result to
−1
T −1
−1
|D −1 + Z Ti R −1
(I + DZ Ti R −1
i Z i | = |D
i Z i )| = |D| |I + DZ i R i Z i |.
The second term in the last expression is thus equal to
T
−1
|I + Z i DZ Ti R −1
i | = |R i + Z i DZ i ||R i | .
2
Putting all together, we have
T
−1
−1
|D −1 + Z Ti R −1
i Z i | = |D| |R i + Z i DZ i ||R i | .
Thus, from above, we can conclude (3).
Now we need to deal with the term in the exponential. We want to show that
b T −1 b
b i = (u i − f i )(R i + Z i DZ T )−1 (u i − f i ),
(Y i − f i )T R −1
i
i (Y i − f i ) + b i D
(4)
where we have defined
bi
ui = Y i + Z i b
so that (u i − f i ) = (Y i − hi ) and hi is defined on page 322. We can write
b i = DZ T R −1 (u i − f i − Z i b
b i ),
b
i
i
which leads to
b i = D(D −1 + Z T R −1 Z i )b
b i = DZ T R −1 (u i − f i )
b i + DZ T R −1 Z i b
b
i
i
i
i
i
i
so that we finally obtain
b i = (D −1 + Z T R −1 Z i )−1 Z T R −1 (u i − f i ).
b
i
i
i
i
Note that we can write the left hand side of (4) as
−1
T −1
b b T T −1 Z i + D −1 )b
bi .
bT
(u i − f i )T R −1
i (u i − f i ) − b i Z i R i (u i − f i ) − (u i − f i ) R i Z i b i + b i (Z i R
b i , we obtain
Now simplifying this and inserting the expression above for b
T −1
T −1
(u i − f i )T R −1
Z i + D −1 )−1 Z Ti R −1
i (u i − f i ) − (u i − f i ) R i Z i (Z i R
i (u i − f i )
T −1
−(u i − f i )T R −1
Z i + D −1 )−1 Z Ti R −1
i Z i (Z i R
i (u i − f i )
T −1
+(u i − f i )T R −1
Z i + D −1 )−1 (Z Ti R −1 Z i + D −1 )(Z Ti R −1 Z i + D −1 )−1 Z Ti R −1
i Z i (Z i R
i (u i − f i ).
This simplifies further to
T −1
T −1
(u i − f i )T R −1
Z i + D −1 )−1 Z Ti R −1
i (u i − f i ) − (u i − f i ) R i Z i (Z i R
i (u i − f i )
which can be rewritten as
T −1
Z i + D −1 )−1 Z Ti R −1
(u i − f i )T {R −1
− R −1
i }(u i − f i ).
i
i Z i (Z i R
Using standard matrix inversion results, the middle term is equal to
(R i + Z i DZ Ti )−1 .
Substituting this gives the desired result.
3
3. (a) This is a subject-specific generalized linear mixed effects model. Thus, (3) is the individual i-specific probability of having moderate to severe onycholysis at time tij for subject i, so
that exp(β0 + β1 tij + b1i ) is the odds and β0 + β1 tij + b1i is the log odds of moderate to severe
onycholysis for individual i at time tij . Accordingly, β0 + b1i is the log odds at baseline (ti1 = 0)
so that β0 is the average or “typical” value of the log odds in the population of individuals;
i.e., under the usual convention, the log odds for the “typical individual” who has the “typical”
value of log odds. There is no random effect associated with the change in log odds over
time, so that, for all i, the amount by which individual log odds changes from baseline after
one time unit is β1 , and β1 is the “typical” value of this change in log odds. Thus, β0 and β1
characterize the probability of moderate to severe onycholysis at any time tij for an individual
for whom the log odds of moderate to severe onycholysis at baseline and the change in log
odds per unit time are equal to the “typical” or average values in the population.
(b) This is a population-averaged model. Thus, (5) is a model for the overall population probability of having moderate to severe onycholysis at time tij for a randomly chosen individual
in the population. Thus, β0 and β1 characterize this overall population probability. β0 is thus
the log odds, and exp(β0 ) is the odds that a randomly chosen individual from the population
will have moderate to severe onycholysis at baseline, and β1 is the change in log odds from
baseline that a that a randomly chosen individual will have moderate to severe onycholysis at 1 time unit post-baseline and exp(β1 ) is the is the multiplicative factor by which the
odds of that a randomly chosen individual from the population will have moderate to severe
onycholysis changes in one time unit.
(c) On the course web page are two SAS programs with their corresponding log files and
output. The first fits (3)-(4), with a single “intercept” random effect, and the second fits (3)-(7)
with the population model (8), with “intercept” and “slope” random effects. Several different methods are used as implemented in proc glimmix, proc nlmixed, and the glimmix
macro – these include linearization about zero (“MQL” type method), linearization about current estimates of random effects (“PQL” type methods), full Laplace approximation, and full
adaptive quadrature with 10 and 20 quadrature points
For model (3)-(4), here are the results. The first five columns are using proc glimmix; the
Laplace and quadrature (with 10, 20, and 50 quadrature points) results in columns 3-5 are
almost identical to those from proc nlmixed. The sixth column shows the results from the
glimmix macro.
β0
β1
β2
D
“MQL”
-0.5477
-0.1692
-0.0697
2.4674
“PQL”
-0.7318
-0.2765
-0.0971
4.6348
Laplace
-2.6624
-0.3959
-0.1457
20.7329
Quad-10
-1.7190
-0.3906
-0.1432
16.4896
Quad-20
-1.6971
-0.3883
-0.1424
16.0121
Quad-50
-1.6972
-0.3885
-0.1424
16.0349
“PQL”
-0.7624
-0.2949
-0.1023
5.3789
The results are not very consistent across methods. The approximate linearization methods
give results that are markedly different from those using the full Laplace approximation or
adaptive quadrature. The results for quadrature using 20 and 50 quadrature points are similar, suggesting that a sufficient number of points have been used to achieve good accuracy.
10 quadrature points does not seem to be sufficient.
Clearly, the linearization approximations are not very accurate. This is a binary response,
and it is well-known that “PQL” approximation can lead to nontrivial bias in estimators for the
4
components of β and D. This is likely what is going on here. I would definitely use adaptive
quadrature in this kind of situation; it is trivial to perform and appears to yield good accuracy
with sufficient number of quadrature points. Even the full Laplace approximation, which is
equivalent to adaptive quadrature with 1 point, doesn’t do too badly.
Now consider the fit of the model (3)-(7) with two random effects. Here are the results; no
entries mean that the algorithm did not converge.
β0
β1
β2
D11
D12
D22
“MQL”
–
–
–
–
–
–
“PQL”
–
–
–
–
–
–
Laplace
-12.3377
-10.8985
-1.6011
5461.88
-18.3139
124.55
Quad-10
-3.6437
-0.9031
-0.2383
93.2093
-6.1531
1.0921
Quad-20
-5.4770
-0.8856
-0.5940
179.98
-13.4215
2.1579
Quad-50
-5.4770
-0.8856
-0.5940
179.98
-13.4215
2.1579
“PQL”
0.2768
-1.6057
-0.5460
84.9839
-15.3465
6.6298
The results are all over the map. In fact, implementation of Laplace and the quadrature
methods using proc nlmixed instead did not necessarily give the same results (which could
be a starting value issue). It is hard to know what to believe here. Perhaps the results
with quadrature with 20 and 50 points, which agree, are the most reliable, but the fact that
different results were obtained with proc nlmixed with 50 points is worrisome. The data
analyst should be very wary. Probably what is going on is that the model with two random
effects is barely practically identifiable. Lack of agreement, failure to converge, and crazy
results usually reflect the need to reassess the model. e (b) The results for estimation of
b = (−0.5794, −0.1714, −0.07786)T . Not surprisingly, these
β using proc glimmix are β
estimates are most similar to those from “MQL” for model (3)-(4), which linearizes about 0 so
is essentially fitting the same mean model. They are thus very different from those from the
most accurate quadrature fit of the subject-specific model (3)-(4). Clearly, the parameters
in the population-averaged and subject-specific models represent different quantities with
different interpretations.
4. Here is a plot of the data
5
1.5
1.0
0.5
0.0
Concentration (mcg/ml)
0
5
10
15
20
25
30
35
Time (hours)
The individual-specific profiles appear to be well-represented by a one compartment model
with first order absorption and elimination. Thus, there are three parameters, fractional rate
of absorption ka , representing absorption characteristics; clearance Cl, representing elimination characteristics; and volume of distribution V , representing distribution characteristics.
Because of the known tendency for distributions of these parameters to be skewed, I parameterized the model in terms of β1 = log ka , β2 = log Cl, and β3 = log V . I adopted a subjectspecific nonlinear mixed effects model in which these parameters vary across individuals (so
each has an associated random effect). I also assumed the power model for within-individual
variance and allowed the power δ to be estimated. Because there are enough time points
on each individual to fit the one compartment model separately to each, it would be possible
to do this, obtain residuals and predicted values for each individual, pool these across individuals, and plot residuals against predicted values to get a visual impression of the nature
of within-individual variance. You may have done this.
In the context of this model, the investigators are interested in characterizing the average values of ka , Cl, and V and whether or not these averages are systematically related to subject
characteristics. The model is parameterized with these on the log scale; for the purpose of
investigating associations, it is reasonable to operate on the transformed scale and consider
the evidence supporting whether or not the mean values of these on the transformed scales
are associated with these covariates. This is the standard approach to this sort of question.
To this end, I first fit this general model with a second stage model involving no individualspecific covariates and obtained the empirical Bayes estimates of the random effects and
plotted them against the covariates (box plots or scatter plots as appropriate, not shown).
These suggest possible relationships, particularly with weight, which I put on the log scale to
make the relationship appear linear, making it easier to model. I also dichotomized creatinine
clearance (see program) to obtain a binary indicator of renal impairment. You may have
made different choices.
I also fit the full blown model with all the covariates; I did not include creatinine clearance
6
in log ka because physiologically it isn’t plausible that absorption is associated with renal
impairment, but I could have included it. With all the stuff being estimated, some of the
estimates are probably not very precise, resulting in “high” p-values. The strong association
between log Cl and log V and weight is evident, as is weaker evidence for associations with
gender. You may have tried fitting several models before settling on a final model. The
important associations the investigators would like to know about then follow from what you
included in your final model.
Based on the final model, the investigators are interested in the average values of ka , Cl
and V in the population and if these are systematically associated with subject characteristics. There are lots of ways to interpret and address these questions. The models here
are parameterized with these parameters on the log scale. One possibility is to just report
average or “typical values” on the log scale, along with standard errors. So, for example, for
the model with β2i = log Cli depending on log weight and age, e.g.,
β2i = β20 + β21 log(weight) + β22 age + b2i ,
one might calculate the sample averages of log weight and age and report the estimated
average value of β2i ,
β20 + β21 log(weight) + β22 age,
evaluated at these. Or provide estimates for a range of values of weight and age. Of course,
it stands to reason that the investigators would like estimates of these quantities on their
original scales. In the above model, we thus want the expectation of Cli = exp(β2i ),
E(Cli ) = E{exp(β 2i )} = exp{β20 + β21 log(weight) + β22 age}E{exp(b2i )}
= exp{β20 + β21 log(weight) + β22 age} exp(D22 /2),
using the fact that b2i ∼ N (0, D22 ). An estimator is thus found by substituting the estimators
for β2 and D22 .
Of course, if we report an estimate, we’d better report a standard error to provide an assessment of the quality of the estimation procedure. To obtain a standard error to go along with
b one
b and D,
this estimate, given the approximate large sample joint covariance matrix of β
can use the delta method applied to this and similar expressions to obtain an approximate
b are independent.) Unfortunately,
b and D
standard error. (Under normality of everything, β
b is
as we have discussed in class, the large sample approximate sampling distribution of D
not very reliable, and these are cumbersome to obtain from the software (if it provides them
at all – nlme() has the apVar attribute that produces a large sample covariance matrix for
the estimators of all the covariance and variance parameters in the model, but it is difficult to
interpret. It is possible to get this out of nlinmix and nlmixed with some wrangling for the
former).
It is thus customary to do one of two things. One is to pretend that the covariance parameters
are “known” and treat the estimates as fixed, and apply the delta method to expressions like
b The other and most widely adopted is as follows. By a
that above only in the elements of β.
linear Taylor series about b2i = 0 (its mean),
Cli ≈ exp{β20 + β21 log(weight) + β22 age} + exp{β20 + β21 log(weight) + β22 age}b2i ,
which implies E(Cli ) ≈ exp{β20 +β21 log(weight)+β22 age}. Whether it is made explicit or not,
this approximation to the “typical” clearance rate is what would ordinarily be done in practice
7
and is the standard approximation used in PK and in interpretations in generalized linear
mixed effects models as in the discussion of the epileptic seizure example in Chapter 9. One
can then use the delta method to get standard errors based on the large sample covariance
b
matrix of β.
In the program, we demonstrate implementation of the above approximations. We can exb as shown in the
tract the approximate covariance matrix for the sampling distribution of β
b such that
program. For parameters on the log scale, let a be a vector of the same length as β
T
b
the quantity of interest can be represented as exp(a β). Then the delta method proceeds
b about the “true value” of β and then substituting
by a linear Taylor series in this quantity in β
b
b can be approximated/estimated by
the estimate β for β, we have that var{exp(aT β)}
b
c β)a},
exp(2aT β) {aT var(
b is the estimated large sample covariance matrix of β.
b Calculation is shown in
c β)
where var(
the R program.
You may have done something different. The important thing is that you recognized that you
needed to do something to report at least approxmate estimates and standard errors.
8
© Copyright 2026 Paperzz