Homework 4 Solutions

ST 790, Homework 4 Solutions
Spring 2017
1. (a) We suppress the dependence on β and ξ for brevity and write the numerator of (1) as
p(y i |x i , b i )p(b i )
= (2π)−ni /2 (2π)−q/2 |R i |−1/2 |D|−1/2
T −1
× exp{−(1/2)(Y i − X i β − Z i b i )T R −1
bi }
i (Y i − X i β − Z i b i ) + (−1/2)b i D
First, it is straightforward to demonstrate plugging into the formula in Appendix A that
−1
−1
−1
−1 T −1
V −1
= (Z i DZ Ti + R i )−1 = R −1
+ Z Ti R −1 Z i )−1 Z Ti R −1
= R −1
i
i − R i Z i (D
i
i − R i Z i Ai Z i R i
where
Ai = (D −1 + Z Ti R −1
i Z i ).
The expression in the exponent multiplied by −2 can be written as
T T −1
T −1
(Y i − X i β)T R −1
i (Y i − X i β) − (Y i − X i β) R i Z i b i − b i Z i R i (Y i − X i β)
T −1
+ b Ti Z Ti R −1
bi
i Z i bi + bi D
−1
−1
T −1
T
= b Ti (D −1 + Z Ti R −1
− R −1
+ Z Ti R −1
i Z i )b i + (Y i − X i β) {R i
i Z i (D
i Z i )Z i R i }(Y i − X i β)
−1
T −1
+ (Y i − X i β)T R −1
+ Z Ti R −1
i Z i (D
i Z i )Z i R i (Y i − X i β)
T T −1
− (Y i − X i β)T R −1
i Z i b i − b i Z i R i (Y i − X i β)
T
T
T T −1
= (Y i − X i β)T V −1
i (Y i − X i β) + b i Ai b i − b i Ai Ai Z i R i (Y i − X i β)
−1
−1
−1 T −1
T −1
− (Y i − X i β)T R −1
i Z i Ai Ai b i + (Y i − X i β) R i Z i Ai Ai Ai Z i R i (Y i − X i β)
−1 T −1
−1 T −1
T
= (Y i − X i β)T V −1
i (Y i − X i β) + {b i − Ai Z i R i (Y i − X i β)} Ai {b i − Ai Z i R i (Y i − X i β)}.
Thus,
p(y i |x i , b i )p(b i ) = (2π)−ni /2 (2π)−q/2 |R i |−1/2 |D|−1/2 exp[−(1/2)(Y i − X i β)T V −1
i (Y i − X i β)
T −1
−1 T −1
T
+ (−1/2){b i − A−1
i Z i R i (Y i − X i β)} Ai {b i − Ai Z i R i (Y i − X i β)}].
The denominator of (1) is thus
Z
p(y i |x i , b i )p(b i ) db i = (2π)−ni /2 (2π)−q/2 |R i |−1/2 |D|−1/2 exp{−(1/2)(Y i − X i β)T V −1
i (Y i − X i β)}
Z
T −1
−1 T −1
T
× exp[(−1/2){b i − A−1
i Z i R i (Y i − X i β)} Ai {b i − Ai Z i R i (Y i − X i β)}]
1/2
= (2π)−ni /2 (2π)−q/2 |R i |−1/2 |D|−1/2 exp{−(1/2)(Y i − X i β)T V −1
(2π)q/2 .
i (Y i − X i β)}|Ai |
It follows that (1) can be written as
T −1
−1 T −1
T
p(b i |y i , x i ) = (2π)−q/2 |Ai |−1/2 exp[(−1/2){b i −A−1
i Z i R i (Y i −X i β)} Ai {b i −Ai Z i R i (Y i −X i β)}],
showing that p(b i |y i , x i ) is normal with mean/mode
T −1
A−1
i Z i R i (Y i − X i β)
1
and covariance matrix A−1
i . Again using the results in Appendix A,
−1
A−1
= (D −1 + Z Ti R −1
= D − DZ Ti (Z i DZ Ti + R i )−1 Z i D = D − DZ Ti V −1
i
i Zi)
i Z i D.
It follows that
T −1
A−1
= {D − DZ Ti (Z i DZ Ti + R i )−1 Z i D}Z Ti R −1
i Z i Ri
i
= DZ Ti {I q − (Z i DZ Ti + R i )−1 Z i DZ Ti }R −1
i
= DZ Ti (Z i DZ Ti + R i )−1 {Z i DZ Ti + R i − Z i DZ Ti }R −1
i
= DZ Ti V −1
i ,
showing that the mean/mode can be written as
DZ Ti V −1
i (Y i − X i β),
as required, and the covariance matrix is D − DZ Ti V −1
i Z i D.
(b) We have
b i = DZ T V −1 (Y i − X i β),
b
b
i
i
b = (X T V −1 X )−1 X T V −1 Y
β
in the “stacked” notation. Let β 0 be the true value of β, so that E(Y i |x̃) = X i β 0 and E(Y |x̃) =
X β 0 . Easily,
b i |x̃) = DZ T V −1 (X i β − X i (X T V −1 X )−1 X T V −1 X β ) = 0.
E(b
0
0
i
i
b T |x̃), which equals
b i |x̃) = E(b
bi b
Thus, var(b
i
b
b T −1
E{DZ Ti V −1
i (Y i − X i β)(Y i − X i β) V i Z i D}.
Now
i
h
T
T
b
b
b
b
)}{Y
−
X
β
−
X
(
β
−
β
)}
E{(Y i − X i β)(Y
−
X
β)
}
=
E
{Y
−
X
β
−
X
(
β
−
β
i
i 0
i
i
i
i
i 0
i
0
0
b − β )T X T }
= E{(Y i − X i β 0 )(Y i − X i β 0 )T } − E{(Y i − X i β 0 )(β
0
i
T
T
b
b
b
− E{X i )(β − β )(Y i − X i β ) } + X i E{(β − β )(β − β ) }X T ,
0
0
0
0
where
E{(Y i − X i β 0 )(Y i − X i β 0 )T } = V i .
Also
b − β ) = (X T V −1 X )−1 X T V −1 (Y − X β ),
(β
0
0
2
i
so that
h
i
b − β )T } = E (Y i − X i β )(Y − X β )T V −1 X (X T V −1 X )−1
E{(Y i − X i β 0 )(β
0
0
0


X
1
h
i
−1  .. 
T −1
= E (Y i − X i β 0 ){(Y1 − X 1 β 0 )T , ... , (Ym − X m β 0 )T } diag(V −1
X )−1
1 , ... , V m )  .  (X V
Xm


X1
−1
−1  .. 
= (0, · · · , V i , · · · 0) diag(V 1 , ... , V m )  .  (X T V −1 X )−1
Xm


= (0, · · · , I ni , · · · , 0) 

X1
..  (X T V −1 X )−1
. 
Xm
= X i (X T V −1 X )−1 .
By similar calculations,
b − β )(β
b − β )T }
E{(β
0
0




Y 1 − X 1 β0



..
−1
T −1
T
T
−1 
= E (X T V −1 X )−1 X T V −1 
 {(Y 1 − X 1 β 0 ) , ... , (Y m − X m β 0 ) }V X (X V X ) 
.
Y m − X m β0
= (X T V −1 X )−1 X T V −1 V V −1 X (X T V −1 X )−1 = (X T V −1 X )−1 .
Thus, putting together, we obtain
b
b T
E{(Y i − X i β)(Y
i − X i β) }
= V i − X i (X T V −1 X )−1 X Ti − X i (X T V −1 X )−1 X Ti + X i (X T V −1 X )−1 X Ti
= V i − X i (X T V −1 X )−1 X Ti .
Thus,
b i |x̃) = DZ T V −1 {V i − X i (X T V −1 X )−1 X T }V −1 Z i D
var(b
i
i
i
i
T −1
= DZ Ti {V −1
− V −1
X )−1 X Ti V −1
i
i X i (X V
i }Z i D,
as required.
b i − b i |x̃) = 0 from above,
(c) Because E(b
b i − b i |x̃) = E{(b
b i − b i )(b
b i − b i )T |x̃} = E(b
bi b
b T |x̃) − E(b
b i b T |x̃) − E(b i b
b T |x̃) + E(b i b i |x̃),
var(b
i
i
i
bi b
b T |x̃) above, and E(b i b i |x̃) = D. So we calculate
where we calculated E(b
i
h
i
b T |x̃) = E b i {Y i − X i β − X i (β
b − β )}T V −1 Z i D.
E(b i b
0
0
i
i
3
Now
h
i
b − β )}T
E b i {Y i − X i β 0 − X i (β
0
= E{b i (Y i − X i β 0 )T } − E{b i (Y − X β 0 )T }V 1− X (X T V −1 X )−1 X Ti


X1
h
i
−1  .. 
T −1
= DZ Ti + E b i {(Y1 − X 1 β 0 )T , ... , (Ym − X m β 0 )T } diag(V −1
X )−1 X Ti
1 , ... , V m )  .  (X V
Xm


X1
T
−1
−1  .. 
= (0, ... , DZ i , ... , 0)diag(V 1 , ... , V m )  .  (X T V −1 X )−1 X Ti
Xm

=

(0, ... , DZ Ti V −1
i , ... , 0) 

X1
..  (X T V −1 X )−1
. 
Xm
T −1
= DZ Ti V −1
X )−1 X Ti .
i X i (X V
Putting together,
b i − b i |x̃) = DZ T {V −1 − V −1 X i (X T V −1 X )−1 X T V −1 }Z i D
var(b
i
i
i
i
i
T −1
T −1
− {DZ Ti V −1
X )−1 X Ti V −1
i Z i D − DZ i V i X i (X V
i Z i D}
T −1
T −1
− {DZ Ti V −1
X )−1 X Ti V −1
i Z i D − DZ i V i X i (X V
i Z i D} + D
T −1
= D − DZ Ti {V i−1 − V −1
X )−1 X Ti V −1
i X i (X V
i }Z i D,
as required.
2. (a) See attached programs and plot of the data with the fitted model superimposed.
c = exp(β
b = exp(β
b ), Cl
b ), and V
b ).
(b) The estimators of the PK parameters are kba = exp(β
1
2
3
b
Given we have approximate standard errors for the components of β based on the asymptotic theory, an obvious approach is to use the delta method. Generically, if we let a(β) be a
real-valued function of β, then by the usual linear Taylor series
b ≈ a(β ) + aT (β )(β
b − β ),
a(β)
0
0
0
β
where aβ (β) is the vector of partial derivatives of a(β) with respect to β, so that
b ∼· N {a(β),
b aT (β)
b Σa
b β (β)},
b
a(β)
β
b is the large sample approximate covariance matrix for β.
b Taking a(β) = exp(βk ) for
where Σ
k = 1, 2, 3, aβ (β 0 ) has exp(βk0 ) in the kth position and zeroes elsewhere, in which case it is
straightforward that
b Σa
b β (β)
b = exp(2βbk )Σ
b kk .
aβT (β)
This is an approximate expression for the variance of the estimator, so you should have
multiplied the standard error for βbk by exp(βbk ) for k = 1, 2, 3. See the R program for the
results.
4
3. (a) Expanding
n
X
D Tj (η)V −1
j (η){s j (η) − mj (η)} = 0,
(1)
j=1
about η ∗ “close to” η gives
0 ≈
n
X
∗
∗
∗
D Tj (η ∗ )V −1
j (η ){s j (η ) − mj (η )}
j=1

n
X

+
∂/∂η {D Tj (η ∗ )V −1 (η ∗ )}{sj (η ∗ ) − mj (η ∗ )}
j
j=1
+
n
X

∗
∗
∗ 
∗
D Tj (η ∗ )V −1
j (η )∂/∂η {s j (η ) − mj (η )} (η − η ).
j=1
Now ∂/∂η sj (η) = ∂/∂ηsj [{Yj − f (x j , β)}2 ], where this notation is meant to indicate that
sj (η) is a function of η only through this argument. By the chain rule, letting “0 ” denote
differentiation with respect to the argument, we have
s0j [{Yj − f (x j , β)}2 ] ∂/∂η, [{Yj − f (x j , β)}2 ]
= −2{Yj − f (x j , β ∗ )}∂/∂z sj [{Yj − f (x j , β ∗ )}2 ]∂/∂η {f (x j , β ∗ )}.
Substituting this into the above and using the definition of D j (η), we obtain
0 ≈
n
X
∗
∗
∗
D Tj (η ∗ )V −1
j (η ){s j (η ) − mj (η )}
j=1
+
n
X
∗
∗
∗
∗
∂/∂η {D Tj (η ∗ )V −1
j (η )}{s j (η ) − mj (η )}(η − η )
j=1
−
n
X
∗
∗
∗
D Tj (η ∗ )V −1
j (η )D j (η )(η − η )
j=1
+−2
n
X
∗ 0 ∗
∗
∗
{Yj − f (x j , β ∗ )}D Tj (η ∗ )V −1
j (η )s j (η )∂/∂η {f (x j , β )}(η − η ).
j=1
The term on the second line is negligible by virtue of the fact that E(sj (η ∗ )|x j ) ≈ mj (η ∗ ), so
that this term contains the product of two “small” terms. The term in the fourth line depends
on the product (η − η ∗ ){Yj − f (x j , β ∗ )}, which is also “small.” Disregarding these terms yields
the approximation
0 ≈
n
X
∗
∗
∗
D Tj (η ∗ )V −1
j (η ){s j (η ) − mj (η )}
j=1
−
n
X
∗
∗
∗
D Tj (η ∗ )V −1
j (η )D j (η )(η − η ).
j=1
5
Writing this in terms of the “stacked” notation and rearranging then suggests the iterative
update
−1
−1 T
η (a+1) = η (a) + {D T(a) V −1
(a) D (a) } D (a) V (a) (s (a) − m(a) ),
as required.
4. In this situation, we have n = 3 repeated measurements on each of m individuals at equallyspaced times, where the overall pattern of correlation is AR(1), so that


1 α α2
Γi (α, x i ) =  α 1 α  .
α2 α 1
There are no covariates. For brevity, write fj = f (tj , β), so that we have E(Yij ) = fj and
var(Yij ) = σ 2 gj2 = σ 2 fj , so that gj2 = fj . Because we have both the variance scale parameter
σ 2 and the scalar correlation parameter α, ξ = (σ 2 , α)T , to estimate, there will be 6 terms in
u i , three squared terms and three cross-product terms. Placing these in the order on page
242, it follows that




σ 2 f1
(Yi1 − f1 )2
 σ 2 αf 1/2 f 1/2 
 (Yi1 − f1 )(Yi2 − f2 ) 




 2 2 11/2 21/2 
 (Yi1 − f1 )(Yi3 − f3 ) 
σ α f1 f3 
 , vi = 
ui = 

,
2


(Y
−
f
)


σ 2 f2
2
i2




 (Yi2 − f2 )(Yi3 − f3 ) 
 σ 2 αf 1/2 f 1/2 
2
3
(Yi3 − f3 )2
σ 2 f3





Ei = 



2σf1
0
1/2 1/2
1/2 1/2
2
2σαf1 f2
σ f1 f2
1/2 1/2
1/2 1/2
2
2σα f1 f3
σ 2 αf1 f3
2σf2
0
1/2 1/2
1/2 1/2
2
2σαf2 f3
σ f2 f3
2σf3
0





.



To deduce the “covariance matrix” Z i (β, ξ) under the “Gaussian working assumption,” we
need to use the definition of u i above along with the condition (8.23) that holds under normality. In general, (8.23) for u i defined as above implies that




4
Zi = σ 


2g14
2αg13 g2
2α2 g13 g3
2α2 g12 g22
2α3 g12 g2 g3
2α4 g12 g32
2
2
2
2
2
3
2
2
(1 + α )g1 g2 α(1 + α )g1 g2 g3
2αg1 g2
2α g1 g2 g3
2α3 g1 g2 g32
(1 + α4 )g12 g32
2α2 g1 g22 g3 α(1 + α)g1 g2 g32
2αg1 g33
4
3
2αg2 g3
2α2 g22 g32
2g2
(1 + α2 )g22 g32
2αg2 g33
2g34
1/2
where as above gj = fj




,



, j = 1, 2, 3.
5. A plot of the raw data is not that informative, but we can summarize the proportions of
subjects with skin cancers at each year under each of placebo and active agent and plot
them (not shown):
6
year
placebo
beta-carotene
1
0.172
0.168
2
0.134
0.153
3
0.134
0.171
4
0.129
0.148
5
0.167
0.186
The raw proportions are difficult to interpret; there is a suggestion of a quadratic pattern for
placebo, but the proportions for beta-carotene show no systematic pattern. In fact, the raw
proportions of skin cancers are higher for beta-carotene in all years except year 1, which
does not bode well for its effectiveness.
Choosing an intitial model is thus difficult. In the attached program, we try fitting a model for
which proportion is constant over time and one where it is quadratic in time for each treatment; we do not attempt to incorporate covariates yet. Because these data are miraculously
balanced, it is possible to fit an unstructured working correlation model in each case. The
estimated correlation matrix suggests that a working compound symmetric correlation structure is reasonable. Thus, a sensible approach is to adopt the working model and use the
robust sandwich standard errors “just in case.”
There is evidence to support a quadratic trend for placebo but none for beta-carotene based
on either model-based or robust standard errors. We also go ahead and fit a model for which
proportion is linear in time (so constant rate of change) for each treatment, mainly to assess
if there is evidence of a long-term increasing or decreasing trend over the study period. The
results do not show evidence supporting this.
Based on these fits, to answer question (i) (Is there evidence that the probability of new skin
cancers changes over the study period for either treatment?), we might say that there is
evidence that the proportion of skin cancers in the population treated with placebo changes
over the study period in that it decreases and then rebounds by year 5, but there is no
evidence of a change for beta-carotene. The quadratic trend for placebo could well be an
artifact; given that no active agent was given to the placebo patients, we would not expect
there to be a change over time in proportion, particularly of the type observed. So we might
add a caveat to this effect. Given this, it would be perfectly reasonable to decide to stick with
the constant but different proportions model for further analyses.
To answer (ii) (Does beta-carotene lead to a lower probability of new skin cancers than
placebo after five years in this population? What is the probability of experiencing new skin
cancers at year 5 for each treatment?), given the pronounced quadratic trend for placebo, we
might base the comparison at five years on the quadratic model. The probability at year 5 is
a monotone transformation of the linear predictor, so we could construct a linear hypothesis
test of a difference in linear predictor at year 5. See the R program. I got a p-value of 0.54,
suggesting no difference.
To tackle (iii) (Is there evidence that the probability of experiencing new skin cancers over
the study period is associated with age, gender, previous cancer, or center?), because of
the above, I decided to look at this in the context of the model with constant proportions over
time, in which case the question reduces to one of whether or not any of these covariates
is associated with this constant probability in either treatment. As a first shot, I allowed the
dependence on covariates to be the same for both treatments. See the R program for results.
There appears to be strong evidence that, while treatment doesn’t seem to matter much,
the probability of skin cancers over the five years is strongly associated with age, gender,
exposure, and center. The last is interesting, as it suggests that the patient populations
might be systematically different at the different centers, perhaps because of demographics
or perhaps because of something else (preventative counseling or whatever).
7
The bottom line is that this was a pretty disappointing study! There does not appear to be
any difference between the treatments. However, the results show that the recurrence of
skin cancers does appear to be associated with several of the covariates.
8