Homework 1 Solutions

ST 790, Homework 1 Solutions
Spring 2017
1. (a) Under these conditions, it is straightforward that
Yij = {(1 − gi )(β0,G + β1,G tij ) + gi (β0,B + β1,B tij )} + b0i + eij ,
(1)
so that it is immediate that
var(Yij |gi ) = D + σ 2 ,
cov(Yij , Yij 0 ) = D,
and thus
D
.
D + σ2
This correlation is the same for all pairs of time points, and the variance is constant across
time, so it is clear that the covariance and correlation matrices for Y i have the compound
symmetric form with the same variance at all time points.
corr(Yij , Yij 0 ) =
(b) Comparing (1) to (3.1), both models involve an individual-specific random effect plus an
independent within-individual deviation, so that the variance, covariance, and correlation are
of the same form with D playing the role of σb2 and σ 2 playing the role of σe2 . Thus, both
models make the same assumption about the overall covariance and correlation structure
of a data vector; namely, that it is compound symmetric with constant variance. The major
difference is how the population mean response profile is represented. In (1), the population
means at each time point lie on a straight line, where the intercept and slope are possibly
different for each gender. In (3.1), no relationship between the population means at each
time point is assumed; each is modeled separately and is not constrained in any way to be
related to any other mean.
(c) We now have
Yij = {(1 − gi )(β0,G + β1,G tij ) + gi (β0,B + β1,B tij )} + b0i + b1i tij + eij ,
Immediately,
var(Yij ) = var(b0i + b1i tj + eij ) = D11 + D22 tj2 + 2D12 tj + σ 2 ,
and it is straightforward that, for j 6= j 0 ,
cov(Yij , Yij 0 ) = cov(b0i + b1i tj + eij , b0i + b1i tj 0 + eij 0 ) = D11 + D22 tj tj 0 + D12 (tj + tj 0 ).
One can derive the entire covariance matrix with these elements by letting


1 t1


Z i =  ... ...  ,
1 t4
and noting that
var(Y i |gi ) = Z i DZ Ti + σ 2 I 4 ,
using the independence of b i and eij . Upon evaluation, the jth diagonal element is
var(b0i + b1i tj + eij ) = D11 + D22 tj2 + 2D12 tj + σ 2 ,
1
(2)
and the (j, j 0 ) off-diagonal element is
cov(b0i + b1i tj + eij , b0i + b1i tj 0 + eij 0 ) = D11 + D22 tj tj 0 + D12 (tj + tj 0 ).
(d) For the situation in (a), as noted above, the covariance matrix and associated correlation
matrix exhibit the compound symmetric structure, where the covariance matrix has the same
variance at all time points. For (c), clearly, the induced covariance matrix does not reflect a
stationary correlation structure; the elements depend on the actual values of the times (ages)
tj and not just on differences of them. Accordingly, even though the times are equally spaced,
an autoregressive type model would not seem to be a very good approximation. Similarly,
at first look, a compound symmetry model would also seem not to be a good approximation,
as clearly the off-diagonal elements of var(Y i |gi ) change with tj . A completely unstructured
model would of course be capable of capturing the pattern of correlation; however, this would
not take account of the specific structure, with the covariances depending on the tj in this
particular way. One- and higher-order dependent models have correlations that taper off to
zero, which is not consistent with the form of the covariance matrix.
Thus, it seems that trying to approximate the induced overall pattern of correlation using one
of the “standard” models discussed in Chapter 2 is difficult to justify. In situations where the
variance of b0i is orders of magnitude larger than that of b1i , the induced correlation structure
can look approximately compound symmetric as in (a). In this situation, D11 >> D22 , and if
D22 is very small (close to 0), it is often the case that the covariance D12 can also approach
0, which would be the case if D22 were identically equal to zero (in which case there would
be no correlation at all because b1i would be a fixed constant (0). We discuss this further in
Chapter 6 of the course.
2. Nonlinear models.
(a) We first find µ(t) = E{µi (t)}. With µi (t) = β0i exp(−β1i t), β01 = β1 + b1i and β1i = β1 + b1i ,
we have that
µi (t) = (β0 + b01i ) exp(−β1 t) exp(−b1i t)
= β0 exp(−β1 t) exp(−b1i t) + b0i exp(−β1 t) exp(−b1i t),
so that
E{µi (t)} = β0 exp(−β1 t)E{exp(−b1i t)} + exp(−β1 t)E{b0i exp(−b1i t)}.
(3)
Thus, we must evaluate the two expectations on the right hand side when
D11 D12
b i ∼ N (0, D), D =
.
D12 D22
Consider E{exp(−b1i t)}. From above, b1i ∼ N (0, D22 ). In general, if X ∼ (µ, σ 2 ), it has
moment generating function
Z ∞
sX
E(e ) =
esx (2πσ 2 }−1/2 exp{(x − µ)2 /(2σ 2 )} dx.
−∞
This integral can be computed analytically by “completing the square,” see, for example,
https://onlinecourses.science.psu.edu/stat414/node/153, resulting in
E(esX ) = exp(µs + σ 2 s2 /2).
2
Applying this here with µ = 0, σ 2 = D22 , and s = −t, we have immediately that
E{exp(−b1i t)} = eD22 t
2 /2
.
Now consider E{b0i exp(−b1i t)}. Write this as
E[E{b0i exp(−b1i t)|b0i }] = E[E{b0i E{exp(−b1i t)|b0i }].
Because b i ∼ N (0, D), it follows by standard results for the conditional distributions of jointly
normal random variables that the distribution of b1i given b0i is
−1
−1
2
b1i |b0i ∼ N (D12 D11
b0i , D22 − D12
D11
).
−1
Thus, again using the moment generating function of a normal with µ = D12 D11
b0i , σ 2 =
2 D −1 , and s = −t,
D22 − D12
11
2
−1
−1
−1 2
2
E{exp(−b1i t)|b0i } = exp{−D12 D11
D11
b0i t + (D22 − D12
)t /2} = e(D22 −D12 D11 )t
2 /2
−1
e−D12 D11 b0i t .
It follows that
−1
2
E{b0i exp(−b1i t)} = e(D22 −D12 D11 )t
2 /2
E(b0i eb0i u ),
−1
t.
u = −D12 D11
Now, with b0i ∼ N (0, D11 ),
E(b0i e
b0i u
Z
∞
)=
Z−∞
∞
=
−∞
b0 exp(b0 u)
exp{−(2D11 )−1 b02 } db0
(2πD11 )1/2
b0
exp{−(2D11 )−1 (b02 − 2D11 b0 u)} db0
(2πD11 )1/2
(4)
We can complete the square in the exponential function to obtain
2 2
b02 − 2D11 b0 u = (b0 − D11 u)2 − D11
u ,
so that
exp{−(2D11 )−1 (b02 − 2D11 b0 u)} = exp(D11 u 2 /2) exp{−(2D11 )−1 (b0 − D11 u)2 }.
Substituting in (4), we have
E(b0i e
b0i u
)=e
D11 u 2 /2
Z
∞
−∞
b0
exp{−(2D11 )−1 (b0 − D11 u)2 } db0 .
(2πD11 )1/2
(5)
The integral in (5) is the expectation of a N (D11 u, D11 ) random variable and thus equals
D11 u. It follows that
E(b0i eb0i u ) = eD11 u
2 /2
−1
2 /2
D11 u = eD11 (−D12 D11 t)
−1
2 2
D11 (−D12 D11
t) = −D12 t exp{D12
t /(2D11 )}.
We can now combine with the above. Substituting all of these results in (3) yields
2 /2
− e−β1 t e(D22 −D12 D11 )t
= β0 e−β1 t eD22 t
2 /2
− D12 teD22 t
= e−β1 t+D22 t
2 /2
2
−1
E{µi (t)} = β0 e−β1 t eD22 t
(β0 − D12 t)
3
2 /2
2 /2
2
−1 2
/2
D12 teD12 D11 t
Thus,
µ(t) = (β0 − D12 t)e−β1 t+D22 t
2 /2
(6)
and with µi (t) = µ(t) + Bi (t), we can write
Bi (t) = β0i e−β1i t − e−β1 t+D22 t
2 /2
(β0 − D12 t).
Thus, although the form of the deviation depends on b0i in a linear fashion, it depends on b1i
in a rather complicated (nonlinear) way through β1i .
(b) Clearly, from the form of µi (t), β1i is the individual-specific rate of decay, and its mean β1
in the population has the interpretation as being the “typical” value of rate of decay. From
(6), which is the overall population mean of the response at t, the way in which this mean
changes with t is no longer straightforward; there is not a constant linear rate of decay (on the
log scale) with time. Clearly, it is thus not possible to interpret β1 in this model as the “rate of
decay” of the overall population mean response trajectory. The way in which the population
mean response “decays” over time is dictated by both β1 and D22 in a complicated way.
This example thus illustrates the contention in the notes that, when models are nonlinear,
it is not possible to interpret β1 either way. This emphasizes that one must be clear about
whether one is interested in subject-specific or population-averaged type inference.
(c) This is a similar situation to that described in the answer to 1(d) above. If D22 = 0, then
effectively β1i = β1 for all i. Thus, if D22 = 0 and thus D12 = 0, the expressions for µ(t) and
Bi (t) in (a) reduce to
µ(t) = β0 e−β1 t
and
Bi (t) = β0i e−β1 t − e−β1 t β0 = b0i e−β1 t ,
and
µi (t) = β0 e−β1 t + b0i e−β1 t .
(d) Here, β1 does have the dual interpretations. It is the “typical” rate of decay in the population because in fact the individual-specific rates of decay do not vary in the population,
so that everyone has the same rate. As the expression for µ(t) shows, it also has the interpretation as the rate of decay leading to the “typical” or overall population mean response.
It should be clear from the above that this follows because, under these conditions, µi (t) is
linear in the random effect b0i .
3. (a) If the pattern of change in mean response is the same in each group, it must be that
the difference in mean response at each time j must be the same between groups, so that
the mean response profiles over time are parallel. Thus, the means µ`j must satisfy for
j = 1, ... , 4
µ1j − µ2j = a12 , µ1j − µ3j = a13 , µ2j − µ3j = a23 ,
where a12 , a13 , a23 are fixed constants that are the same for all j. Of course, any two of the
above conditions implies the third. This is of course the situation that is represented by the
statistical model (3.3) under the usual null hypothesis of no group by time interaction or of
parallelism.
(b) Here,


µ11 µ12 µ13 µ14
M =  µ21 µ22 µ23 µ24  .
µ31 µ32 µ33 µ34
4
If we take

1
0
0
 −1
1
0 

U=
 0 −1
1 
0
0 −1

(or equivalently let U be another (4 × 3) matrix whose columns define appropriate contrasts
of all pairs of means), and let
1 −1
0
C=
1
0 −1
(or equivalently another (2 × 3) matrix whose rows define appropriate contrasts), we obtain


µ − µ12 µ12 − µ13 µ13 − µ14
1 −1
0  11
µ21 − µ22 µ22 − µ23 µ23 − µ24  .
MU =
1
0 −1
µ31 − µ32 µ32 − µ33 µ33 − µ34
This matrix contains differences of pairs of means at different time points for each group
(rows). Premultiplying by C gives
(µ11 − µ12 ) − (µ21 − µ22 ) (µ12 − µ13 ) − (µ22 − µ23 ) (µ13 − µ14 ) − (µ23 − µ24 )
(µ11 − µ12 ) − (µ31 − µ32 ) (µ12 − µ13 ) − (µ32 − µ33 ) (µ13 − µ14 ) − (µ33 − µ34 )
or, equivalently,
(µ11 − µ21 ) − (µ12 − µ22 ) (µ12 − µ22 ) − (µ13 − µ23 ) (µ13 − µ23 ) − (µ14 − µ24 )
(µ11 − µ31 ) − (µ21 − µ32 ) (µ12 − µ32 ) − (µ13 − µ33 ) (µ13 − µ33 ) − (µ14 − µ34 )
If the conditions in (a) hold, then any element of the first row equals a12 − a12 = 0, and any
element of the second row equals a13 − a13 = 0, so that CMU = 0. This problem forced
you to write out one of these expressions in detail and simply verify that the hypothesis of
parallelism or no group by time interaction can be written this way.
4. Protein diets and growth of chicks. Here is a plot of the data; the chick-specific weight profiles
and overall sample means in each group appear to behave approximately like straight lines
but perhaps with some curvature for some chicks. The steepness of the weight gain appears
somewhat different for the different diets.
5
Diet 1
Diet 2
Diet 3
Diet 4
300
200
Weight (g)
100
300
200
100
5
10
15
20
5
10
15
20
Days
These data are balanced and but not equally spaced. Nonetheless, you could have used
the methods in Section 2.5 to do an informal examination of the correlation structure and
variance. The apparent pattern of correlation and covariance seems very different in each
group and certainly not compound symmetric. Overall variance increases markedly over
time, which can be seen visually in the plot of the data. Thus, the validity of the univariate
repeated measures analysis of variance you did is highly suspect, as it assumes compound
symmetry, constant variance over time, and the same covariance structure for all groups. In
fact, the Mauchly’s criterion strongly rejects Type H covariance structure. The multivariate
analysis (if you did it) does not require compound symmetry but still requires a common pattern of covariance, so it, too, is suspect. However, the evidence seems overwhelming that
chick weights over time are different under the different diets and the true pattern of change
is also different (the test of parallelism strongly rejects the null hypothesis). You should have
stated formal hypotheses that address these issues. The specialized tests obtained from
the orthogonal polynomial transformation that decomposes the overall hypothesis of parallelism in linear, quadratic, cubic,, and quartic components suggest that there is evidence
that there is a difference among the fou groups in the linear component. Evidence of differences in curvature is not overwhelming in the quadratic component. These tests are valid
even if compound symmetry does not hold, but they do require a common covariance structure across groups, which seems shaky here, so it is not appropriate to interpret these test
too precisely. But they probably do reflect what the plots of sample means for each group
suggest, namely that the steepness of the profiles may be different.
5. Treatment of children exposed to lead. Here is a plot of the data. The individual-specific
lead profiles and overall sample means appear to be very different between placebo and
6
succimer, with the former being relatively flat, as one might expect given that the placebo
should have no effect on lead levels, and the latter showing a decline followed by a rebound.
Thus, the visual evidence is compelling that succimer seems to lower blood levels relative to
placebo, at least initially, and that there is a difference in the patterns of change.
Placebo
Succimer
40
Lead Level (mcg/dL)
30
20
10
0
2
4
6
0
2
4
6
Week
The data are balanced but the observation times are not equally spaced. Examination of the
estimated covariance and correlation matrices in each group suggests that the correlation
pattern might be approximately compound symmetric within the placebo group. Withint the
succimer group, the pattern of correlation is a bit more varied, although given the relatively
small sample size, it is not out of the question that the estimated overall correlation matrix
could exhibit such a pattern if the true correlation structure really is compound symmetric. It
is not clear, however, if the true correlation pattern is the same in both groups. The overall
variances in the placebo group are relatively constant over time, whereas those in the succimer group are much larger at weeks 1, 4, and 6 are much larger than that at baseline (week
0). It is questionable whether or not the assumptions underlying univariate and multivariate
repeated measures analyses of variance hold here.
From the spaghetti plots, there are some unusual observations in the succimer group. One
child’s trajectory drops considerably at 6 weeks relative to the value at week 4, while most
other children in this group show a rise. This could possibly be an error, but we don’t know
for sure. There are also a few other children in this group whose pattern is different from
the majority of the others, including one child who starts out with a relatively high lead level,
which drops substantially at week 1 but then rises to the highest observed values at weeks
4 and 6. Another child who starts with a high level has week 1 level that increases, while
7
almost all other children stay flat or show a decrease. It would be interesting to carry out
analyses deleting unusual values to gauge their effect on results, but this would destroy the
balance. You may have deleted entire children with unusual patterns to assess their influence
on results.
The univariate hypothesis of parallelism is strongly rejected, which, despite the fact that the
model assumptions do not hold, probably is reflecting the obvious apparent difference in
the mean lead level patterns over time. Clearly, the visual evidence suggests that succimer
does initially lower blood lead levels, but they show signs of rebounding, while those in the
placebo group stay flat. This is reflected in the specialized tests of the linear and quadratic
components of the orthogonal polynomial transformation (which may not reliable because
the covariance structure does not seem the same in each group). The test of the linear
component has a F statistic of 0.53 with p-value 0.47, suggesting insufficient evidence of
difference. This is reflecting the fact that, if one envisions a straight line trajectory drawn
through each plot, both woud look relatively flat. The test of the quadratic component yields
a very small p-value, which obviously is reflecting that the mean profile for the placebo group
shows no real curvature, while the one for succimer clearly does. The same is true for the
cubic component.
Clearly, things are very different in each group. A statistical model that represents this explicitly and embodies more realistic assumptions about covariance would allow us to confirm
that.
8