ST 790, Homework 2 Solutions
Spring 2017
1. Finite-sample properties of estimators in a population-averaged model. (a) I did my simulation in R using the gls() function. I calculated the correct model-based standard errors and
the robust standard errors using the homegrown function robust.cov().
(b) Here are summaries of the results I got:
##
##
##
##
##
##
Monte Carlo Means
Method beta_0G
1
OLS 17.30816
2
AR1 17.30591
3
CS 17.30816
4
UN 17.31659
##
##
##
##
##
##
Monte Carlo Standard Deviations
Method beta_0G beta_0B beta_1G
1
OLS 1.46852 1.19054 0.12244
2
AR1 1.44568 1.16737 0.1198
3
CS 1.46852 1.19054 0.12244
4
UN 1.52283 1.20724 0.12641
##
##
##
##
##
##
Monte Carlo Mean
Method beta_0G
1
OLS 1.6263
2
AR1 1.40116
3
CS 1.12255
4
UN 1.31126
of Model-based Standard Errors
beta_0B beta_1G beta_1B
1.34846 0.14488 0.12013
1.16178 0.11753 0.09745
0.93077 0.08884 0.07366
1.08724 0.11012 0.0913
##
##
##
##
##
##
Monte Carlo Mean
Method beta_0G
1
OLS 1.3411
2
AR1 1.32394
3
CS 1.3411
4
UN 1.26387
of Robust Standard Errors
beta_0B beta_1G beta_1B
1.15341 0.11262 0.09696
1.13978 0.11132 0.09574
1.15341 0.11262 0.09696
1.09565 0.10631 0.09196
##
##
##
##
##
##
Monte Carlo MSE Ratios
Method MSE This/MSE AR1
1
OLS
0.966
2
AR1
1
3
CS
0.966
4
UN
0.914
beta_0B
16.28057
16.27785
16.28057
16.28067
beta_1G
0.47933
0.4791
0.47933
0.47894
beta_1B
0.77929
0.77944
0.77929
0.77895
beta_1B
0.09883
0.09707
0.09883
0.10015
(c) The Monte Carlo (MC) averages are all very close to the true values, which probably
reflects that they are indeed all consistent estimators for the true value of β. In fact, it
appears that the estimators are all approximately unbiased for this finite sample size.
1
(d) The MSE ratios for all three estimators based on an incorrect covariance model are less
than 1, demonstrating the loss of efficiency predicted by the theory. However, for OLS and
CS (which give identical estimates for reasons we will deduce later), the MSE ratio is pretty
close to 1, suggesting that, in this particular scenario, there is not much loss of precision
for estimating beta. The worst case is when one uses an unstructured model. Of course, in
principle, this model is not “incorrect,” as it subsumes the true AR(1) model. The problem
with it is that it uses many more parameters to characterize the covariance structure than
are necessary – 4(5)/2 = 10 parameters (4 variances and 6 correlation parameters) when
the true covariance structure is characterized by only 2 parameters. So, here, the loss of
efficiency of about 9% (which is nontrivial) is likely due to the fact that the need to estimate
all those additional parameters is degrading the quality of the estimation. Recall that the
1st order asymptotic theory says that, at least in large samples (large m), there should be
no penalty paid for estimating covariance parameters relative to knowing them, even for an
incorrect model. That obviously isn’t the case here! This probably reflects that this optimistic
theoretical result may not be relevant in finite samples.
(e) Here is the ratio of the MC SDs to the average of the model-based standard errors:
##
##
##
##
##
1
2
3
4
Method beta_0G beta_0B beta_1G beta_1B
OLS
0.903
0.883
0.845
0.823
AR1
1.032
1.005
1.019
0.996
CS
1.308
1.279
1.378
1.342
UN
1.161
1.11
1.148
1.097
and here is the same for the robust standard errors:
##
##
##
##
##
1
2
3
4
Method beta_0G beta_0B beta_1G beta_1B
OLS
1.095
1.032
1.087
1.019
AR1
1.092
1.024
1.076
1.014
CS
1.095
1.032
1.087
1.019
UN
1.205
1.102
1.189
1.089
Under the true AR(1) model, the model-based standard errors do a pretty good job of approximating the standard deviation of the true sampling distribution of the estimator for beta
(the ratio of the MC SD to the average of SEs is right around 1), suggesting that the asymptotic theory approximation is working pretty well in this sample size. However, when the
assumed covariance structure is not correct (OLS, CS) or is overparameterized (UN), the
model-based standard errors can overstate or understate the true sampling variation, sometimes substantially. This is consistent with what we might expect; the model-based formula
is based on assuming that we have the correct model, and with OLS/CS we don’t. And for
UN, the 1st order theory does not account for the additional variation due to estimating the
covariance parameters, so that the estimated SEs also understating the true sampling variation for that reason. The robust SEs, on the other hand, fix up things pretty well for OLS and
CS, as the ratio of the true SD and the average of robust standard errors is closer to 1 for all
components. For UN, things are about the same, which makes sense, given that in this case
the ”meat” and bread” of the ”sandwich” should be approximately the same, so not changing
very much.
2
2. REML estimating equations. We show that a summand of (5.56) has conditional expectation
(given x̃) zero when the model is correctly specified by brute force. Note that you were
specifically asked to show that a summand has conditional expectation zero, so showing
that the entire estimating equation has expectation zero is NOT a satisfactory answer to this
problem.
b not β; thus we need to maintain β
b in (5.56)
Remember, the REML equation depends on β,
when showing that a summand has expectation zero. So consider the ith summand, which
can be written as, with ∂/∂ξk V i = V ξi ,
b T V −1 V ξi V −1 (Y i − X i β)
b − tr(V −1 V ξi ) + tr{(X T V −1 X )−1 X T V −1 V ξi V −1 X i }. (1)
(Y i − X i β)
i
i
i
i
i
i
We now find the conditional expectation of the leftmost term. Note that, with
b = (X T V −1 X )−1 X T V −1 Y ,
β
b = 0. So from the result for
it is straightforward to observe by substitution that E(Y i − X i β)
−1
the expectation of a quadratic form in Appendix A, letting A = V −1
i V ξi V i , the expectation
we want is equal to
b
b T
tr[ E{(Y i − X i β)(Y
i − X i β) |x̃}A ].
b
b T
b
So we need to find E{(Y i − X i β)(Y
i − X i β) |x̃}, the covariance matrix of (Y i − X i β). Note
b
that we can write, adding and subtracting X i β in (Y i − X i β),
b
b T
(Y i − X i β)(Y
i − X i β)
= {(Y i − X i β) − X i (X T V −1 X )−1 X T V −1 (Y − X β)}{(Yi − X i β) − X i (X T V −1 X )−1 X T V −1 (Y − X β)}T
= (Y i − X i β)(Y i − X i β)T − (Y i − X i β)(Y − X β)V −1 X (X T V −1 X )−1 X Ti
−X i (X T V −1 X )−1 X T V −1 (Y − X β)(Y i − X i β)T
T
+X i (X V
−1
−1
X)
T
X V
−1
T
(Y − X β)(Y − X β) V
(2)
−1
T
X (X V
−1
X)
−1
X Ti
The expectation of the first term in (2) is V i , and the expectation of the last term can be seen
to be
X i (X T V −1 X )−1 X Ti .
(3)
For first middle term in (2), observe that (Y i − X i )(Y − X β)T is a (ni × N) matrix of the form
{(Y i − X i β)(Y 1 − X i β)T , ... , (Y i − X i β)(Y i − X i β)T , ... , (Y i − X i β)(Y m − X m β)T }.
Clearly, the conditional expectation of this matrix is, by independence across i,
(0, ... , V i , ... , 0).
(4)
Now V −1 is a block diagonal matrix with V −1
i , i = 1, ... , m, on the diagonal, and X is the
(N × p) matrix with the X i , i = 1, ... , m, stacked. It can then be seen that premultiplying V −1 X
by (4) yields X i (ni × p), so that the first middle term in (2) is equal to (3). Likewise, as the
second middle term in (2) is just the transpose of the second, its expectation is also equal to
(3). Combining, we thus obtain
T −1
b
b T
E{(Y i − X i β)(Y
X )−1 X Ti ,
i − X i β) |x̃} = V i − X i (X V
(5)
so that the expectation we want is
−1
−1
−1
V ξ i ) − tr{(X T V −1 X )−1 X Ti V −1
tr[ {V i − X i (X T V −1 X )−1 X Ti }V −1
i V ξi V i ] = tr(V
i V ξi V i X i }.
It follows that the conditional expectation of (1) given x̃ is 0, demonstrating the result.
3
3. Chickweights, continued. Here is a plot of the data.
Diet 1
Diet 2
Diet 3
Diet 4
300
200
Weight (g)
100
300
200
100
0
5
10
15
20
0
5
10
15
20
Days
From the plot, although the trend looks approximately like a straight line in each treatment
group, there seems to be a some suggestion of possible curvature. Clearly, the investigators’
question (ii) is directed toward this issue – if the true population mean trajectory for any
treatment group is approximately quadratic rather than a straight line, then the rate of change
is no longer constant. In particular, if the mean for any group is of the form
β0 + β1 t + β2 t 2 ,
then, taking derivatives with respect to t, the rate of change at time t is
β1 + 2β2 t.
(6)
So a way to address the investigators’ questions is to posit a quadratic model for each treatment group and then explore whether or not there is sufficient evidence to suggest that there
is indeed curvature in any of the treatment groups. If there is, then it would make sense to
inform the investigators that there is evidence that the rate of change is not constant for any
treatments for which there is evidence of curvature and, for (iii), provide them with estimates
of the rate of change at selected time points using (6), accompanied by standard errors (of
course).
In the attached programs in SAS and R we fit a quadratic model with different intercepts,
linear, and quadratic effects for each treatment group with different posited structures for the
overall covariance. This model allows for a quadratic mean trajectory that is possibly different
4
for each diet. Because there are missing values, we play it safe and fit using maximum
likelihood as in the discussion in Section 5.6 and use model-based standard errors.
In SAS we are able to fit separate structures for each treatment group, so we use SAS to
determine whether or not there different structures are warranted. This appears to not be
the case; a common unstructured covariance with variances differing over time is preferred
among all the structures considered. So we can use either SAS or R to carry out further
analyses to address the questions. In both cases, we construct standard errors and tests
using the model-based covariance matrix, as there are missing values and the robust version
with an unstructured covariance matrix is really no different.
We did not simplify further to have a common intercept for all three diets on the basis of
randomization, but you may have. Of course, it is also possible to work with an alternative
parameterization of the model, treating one of the diets as the reference group. We fit the
model both ways in the attached program.
From the output of either program, we see that there appears to be strong evidence of
possible quadratic effects for all diets, and all diets have a positive quadratic coefficient
estimate. In addition, the test of whether or not the quadratic effects differ for at least one
pair of diets is highly significant, suggesting evidence that the nature of the curvature is
not the same across diets. See the programs for estimates and tests based on the overall
quadratic model.
You should have written out a formal model and stated your assumptions about the model,
discussed the steps you took to settle on an appropriate working covariance structure assumption, formalized and addressed the investigators’ questions in terms of your model, and
provided standard errors for any estimates you provided to the investigators.
4. Age-related macular degeneration trial. Figure 5.4 shows the spaghetti plots by treatment.
The visual impression is that visual acuity show an overall decreasing mean trend over the
study period for each treatment. There is some suggestion of possible curvature, at least for
the active treatment, so we include quadratic effects in our initial models. We allow different
intercepts to investigate the integrity of the randomization initially as well.
It is critical to recognize that here, because there are missing observations, we must be
willing to believe that these data are missing according to a missing at random mechanism
and that multivariate normality of the visual acuity measures given covariates holds. In
practice, we would have the opportunity to discuss this with the investigators and based this
assumption on the substantive issues. From the discussion in Section 5.6 of the notes, we
use maximum likelihood (ML) methods to carry out the analyses and model-based standard
errors.
See the attached programs for initial fits of these models with several plausible covariance
structures; because the times are not equally spaced, the only structures we consider are unstructured, compound symmetric, and exponential. Although visual inspection of the fully unstructured estimates for each group is suggestive that common compound symmetry might
be a reasonable approximation, a common unstructured covariance matrix is the preferred
structure from SAS based on AIC and BIC, so we adopt this for more fine-tuned analyses.
You may have investigated other models and may have made a different decision, which is
fine.
In the attached SAS and R programs, evidence supporting quadratic effects is not strong,
so we scale back to linear models. There is also no evidence supporting different intercepts,
5
as we’d hope for a randomized study, so we assume a common intercept term as well. We
fit fancier models that allow the common intercept and slope in each treatment group to
depend on the categorical covariate lesion severity. There is strong evidence that baseline
mean visual acuity is associated with lesion severity, but fits of models allowing slopes to
depend on lesion severity do not suggest evidence of an association. So we decide on a
final linear model with intercepts different by lesion severity. In this model there is evidence
that the assumed constant rate of change over the study period in mean visual acuity is
different for the two groups and is in fact steeper for the treatment group.
You may have chosen a different model. You should have given a rationale for the model
you finally adopted and then stated that model and how you formalized and addressed the
questions of interest within it.
6
© Copyright 2026 Paperzz