Including Covariates in Models

STAT 5200 Handout #27
ANCOVA: Including Covariates in Models (Ch. 17)
Up until now, all of our models have used experimental factors (A) whose levels have
been assigned (at random) to experimental units. Sometimes, however, experimental
units have additional characteristics (not assigned to them at random) that may affect the
response variable (Y). Such characteristics are generally considered uncontrolled
nuisance variables; we refer to them as “covariates”, and some literature uses the term
“concomitant variables”.
Analysis of Covariance (ANCOVA) involves adding these variables (X) to our model in
an appropriate way. They are usually treated similar to blocking factors (due to lack of
randomization), and including them has a similar variance reduction (as results from
blocking). We generally add them as linear effects (with “slope” parameter β):
ANOVA:
Yij = µ + Ai + εij
ANCOVA (assuming additive effects):
– note that this assumes equal slopes
ANCOVA (allowing interaction):
– note that this allows nonparallel slopes
Yij = µ + Ai + β Xij + εij
Yij = µ + Ai + βi Xij + εij
Extensions:
o for designs other than completely randomized design
o for mixed models
o for quadratic (or other non-linear) covariate effects
Note: A covariate must be observed before treatment application, or it may in fact be a
secondary response variable. Simply including a secondary response variable in the
model will cause interpretation to suffer; see text section 17.2.
Example: In an experiment studying treatments for leprosy, a pre-treatment score of
leprosy bacilli was recorded, and then subjects were randomly assigned to one of three
drugs (antibiotics ‘a’ and ‘d’, control ‘f’). After a certain length of time on treatment, the
leprosy patients were again scored on leprosy bacilli.
data drugtest;
input Drug $ PreTreatment PostTreatment @@;
datalines;
a 11 6 a 8 0 a 5 2 a 14 8 a 19 11
a 6 4 a 10 13 a 6 1 a 11 8 a 3 0
d 6 0 d 6 2 d 7 3 d 8 1 d 18 18
d 8 4 d 19 14 d 8 9 d 5 1 d 15 9
f 16 13 f 13 10 f 11 18 f 9 5 f 21 23
f 16 12 f 12 5 f 12 16 f 7 1 f 12 20
;
1
/* Fit ANOVA model */
proc glimmix data=drugtest;
class Drug;
model PostTreatment = Drug;
run;
Type III Tests of Fixed Effects
Effect
Drug
Num DF Den DF F Value
2
27
Pr > F
3.98
0.0305
/* Fit ANOVA model of difference */
data drugtest; set drugtest;
Diff = PostTreatment - PreTreatment;
proc glimmix data=drugtest
class Drug;
Type III Tests of Fixed Effects
model Diff = Drug;
run;
Effect Num DF Den DF F Value Pr > F
/* Note this assumes
2
27
2.42 0.1078
Drug
beta==1 */
/* Fit ANCOVA model */
proc glimmix data=drugtest plots=residualpanel;
class Drug;
model PostTreatment = Drug PreTreatment;
title1 'ANCOVA Model';
run;
/* Consider transformation */
data drugtest; set drugtest;
Post1 = PostTreatment + 1;
proc transreg data=drugtest;
model boxcox(Post1 / lambda=-1 to 1 by 0.05) =
class(Drug) identity(PreTreatment);
title1 'Box-Cox on response';
run;
2
data drugtest; set drugtest;
newPost = sqrt(PostTreatment);
newPre = sqrt(PreTreatment); /* Do this to preserve scale;
can run TRANSREG with this
to ensure sqrt still okay
for PostTreatment */
run;
proc glimmix data=drugtest plots=residualpanel;
class Drug;
model newPost = Drug newPre;
output out=out1 pred=newPosthat;
title1 'ANCOVA Model on sqrt scale';
run;
ANCOVA Model on sqrt scale
Type III Tests of Fixed Effects
Effect
Num DF Den DF F Value
Pr > F
Drug
2
26
1.27
0.2988
newPre
1
26
41.91
<.0001
data
if
if
if
proc
out1; set out1;
Drug='a' then PredA = newPosthat;
Drug='d' then PredD = newPosthat;
Drug='f' then PredF = newPosthat;
sort data=out1; by newPre; /* To make connected
lines go left-to-right
in plot */
proc sgplot data=out1;
scatter x=newPre y=newPost / group=Drug;
series x=newPre y=PredA /
lineattrs=(pattern=thindot thickness=5);
series x=newPre y=PredD /
lineattrs=(pattern=longdash thickness=2);
series x=newPre y=PredF /
lineattrs=(pattern=solid thickness=2);
3
xaxis label='Square Root of PreTreatment Score';
yaxis label='Square Root of PostTreatment Score';
title1 'ANCOVA Model: Leprosy Data';
run;
/* Consider Interaction */
proc glimmix data=drugtest plots=residualpanel;
class Drug;
model newPost = Drug | newPre;
output out=out1 pred=newPosthat;
title1 'Interaction ANCOVA Model on sqrt scale';
run;
Interaction ANCOVA Model on sqrt scale
Type III Tests of Fixed Effects
Effect
Num DF Den DF F Value
Pr > F
Drug
2
24
0.09
0.9164
newPre
1
24
36.02
<.0001
newPre*Drug
2
24
0.13
0.8750
4
data out1; set out1;
if Drug='a' then PredA = newPosthat;
if Drug='d' then PredD = newPosthat;
if Drug='f' then PredF = newPosthat;
proc sort data=out1; by newPre;
proc sgplot data=out1;
scatter x=newPre y=newPost / group=Drug;
series x=newPre y=PredA /
lineattrs=(pattern=thindot thickness=5);
series x=newPre y=PredD /
lineattrs=(pattern=longdash thickness=2);
series x=newPre y=PredF /
lineattrs=(pattern=solid thickness=2);
xaxis label='Square Root of PreTreatment Score';
yaxis label='Square Root of PostTreatment Score';
title1 'Interaction ANCOVA Model: Leprosy Data';
run;
(More on such quantitative predictors in STAT 5100 Linear Regression and Time Series)
5