Model Diagnostics for Cox Regression

Statistical Methods in Epi II (171:242)
Model Diagnostics for Cox Regression
Brian J. Smith, Ph.D.
March 5, 2003
Model Diagnostics for Cox Regression
Outliers
An outlier is a data point that is located away from the majority of
the data. Outliers are a concern in any analysis and most easily
illustrated in the context of linear regression.
1
2
y
3
4
Influential Outlier
1
2
3
4
x
0
1
2
y
3
4
Outlier
1.0
1.2
1.4
1.6
x
1
1.8
2.0
Typically, in survival analyses, outliers are found among long
survivors. In general, individual subjects are outliers if they fail very
early or very late with respect to other subjects having similar
characteristics.
Goal: Detect outliers or influential observations that significantly
impact the estimates in our Cox regression model.
Residuals
In linear regression, residuals are simply computed as the observed
response variables minus the predicted values. We can then plot
the residuals to check that they are normally distributed. They can
also be plotted against covariates not included in the model to
explore possible relationships that are not accounted for. We would
like to perform comparable residual analyses in the Cox regression
setting. However, here we are modeling the hazard rate
 t, x i   0 t exp βx i 
which is not directly observable. As a result, the construction of
residuals is more involved. In fact, there have been many different
methods proposed to compute residuals for Cox regression. We
will discuss two:
1. Martingale Residuals
2. Deviance Residuals
Martingale Residuals
The Martingale residual can be explained as the difference between
the number of events (0 or 1) occurring for the ith individual during
follow-up and the number expected under the model. These
residuals are used primarily to identify patterns in the data that are
not explained by the model.
2
Lymphoma Example:
In the lymphoma study we found an interaction between the method
of bone marrow transplant and disease type. However, a linear
effect for waiting time to transplant did not appear to be significant
(p = 0.1000). Thus, we might propose the following model:
 1autologous   2 hodgkins

 .


autologous

hodgkins


karnofsky
3
 12

 t; x   0 t exp 
As a result of excluding the waiting times, there is a substantial
change in our parameter estimates
Covariate
Autologous
Hodgkins
AutologousHodgkins
Karnofsky
Wait
Coefficients
Wait Included
Wait Excluded
0.6394
0.5327
2.7603
1.6831
-2.3709
-1.6526
-0.0495
-0.0547
-0.0166
-
We may be concerned about the difference in our estimates, or the
investigators may be puzzled by the lack of an effect for waiting
times. Whatever the reason, we can use the Martingale residuals
to explore the relationship between waiting times and survival.
3
1.0
0.5
0.0
Martingale Residuals
-0.5
-1.0
0
50
100
150
Waiting Times (Days)
The plot indicates that there is a nonlinear effect for this covariate.
Notice that the residuals are predominantly less than zero after
about 70 days; otherwise the residuals are more evenly scattered
about zero. Consequently, we might try the indicator variable
Variable
Wait70
Levels
0 = Wait < 70
1 = Wait  70
N
36
7
Percents
84%
16%
SE
0.5922
0.8294
0.0801
0.0122
0.3737
p-value
0.27
0.0093
0.011
0.0001
0.042
The resulting parameter estimates are
Covariate
Autologous
Hodgkins
AutologousHodgkins
Karnofsky
Wait70
Coefficient
0.6470
2.7455
-2.5242
-0.0543
-0.7585
4
Deviance Residuals
Deviance residuals are defined so as to generate results that tend
toward the standard normal distribution. These serve as the semiparametric analog to the residuals utilized in linear regression. The
deviance residuals are plotted against the values of the linear
predictor in the Cox regression model. Observations that deviate
from the specified model will result in relatively large values for the
deviance residuals. Thus, these residuals are useful in detecting
outliers and points in the data that are not adequately described by
the model.
Breast-Feeding Example:
Recall that we arrived at the final model for the breast-feeding
example
 1white   2black   3 poverty   4 smoke
 .


education
5


 t; x   0 t exp 
0
-1
-2
-3
Deviance Residuals
1
2
Consider the deviance residuals that result in fitting this model
-0.6
-0.4
-0.2
0.0
Linear Predictors
5
0.2
0.4
0.6
Approximately 95% of the deviance residuals would be expected to
fall within the interval (–1.96, 1.96). In this example, only 90.6% of
the residuals are within this range. Hence, the more extreme
residuals might be of some concern. Among the most extreme (the
smallest in this example) residuals are
Time
Event
White
Black
Poverty
Smoke
Alcohol
Care
Age
Education
18
192
1
1
0
0
0
0
0
21
12
Subject
518
96
1
0
0
1
0
0
0
19
8
353
104
1
1
0
1
1
1
0
20
12
594
96
1
1
0
1
1
0
1
18
10
849
120
1
0
0
0
0
0
0
22
12
These are the subjects with the longest periods of breast-feeding.
Delta-Beta Plots
The Delta-Beta plot is one method of checking the influence of each
observation on the estimated model parameters. The idea is to
compare the parameter estimates β̂ obtained from an analysis of
all observations, to the parameter estimates β̂ j obtained from an
analysis excluding the jth observation. This is done for every
observation in the data set and the changes Δ  βˆ  βˆ j are then
plotted.
Breast-Feeding Example:
Delta-beta plots were constructed by excluding observations oneat-a-time and refitting the final model to obtain the associated
6
changes in the parameters. This was done for each of the 927
subjects in the data set. The changes are plotted against the
observation numbers as follows
black
0.02
-0.02
200
400
600
800
0
200
400
Subject
Subject
poverty
smoke
600
800
600
800
200
400
600
800
0.000
-0.010
0
200
Subject
0.002
beta - beta(j)
0.004
education
0
400
Subject
-0.002 0.000
0
-0.020
-0.03 -0.02 -0.01 0.00
beta - beta(j)
0.010
0.01 0.02
0
beta - beta(j)
0.00
beta - beta(j)
0.02
0.00
beta - beta(j)
0.04
0.04
white
200
400
Subject
7
600
800
One of the subjects appears to have a relatively large delta-beta
value in the plots for the white, black, and poverty covariates. This
is subject 849, whose covariate values are
Subject
849
Time
120
Event
1
White
0
Black
0
Poverty Smoke
0
0
Educ
12
Thirteen weeks is the average length of breast-feeding for subjects
with the same characteristics. Thus, at 120 weeks, this subject
breast-fed for a much longer period of time. Only one subject
continued to breast-feed for more than 120 weeks.
Testing the Proportional Hazards Assumption
A key assumption in the proposed Cox model is that of proportional
hazards. This is seen in the hazard ratio equation
 t , x1 
 exp βx1  x 2 
 t , x 2 
 exp 1  x11  x12      K  x K 1  x K 2 
.
 exp 1  x11  x12    exp  K  x K 1  x K 2 
For instance, a unit increase in the x1 covariate is associated with a
hazard ratio of
 t , x1  1
 exp 1 
 t , x1 
which yields a rate that is constant across time. However, we may
want to test this assumption. For example, in the leukemia example
we had plotted the log-log transformed Kaplan-Meier curves as a
graphical check of proportionality between the hazard rates for the
placebo and 6-MP groups.
8
1
0
-1
log(-log S(t))
-2
-3
0
5
10
15
20
Weeks
A formal test of proportionality with respect to one of the covariates
can be performed with a Cox regression model. Consider the
following model
 t;x i   0 t exp 1 x1i  g t x1i   2 x2i     K xKi 
where g(t) is a function of time. The resulting hazard ratio for a unit
increase in x1 is
 t , x1  1
 exp 1  g t 
 t , x1 
which is not proportional across time if   0. Thus, a formal test of
proportionality can be carried out by fitting this Cox model and
testing if the  parameter is significant. Common choices of g(t)
include g(t) = log(t) and g(t) = t. It may be necessary to center the
covariate about its mean in fitting this model.
9
Leukemia Example:
The Leukemia trial has but one covariate, the treatment group. For
the regression model, define the indicator variable
Variable
Drug
Levels
0 = Placebo
1 = 6-MP
N
21
21
Percents
50%
50%
and fit the model
 t; x i   0 t exp   drug    log( t )  drug 
The resulting test of proportional hazards is
Covariate
log(t)Drug
p-value
0.63
Therefore, the assumption of proportional hazards across treatment
groups is not rejected (p = 0.63) and we need not include the time
interaction in the linear predictors for the model.
10