Proc PHreg plots, checking proportional hazards assumption

Lecture 27
1. Parallel features in Logistic and PHreg
2. Plots from PHreg
3. Graphical checks of proportional hazards assumption
4. Testing proportional hazards assumption
5. Time-varying explanatory variables
1
Breast cancer example
Study in 1987 compared survival times of women diagnosed with breast cancer
divided into two groups: staining test of biopsy tissue positive or negative.
Data from Collett (2003) Example 1.2.
2
Kaplan-Meier curves:
Proc Lifetest
data=breast_cancer
plots =(survival(atrisk=0 to 200 by 50) ;
time
surv_months * died(0) ;
strata
positive_stain;
PHreg: proportional hazards regression
Proc PHreg
data = breast_cancer;
class positive_stain;
model
surv_months * died(0)
= positive_stain
/ risklimits ties=efron;
The response is specified in the same way as for Proc Lifetest.
3
Analysis of Maximum Likelihood Estimates
Parameter
positive_stain
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
0.90933
0.50089
3.2957
0.0695
Analysis of Maximum Likelihood Estimates
Parameter
positive_stain
Hazard
Ratio
95% Hazard Ratio
Confidence Limits
2.483
0.930
6.626
Hazard rate in the positive-stain group was estimated to be 2.5 times greater than in the
negative-stain group, although this did not reach significance (p = .0695).
4
Parallel features of Proc PHreg and Proc Logistic
In comparing two groups, PHreg compares ordering of subjects’ times to event.
Logistic compares proportions of subjects who had the event.
Can treat survival to a specific time, e.g. 30-day survival, as an event in logistic
regression if there is no censoring: every subject’s 30-day survival status is known.
• Both have CLASS statements, and specify interactions with * in the MODEL.
• Both regressions are on log scale, so we back-transform:
exp(regression coeficient)
5
• Class variable A .
Logistic: exp(Ø̂i ) is the odds ratio comparing i -th level of A to reference level.
PHreg : exp(Ø̂i ) is the hazard ratio or relative risk comparing i -th level of A to
reference level.
• Continuous variable X :
Logistic: exp(Ø̂ X ) is the odds ratio corresponding to a 1-unit increase in X ,
comparing those with X = x + 1 to those with X = x.
PHreg: exp(Ø̂ X ) is the hazard ratio corresponding to a 1-unit increase in X ,
comparing those with X = x + 1 to those with X = x.
6
• Odds ratio and hazard ratio are main effects. When model includes an
interaction, age * treatment, no odds ratios or hazard ratios are given.
You can get comparisons by specifying levels of the predictors:
Logistic: ODDSRATIO treatment;
will give separate odds ratios for age in each treatment group
PHreg: HAZARDRATIO treatment;
will give separate hazard ratios for age in each treatment group
• Both procedures will do automatic step-wise model reduction.
7
Plots from PHreg
PHreg will produce plots of estimated survival function for specified values of
covariates. Useful when there are several important predictors, and we want to
show their effects on survival function.
1. Create new dataset with one observation for each set of covariate values, with a
label
2. Turn on ODS graphics
3. Request plots in the Proc PHreg statement, and call the specifications dataset in
the BASELINE statement
8
Breast-cancer example: Kaplan-Meier estimates of survivor curves for two groups:
staining test of biopsy tissue positive or negative.
9
To get estimates of survivor curves for two groups from Proc PHreg, create new
dataset with one observation for each set of covariate values, with a label.
In this simple example, we specify only the stain groups:
data specifications;
input positive_stain
label $ ;
length label $10. ;
cards;
1 positive
0 negative
;
The labels will be used in the plot legend.
10
Then request plots in the Proc PHreg statement, and call the specifications dataset
in the BASELINE statement.
ODS graphics on;
proc PHreg data=breast_cancer plots(overlay)=(survival cumhaz);
class positive_stain;
model surv_years * died(0)= positive_stain
/ risklimits ties=efron;
baseline covariates=specifications / rowid = label;
run;
ODS graphics off;
overlay — draw both group curves on the same plot
cumhaz — cumulative hazard (sum of baseline hazard function values from t = 0)
11
Is this the same as the Kaplan-Meier plot from Proc Lifetest?
12
Estimated survivor function that Proc PHreg plots is the common baseline survivor
function, applying specified hazard ratio:
©
™exp(Ø̂x)
Ŝ 0(t )
where the baseline survivor function is derived from a smoothed cumulative
baseline hazard
∑ Zt
∏
Ŝ 0(t ) = exp ° h 0(u)d u
0
The curves have events at the same times because they are based on the common
baseline survivor function, Ŝ 0.
13
Hazard is the time-specific event rate. Usually plotted as cumulative hazard,
because this is smoother:
14
Graphs to check whether hazards are really proportional
Proportional hazards assumption: ratio of hazards is constant and does not
depend on time:
h A (t )
= r.
h B (t )
When this assumption fails, it is because the hazard ratio changes over time.
Connection to survivor function:
©
™r
If h A (t ) = r h B (t ) then S A (t ) = S B (t )
Depending on whether r > 1 or r < 1, S A (t ) must always be above or below S B (t ),
respectively.
Either way, S A (t ) and S B (t ) cannot cross.
15
Proc Lifetest makes 3 graphs that provide visual checks of this assumption:
Proc Lifetest
plots=(SURVIVAL LOGSURV
LOGLOGS);
As we have seen, SURVIVAL plots estimated survivor functions.
If they cross, then hazard changes over time.
16
Stomach Cancer
Breast Cancer
17
LOGSURV plots the cumulative hazard function(s) H (t ) = ° log S(t )
If hazards are proportional, then larger cumulative hazard should be a multiple of
smaller: H A = r HB .
Breast cancer example:
18
Stomach cancer example, where survivor curves crossed:
19
°
¢
©
™
LOGLOGS gives a plot of log cumulative hazard log H (t ) = log ° log S(t )
If hazards are proportional, then LOGLOGS plot will show parallel curves:
log H A = c + log HB .
Breast cancer example:
20
LOGLOGS for the stomach cancer example, where survivor curves crossed:
21
Testing the proportional hazards assumption
The proportional hazards assumption is that the ratio of hazards is a constant that
does not depend on time:
h A (t )
= r.
h B (t )
When this assumption fails, it is because the hazard ratio changes over time.
To test this, add predictor for group*time interaction.
Evidence that group*time interaction is not zero is evidence against proportional
hazards.
22
Breast cancer example: groups are positive_stain = 0, 1,
response time = surv_months.
group*time interaction:
positive_stain * surv_months?
Interaction combines response and predictor!
Predictors that change with time are defined inside PHreg not in a DATA step.
Proc PHreg data=breast_cancer;
class positive_stain;
model surv_months * died(0) = positive_stain PS_time
/ risklimits ties=efron;
PS_time = positive_stain * surv_months;
23
Breast cancer example:
Parameter
Standard
DF
Estimate
Error
Chi-Square
Pr > ChiSq
positive_stain 0
1
-1.88112
0.98093
3.6775
0.0551
PS_time
1
-0.01371
0.01070
1.6412
0.2002
Parameter
Null hypothesis of test for interaction: hazards are proportional.
No evidence against proportional hazards.
24
Stomach Cancer
Breast Cancer
25
In the stomach cancer example, time is years
proc PHreg
data=pubh.stomach_cancer ;
class group;
model years*censor(1) = group group_time / risklimits ties=efron;
group_time = group * years;
Variable
DF
group
group_time
1
1
Parameter
Estimate
Standard
Error
-1.11806
0.78008
0.39591
0.27731
Chi-Square
Pr > ChiSq
7.9752
7.9129
0.0047
0.0049
Interaction is highly significant, strong evidence against proportional hazards.
26
Time-varying predictor: Alport mice example
Compare survival of two groups of mice (“offspring 129” and “French”) with a
genetic kidney disease.
Kidney function measured from urine at 2, 4, and 6 months after birth.
All mice who survived to 6 months censored.
Need a predictor that changes at 2, 4, and 6 months, like time-varying predictor in
the test for non-proportional hazard.
27
Proc PHreg data= Alport;
class group ;
model surv_days*early_death(0) = pr_cr
group
/ risklimits ties=efron;
if (surv_days < 60) then pr_cr=log_pr2;
else if (surv_days < 120) then pr_cr=log_pr4;
else pr_cr=log_pr6;
28
Analysis of Maximum Likelihood Estimates
Parameter
pr_cr
group
French
DF
1
1
Parameter
Estimate
-0.30267
-1.98379
Standard
Error
0.31149
0.64142
Chi-Square
0.9442
9.5653
Pr > ChiSq
0.3312
0.0020
Analysis of Maximum Likelihood Estimates
Parameter
pr_cr
group
French
Hazard
Ratio
0.739
0.138
95% Hazard Ratio
Confidence Limits
0.401
1.360
0.039
0.484
Variable
Label
group French
Is pr_cr (urine protein, a measure of kidney function) associated with survival time?
29