baseline hazard - People Server at UNCW

• We’ll now look at the relationship between a survival
variable Y and an explanatory variable X; e.g., Y
could be remission time in a leukemia study and X
could be white blood cell count. X is sometimes
called the covariate or the regressor variable.
• Often there are more than just one X variables so we
write XT =(X1, … Xp) when there are p explanatory
variables. (T=transpose). We write Yx for the
response Y when X=x.
• Def 8.1: Let Yx denote the response depending on an
observed vector X=x. A proportional hazards model
for Yx is hx(y)=h0(y)g1(x), where g is a postive function
of x and h0(y) is called the baseline hazard and
represents the hazard function for an individual
having g1(x)=1. Often g1(x)=exp(x1+…+pxp)
• Note how the “proportional” enters the picture (see p.
144 for definitions):
hX 1 (y)
h0 (y)g1(X1) g1 (X1 )


hX 2 (y) h0 (y)g1(X 2 ) g1 (X 2 )
• The two hazards are for two different individuals,
distinguished by the values the explanatory variables
take on for them…note that the “baseline” hazard
cancels out
• In the simplest case, we work with the situation
where the g1 function is exp(x1+…+pxp) - it
satisfies the properties g1(x) ≥ 0 and g1(0) = 1 and
the baseline hazard occurs when x=0. The process
of fitting this model follows the usual process of
finding the best estimates of the beta values…
• Then the standard proportional hazards model in Def. 8.1
becomes:
hx(y)=h0(y) exp(x1+…+pxp)
• Then the baseline hazard is when x=0 (all covariates=0)
• We’ll then estimate the betas using the given responses
and covariates…
NOTE: The hazard on the left equals the product of two
functions: the baseline hazard (which doesn’t involve the
covariates) and the other factor (which doesn’t involve the
survival time y).
• This is called the Cox proportional hazards model and good
estimates of the betas and the hazard and survival curves
can be obtained in many different and varied situations ;
i.e., this model is very robust. It is called semiparametric
since we don’t have to assume a particular model for the
survival function.
• Let’s look Example 8.1, where there is only one
covariate, namely “group” (usually control and
experimental are the only two values). The
proportional hazard (or the hazard ratio) is
hX 1 (y) h0 (y)exp(  (1))

 exp(   0)  exp(  )
hX  0 (y) h0 (y)exp(  (0))
• So, if we could get an estimate of call it -hat), we
could then have an estimate of the hazard ratio
between two individuals in the two groups ; i.e.,
exp(-hat) so we could say that
ˆ )h (y)
hX 1(y)  exp( 
X0

• Note on page 145 in (8.3) that the proportional
hazards model has a so-called “power” effect on the
baseline survival function:
exp(  1 x1 ...   p x p )
Sx (y)  S0 (y)
y
• Here
S0 (y)  exp(  h0 (u)du)
0

• Example 8.1 shows the effect of a single covariate
X=group:
exp(  )

S1 (y)  S0 (y)
• Notice also that the ratio of two hazards cancels out
the baseline hazard and leaves a function that is
constant over time.

• SAS has a procedure that easily estimates the betas
in the proportional hazards model - for example, in
the remission times data:
proc phreg;
model remtime*censor(0)=grp; run;
/* or if we put a second covariate in */
proc phreg;
model remtime*censor(0)=grp logWBC;
run;
/*note the use of the numeric variable
grp defined as grp=1 if group=“pl” and
0 otherwise… */
• Now let’s consider the remission data example in
more detail…get the SAS output for the 3 models:
– grp only (model 1)
– grp and logWBC (model 2)
– grp, logWBC, and interaction term grp*logWBC (model 3)
• For each model, we’ll do three things:
– do a statistical test of the null hypothesis beta=0
– get an estimate of the hazard ratio for each beta
– get a 95% confidence interval for the hazard ratio
• There are two statistics we can compute to do a
significance test of the betas:
– the Wald statistic is the quotient of the estimator (beta-hat)
divided by the standard error of the estimator. This statistic is
approximately standard normal and the p-value is obtained
from the normal table.
– the second statistic is the so-called likelihood ratio (LR)
statistic and is used to compare the models
• Use this statistic to compare model 3 with model 2;
i.e., is the interaction term significant?
– the Wald statistic is -.342/.520 = -.66. The null hypothesis
being tested is that beta=0 (for the coefficient of the
interaction term) Use the normal table to see that 2*P(Z<.66)=2(.2554)=.5108
– the LR statistic is computed as the difference between LRs
of the two models, LR(model 2) - LR(model 3) = 144.559 144.131 = .428. Now consider this as chi-square with 1 d.f.
(one parameter difference between the two models) under
the null hypothesis that the interaction term has coefficient
zero and we have P(chisq(1) > .428) = .513
•
•
Notice that in each of the three printouts, there is a
section giving values of a three test statistics testing
the so-called “Global Null Hypothesis: BETA=0” . In
this case, the BETA=0 refers to the vector of all the
1   2  ...   p  0 The likelihood
betas:
ratio chi-square statistic is obtained from the two 2LOG(L) statistics subtracted (the one w/out
covariates {no x’s} minus the one with covariates). If
the null hypothesis is true, then this chi-square will

have d.f. equal to the number of covariates in the
model.
This same difference in log(likelihoods) can be used
to compare any two models - the statistic is chisquare with the number of d.f. is the difference in #
of covariates, assuming the null hypothesis of the
“extra” betas = 0 is true.
• Now let’s look at the HRs in each of the three models…
• In model 1, the HR is estimated to be 4.523 (from SAS).
Let’s see how this is done… we’ve seen that
hX 1 (y) h0 (y)exp(  (1))

 exp(   0)  exp(  )
hX  0 (y) h0 (y)exp(  (0))

so if X=1 is the placebo group, then the maximum
likelihood estimate of beta = 1.50919 (from SAS), so
exp(1.50919) = 4.523066 is the estimated hazard ratio.
This means that the hazard for an individual in the
placebo group is more than 4.5 times greater than an
individual in the treatment group (at all times) ignoring
logWBC.
• Consider Model 2’s hazard ratios…
hX 1& logW BC (y) h0 (y)exp(1.29405(1)  1.60432log WBC)

 3.647529
hX  0& logW BC (y) h0 (y)exp(1.29405(0)  1.60432log WBC)
and
hX 1& logW BC1 (y) h0 (y)exp(1.29405(1)  1.60432(log WBC  1))

 4.974476
hX 1& logW BC (y)
h0 (y)exp(1.29405(1)  1.60432(log WBC))

If we had a significant interaction term the estimated HR
could be
hX 1& logW BCint (y) h0 (y)exp(2.35494(1)  1.80279(log WBC)  .34220 *1* log WBC)

hX  0& logW BCint (y) h0 (y)exp(2.35494(0)  1.80279(log WBC)  .34220 * 0 * log WBC)
• To get confidence intervals around the estimated
HRs, we use the Wald statistic +/- 1.96 * SE(Wald) to
get confidence intervals for the beta-hats - then
exponentiate the interval to get Cis for the HRs.
• To get the adjusted survival curves for the two groups
(adjusted for the covariates - i.e., use the model 2),
we use the baseline option in proc phreg
proc phreg; model remtime*censor(0)=grp
logWBC; title “Model 2”;
baseline out=a survival=s upper=ucl
lower=lcl ;
proc print data=a; run; quit;
• To get the adjusted survival curves for specific values
of the covariates, first create a dataset with the
values you want to consider and then use the
covariate option as follows:
…
data b; grp=1; logWBC=2.93; run;
…
proc phreg data=remission;
model remtime*censor(0)=grp logWBC;
baseline out=a survival=s upper=ucl
lower=lcl covariates=b/nomean;
proc print data=a; run; quit;