intro to survival analysis

HSRP 734:
Advanced Statistical Methods
July 3, 2008
Objectives



Describe the situations under which this method
would be useful
Describe censored data
Describe the survivor function, the hazard
function, and their relationship
What is Survival Analysis?

Survival analysis is a collection of statistical
procedures for data analysis for which the
outcome variable of interest is time to an event.
What do we mean by Time?


Length of follow-up till the event of interest
occurs
Follow-up can start at (for example)
1.
2.

Registration into a clinical trial
Time of employment
Age of the individual at the time of the event
What do we mean by Event?






Usually we mean death – thus the name
“survival” analysis
Relapse
Disease incidence
Can also be a positive event
Discharge from psychiatric counseling
Normalization of WBC count
Why not just use
what we already know?

Means, t-tests, and linear regression

Counts and chi-sq tests, and logistic regression
Censored Data

In survival analysis, an observation consists of two
random components


Observed variable represent time (e.g., actual time until death,
or time until last follow-up)
Bernoulli random variable (0,1) for whether the observation
is censored or not – 1 if we observed a failure, 0 if we have a
censored observation
Censored Data

When we have only incomplete information about the
exact survival time due to a random factor



Non-informative censoring – whether an observation is
censored or not is independent of the value of the
observation.
Informative censoring – whether an observation is censored
or not is dependent on the value of the observation
We will require non-informative censoring mechanisms.
If censoring is informative, then these methods will
generate biased results.
Types of censoring



Right censoring – true survival time is greater
than what we observed
Left censoring – true survival time is less than
what we observed
Interval censoring – subjects are not observed
continuously and we only know the event
happened between time A and time B (e.g.,
annual testing of partner of an HIV+ individual)
Three common reasons for
right censoring



Person does not experience the event before the
study ends
Person is lost to follow-up during the study
period
Person withdraws from the study because of
death (if death is not the outcome of interest) or
some other reason (e.g., adverse drug reaction)
What does the data look like?
1
drop out
2
3
4
end of study
5
0
5
10
15
20
What does the data look like?
ID
Time Status
1
20
0
2
14
1
3
17
0
4
13
0
5
5
1
Survival Distribution

The probability distribution of the survival
times can be described in five different,
equivalent ways:
1.
2.
3.
4.
5.
probability density function
cumulative distribution function
survival function = 1 - cumulative distribution
function
hazard function
cumulative hazard function
Survival Distribution




Distribution of times to event – called “survival
times,” even when the “event” is not “death”
Let T = survival time (T ≥ 0)
t = specified value for T
Survival times follow a continuous distribution
with times ranging from zero to infinity
Ordinary methods for estimating and comparing
continuous distributions cannot be used with
survival data due to the presence of censoring
Probability Density Function f (t )
1
f (t )  lim P[t  T  t  t ]
t 0 t

Difficult to estimate density directly because of
censoring – histogram not direct estimate of f(t)
Cumulative Distribution Function
F (t )
t
F (t )  P[T  t ]   f ( s )ds
0
Survival Function S (t )

t
t
0
S (t )  Pr[T  t ]   f (u )du  1   f (u ) du  1  F (t )



Monotone nonincreasing function
S(0) = 1
S(+∞) = 0
Hazard Function λ(t )

Instantaneous death rate at time t, given alive at time t
Pr(t  T  t  t | T  t )
 (t )  lim
t  0
t
Prob event in (t, t  t ) given survived to t
 lim
t  0
t


Other names for hazard function include:
Force of Mortality, Incidence Function
Rates — Probability / unit time (sec-1, years-1)
Hazard Function λ(t )




So, you survived to time t, what is the probability that
you survive another increment of time t?
Now standardize this conditional probability to a per
unit of time.
As unit of time gets very small, goes to 0, this
conditional probability becomes an instantaneous rate.
Some simple features of h(t)


h(t) takes on values in the interval (0, ∞)
h(t) could be instantaneously increasing, decreasing, or
constant
Hazard Function λ(t )
Cumulative Hazard Function Λ(t )
t
 (t )    ( s)ds   ln( S (t )
0
Survival Distribution




Any one of these five functions is enough to
specify the survival distribution. There exists an
equivalence relationship between the them.
The most important models for survival analysis
are about hazard rate λ(t)
When λ(t) is high, S(t) decreases faster.
When λ(t) is low, S(t) decreases slower.
Example
Consider a clinical trial in patients with acute
myelogenous leukemia (AML) comparing two
groups of patients: no maintenance treatment
with chemotherapy (X=0) vs. maintenance
chemotherapy treatment (X=1)
 Demographic and other clinical variables
present
 Time to relapse is of interest

 Event
= relapse
Example
Group
Weeks in remission -- ie,
time
to relapse
Maintenance
chemo (X=1)
9, 13, 13+, 18, 23, 28+, 31,
34, 45+, 48, 161+
5, 5, 8, 8, 12, 16+, 23, 27,
30+, 33, 43, 45
No maintenance
chemo (X=0)
+ indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse
Grouped data/life table analysis
Divide the time period into intervals
appropriate for the data
– use more intervals in periods of changing
incidence
 For each person, tally time spent at risk
(person-years) in each interval
 Tally the events in each interval

Grouped data/life table analysis

Estimate the incidence rates (hazard rates)
as the ratio of the number of events to the
total time at risk in an interval:
# of events
ˆ

person -time
Example cont.
Maintained on chemo
Not maintained on chemo
Interval
Events
Person-time
(weeks)
Events
Person-time
(weeks)
0-5
0
55
2
60
5-10
1
54
2
46
10-15
1
46
1
37
15-20
1
38
0
31
20-25
1
33
1
28
25-30
0
28
1
22
30-35
2
20
1
13
35-40
0
15
0
10
40-45
0
15
2
8
45-50
1
8
-
-
50+
0
111
-
-
Survival function


The “Survival Function” is defined as
S(t) = Pr (Survived beyond time t)
For example, suppose t = end of follow-up time bin 3
S(t) = Pr (Survived > t)
= Pr (survived through bin 1 and
survived through bin 2 and
survived through bin 3 )
= Pr(survived bin 1) x
Pr(survived bin 2 given survived bin 1) x
Pr(survived bin 3 given survived bin 1 and bin 2)
Survival function


Calculate probabilities of surviving through bin j of follow-up time by finding
the complement of the probability of dying in bin j
Pr (Survived bin j) = 1 - Pr(died in bin j)
Pr ( “Die” in bin j ) is approximated by
# Events in Bin j
yj
y j Lj
Average number of people at risk in Bin j
Pj 


Length of Bin j
N j Lj
Nj
where
yj = # of events in bin j
Nj = time at risk (person-time) in bin j
Lj = length of bin j (must be small for the approximation to
work well)
Survival function

Then, use Pj , the probabilities of dying in bin j, to estimate
the survival function S(t):
 y j Lj 
Sˆ (t )   1  Pr(Die in j)   1  Pj    1 

j 1
j 1
j 1 
N j 

t

t
t
The calculations needed for Sˆ (t ) , the estimated survival
function, are usually organized into a “life table”, as
follows:
Sj = Pr ( Survived beyond the end of bin j)
S0 = 1
Survival function
Maintained on chemo
Not maintained on
Chemo
j
Lj
Nj
yj
1-Pj
Sj
Nj
yj
1-Pj
Sj
1
5
55
0
1
1
60
2
0.83
0.83
2
5
54
1
0.91
0.91
46
2
0.78
0.65
3
5
46
1
0.89
0.81
37
1
0.86
0.56
4
5
38
1
0.87
0.70
31
0
1
0.56
5
5
33
1
0.85
0.60
28
1
0.82
0.46
6
5
28
0
1
0.60
22
1
0.77
0.36
7
5
20
2
0.5
0.30
13
1
0.62
0.22
8
5
15
0
1
0.30
10
0
1
0.22
9
5
15
0
1
0.30
8
2
0*
0
10
5
8
1
0.38
0.11
-
-
-
0
11
111
111
0
1
0.11
-
-
-
0
Survival function


Trouble with follow-up time bins that are too wide:
1-Pj = 1-yi Li /Ni = 1-(10/8) = -0.25
Work-around: set the probability, 1 - Pi, to zero whenever the
estimate is negative
To display the estimated survivor,
plot Sˆ (t ) vs. t
— For grouped data:
Plot Sˆ (t ) at the end of each time interval connecting the
points with line segments (not steps like Kaplan-Meier)
At time=0, plot Sˆ (t ) =1
Survival function
1.0
Maintained on chemo
Not Maintained on chemo
0.8
0.6
0.4
0.2
0.0
0
10
20
30
Weeks
40
50