Survival Analysis

Biostatistics*
Survival Analysis
What is survival analysis?
Time origin: admission and randomization to the study, birth…
Endpoint: occurrence of an event
1. The time to failure
— The time to the failure of a physical component (mechanical or electrical)
— The time to the death of a biological unit (patient, animal, etc.)
— Death, disease, relapse, recovery
2. The time to learning of a skill
Incomplete Data:
— Censor: individuals don’t fail during the observation period
— Truncate: individuals are not observation in the study
Censor
▪ Right censoring
▪ Left censoring
▪ Doubly censoring
▪ Interval censoring
▪ Truncation
— Survival functions and Hazard function
Let T be the random variable of survival time ( T ≥ 0) with probability density
function (pdf) f(t) and distribution function (cdf) F(t).
▪ The survival function S(t)
S(t) = 1- F(t) = P( T > t ).
▫ When T is continuous random variable
∞
dS (t )
S(t) = ∫ f (u )du Æ S(t) = f (t ) = −
;
x
dt
▫ T is discrete random variable
p j = P (T = t j ) , j = 1,2,…, where t1 < t2 < .
The survival for a discrete random variable T
S(t) = ∑ t >t p j ,
j
Thus S(t) is a nonincreasing, left-continuous, step function.
▫ Shapes of survival function (continuous and discrete)
S(t)
S(t)
1
1
0
t
0
t
*
內容參考:
1 戴政、江淑瓊:生物醫學統計概論.
2 Rosner B., “Fundamentals of Biostatistics,” 6th edition. 2006.
3 Kutner, M. H., Nachtsheim, C. J. and Neter, J., 2006, “Applied Linear Regression
Model 5th”.
Biostatistics*
▪ The hazard function λ (t)
1
λ (t ) = lim P (t < T < t + dt T > t )
dt → 0 dt
f (t )
, λ (t ) ≥ 0 .
S (t )
▫ Integrating λ (t) and get the cumulative hazard function
t
t
t
f (u )
H(t) = ∫ λ (u )du = ∫
du = - ln [1- F (u )] = -log [1-F(t)] = -ln S(t)
0
0 1 − F (u )
0
▫ When T is continuous random variable, then, λ (t ) =
which leads to the important expression S(t) = e − H ( t )
▫ When T is discrete random variable
pj
λ (t j ) = P (T = t j T ≥ t j ) =
S (t j −1 )
∵ p j = S (t j −1 ) − S (t j ) Æ λ (t j ) = 1 −
S (t j )
S (t j −1 )
Note that S(t) can be written as the product of conditional survival
probabilities and the relationship between S(t) and λ (t) is
S (t j )
S(t) = ∏
= ∏ ⎡⎣1 − λ (t j ) ⎤⎦ ; H(t) = ∑ λ (t j ) .
S (t )
t j ≤t
j −1
t j ≤t
H ′(t ) = − ∑ ln ⎡⎣1 − λ (t j ) ⎤⎦ ~ H(t)
t j ≤t
( λ (t) is small, S(t) = e − H ( t ) ).
t j ≤t
▫ Shapes of hazard function
▫ Relationship between Survival and Hazard Function
▪ Mean Residual Life Function and Median Life
▫ Mean Residual Life Function
▫ The median lifetime
— Nonparametric Method in Survival Analysis
▪ One-Sample Nonparametric Methods
▫ Estimations of a Survival Function for uncensored survival data
# of subjects with survival time >t
S (t ) =
# of subjects in the data set
▫ Estimations of a Survival Function for censored survival data
When T is discrete. Suppose that T can take on values t j ( t1 < t2 <
)
λ (t ) = P{expiring in the interval (t, t+dt)| survived past time t}
S(t) = ∏ ⎡⎣1 − λ (t j ) ⎤⎦
t j ≤t
*
內容參考:
1 戴政、江淑瓊:生物醫學統計概論.
2 Rosner B., “Fundamentals of Biostatistics,” 6th edition. 2006.
3 Kutner, M. H., Nachtsheim, C. J. and Neter, J., 2006, “Applied Linear Regression
Model 5th”.
Biostatistics*
Ordered Failure # of Failures in # of Censored in
Time: t( j )
t( j ) : d j
t( j ) , t( j +1) : c j
(
)
Risk Set:
R (t ( j ) )
( )
t( 0 ) = 0
d0 = 0
cj
R t(0)
t(1)
d1
c1
R t(1)
t( k )
dk
ck
R t( k )
( )
( )
Risk Set: R (t( j ) ) is the set of individuals for whom T ≥ t( j )
Estimating methods: Life-table method; Kaplan-Meier method.
Two samples comparison: Log-rank test; Wilcoxon test.
— Life-table method: estimating survivor function for grouped survival data
1. Partition the time axis into m intervals, I j = (t j , t j +1 ] , j =0,1,…,m-1. The length
of the jth interval is h j = t j +1 − t j ; t0 : zero (the origin of the time axis), tm :
the upper limit of the observation times.
2. To calculate number of failures, number of censored subjects and number of
subjects alive and uncensored in each interval.
n′j : Number of survivors at the beginning of the jth interval.
c j : Number of censored subjects in the jth interval.
d j : Number of failures that occurred in the jth interval.
Where n0′ = N (sample size), n′j +1 = n′j - c j - d j , where j= 0, 1,…, N.
3. To define the number of patients at risk during I j : assume the occurrence of
censoring follows uniform distribution. Average number of subjects at risk
in I j : n j = n′j − (c j 2) . The conditional probability of failure in the interval
Ij:
can be estimated by d j n j and the conditional survival probability can
be estimated by 1-( d j n j ).
4. S(t) can be written as the product of conditional survival probabilities and the
estimator of survival probability in time t ( t j ≤ t < t j +1 ) is
j
S (t ) = ∏ (1 − d k nk ) .
k =1
*
內容參考:
1 戴政、江淑瓊:生物醫學統計概論.
2 Rosner B., “Fundamentals of Biostatistics,” 6th edition. 2006.
3 Kutner, M. H., Nachtsheim, C. J. and Neter, J., 2006, “Applied Linear Regression
Model 5th”.
Biostatistics*
— Kaplan-Meier method (Product-Limit Estimator): estimate survivor function for
ungrouped data. Suppose failure and censored times {t1 , t2 , , t N } are known on
N subjects in a random sample.
1. Let t(1) < t(2) < < t( j ) < t( m ) be the distinct ordered failure times observed
among N subjects ( m ≤ N ). Partition the time axis into m+1 intervals,
I j = (t( j ) , t( j +1) ] , j =0,1,…,m. t0 : zero (the origin of the time axis), tm : the
upper limit of the observation times.
2. To calculate number of failures, number of censored subjects and number of
subjects alive and uncensored
n j : Number of subjects at risk at t( j ) . c j : Number of censored subjects at t( j ) .
d j : Number of failures which occurred at t( j ) .
3. The conditional probability of failure in (t( j ) − ∆t , t( j ) ] can be estimated by
d j nj
and the conditional survival probability can be estimated by
1-( d j n j ). If the censoring time and the failure time are equal then we
assume that the occurrence of censoring is right after the failure.
Number of Number of
Conditional survival
survivors
death
probability beyond the interval
1-( d j n j )
(t ( j ) − ∆ t , t ( j ) ]
nj
dj
(t( j ) , t( j +1) − ∆t ]
0
nj -d j
1
4. The estimator of survival probability in time t where t( j ) ≤ t < t( j +1) is as below:
j
S (t ) = ∏ (1 − d k nk ) .
k =1
— Two-Sample Nonparametric Methods
H 0 : S1 (t ) = S2 (t ) for each t > 0 vs. H1 : Not H 0
Data: Pool the two groups and then ordered the distinct death times
t(1) < t(2) < < t( j ) < t( m )
The two-sample test at a fixed time point t( j ) , data can be conducted a 2*2 table
Group
# of death at t( j )
# of survivors beyond t( j )
# at risk at t( j )
1
d1 j
n1 j - d1 j
n1 j
2
d2 j
n2 j - d 2 j
n2 j
Total
dj
nj -d j
nj
*
The global test based on combining all 2*2 tables for each failure time to test
H 0 : S1 (t ) = S2 (t ) for each t > 0 vs. H1 : Not H 0
*
內容參考:
1 戴政、江淑瓊:生物醫學統計概論.
2 Rosner B., “Fundamentals of Biostatistics,” 6th edition. 2006.
3 Kutner, M. H., Nachtsheim, C. J. and Neter, J., 2006, “Applied Linear Regression
Model 5th”.
Biostatistics*
Methods:
— For sparse data: the Fisher’s exact test
Under H 0 : S1 (t( j ) ) = S 2 (t( j ) ) , d1 j ~ Hypergeometric distribution
Given the fixed marginal d j , n j - d j , n1 j , n2 j , the conditional probability of
observing d1 j is
⎛ d j ⎞⎛ nj − d j ⎞
⎜⎜ ⎟⎟ ⎜⎜
⎟⎟
⎝ d1 j ⎠ ⎝ n1 j − d1 j ⎠
Where E( d1 j )= e1 j =
n1 j d j
nj
, V( d1 j )= v1 j =
⎛ nj ⎞
⎜⎜ ⎟⎟ ,
⎝ n1 j ⎠
n1 j n2 j d j (n j − d j )
n 2j (n j − 1)
— For large data: Use the Chi-Square test statistic to test H 0 : S1 (t( j ) ) = S 2 (t( j ) )
▪ The Log-rank test statistic
(
)
U L = ∑ j =1 d1 j − e1 j ; VL = ∑ j =1 v1 j
m
Under H 0 , WL = U L
m
VL , WL2 ~ χ12
▪ The Wilcoxon test statistic
(
)
UW = ∑ j =1 n j d1 j − e1 j ; VW = ∑ j =1 n 2j v1 j
m
Under H 0 ,
WW2
m
UW2
=
~ χ12
VW
*
內容參考:
1 戴政、江淑瓊:生物醫學統計概論.
2 Rosner B., “Fundamentals of Biostatistics,” 6th edition. 2006.
3 Kutner, M. H., Nachtsheim, C. J. and Neter, J., 2006, “Applied Linear Regression
Model 5th”.