A General Class of Parametric Models for Recurrent Event Data

A General Class of Parametric
Models for Recurrent Event Data
Russell Stocker
[email protected]
University of South Carolina
Research Supported by NSF Grant DMS 0102870 and NIH Grant GM056182
A General Class of Parametric Models for Recurrent Event Data – p.1/37
Overview of Talk
Overview of Reliability and Survival Analysis
Basics
Recurrent Events and Examples
General Class of Models
No Frailty Case
Frailty Case
Data Example
Future Work
A General Class of Parametric Models for Recurrent Event Data – p.2/37
Reliability and Survival Basics
Observe n units over a time period [0, t∗ ]
Ti is the lifetime random variable for unit i
F (t) = P (T ≤ t) (Cumulative Distribution
Function)
f (t) =
d
dt F (t)
(Probability Density Function)
Popular Choices for f (t) include Exponential,
Gamma, Log-Normal, Inverse Gaussian, and
Weibull Distributions
A General Class of Parametric Models for Recurrent Event Data – p.3/37
Reliability and Hazard Function
Reliability or Survivor Function
R(t) = 1 − F (t) = 1 − P (T ≤ t) = P (T > t)
Hazard Rate Function
P (t ≤ T < t + ∆t|T ≥ t)
λ(t) = lim
∆t↓0
∆t
λ(t)∆t ≈ probability of a failure in the next instant
A General Class of Parametric Models for Recurrent Event Data – p.4/37
Cumulative Hazard Function
Λ(t) =
Z
t
λ(w)dw
0
How do we connect λ(t), Λ(t), R(t), and f (t)?
A General Class of Parametric Models for Recurrent Event Data – p.5/37
Censoring
The time at which a unit fails may not be observed
in [0, t∗ ]
Units may fail after some time tci (Right
Censoring)
For unit i, Ti > t∗ (Type I Censoring)
The n units may be observed until r units fail
(Type II Censoring)
Right Censoring may be generalized to random
censorship (tci ∼ Ci )
A General Class of Parametric Models for Recurrent Event Data – p.6/37
Recurrent Event Data
Data which occurs when observing a repairable
system. That is a system that when a failure occurs
can be brought to operational condition through an
intervention.
A General Class of Parametric Models for Recurrent Event Data – p.7/37
Examples
Repeated Fixing of Automobiles
Bugs in Software Programs
Recurrence of Tumors
Repeated Incidences of Strokes
Repetition of Conflict in a Geographical Region
Recidivism Rate of Criminals
A General Class of Parametric Models for Recurrent Event Data – p.8/37
Bladder Cancer Data Set
40
20
0
Subject
60
80
Bladder Data
0
10
20
30
40
50
60
calendar time
A General Class of Parametric Models for Recurrent Event Data – p.9/37
General Mathematical Setting
n units in a study
Each ith unit is observed over a time period [0, τi ].
Si0 < Si1 < Si2 < . . . (Calendar Times)
Tik = Sik − Sik−1 (Interoccurence Times)
A General Class of Parametric Models for Recurrent Event Data – p.10/37
Counting and At Risk Processes
†
Ni (s)
=
∞
X
j=1
I(Sij ≤ s, Sij ≤ τi )
The number of events observed for unit i at time s
which are not censored.
Yi† (s) = I(τi ≥ s)
Indicates if unit i is at risk at time s.
A General Class of Parametric Models for Recurrent Event Data – p.11/37
Martingales
Let (Ω, B, P ) be a probability space.
A filtration Fs is a family of increasing sub
σ − fields. (Event History)
A martingale Mi (s) is a stochastic process that
satisfies
1) E(|Mi (s)|) < ∞
2) E(Mi (s)|Ft ) = Mi (t) for any t<s
A martingale is a fair bet
A central limit theorem exists for martingales.
A General Class of Parametric Models for Recurrent Event Data – p.12/37
Class of Models
Model given by Peña and Hollander (2003)
(Ω, B, P ) with filtration Fs (Event History)
Effective Age Process: Let {Ei (s)|0 ≤ s ≤ s∗ } be a
class of observable predictable processes such that
1. Ei (0) = eio , almost surely where eio ∈ <+ ;
2. Ei (s) ≥ 0;
3. On [Sik−1 , Sik ), Ei (s) is monotone and almost
surely differentiable with nonnegative derivative
0
Ei (s).
A General Class of Parametric Models for Recurrent Event Data – p.13/37
Big Picture
Effective Age Process
E1 (s)
6
r
r
0
-
r
r
S11
S12
S13
τ1
-s
A General Class of Parametric Models for Recurrent Event Data – p.14/37
Compensator Process
A†i (s|Zi , Xi , δ) =
Z
s
0
Yi† (w)λi (w|Zi , Xi , δ)dw
where δ = (θ, α, β)t ,
λi (s|Zi , Xi , δ) =
†
t
Zi λ0 (Ei (s); θ)ρ[Ni (s−); α]ψ(β Xi (s)).
A†i (s|Zi , Xi , δ) is called a compensator.
Mi† (s) = Ni† (s) − A†i (s) is a martingale with respect
to Fs
A General Class of Parametric Models for Recurrent Event Data – p.15/37
Estimation Assuming No Frailty
Likelihood Equation
†
Y (0)λi (0)∆s
i
0
†
Y (s1 )λi (s1 )∆s
i
s1
†
Y (s2 )λi (s2 )∆s
i
s2
†
Y (s3 )λi (s3 )∆s
i
s3
s∗
Let αi (s|δ) = Yi† (s)λi (s|δ) where δ = (θ, α, β)t .
L(δ) =
Qn
∆Ni† (w)
(1−∆Ni† (w))
(1 − αi (w|δ))
}
i=1 {w∈[0,s]αi (w|δ)
Rs
Qn
∆Ni† (w)
αi (w|δ)
exp(− 0 αi (w|δ)dw)}
= i=1 {
π
π
w∈[0,s]
A General Class of Parametric Models for Recurrent Event Data – p.16/37
Estimation Assuming No Frailty
l(δ) =
−
U (δ) =
n Z sn
X
i=1
Z s
0
n Z
X
i=1
0
Yi† (w)λi (w|δ)dw
s
0
log(Yi† (w)λi (w|δ))dNi† (w)
∂
†
†
log Yi (w) λi (w|δ) dMi (w)
∂δ
A General Class of Parametric Models for Recurrent Event Data – p.17/37
Calendar/Gap Time Processes
Ei (s, t) = I(Ei (s) ≤ t)
Rs
Ni (s, t) = 0 Ei (η, t)Ni† (dη)
Rs
†
Ai (s, t) = 0 Ei (η, t)Ai (dη)
Mi (s, t) = Ni (s, t) − Ai (s, t) =
Rs
0
Ei (η, t)Mi† (dη)
Mi (·, t) is a martingale with respect to Fs , but Mi (s, ·)
is not a martingale with respect to Fs .
A General Class of Parametric Models for Recurrent Event Data – p.18/37
Notation
Eij−1 (s) = Ei (s)I(Sij−1 < s ≤ Sij )
Υij (s) =
−1
(s)))
ρ[j − 1; α]ψ(β t Xi (Eij−1
−1
0
Eij−1
(Eij−1
(t))
IEij−1 (η, Sij−1 , Sij ) = I(Eij−1 (Sij−1 ) < η ≤ Eij−1 (Sij ))
A General Class of Parametric Models for Recurrent Event Data – p.19/37
Theoretical Results
For i = 1, 2, . . . , n,
Ai (s, t) =
Z
t
Yi (s, η)λ0 (η; θ0 )dη
0
where,
Yi (s, η) =
+IE
PNi† ((s∧τi )−)
†
iN ([s∧τi ]−)
i
j=1
IEij−1 (η, Sij−1 , Sij )Υij (η)
(η, SiN † ([s∧τi ]−) , (s ∧ τi ))ΥiN † ([s∧τi ]−) (η)
i
i
A General Class of Parametric Models for Recurrent Event Data – p.20/37
Theoretical Results
For any s ≥ 0 and t ≥ 0,
n Z t
X
1
Hi (s, w)Mi (s, dw)
W (s, t) ≡ √
n i=1 0
n Z s
X
1
Hi (s, Ei (η))Mi (dη, t).
= √
n i=1 0
A General Class of Parametric Models for Recurrent Event Data – p.21/37
Transformed Score Process
U (δ; s, t)


∇θ log[λ0 (Ei (w); θ)]
†

=
∇α log[ρ[Ni (w−); α]  Mi (dw, t)
i=1 0
∇β log[ψ(β t Xi (w))]


∇θ log[λ0 (η; θ)]
n Z t
X
 ∇α log[ρ[N † (E −1 (η)); α]  Mi (s, dη)
=
i
i
i=1 0
∇β log[ψ(β t Xi (Ei−1 (η))]
Z
n
X s
A General Class of Parametric Models for Recurrent Event Data – p.22/37
Main Asymptotic Results
The score process equation has a solution
(θ̂, α̂, β̂)t which is consistent.
i
√ h
d
t
t
n (θ̂, α̂, β̂) − (θ, α, β) → N (0, Σ)
Σ̂ = I −1 (observed information matrix)
A General Class of Parametric Models for Recurrent Event Data – p.23/37
Simulation Studies
λi (·; θ) = θ1 θ2 (θ1 Ei (s))
θ2 −1 Ni† (s−)
α
exp(β t (Xi ))
θ1 = 1 and θ2 ∈ {.8, 2}
α ∈ {.8, 1, 1.05} and β = (1, −1)t
An item was repaired after a failure with
probability equal to 0.6 (Brown and Proschan
Imperfect Repair Model)
X1 ∼ Ber(0.5) X2 ∼ N (0, 1)
n ∈ {10, 30, 50}
τi ∼Exp(0.1) (Max Events=50)
1500 repetitions for each combination
A General Class of Parametric Models for Recurrent Event Data – p.24/37
Histograms
^
Histogram of α
^
Histogram of α
30
20
Density
20
10
15
Density
10
0.70
0.75
0.80
0.85
0
0
0
5
10
5
Density
25
15
40
30
35
50
^
Histogram of α
0.77
0.79
0.81
0.83
0.76
0.78
0.80
0.82
^ (n=30)
α
^ (n=50)
α
^
Histogram of θ
1
^
Histogram of θ
1
^
Histogram of θ
1
3
2
Density
0.5
1.5
2.5
^ (n=10)
θ
1
3.5
0
0.0
0.0
0.5
1
1.0
1.5
Density
0.5
Density
2.0
1.0
2.5
3.0
1.5
^ (n=10)
α
0.8
1.0
1.2
^ (n=30)
θ
1
1.4
0.8
1.0
1.2
1.4
^ (n=50)
θ
1
A General Class of Parametric Models for Recurrent Event Data – p.25/37
Histograms
^
Histogram of θ
2
^
Histogram of θ
2
8
6
Density
0.7
0.8
0.9
1.0
1.1
0.70
0.80
0.90
0.75
0.80
0.85
0.90
^ (n=30)
θ
2
^ (n=50)
θ
2
^
Histogram of β1
^
Histogram of β1
^
Histogram of β1
2
Density
2.0
1.5
1.0
Density
1.0
0.5
1.0
1.5
^
β1 (n=10)
2.0
0
0.0
0.5
1
0.5
0.0
Density
3
2.5
4
3.0
^ (n=10)
θ
2
1.5
0.6
0
0
0
2
1
2
4
2
4
6
Density
3
Density
4
8
10
5
12
10
6
14
^
Histogram of θ
2
0.8
1.0
1.2
^
β1 (n=30)
1.4
0.8
1.0
1.2
^
β1 (n=50)
A General Class of Parametric Models for Recurrent Event Data – p.26/37
Histograms
^
Histogram of β2
^
Histogram of β2
−1.6
−1.2
−0.8
^
β2 (n=10)
3
0
0
0.0
1
1
0.5
2
2
Density
Density
1.0
Density
3
4
1.5
5
4
2.0
6
5
2.5
^
Histogram of β2
−1.2
−1.0
^
β2 (n=30)
−0.8
−1.2
−1.0
−0.8
^
β2 (n=50)
A General Class of Parametric Models for Recurrent Event Data – p.27/37
Simulated Means
α
θ2
n
µ̂E
θ̄1
θ̄2
ᾱ
β̄1
β̄2
0.80
0.8
10
12.078
1.035
0.823
0.792
1.044
−1.037
0.80
0.8
30
12.194
1.020
0.807
0.797
1.010
−1.010
0.80
0.8
50
12.215
1.012
0.804
0.798
1.003
−1.009
1.00
0.8
10
37.765
1.009
0.807
1.000
1.013
−1.015
1.00
0.8
30
37.856
1.009
0.802
1.000
1.002
−1.002
1.00
0.8
50
37.781
1.000
0.801
1.000
1.001
−1.002
1.05
0.8
10
42.168
1.010
0.806
1.050
1.009
−1.008
1.05
0.8
30
42.165
1.006
0.801
1.050
1.001
−1.001
1.05
0.8
50
42.093
1.000
0.801
1.050
1.002
−1.002
A General Class of Parametric Models for Recurrent Event Data – p.28/37
Standard Error Comparisons
α
θ2
n
µ̂E
σ̂θ1
σ̃θ1
σ̂θ2
σ̃θ2
0.80
0.8
10
12.078
0.290
0.264
0.063
0.062
0.80
0.8
30
12.194
0.138
0.137
0.035
0.034
0.80
0.8
50
12.215
0.106
0.104
0.027
0.026
1.00
0.8
10
37.765
0.164
0.158
0.033
0.033
1.00
0.8
30
37.856
0.088
0.085
0.018
0.019
1.00
0.8
50
37.781
0.065
0.065
0.014
0.014
1.05
0.8
10
42.168
0.156
0.152
0.032
0.031
1.05
0.8
30
42.165
0.082
0.082
0.018
0.017
1.05
0.8
50
42.093
0.064
0.063
0.014
0.014
A General Class of Parametric Models for Recurrent Event Data – p.29/37
Standard Error Comparisons
α
θ2
n
µ̂E
σ̂α
σ̃α
σ̂β1
σ̃β1
σ̂β2
σ̃β2
0.80
0.8
10
12.078
0.022
0.021
0.252
0.238
0.160
0.151
0.80
0.8
30
12.194
0.012
0.011
0.120
0.121
0.076
0.075
0.80
0.8
50
12.215
0.009
0.009
0.089
0.092
0.057
0.057
1.00
0.8
10
37.765
0.004
0.004
0.137
0.132
0.090
0.084
1.00
0.8
30
37.856
0.002
0.002
0.069
0.067
0.043
0.042
1.00
0.8
50
37.781
0.002
0.002
0.052
0.051
0.032
0.032
1.05
0.8
10
42.168
0.004
0.004
0.133
0.124
0.083
0.077
1.05
0.8
30
42.165
0.002
0.002
0.063
0.063
0.040
0.039
1.05
0.8
50
42.093
0.002
0.002
0.049
0.048
0.030
0.030
A General Class of Parametric Models for Recurrent Event Data – p.30/37
Frailty Case
Zi is unobservable
Zi ∼ Gamma(η, η) (E(Zi ) = 1 and Var(Zi ) = η1 )
We must integrate out the Zi s from the full
likelihood
We obtain a marginal partial likelihood
L(δ, η) =
n
Y
i=1
×
(
ηη
Γ(η + Ni† (s))
Rs †
Γ(η) (η + Yi (w)λi (w; θ)dw)η+Ni† (s)
0
†
4Ni† (w)
(Yi (w)λi (w; δ))
∗
π
w∈[0,s ]
A General Class of Parametric Models for Recurrent Event Data – p.31/37
EM Algorithm
Step 1: Give initial guess for η and δ.
Step 2 (E-Step):
Ẑi = E(Zi |η, δ) =
Step 3 (M-Step):
η+Ni† (s)
Rs †
η+ 0 Yi (w)λi (w|δ)dw
M-Step 1: Find δ̂ using E(Zi |η, δ) for Z in partial
likelihood
M-Step 2: Find η̂ by maximizing marginal partial
likelihood
Step 4: Check for convergence and repeat steps 2 and
3 if necessary.
A General Class of Parametric Models for Recurrent Event Data – p.32/37
Visualization of Data
40
20
0
Subject
60
80
Bladder Data
0
10
20
30
40
50
60
calendar time
A General Class of Parametric Models for Recurrent Event Data – p.33/37
Real Data Set
Bladder Cancer Data Set in Wei et. al. (1989)
Times of Recurrence of Bladder Cancer Data for
86 subjects
Fit using Weibull Hazard
ρ[Ni† (s−); α]
=α
Ni† (s−)
Covariates:
X1 indicates treatment (Placebo or Thiotepa)
X2 the size of the largest tumor
X3 number of initial tumors
Ei (s) = s
A General Class of Parametric Models for Recurrent Event Data – p.34/37
Bladder Data Estimates
Parameter Frailty Estimates Non Frailty Estimates
η
0.576
∞
θ1
0.096
0.055
θ2
1.235
0.836
α
0.60
1.013
β1
-0.707
-0.389
β2
-0.026
-0.040
β3
0.280
0.159
A General Class of Parametric Models for Recurrent Event Data – p.35/37
Baseline Survivor Functions
1.0
Placebo vs. Thiotepa Baseline Survivor Functions
0.6
0.4
0.2
0.0
^
S0(s)
0.8
Placebo (F)
Thiotepa (F)
Placebo (NF)
Thiotepa (NF)
0
20
40
60
80
100
s
A General Class of Parametric Models for Recurrent Event Data – p.36/37
Future Work
Construction of Goodness of Fit Tests
Frailty Case
Bayesian Approach
Competing Risks with Masked Failures
Additive Hazards Model
A General Class of Parametric Models for Recurrent Event Data – p.37/37