slides

M ARKED O UTCOME R EGRESSION
R EGRESSION M ETHODS FOR M ARKED
O UTCOMES
Brent A. Johnson
Department of Biostatistics and Bioinformatics
Emory University
13 July 2013
M ARKED O UTCOME R EGRESSION
O UTLINE
M OTIVATION
BACKGROUND
M ETHODS
S IMULATIONS
DATA A NALYSIS
D ISCUSSION
M ARKED O UTCOME R EGRESSION
M OTIVATION
M OTIVATING A PPLICATIONS
In some clinical studies, there may be an interest in an
auxiliary endpoint that is measured at the point of failure, not
merely the time to failure.
If every participant in the study experiences a failure, then the
observed data includes the failure time and the auxiliary
endpoint for the entire study sample.
On the other hand, if some participants do not experience the
failure event by the end of the follow-up period, both the time
to failure and the auxiliary endpoint are unobserved.
In this case, we say the failure time is right-censored and
auxiliary endpoint is dependently censored.
M ARKED O UTCOME R EGRESSION
M OTIVATION
VACCINE E FFICACY T RIALS
From 1998 - 2003, VaxGen Inc. conducted the first HIV vaccine
efficacy trial.
5403 high-risk, uninfected subjects randomized to the
AIDSVAX vaccine (n1 = 3598) or placebo (n2 = 1805)
368 subjects became infected with the virus
Naturally, VaxGen hypothesized that vaccine efficacy would be
higher against HIV strains with amino acid sequences similar
to those strains used to make the vaccine (MN and GNE8)
Question: What is the efficacy of the AIDSVAX vaccine?
Note: Endpoint is only available on patients that acquire HIV
M ARKED O UTCOME R EGRESSION
M OTIVATION
T HERAPEUTIC AIDS T RIALS
ACTG 5095 randomized, multi-center clinical trial of HIV+
ART-naive patients
Study designed for 120 weeks follow-up
1147 study participants randomized to one of three treatment
groups: (i) ABC+3TC+ZDV, (ii) 3TC+ZDV+EFV, or (iii)
ABC+3TC+ZDV+EFV
Questions of Interest:
What is the (un)-adjusted difference in resistance endpoints
among ARTs at the time of virologic failure?
What is the (un)-adjusted difference in immunological
endpoints (CD4+, CD8+, etc.) among ARTs at the time of
virologic failure?
Note: Resistance endpoint is only available at virologic failure
M ARKED O UTCOME R EGRESSION
M OTIVATION
L IFETIME C ENSORED M EDICAL C OST
SWOG 9509: randomized clinical trial of untreated patients
with advanced nonsmall cell lung cancer
408 study participants randomized to two treatment groups:
(i) Paclitaxel plus Carboplatin, and (ii) Vinorelbine plus
Cisplatin
Hypothesis: What is the (un)-adjusted difference in lifetime
medical costs between treatments?
Notes:
Lifetime cost accrues over time
Lifetime medical cost observed for only those patients whose
entire lifetime is observed
M ARKED O UTCOME R EGRESSION
BACKGROUND
N OTATION
Consider the linear regression model
Yi = Xi0 β + εi , i = 1, . . . , n,
Yi := auxiliary endpoint of interest, e.g. lifetime medical cost
Ti := time at which Yi is measured (event time)
Xi := d-vector of risk factors
β := (β1 , . . . , βd )0 regression coefficients
εi are i.i.d. errors
M ARKED O UTCOME R EGRESSION
BACKGROUND
O BSERVED DATA
The true data are:
(Yi , Ti , Xi ), i = 1, . . . , n.
However, the observed data are:
(Wi , Ui , ∆i , Xi ), i = 1, . . . , n,
Wi := Yi · I (Ti ≤ Ci )
Ui := Ti ∧ Ci
∆i := I (Ti ≤ Ci )
Objective: Estimate regression coefficients β.
M ARKED O UTCOME R EGRESSION
BACKGROUND
I DENTIFIABILITY
Because of dependent censoring, there is concern whether the
marginal mean of lifetime medical cost is identifiable from the
observed data.
Define
τT := max support for T
τC := max support for C
Then
If τT ≤ τC , then all is well
If τC < τT , the joint distribution of (Y , T ) is not observed on
(−∞, ∞) × (τC , ∞) and, hence, the marginal distribution of Y
can be nowhere identifiable
M ARKED O UTCOME R EGRESSION
BACKGROUND
M ODELING THE T IME -R ESTRICTED M ARGINAL M EAN
One approach to address identifiability problem is to model the
time-restricted lifetime medical cost, instead of lifetime medical
cost.
Suppose we want to model the L = 5-year restricted medical cost,
Y L , defined as the cumulative medical cost to T or L = 5 years,
whichever comes first. Define the time-restricted observables:
WiL := Yi · I (Ti ∧ L ≤ Ci )
UiL := Ti ∧ L ∧ Ci
∆Li := I (Ti ∧ L ≤ Ci )
Then, we can estimate E (Y L ) or model E (Y L |X ).
M ARKED O UTCOME R EGRESSION
BACKGROUND
IPW E STIMATION
To estimate mean E (Y L ), for example, Zhao and Tsiatis (1997)
proposed the statistic,
n
∆Li WiL
.
i=1 pr{C > min(Ti , L)}
n−1 ∑
To parametrically model E (Y L |X ), a regression estimator may be
constructed by replacing YiL above with a score, say ψ,
n
∆Li ψ(WiL , X , β)
.
i=1 pr{C > min(Ti , L)|Xi }
n−1 ∑
Using arguments from survival analysis, one can show these
estimators are CAN.
M ARKED O UTCOME R EGRESSION
BACKGROUND
N OTES ON T IME -R ESTRICTED M ARGINAL M EAN
This technique has been studied extensively with contributions by
Zhao and Tsiatis (1997), Lin et al. (1997), Bang and Tsiatis
(2000), Zhao and Tian (2001), Strawderman (2002), Zhao et al.
(2007).
Potential caveats:
Clearly, Y and Y L are not the same quantity, although both
may be of interest;
Recall, Y = Y L if min(T , L) = T but can be very different
otherwise;
Example: Approximately 90% of patients living with ALS die
within 7 years of the date of diagnosis. So, one expects that
modeling E (Y |X ) or E (Y L |X ) with L = 7 is similar.
M ARKED O UTCOME R EGRESSION
BACKGROUND
M ODEL THE J OINT D ISTRIBUTION
Huang and Louis (1998) showed that one may obviate the
identifiability concern in the marginal mean by estimating the mark
& time joint distribution. Huang and Lovato (2002) extended
these ideas to two-sample tests and Huang (2002) proposed a
regression coefficient estimator. Suppose
Yi
β
0
= Xi
+ εi , (i = 1, . . . , n),
Ti
γ
γ := (γ1 , . . . , γd )0 time-scale coefficients
εi are i.i.d. according to a unspecified bivariate distribution
function
Note: WLOG, assume Ti is measured on the log-scale
M ARKED O UTCOME R EGRESSION
BACKGROUND
C OEFFICIENT E STIMATION
Define the counting process Ni (t, γ) = I (Ui − Xi0 γ ≤ t, ∆i = 1) and
the marked process Ni† (t, θ) = (Wi − Xi0 β)Ni (t, γ). Then, define the
pair of estimating functions:
n
Sβ (β, γ; O) =
Sγ (γ; O) =
Z ∞
∑
φ(t, γ) Xi − X̄ (t, γ) dNi† (t, γ),
∑
φ(t, γ) Xi − X̄ (t, γ) dNi (t, γ),
i=1 −∞
n Z ∞
i=1 −∞
where X̄ (t, γ) = ∑i Xi Ri (t, γ)/ ∑i Ri (t, γ), the at-risk indicator
Ri (t, γ) = I (Ui − Xi0 γ ≥ t), φ(t, γ) is a weight function satisfying
regularity conditions and O is the observed data.
M ARKED O UTCOME R EGRESSION
BACKGROUND
C OMMENTS
Ni (t, γ) is an ordinary counting process that increments by +1
on the time-scale support points whereas the marked process
Ni† (t, θ) takes random jump size of (Wi − Xi0 β);
Sγ (γ; O) is the weighted log-rank estimating function (Tsiatis,
1990; Wei et al., 1990; Louis, 1981);
Solving for 0 = (S0β , S0γ )0 leads to a strongly consistent
estimator for θ = (β0 , γ0 )0 .
n1/2 (b
θ − θ0 ) is asymptotically normal
M ARKED O UTCOME R EGRESSION
BACKGROUND
C LOSED - FORM S OLUTION
Note that Sγ (γ; O) does not depend on β so one can solve for the
mark-scale coefficients in two-stages.
Let bγφ be the weighted log-rank estimator for γ and solve for β in
0 = Sβ (β, bγφ ; O):
"
e
βφ =
n
Z ∞
∑
i=1 −∞
"
n
∑
#−1
0
φ(t, bγφ ) Xi − X̄ (t, bγφ ) Xi dNi (t, bγφ )
×
Z ∞
i=1 −∞
#
φ(t, bγφ ) Xi − X̄ (t, bγφ ) Wi dNi (t, bγφ ) ,
M ARKED O UTCOME R EGRESSION
M ETHODS
L IMITATIONS OF H UANG ’ S E STIMATOR
Despite its robustness on the time-scale, Huang’s estimator
can be quite sensitive on the mark- or cost-scale
Given the numerous articles describing the unusual
architecture of cost data, it would seem that robustness on
the mark-scale is as or more important as on time-scale
Huang (2002) noted that one could construct a robust marked
process through Ni† (t, θ) = ψ(Wi − Xi0 β)Ni (t, γ) but offered no
details on such extension, ψ(·) is Huber’s ψ
I contend that such robust extension would be non-trivial and
not follow the standard tricks from robust M-estimators
M ARKED O UTCOME R EGRESSION
M ETHODS
S UFFICIENT C ONDITION FOR C ONSISTENCY
Let
eYi (β) = Wi − Xi0 β
f (t) be a monotone function in t
Define Ni† (t, θ) = f {eYi (β)} dNi (t, γ)
Define
n
Sβ (β, γ; O) =
=
Z ∞
∑
φ(t, γ) Xi − X̄ (t, γ) dNi† (t, θ),
∑
φ(t, γ) Xi − X̄ (t, γ) f {eYi (β)} dNi (t, γ).
i=1 −∞
n Z ∞
i=1 −∞
Solving 0 = Sβ (β, γ0 ; O) leads to consistent estimator for β.
M ARKED O UTCOME R EGRESSION
M ETHODS
R ANK - BASED E STIMATOR
Let
R{eYi (β)} to be the rank of eYi (β) among the uncensored
residuals
Define new estimator b
θ as solution to 0 = (S0 , S0 )0 with
β
n
Sβ (β, γ; O) =
∑
Z ∞
i=1 −∞
γ
φ(t, γ) Xi − X̄ (t, γ) R{eYi (β)} dNi (t, γ).
Under the same conditions in Huang (2002), one can show that b
θ
is strongly consistent and n1/2 (b
θ − θ0 ) is asymptotically normal
under somewhat different arguments than Huang (2002).
M ARKED O UTCOME R EGRESSION
M ETHODS
M ARTINGALE - BASED ARGUMENTS
For Huang’s (2002) estimator,
n
Sβ (β, γ; O) =
∑
i=1
Z ∞
φ(t, γ) Xi − X̄ (t, γ) ψ{eYi (β)} dNi (t, γ).
{z
}
−∞ |
Ft -measureable
where
Ft = σ Ni (u, γ), I (Ui − Xi0 γ ≤ u, ∆i = 0),
Yi · I (U − Xi0 γ ≤ u, ∆i = 1), Xi , u ≤ t, i = 1, . . . , n .
Evidently, this is not true for the rank process R{eYi (β)}.
M ARKED O UTCOME R EGRESSION
M ETHODS
R EWRITE S CORE AS F UNCTION OF E MPIRICAL
M EASURE
WLOG, write R{eYi (β)} = ∑j I {eYj (β) ≤ eYi (β), ∆j = 1}
Sβ (β, γ; O) =
n
=
Z ∞
∑
i=1 −∞
n
=
φ(t, γ) Xi − X̄ (t, γ) R{eYi (β)} dNi (t, γ),
∑ ∆i φ{eT (γ), γ}
i
Xi − X̄ {eTi (γ), γ} R{eYi (β)}
i=1
n n
=
∑ ∑ ∆i ∆j φ{eT (γ), γ}
i
i=1 j=1
Xi − X̄ {eTi (γ), γ} I {eYj (β) ≤ eYi (β)}.
M ARKED O UTCOME R EGRESSION
M ETHODS
I NFERENCE VIA M ULTIPLIER B OOTSTRAP
Jin et al. (2006) presented a resampling scheme for the
Buckley-James estimator and we can adopt a similar strategy here.
1
2
3
Simulate (Z1 , . . . , Zn ) i.i.d. such that E (Zi ) = var(Zi ) = 1
Compute the estimate bγ∗φ by minimizing the k-step perturbed
weighted Gehan loss function
Compute the estimate b
β∗φ by solving the perturbed system
0 = S∗β (β, bγ∗φ ; O, Z ),
n
S∗β (β, γ; O, Z ) = ∑ Zi
i=1
4
Z ∞
−∞
φ∗ (t, γ) Xi − X̄ ∗ (t, γ) R{eYi (β)} dNi (t, γ)
Repeat B times and use the Wald or percentile method
M ARKED O UTCOME R EGRESSION
S IMULATIONS
S CENARIO
Simulate data according to the model
β
Yi
0
= Xi
+ εi , (i = 1, . . . , n)
Ti
γ
where β = (1, 0.5)0 , γ = (1, 1)0 ,
Xi are standard normal;
Ci Un(0, 6);
and εi are i.i.d.
(1 − π) × N(0, Ω) + π × N
10
0
1 0
,
,
0 1
with Ω = {ωjk }, ω11 = ω22 = 1, and ω12 = ω21 = 0.5.
M ARKED O UTCOME R EGRESSION
S IMULATIONS
TABLE : Monte Carlo simulation results
Method
Contamination Probability
1%
Bias
SE
SEE
Bias
Weight
Parameter
Bias
0%
SE
SEE
2.5%
SE
SEE
Gehan
β1
β2
β1
β2
5
-5
1
-6
201
202
175
189
184
185
170
173
-5
6
-7
8
247
256
209
235
205
206
190
190
5
16
6
18
313
301
273
297
256
254
241
248
β1
β2
β1
β2
7
-7
-1
-8
220
220
196
206
204
205
236
248
-8
-6
-7
-2
234
227
204
211
216
214
248
252
-2
-5
-3
-2
245
232
208
206
230
230
265
271
β1
β2
β1
β2
3
7
1
1
185
179
162
155
153
154
141
142
12
-16
9
-13
203
222
193
208
177
181
163
166
8
-38
2
-25
287
274
261
255
238
230
225
226
β1
β2
β1
β2
1
7
-2
-2
183
179
172
150
165
163
169
168
4
-13
4
-8
176
184
166
167
166
172
175
174
2
-19
3
-11
178
184
169
169
178
181
184
182
n = 40
Huang
Log-rank
Robust
Gehan
Log-rank
n = 60
Huang
Gehan
Log-rank
Robust
Gehan
Log-rank
M ARKED O UTCOME R EGRESSION
DATA A NALYSIS
A NALYSIS OF SWOG 9509
Randomized clinical trial of untreated patients with advanced
nonsmall cell lung cancer
408 study participants randomized to two groups:
1
2
Paclitaxel plus Carboplatin
Vinorelbine plus Cisplatin
Objective: What is the expected difference in lifetime
medical costs between the two treatments after adjusting for
age and LDH?
M ARKED O UTCOME R EGRESSION
DATA A NALYSIS
M ORE N OTES ON A NALYSIS
Cost of Resource Utilization
Resource utilization was monitored as part of study
Collected at 3, 6, 12, 18, & 24 mos.
Cost assigned using standard procedures
Re-calibrated back to 1998 dollars
• 10 participants had insufficient data and were removed
M ARKED O UTCOME R EGRESSION
DATA A NALYSIS
TABLE : Analysis results for the lung cancer data on log-scale
Huang
Weight
Variable
Gehan
Log-rank
Robust
Estimate
SE
95% CI
Estimate
SE
95% CI
Treatment
LDH
Age
0.405
0.164
-0.069
0.134
0.138
0.066
(0.140, 0.652)
(-0.094, 0.446)
(-0.202, 0.066)
0.357
0.156
-0.081
0.131
0.131
0.062
(0.103, 0.612)
(-0.102, 0.390)
(-0.200, 0.046)
Treatment
LDH
Age
0.338
0.141
-0.050
0.121
0.121
0.056
(0.109, 0.582)
(-0.117, 0.366)
(-0.168, 0.058)
0.300
0.118
-0.059
0.113
0.112
0.052
(0.086, 0.532)
(-0.102, 0.336)
(-0.168, 0.035)
M ARKED O UTCOME R EGRESSION
DATA A NALYSIS
TABLE : Analysis results for the lung cancer data on natural scale
Huang
Weight
Variable
Gehan
Log-rank
Robust
Estimate
SE
95% CI
Estimate
SE
95% CI
Treatment
LDH
Age
8.251
2.763
-1.897
3.485
3.465
1.652
(1.266, 14.757)
(-3.892, 9.118)
(-5.219, 1.406)
9.195
3.516
-2.085
3.308
3.285
1.580
(2.974, 15.680)
(-2.918, 9.640)
(-5.353, 1.075)
Treatment
LDH
Age
8.227
3.588
-1.232
3.551
3.559
1.677
(1.064, 15.415)
(1.064, 15.415)
(-5.219, 1.406)
8.998
3.380
-1.793
3.267
3.201
1.516
(2.612, 15.799)
(-3.234, 9.486)
(-5.103, 0.993)
M ARKED O UTCOME R EGRESSION
DATA A NALYSIS
S OME R EMARKS ABOUT SWOG 9509
Lifetime medical cost significantly higher in the Paclitaxel plus
Carboplatin group as compared to the Vinorelbine plus
Cisplatin group
Using results on natural scale, the average difference is
≈ $9, 100USD
The length of confidence interval using the rank-based
estimator is about $1, 200 less than Huang’s estimator
M ARKED O UTCOME R EGRESSION
D ISCUSSION
S OME C ONCLUDING R EMARKS
Proposed a robust, rank-based extension of an estimator
proposed by Huang (2002)
In simulation studies, the new estimator performed as well or
better than Huang’s “semi-rank-based” estimator
Proposed a resampling scheme to approximate the sampling
distribution of the coefficient estimator
The resampling scheme performed well in simulation studies.
It should work equally well for any style of calibration
estimator.
M ARKED O UTCOME R EGRESSION
D ISCUSSION
W HAT ABOUT C OST ACCUMULATION ?
Let A(t) be cost accumulation at time t.
Y = A(T ).
Y L = A(T ∧ L)
W L = A(T ∧ L ∧ C )
but W ≡ Y · I (T ≤ C ) 6= A(T ∧ C )
So, if cost data is collected on participants with censored failure
times, then the current approach may be somewhat wasteful.
For other problems, like HIV resistance, marks may not be
collected/available until failure occurs.