slides

A Broad Framework for Joint Modeling
And Some Tales From the Unexpected
Geert Molenberghs
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat)
Universiteit Hasselt & Katholieke Universiteit Leuven, Belgium
[email protected] & [email protected]
www.ibiostat.be
Raleigh, North Carolina State University
with: M. Aerts, M. Davidian, M.G. Kenward, D. Rizopoulos, A.A. Tsiatis, G. Verbeke
Common Ground for Various Settings
Sequential trials
Incomplete data
outcome & sample size
outcome(s) & missigness process
Completely random sample size
outcome & sample size
‘Informative’ cluster size
outcome & cluster size
Classical survival
(Narrow) joint modeling
Random observation times
Raleigh, NC, July 13, 2013
time to event & censorship
time to event(s) & longitudinal process
outcome & measurement schedule
1
A Classic:
Joint Models with Missing Data
f (yi, r i|Xi, θ, ψ)
Selection Models: f (y i|Xi, θ) f (r i|Xi, y oi, y m
i , ψ)
MCAR −→
f (r i|Xi, ψ)
MAR
−→
f (r i|Xi, y oi , ψ)
MNAR
f (r i|Xi, y oi , y m
i , ψ)
Pattern-mixture Models: f (y i|Xi, ri, θ) f (r i|Xi, ψ)
Shared-parameter Models: f (y i|Xi, bi, θ) f (ri|Xi, bi, ψ)
Raleigh, NC, July 13, 2013
2
Generic Setting
f (y i, ci|θ, ψ)
=
f (y i|θ) · f (ci|y i, ψ)
=
f (y i|ci, θ, ψ) · f (ci|θ, ψ)
Setting
Yi
Ci
Sequential trials
Yi
N
Incomplete data
Yi
Ri
Completely random sample size
Yi
N
‘Informative’ cluster size
Yi
Ti
Classical survival
Ti
Ci
(Narrow) joint modeling
Yi
Ti, . . .
Random observation times
Yi
Ti
Raleigh, NC, July 13, 2013
3
Fundamental Concepts
Ignorability:
f (ci|y i, ψ) = f (ci|y oi, ψ)
combined with separability
Ancillarity:
. S minimally sufficient statistic for θ
. T |S does not contain information about θ
Raleigh, NC, July 13, 2013
4
Completeness:
. g(·) a measurable function of S
. E[g(s)] = 0 for all θ
=⇒
g(s) = 0 a.e.
Weights:
f (ci|ψ ∗)
f (y i|θ , ψ ) = f (y i|ci, θ ) ·
f (ci|y i, θ ∗, ψ ∗)
∗
MAR: wi(y i, ci) = wi (y oi, ci)
∗
∗
= f (y i|ci, θ ∗) · wi(y i, ci, θ ∗, ψ ∗)
Deterministic rule:
wi(y oi, ci) ∈ {0, 1}
Raleigh, NC, July 13, 2013
5
Simple Setting
• Observe
Yi ∼ N (µ, 1)
• Construct sum statistics
• Apply a stopping rule, e.g.,
• If continuation applies, observe Yi,
K=

i = 1, . . . , n
n
X
i=1
Yi

β
F α + k 
n
i = n + 1, . . . , 2n
• Data:
(Y1, . . . , YN , N )
Raleigh, NC, July 13, 2013
6
Stopping Rule




β
β
F α + k  = Φ α + k 
n
n
• Three important cases:
Purely random sample size
Probabilistic stopping
Purely random sample size
Raleigh, NC, July 13, 2013
β=0
β 6= 0
β → +∞
7
Four Relevant Distributions
N
Y
Y
i=1
φ(yi; µ)


N |Y
β 


Φ α + k
n
Y |N
z
1−z
β
β
φ(yi; µ)Φ α + k  1 − Φ α + k 
i=1
n
n

 

z
1−z
 α + βµ  
 α + βµ 
 


Φ  r
 1 − Φ  r

2
2
1 + β /n
1 + β /n
N
Raleigh, NC, July 13, 2013
N
Y







 α + βµ 

Φ  r

2
1 + β /n
8
Joint Likelihood


β z 
β 1−z


L(µ) =
φ(yi ; µ) · Φ α + k  · 1 − Φ α + k 
i=1
n
n

N
Y
`(µ) = −
S(µ) =


N
1 X
(yi − µ)2
2 i=1
N
X
(yi − µ)
i=1
H(µ) = −N
Raleigh, NC, July 13, 2013
9
Conditional Likelihood
Lc(µ) =
QN
i=1 φ(yi ; µ)Φ

Φ  √α+βµ2
N =n :
N = 2n :
Raleigh, NC, July 13, 2013






































































α+
1+β /n

z





!
β z
nk
"
1−Φ α

1 − Φ  √α+βµ2
!#
1−z
β
+ nk

1−z
1+β /n


2
(y
−
µ)
− ln Φ(ν)
`(µ) = − 12 PN
i
i=1
S(µ) =
f
φ(ν)
(y
−
µ)
−
β
·
i
i=1
Φ(ν)
PN
φ(ν)
H(µ) = −N + βf2 · [ν · Φ(ν) + φ(ν)] · Φ(ν)
2
2
`(µ) = − 12 PN
i=1(yi − µ) − ln [1 − Φ(ν)]
S(µ) =
f
φ(ν)
(y
−
µ)
+
β
·
i
i=1
1−Φ(ν)
PN
φ(ν)
H(µ) = −N − βf2 · {ν · [1 − Φ(ν)] − φ(ν)} · [1−Φ(ν)]
2
10
Information, Bias, MSE
• Notation:
r
α = α/ 1 + β 2/n
f
f
r
β = β/ 1 + β 2/n
r
ν = (α + βµ)/ 1 + β 2/n
• Information:
f
f + βµ)]
I(µ) = n[2 − Φ(α
f2
f
f + βµ)2
β
φ(
α
f
f + βµ)] −
Ic(µ) = n[2 − Φ(α
f
f
f + βµ)[1 − Φ(α
f + βµ)]
Φ(α
Raleigh, NC, July 13, 2013
11
• Counterintuitive?
N is not ancillary
• Some limits:
β=0
=⇒
I(µ) ≡ Ic(µ)
n −→ ∞
=⇒
I(µ) = Ic(µ)
• Both estimators asymptotically unbiased
• The conditional score E[Sc (µ)] = 0:
conditional likelihood estimator unbiased in finite samples
Raleigh, NC, July 13, 2013
12
• Mean squared error
. β < +∞:
1
1 f2
+ 2 β φ(ν)2
MSE(µ) =
n[2 − Φ(ν)] 4n
c
c ) '
MSE(µ
c
. β → +∞:
1
1
f2
2
+
β
φ(ν)
n[2 − Φ(ν)] [2 − Φ(ν)]2 Φ(ν)[1 − Φ(ν)]n2
1
1 √
√
MSE(µ) =
+ φ( nµ)2
n[2 − Φ( nµ)] 4n
c
√
1
1
2
√
√
√
√
MSE(µc ) '
+
φ(
nµ)
n[2 − Φ( nµ)] [2 − Φ( nµ)]2 Φ( nµ)[1 − Φ( nµ)]n
c
Raleigh, NC, July 13, 2013
13
Likelihood Estimators: Conditional Bias
• Conditional likelihood estimator: conditionally unbiased
• Joint likelihood estimator (β < +∞):
β
φ(ν)
E(µ|N = n) = µ +
·
1 + β 2/n nΦ(ν)
r
β
φ(ν)
·
E(µ|N = 2n) = µ −
1 + β 2/n 2n[1 − Φ(ν)]
r
• Joint likelihood estimator (β → +∞):
√
φ( nµ)
√
E(µ|N = n) = µ + √
nΦ( nµ)
√
φ( nµ)
√
E(µ|N = 2n) = µ − √
2 n[1 − Φ( nµ)]
Raleigh, NC, July 13, 2013
14
• β → +∞
&
n → +∞:
.µ<0
E(µ|N = n) → 0
E(µ|N = 2n) → µ
.µ>0
E(µ|N = n) → µ
µ
E(µ|N = 2n) →
2
. Yet: asymptotically unbiased
Raleigh, NC, July 13, 2013
15
Lack of Incompleteness −→
Counterintuitive Results
• N fixed =⇒ K complete
• Nice results apply
• Lehmann-Scheffé theorem:
unbiased
complete
sufficient
Raleigh, NC, July 13, 2013



































=⇒
best mean-unbiased
16
• Basu’s theorem:
complete
sufficient



















=⇒
independent of any ancillary statistic
• Not so in our case!
• Sufficient statistic is (N, K)
Raleigh, NC, July 13, 2013
17
• Joint distribution:


1 2


pµ (N, k) = p0 (N, k) · exp kµ − nµ 
2


β 


p0 (n, k) = φn(k) · Φ α + k 
n






p0 (2n, k) = φ2n(k) · 1 −


βk 
α +


Φ  s 2n+β2n2 
2n
• To establish incompleteness, we must find a function g(k, N ) satisfying:


β 


g(k, 2n) · p0 (2n, k) = − φn(k − z) · g(z, n) · φn(z) · Φ α + z  dz
n
Z
Raleigh, NC, July 13, 2013
18
• Example:
g(k, n) =
`
!
β
Φ α + nk
g(k, 2n) = −
Raleigh, NC, July 13, 2013
`
1−Φ


βk
 α+

2n
r



2
2n+β
2n
,
19
Ordinary Sample Average
is Biased and Not Optimal
• Generalized sample average:
K
· [c · I(N = n) + d · I(N = 2n)]
µ=
N
• Expectation:
E(µ) = dµ + (c − d)µΦ(ν) +
Raleigh, NC, July 13, 2013
2c − d
β
r
· φ(ν)
2
2n 1 + β /n
20
β=0
Generalized Sample Size
— completely random sample size
• A family of unbiased estimators:
1 − cΦ
d=
1−Φ
=⇒
E(µ) = µ
• Minimum variance:
copt = 1 −
1
1−Φ
· 2 2−Φ ,
2n µ + 2n
dopt = 1 +
Φ
1
· 2 2−Φ .
2n µ + 2n
• No uniform optimum
• Ordinary sample average is not optimal
Raleigh, NC, July 13, 2013
21
Generalized Sample Size
β 6= 0 — stopping rule
• All generalized sample averages biased
• This includes ordinary sample average!
• Asymptotically unbiased (as we known from likelihood)
Raleigh, NC, July 13, 2013
22
Generalization 1:
Arbitrary Number of Looks
• Our stopping rule


β 


Φ α + k
n
for β → +∞ can be generalized to arbitrary numbers of looks
n1 < n2 < . . . < nL
by applying


Φ αj +

βj 
k
nj
• Results carry over
Raleigh, NC, July 13, 2013
23
Generalization 2:
Completely Random Sample Size
• Allow for sample sizes with probabilities
n
0
1
···
m
πn
π0
π1
···
πm
• Incomplete sufficient statistic:
g(z, n) = bn
m
X
n=0
Raleigh, NC, July 13, 2013
πn · bn = 0
24
• Generalized sample average:
K
· an · I(N = n)
µ=
n=0 n
m
X
• Unbiased generalized sample average:
an = 1 + bn
• Optimal estimator:
an =
Raleigh, NC, July 13, 2013
1
µ2 + σ 2/n
m
πk
X
k=0 µ2 + σ 2/k
25
Generalization 3:
Missing Data
• Our stopping rule


β 


Φ α + k
n
for 0 < β < +∞ corresponds to MAR missingness in independent data
• Can be generalized to longitudinal sequences
Joint likelihood
≡
selection model representation
Conditional likelihood
≡
pattern-mixture model
Raleigh, NC, July 13, 2013
26
• Assume, for example
Y i = (Y i1 , Y i2)
. first block always observed
. second one possibly missing
• Sample average is not an option:
1
µ1 =
N
d
N
X
i=1
y i1

n
n
1 X
1 X
1
−1 
d

µ2 =
y − Σ21 Σ11
y −
n i=1 i2
n i=1 i1 N
Raleigh, NC, July 13, 2013
N
X
i=1

y i1
27
Generalization 4:
Informative Cluster Sizes
• Either: Model for two random variables:
. Outcomes Y i
. Cluster size ti
• Or: model for three random variables:
. Outcomes Y i
. Observed cluster size ti
. Theoretical cluster size ni ≥ ti
• Entirely similar to missing-data setting
Raleigh, NC, July 13, 2013
28
Generalization 5:
Joint Model for
Longitudinal and Survival Data
• Model for two, three, or four random variables:
. Longitudinal process Y i
. Time-to-event outcome Ti
. Censoring time Ci
. Missing data process Ri
• Formulate an appropriate “shared-parameter model”
• Results carry over
Raleigh, NC, July 13, 2013
29
Conclusions
• Commonality across seemingly unrelated settings
• Ordinary sample average loses its ‘perfect status’
. Already with completely random sample size
←−
lack of completeness
. More so with deterministic or probabilistic stopping rule
• It nevertheless is a valid estimator:
. Asymptotically unbiased (unbiased for CRSS)
. Almost optimal
. Asymptotically optimal
. Consistent with the likelihood
Raleigh, NC, July 13, 2013
30