Multistate Markov Models for Analyzing
Incomplete Life History Data
by
Pai-Lien Chen
Department of Biostatistics
University of North Carolina
Institute of Statistics
Mimeo Series No. 2163T
July 1996
MULTISTATE MARKOV MODELS FOR ANALYZING INCOMPLETE
LIFE HISTORY DATA
by
Pai-Lien Chen
A dissertation submitted to the faculty of the University of North Carolina at
Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of
Philosophy in the Department of Biostatistics.
Chapel Hill
1996
..
@1996
Pai-Lien Chen
ALL RIGHTS RESERVED
11
Let
K8
= G(8) (G(8)'
[Lh=1: h (
8l
r
G(8 n )) -1 G(8)'
and define B noB
, by
Bt B
= [Lh=1 Ih (B)]-1
n;B n;(}
n
we have that Bn;BK BB~;B is idempotent. Hence
tracB n;BK(} B~;(}
rankB n;B K (} B~;B
tracB n;BG( B) (G( B)t B~;BB n;BG( B)) -1 G( B)t B~;B
trac (G( B)t B~;B B n;B G( B)) (G( B)t B~;B B n;BG(B)) -1
tracI(m-1)k(k-1)
(m - 1)k(k - 1).
Since Bn;BK(}B~;B is idempotent, symmetric, of order (m - 1)k(k - 1) and (ml)~~(k
- 1),
Hence
K
B
=(B
n;(}
)-1(B t
n;(}
t 1 =Lh=1nIh (B).
Therefore, QL - Qw ~ 0 in probability, i.e., QL ~ Xfm-1)k(k-1)'
To obtain the asymptotic X(m-1)k(k-1) distribution for QR, one can follow the same
argument of Sen and Singer (1993) except substitute
L~ ~h((}O)
!
for I( B).
•
Note that if
mh
= m for
all h, that is we have panel observation, then the test
statistics are equivalent to the Pearson goodness of fit test statistic.
2.4
Incorporation of Covariates
In many applications, there are measured covariates on each individual under study.
The study might interest centers on the relationship between those covariates and
.events of life history. In many cases, those covariates are time dependent. As mentioned in Chapter 1, time homogeneous Markov model can not handle this situation.
37
Chapter 3
Piecewise Transition Intensity
'
Markov Model
This chapter presents piecewise transition intensity Markov models for analyzing nonpanel data. In Section 3.1, we first define a Markov model with piecewise transition
intensity function. A quasi-scoring method is applied to obtain the MLE of piecewise
intensities and the large sample property is also discussed.
Testing homogeneity
and incorporation of covariates are considered in Section 3.2 and 3.3. Section 3.4
presents the piecewise Weibull model and Gompertz model. Section 3.5 discusses the
individual heterogeneity into the model.
3.1
First Order Markov Model
We consider models consisting of k states S
=
{I, ... , k}, and let X (t) denote the
state occupied by an individual at time t(t 2:: 0). We say that {X(t) : t 2:: O} is
the life history of the individual. Given the individual's life history up to time t, H t ,
transitions between states, (i, j, i
=f
j) are assumed governed by transition intensity
functions
aij (t
I H t ) = ~To ~ p { X (t +/:::,) = j I X (t) = i, X (s ), 0 ~ s < t},
40
t
2:: O.
..
2.4.2
Incomplete Observation
In the previous section, we discuss the model with complete panel observations. However, incomplete observations where states and/or covariates are missing often occur
in the real life. There are three possible incomplete data patterns at time s which are
(i). covariate
Zs
(ii). covariate
(iii). covariate
..
is missing, state X(s) is observed;
Zs
is observed, state X (s) is missing;
and state X (s) both are missing.
Zs
The last two incomplete data patterns are usually in non-panel observation model. For
example,
Zs,1
is the covariate presenting patient's age and
Zs,2
is patient's blood pres-
sure at time s. Suppose that the observation time period for subject h is
[t s - I , t s + I ].
Then the state at .time s, X( s), and patient's blood pressure
[th,r-I, th,r]
Zs,2
=
are unob-
servable for subject h. The unobserved X(s) can be handled under Markov assumption. However, we have to impute the unobserved covariates for non-panel model.
Let
Zh,s
be the estimated covariate for subject h at time
ts,
which the covariate does
not observe in the study. Several imputation methods are proposed for the missing
covariates;
(i).
Zh,s
(ii).
=
Zh,s
Zh,s-I,
where
Zh,s-I
A
Zh,s
=
ts t
t s- I
t
s+1 -
s-I
Z h,s+1
+ t t s+ 1 - t t s
s+1 -
s-I
Z,s-I,
h·
is estimated by mean of the other observed covariates at time s, i.e.,
A
Zh,s
where
ts-I;
is estimated by linear function of previous and next observed covariates, i.e.,
Z h,s
(iii).
is the observed covariate at previous time point
Yh,s
=
.,f!-.
Zh,s Yh,s
}/ ,
h=1 L...h=1 h,s
~ "n
= 1 if h subject is observed at time ts; otherwise 0;
The feature of the first two imputation methods is using the individual's own
In-
formation rather than others. Unobserved covariates of an individual may be more
likely their previous and/or next observed values than other individuals' at the same
time point. Note also note that the second imputation method is better than the first
one when the time distance,
t s+ 1 -
ts-I,
is large. The third imputation method does
not imply that the mean is a good estimate but rather it is a convenient one. After
substituting the incomplete covariates, we can apply the same procedure as Section
2.2 to estimate {3.
39
ki + LIES, LqEs/+{I} aql(t I H;)6
+ 0(6)
1
ki
where k i
~
1 is the number of elements in Si. Therefore, the transition intensity
function for non i state transferring to i E S at time t becomes
a.i(t I Hn =
~. L
t
It shows that the intensity, a.i(t
I
jESj
aji(t I Hn
I
H;), is a average intensity functions {aji(t
H;)}jESj' For simplifying the notation, we denote that aij(t) = aij(t I H;) and
a.i(t) = a.i(t I H;) for all t. Then the general form of transition probability of staying at state j on time t s given the previous observation state i( i
=1=
j) at time t s -
l ,
Pij(ts-I, ts) = P(X(t s ) = j I X(ts-d = i), can be displayed as the following:
where S; is a subset of S, j E S, such that for alII
P{X(t
+ 6) =
=1=
j, 1 E S;
11 X(t) = j} > O.
Suppose for now that a random sample of n individuals are observed at time
to < t l ... < t m which is not necessary for each individual observed at the same time
points. As noted by Anderson et al (1993), there are no explicit expressions for the
general transition probabilities Pij(ts-I, ts)' To deal with nonhomogeneous models, an
appropriate and convenient assumption is needed to display the transition probability
as a function of the transition intensity functions. In most medical research or chronic
disease studies, it is reasonable to assume that at most two transitions could occur
within an observed period. In this study, we will base on this restriction to pursue the
properties of the Markov model. Under the assumption, the transition probability
can be represented as
Pil(ts-I, t) =
i
t
-
e
rt '
""
Jts_l LJUES
j a IU (r)dr
I
tS-l
and
42
X
ai/(t')
-
t
r""
X e Jtl
I
LJvES/
a/ (r)dr
v
dt'
If Oij(t I H t ) = 0 for all j =I- i and for all t, we call i state an absorbing state, otherwise
i state is a transient state. Under the Markov assumption, the transition function
. P{X(t + 6) = j I X(t) = i}
11m ----=-'----'----:..-'--'---'-------=6!0
6
Oij(t I H;)
and
Oii(t I H;) = -
L
jES\{i}
Oij(t I H;),
t 2 0,
where H;, a sub-sigma field of H t , stands for the process being in the non-absorbing
state up to the time t. Based on the feature of the periodic observation data, we
define a similar transition intensity function O.i( t I H;) for not i state transferring to
i which is
O.i(t I H;) = lim ; P{X(t
6!0
where t
L:J.
+ 6) = i I X(t)
=1=
i, Hn,
2 O. By Markov assumption, we have
O.i(t I Hn
lim;
6!0
L:J.
lim ;
L
P{X(t
+ 6) = i, X(t) = 11 X(t)
L
P{X(t)
6!0
L:J.
L
P{X(t)
= 11
X(t)
=1=
i, H;}P{X(t
+ 6) = i I X(t) = I}
IESi
= 11 X(t) =I- i, Hn lim
6!0
IESi
L
=I- i, H;}
IESi+{i}
; P{X(t
+ 6) = i I X(t) = I}
L:J.
P{X(t) = 11 X(t) =I- i, HnOli(t I H;),
IESi
where 5 i is a subset of 5, i E 5, such that for alII
P{X(t
and P{X(t) = 1 I X(t) =I- i,
+ 6) = i I X(t)
=1=
i, 1 E 5 i
= I}
> 0,
Hn is the conditional probability of being in state 1 at
time t given that it had not been in an absorbing state before the time t.
Suppose that
0ij
(t I Hn is a continuous function for all t and i, j E 5, then
P{X(t) = 11 X(t) =I- i, H;}
EqESIHI} P{X(t) = 11 X(t - 6) = q}
41
If B = A
# 0 then
As the same discussion
III
Chapter 2, let P s = (Pij(t s- 1 , tS))i,j=l,... ,k be the
transition probability matrix for subjects at time interval [ts-l, ts), and denote [PsL j
be the element of the matrix P s at ith row and jth column. By the ChapmanKolmogorov equation, the pseudo transition probability for subject h with state j at
time th,n given at time th,r-l with state i, can be displayed as
(3.1.3)
where
if [ts-I, ts) C [th,r-l' th,r)
otherwise
and let
p:
o
be an identity matrix.
For example, in two-state survival model with interval-censored data, the pseudo
transition probability for subject will fail in the time interval (L, R] is
where 8s = 1 if (is-I, ts] C [L, R], and 0 otherwise; 8q
= 1 if (tq-I, t q ]
C [L, ts]' and 0
otherwise. The survival function is estimated by
if t < to
if tj-l
~
t < tj
if t 2: t m ,
where 1
~
j
~
m.
Note that we might have explicit expression for (3.1.3) as
P/j(th,r-I,th,r;a) =
~ e- :L:'=1 :LuES; 8h (ts-t s -1 )C>ius
LJ LJ
"
,T
IESJ s=l
X
{D 1 ( 1, s) [D 2 ( 1, s)D 3 ( 1, s)
where
44
+ D 4 ( 1, s)]} ,
(3.1.1 )
3.1.1
Piecewise Transition Intensity
We now apply Peto's (1973) idea for multistate model again, and define the pseudo
transition probability
tensity functions
P i j(ts-1,i s ;a)
Ctij(i)
=
Ctijs
for all
for
i
E
Pij(ts-his)
[i s - h is]
with the piecewise transition in-
as
where
A =
L
L
Ctl-us -
VES;
Ctius
UES;
and
B =
L
L
Ctjlls -
liES;
provides A =f. 0, B =f. 0 and A =f. B.
Ctius,
UES:
Note that the first part of (3.1.2) is the
probability that the subject did not go to the absorbing states from i state at time
interval
[i s - 1 , is)'
Note also that if A
,..,)
P*ij ( t s-1, t s,......
_
-
L
e
- L ..-
ES'.
•
= 0 and
(t s -t s_1)CI'jus
If B
CtilsCtIJ·s
B
IESJ
If B = 0 and A
x
B =f. 0 then
=f. 0 then
= A = 0 then
43
x { 1 - e- B(t s - t s-l l} •
Suppose now that (tr>-l, t r» is the rth successive observation time interval and M
is the total number of the different sucessive observation time intervals for individuals. Also let nijr* denote the number of individuals in state i at tr>-l and j at t r>.
Then P/j(tr>-l, tr>; a) is one of the {P/j(th,r-l, th,r; a)h=l,...,n,r=l,...,mh' Therefore, the
likelihood can be rewritten as
M
L(a) =
k
II II
Pi'j(tr>-l, tr>; a)n'Jr>
r=l i,j=l
(3.l.5)
where
Note also that if each individual is observed at times to, tl, ... , t m, then we have panel
observations and the model is the same as Kalbfleisch and Lawless (1985), then (3.1.5)
can be written as
m
L(a) =
k
II II
Pi'j(ts_l,ts;atiJS.
s=l i,j=l
Furthermore, we define Dl , D2, ... , Dm>, m*
~
m, be the smallest time intervals
such that for given hand r, [th,r-I, th,r] ~ Dq for some q E {I, ... , m*} and {Dqh<q<m>
satisfy the conditions, for q i- q'
Dq n Dql =
and
{t s} for some s E {I, ... , m} if I q - q'
{
o
if 1q -
1= 1
q' I> 1
(3.1.6)
m>
U Dq =
q=l
[to, t m].
(3.1. 7)
From (3.1.6) and (3.1.7), there exists an unique time interval Dq such that [ts-l, ts]
c
Dq •
Example 3.1: Suppose that there are two subjects in the study. We observe the
first subject at time points 0, 2 and 4. The other one is at time points 0, 1, 2 and
4. Then the observation time intervals for the first subject are [tl,o, tl,l]
= [0,2]
and
[t l ,I,t l ,2] = [2,4] and [t2 ,O,t 2,1] = [0,1], [t 2,1,t 2,2] = [1,2] and [t 2,2,t 2,3] = [2,4] for the
other one. (to, h, t 2, t 3 ) = (0,1,2,4) is the time points of follow-up of the study. The
46
e-(ts-ts-dAs _ e-(t s-t s-ll B s
D 3 (l, s) = - - - - - - - - -
B s - As
and
A",! = L
aj/''''! - L
/'ES~
B",! = La/v"'! - L
aiu"'!
, = s, s}, q},
uES:
vES;
and bh~r
, = q, S
aiu"'!
uES;
= 1 if [tsl-1,tsl] c [th,r-l,t s]; otherwise 0; also b'h,r
[fh,r-l, fh,r]; otherwise O.
3.1.2
Maximum Likelihood Estimate
Pseudo likelihood function
We now define a random vector X h = (b~(r)hxhmk2 as Chapter 2 with the binary
random variable.
ij
bh (r) =
By the definition, XI, X
{
= j, given
1 if X(th,r)
o
2, ... ,
X(th,r-d
=i
otherwise.
X n are independent, and the p.dJ. of X
mh
k
h
is
.
IT ITpi'j(th,r-ll fh,r; a)O;;(r)
f(Xh; a) =
r=1 i,j
gU. [mg p:6h
mh
k
,r(S\a)
subject to
L:tj b~(r) = 1 for
all r
ij
mh
h=1 r-l
k
I,)
6
s-1
45
(3.1.4)
Then the pseudo likelihood function is
[m
]o;;(rl
IT I] II I] P: h,r(S\a) ..
n
L(a) =
= 1, ... ,mh.
] o~(r)
I)
N i (i ro_ 1) = 2:J=l nijrO be the number of subjects in state i at time i ro_ 1. By first
taking the expectation of nijr O conditional on Ni(iro_1) and then using the fact that
we find that
(3.1.11)
which can be estimated by
k
I.;s',uv,u1v,(a) =
L L
rE\lI(D q )
i,j=l
(3.1.12)
Before we perform the quasi scoring algorithm, we first simplify the notation. Let
vec(-) be an operator stacking the columns of a matrix to a vector. We denote that
cI>q (a)
is a pP
Let
(}q
X
1 vector which
(Qll"., Qp) C
{I, ... , m} such that [iq)_p iq)J C Dq, j = 1,
,p.
(i = 1, ... , k, j = 1,
,p)
be a vector of length pk(k -1) x 1 by excluding
OCiiq)
from cI>q(a). Suppose that (}'6 is an initial estimate of (}q, S((}q) is the pk(k - 1) x 1
vector (S~vs(a)), and I((}q) is the pk(k -1) x pk(k -1) matrix U;s',uv,u'v,(a)). Then
the quasi scoring algorithm, an updated estimate can be obtained as
where it is assumed that I( (}'6) is nonsingular.
replacing (}'6 and so on till that
(}q
converges.
48
The process
IS
repeated with
(}i
pairs of sucessive observation time intervals are [ip_I, i p
[i 3*-1,i 3*]
]
= [0,1],
[i 2*-1' i 2*]
= [0,2],
= [1,2], and [i 4*_I,i 4*] = [2,4]. One can find that D 1 = [0,2] and D 2
[2,4].
Let '11 ( Dq ) be the subset of {I, ... , M} which for all r E '11 ( Dq ),
C
[i r *-l, i r *]
Dq .
Then (3.1.5) can be displayed as
L( a) =
IT
IT
q=l rEII1(D
)
q
IT [IT p:
i j=l
,
s=l
] n;Jr*
6 *( ,)
r
•
(a)
ij
(3.1.8)
,
As pointed out by Turnbull (1976), there are two features about (3.1.8). First, we
search for a by maximizing L(a), any intensity function
Qij(i),
where
i
is outside
the set C = U;~l D q , cannot be a maximum likelihood estimate of a. Second, the
likelihood is independent of the behavior of a within each interval Dq • Based on the
features, we can estimate a independently within each time interval D q •
3*1.3
The Quasi-scoring Procedure
Form the pseudo likelihood, we can evaluate the score functions and second derivatives
for each q, i.e.,
(3.1.9)
and
k
L L
nijr*
{
rEII1(D q ) i,j=l
x
Direct use of a Newton-Raphson algorithm would require the evaluation of both
first and second derivatives. Kalbfleisch and Lawless (1985) used the scoring de.vice in which the second derivatives are replaced by estimates of their expectations.
This leads to an algorithm where only first derivatives are needed.
47
Let
4. As noted by De Gruttola and Lagakos (1989), the MLE of f) can be nonunique
for certain data structures. This nonuniqueness tends to occur when m is increasing.
Use of coarser time points is more likely to lead to a unique MLE, and will reduce
the number of the parameters and convergence time of the algorithm. However, if
the time is grouped too coarsely, the ability to distinguish subtle transition patterns
is diminished. Thus, a trade-off must be made between attempting to capture all of
the transition patterns and keeping the number of the parameters to a manageable
level.
3.1.4
Asymptotic Behavior of MLE
As noted by Rao (1973) and Kalbflisch and Lawless (1985), with a good initial values of f)'6, the algorithm produces 8~ which is consistent for f)q. Hence, I( 8~) is a
consistent estimate of I (f)q). By an argument similar to the proof of Theorem 2.1, it
is established that for non-panel studies, Zq,n
=
vn( 8~ -
f)q) has an asymptotically
multivariate normal with mean vector 0 and asymptotic covariance matrix [I(f)q)]-l
which can be estimated consistently by [I(8~)]-1. Note that Zq,n, q
= 1, ... , m*, are
independent.
We discuss the asymptotic behavior of the estimated pseudo transition probabilities as follows. Suppose that the number of total observation time points, m, can
be increasing to infinity in a fixed finite time interval, then we have
Lemma 3.1 Suppose that Qij(t) is a continuous boundedfunetion oft for all i,j E S.
~l m ~
00
in an appropriate rate such that ~s ~ t s - t s -
max
sup
sE{l, ... ,m} tE[ts_l,t s ]
1Qijs -
Qij(t)
1--+ 0
1
as
~ 0 then
m
~
00,
therefore,
sup
1Ptj(th,r-l, tj a) -
Pij(th,r-l, t)
1--+ 0
as
m
~
00
tE[th,r_l,th,r]
for all
i,l
E 5 and h = 1, ... ,n.
Proof: Suppose that for given s,
S
= 1, ... , m, there exists one t*
50
E [ts-I, ts] such that
Remark
1. There is an advantage to reparameterizing the model by writing Oijs = exp( ).ijs), i -=I
J. This is because the parameters
).ijs
can take any real value whereas 0ijs
~
O. This
reparametrization avoids problems that might arise when an iteration results in parameter vectors {8} outside the parameter space. Note also that it is possible to
have 0ijs very close to O. When this happens, successive iterates of ).ijs will typically
become large and negative. In this case, we might set 0ijs = 0 then fit the model
agam.
2. In some applications, constant intensity may not good enough. For example,
the interval [ts-I, ts] is too large compared with others. For the smoothing purpose,
one might choose Oij(t) which is a linear function of Oij(t s ) ,and oij(is-d for t E
(is-I,t s], i.e.,
or a higher order linear combination of {Oij (.)}.
For example, in two-state survival analysis, Suppose that).
ts-
I ),
= (t -
ts-I)/(is-
then
t - t s- I
O(s)
t s - t s- I
The estimated survival function is
O(t) =
+ t t -s -t ts
s I
o(s-l).
1
e- 2:~:~ tS-~S_1 {Ci(ts)+Ci(ts-l)}
S(t) =
X
e
- 2(
:-t) r ·
1
J-I
)
{(t-tJ-l )Ci(t) )+(2tJ -t )-1 -t )Ci( t J-l)}
o
where 1
~
j
~
if t
if
< to
tj-I
< t < tj
if t > t m ,
m.
3. With regard to the starting values for the iterative procedure, assume that
the time points th,r represent exact transition times between the states for subject h.
Then the starting values for transition intensities can be estimated by
where ni.s =
2:j=1 nijs
is the number of individuals staying in state i at time ts-I'
49
For N ;:::: 1 let us denote the event exhibited in last expression, namely:
AN(f)
{U U 1aijs -
=
I> fl·
Qijs
n>N s?:l
Then A N ( f) is decreasing with N. Let
Under the regularity conditions,
---+ 0
w.p.l
A
Qijs - Qjjs
as n
--+ 00
for all s.
That is there exists a null set N such that
Vw E
For each
n \ N:
the convergence, of
Wo,
Wo E
C and so
Then
B N +1
n\N
C BN and
C
lim
n-.oo
Qijs A
1
= O.
Qijs(W)
almost everywhere implies that
Qijs(WO)
= 1.
P(C)
nN=l B N = <p.
max
to
aijs(wO)
C. Hence
sE{l, ... ,m(n)}
aijs(w) -
Let us denote
BN(f)
= CnAN(f).
By Hewitt-Savage zero-one law
W P 1
Qijs I • .
---+ 0
as
m
--+ 00.
Then
lim
P(
N-.oo
max
sup
U max
lim P( U{ max
N-.oo
n>N sE{l, ... ,m(n)}
lim
P(
N-.oo
<
U
+
1
aijs -
Qijs(t)
sup
1
aijs -
Qijs
aijs -
Qijs
I> f)
n>N sE{l, ... ,m(n)} tE(ts-l,t.]
n>N sE{l, ... ,m(n)} tE(ts_l,t s]
sup
1
1Qijs
sup
-
Qijs(t)
+ Qijs -
Qijs(t)
I> f)
I
I} > f)
sE{l,oo.,m(n)} tE(ts-l,t.]
by Lemma 3.1,
max
sup
sE{l, ... ,m(n)} tE(ts_l,t s ]
1Ct.ijs -
Qij (t)
1---+ 0
as
m --+
00.
Hence,
lim
N -'00
P(
U{
max
1 aijs - Qijs 1
n>N sE{l, ... ,m(n)}
max
sup
sE{l,oo.,m(n)} tE(ts-l ,tsl
+
1aijs -
sup
sup
1 Qijs - Qijs(t) I}
sE{l,.:.,m(n)} tE(tS-l,tsj
Qijs(t)
I~ 0
as
n
> f)
= 0,
--+ 00.
Likewise, for fixed s
as
n
--+ 00.
•
52
For Oij(t) is continuousand bounded, L:::n=1 Oijs is bounded step function and satisfies
the Lipschitz condition which for any two real numbers u, v = 1, ... , m there exists
some k, 0
< k < 00,
10iju -
such that
0ijv
1<
k
Iu
- v
I.
as
m
-> 00.
as
m
By the property of the
Lipschitz function (see, e.g., Dudley, 1989),
m
sup
1L
s=l
t
Oijs - Oij(t) I~ 0
I.e.,
max
1Oijs -
sup
sE{l,.oo,m} tE[ts-l ,tsJ
Oij(t) I~ 0
For Pij(th,r-l, t; a) is a function of 0ijs,
1P/j(th,r-1' t; a) -
sup
-> 00.
= 1, ... , m, as the same argument, we have
S
Pij (th,r-1' t) I~ 0
as
m
-> 00
tE[th,r_l,th,r]
•
for all i,j E Sand h = 1, ... ,n.
Theorem 3.1 Suppose that m
=
m( n) is a function of n such that m( n)
=
o( n).
Under the regularity conditions, if aijs is the AILE of the 0ijs, where i, j E Sand
S
= 1, ... , m(n), then
I Oijs sE{l,oo.,m(n)} tE(ts_l,t ]
max
sup
A
Oijs t
(
)
I
W
P1
.
~
o
0
as
n
-> 00
s
and
as
Proof: First we show
max
as m
-> 00
I Oijs
A
sE{l,oo.,m(n)}
- 0ijs
I W • P. 1
~
0
in an appropriate rate. We note that
lim P(
N-oo
U
sup
1aijs -
Oijs
n>N sE{l,oo.,m(n)}
m(n)
lim P(
N-oo
<
lim P(
N-oo
U U
n>N
1
aijs - Oijs
U U I aijs n>N
I> E)
s=l
s~l
51
0ijs
I> E).
I> E)
n
-> 00.
As the same argument as Theorem 2.4, it can be shown that the Wald's statistic
has an asymptotic
3.3
X(m-l)k(k-l)
distribution under H,
Incorporation of Covariates
...
As discussion in section 2.4, suppose that each individual has an associated vector of
p covariates,
z'(t) = (Zl(t), ... ,zp(t)) and assume that z(t) = zs for all t E [ts-1,t s)'
Then one model of the pseudo transition function with covariates can be presented
as
(3.3.13)
Hence, for panel data, the pseudo likelihood in (2.1.5) is rewritten as
k
m
L(f3)
II II
=
Ptj(ts-l, ts; f3t iJ ' ,
(3.3.14)
s=l i,j=l
where nijs is the number of subjects with state j at time ts, given state i at time t s- 1
and
where
A=
L
L
TJlvs -
VES;
B =
L
L
TJjl's -
I'ES'J
TJius'
uES'I
and
,
'/1"
-
'/tJS -
Note that if z(s)
TJius
UES:
e z ,{3iJ •
= z for all s, then this model is reduced to time homogeneous model
with time independent covariates.
For time-dependent coefficient, one can define the piecewise transition function as
54
3.2
Tests of Hypothesis about Time Homogeneity
In this section, the same argument is made as Section 2.3, we can apply Wald's
statistic, likelihood ratio statistic and score statistic to test the time homogeneity
under the piecewise intensity transition assumption.
..
First let fJ = (fJ1', ... ,fJm *')' be a mk(k -1) x 1 parametric vector. We consider
testing the hypothesis that transition intensities are not varied by time; i.e.,
H:g(fJ)=O
where 9 : Rmk(k-l)
-+ R(m-l)k(k-l)
A:g(fJ)#O
vs.
is defined as
g( fJ) =
Ok(k-l)l -
Ok(k-l)2
Ok(k-l)(m-l) -
Ok(k-l)m
Note that in the non-panel studies, the distributions of the random vectors X
h,
h = 1, ... , n, are independent but not identical. A slight extension of the theory under
i.i.d. condition, the three test statistics are still applicable to test the hypothesis.
Here we just consider the Wald's statistic. Suppose that G(fJ) = (f)/f)fJ)g(fJ), i.e.,
1
0
0
o
-1
1
0
o
0
0
-1
1
0
0
0
-1
G(fJ) =
mk(k-l) x (m-l)k(k-l)
with the rank(G(fJ)) = (m - l)k(k - 1). By Section 3.1, the Fisher information
matrix can be estimated by a mk( k - 1) x mk( k - 1) matrix
"
Al
"'m*
I(fJ) = Diag{I(fJ ), ... ,I(fJ
53
n.
distribution which is defined as
.'(i) --
at]
(0)i'Y- 1 ,
O'.ij
(3.4.17)
where O'.~J) is the baseline intensity for the transition for i to j. The Weibull hazard
function is increasing if, > 1, decreasing if, < 1; and constant if ,
=
1. In the
last case, the Weibull hazard is the same as the exponential hazard under the time
homogeneous assumption. In order to describe the transition patterns in the various
time zones, we proposed piecewise Weibull intensity model as
(3.4.18)
The pseudo transition probability
P/; (ts-I, is; 0:, , ) for Pij (ts-I, is)
with the piecewise
Weibull transition intensity function is
where
A=
La/v - L
VES;
O'.iu
uES;
and
B
=
L
O'.j/I -
/IES;
The MLE of
0:
L
O'.iu·
UES;
and, can be obtained by the same procedure which described at the
Section 3.3. Testing the homogeneous Markov model hypothesis, H : , = Imxl' can
be performed by the score test, Wald's test or likelihood ratio test.
3.4.2
Piecewise Gompertz Model
In demography and actuarial work, the Gompertz distribution is widely used to
present event patterns. In this section we consider a hazard function from Gompertz distribution which is defined as
-
O'.ij (i) -
(0) (3t
O'.ij e ,
56
(3.4.20)
The pseudo likelihood for panel data is the same as the above except that Pij is
replaced by Pijs' One can apply the quasi-scoring procedure to obtain maximum
likelihood estimate of /3.
For the non-panel studies, one can apply the same imputation methods to estimate the incomplete covariates.
We note that based on the features of the period observation, the risk set of state
i transferring to state j is unobservable. Also the assumption of piecewise intensity
functions reduces the model to a complete parametric one. The partial likelihood
method (Cox, 1975 and Kay, 1982) to estimate the parameters Pijs does not work for
this circumstance.
3.4
Other Relative Model
The drawback of the piecewise intensity model is that a lot of parameters, mk( k - 1),
have to be estimated. Typically, if the number of the observation time points m is
large, then the piecewise intensity model might need a lot of computation. To simplify
the problem, for example, one might consider a certain additive model for piecewise
intensity functions which is
aij(t) = a~J)
+ /s
for all t E [ts-I, ts),
(3.4.16)
where a~J) is the. baseline intensity for the transition for i to j. The pseudo transition probability can obtain directly from (3.1.1), and the parameters decreased from
mk( k - 1) to m
+ k( k -
1). The interpretation of this model is straightforward.
The transition intensities are equal to intensities of subjects transferred and add the
quantities from different time intervals of subjects occupied. We next discuss two
alternative models with the same spiritual as piecewise intensity model but less parameters need to be estimated.
3.4.1
Piecewise Weibull Model
The expression of piecewise transition intensity corresponds to the hazard function
from exponential distribution. We might consider a hazard function from Weibull
55
are dissimilar. The natural course of an event (or disease) varies a lot from person to
person, so does the effect of covariates (e.g., the influence of the various risk factors.)
Especially, in the discrete time follow up studies, a lot of information is unavailable.
The observed transitions may be not representative of any single individual, but are
influenced by the unobservable heterogeneity. In such a situation heterogeneity is not
only of interest in its own right, but actually distorts what is observed. In general,
people use random effects to incorporate unobserved heterogeneity or frailty. Suppose
that O'~J; is the transition intensity for subject h with the transition from state i to
state j at time interval (is-I, is]' then a model includes individual heterogeneity or
frailty is
(3.5.23)
where O'ijs is constant across h, and
variable). In addition,
ZI, •.. Zn
Zh
a non-negative random variable (called a mixing
are iid random variables with distribution F(z). Then
for each specified F(z), we get a random effect model. One might assume that the
expectation of
Zi
is equal to 1. In this way
O'ijs
become in some sense an 'average'
individual intensity. If the distribution F( z) depends on parameter" parameters of
interest are the underlying transition intensities
O'ijs
and the parameter,. Note that
before the parameter inference, we assume that the random variable Z = z satisfies
the conditions of noninformative sampling scheme, i.e., conditional on Z = z, whether
an observation is carried out at a certain time is independent and is noninformative
of z. Under this assumption, we next discuss the proposed model with individual
heterogeneity.
3.5.1
Panel Data with Heterogeneity
Now by (3.5.23), the conditional pseudo transition probability in (3.1.2) for subject
h is
58
where a~~) is the baseline intensity for the transition for i to j. The Gompertz hazard
function is increasing if (3 > 0, decreasing if (3 < 0; and constant if (3 = O. In the
last case, the Gompertz hazard is the same as the exponential hazard under the time
homogeneous assumption. For the same idea to distinguish the transition patterns in
the various time zones, we proposed piecewise Gompertz intensity model as
(3.4.21)
The pseudo transition probability Pij(tS-l, is; a, (3s) for Pij(ts-l, is) with the piecewise
Gompertz transition intensity function is
where
A
= La/v vES;
L
aiu
UES;
and
B = L
ajl' - L
liES'J
aiu'
uES'•
f3 can be obtained by the quasi scoring procedure which described
at the Section 3.3. Testing the homogeneous Markov model hypothesis, H : f3 = Omxl,
The MLE of a and
can also be performed by the score test.
Remark
The piecewise Wei bull and Gompertz models have not only reduced the number of
parameters but also smoothed the intensity functions in the various time intervals.
3.5
Heterogeneity and Random Effect Model
We assume the transitions are the same for given a time interval and transition states
among individuals. However, it is often not true. The basic reason is that individuals
57
and
~3 = (is -
is-dI:
a/vs·
vES;
Then the marginal or observed data likelihood equals
k
m
L(a",Tl) =
II II
PD(ts-1,i s
I a,',Tlt
iJs
(3.5.28)
•
s=l i,j=l
To obtain the maximum likelihood estimation for the parameters, one possibility is
to make direct use of the observed data likelihood (3.5.28 ) [see, e.g. Aalen (1988)].
The quasi scoring algorithm which describes in Section 3.1.3 can be used directly.
The usual large sample properties of the maximum likelihood estimates
a, 1 and
r,
and the corresponding likelihood-based test are similar as the previous sections. In
particular, under the assumption of EZ = 1, the asymptotic distribution of likelihood
= 1/Tl =
ratio test statistics for var( Z)
0, i.e., there are homogeneous among the
study subjects, are still valid.
Another possibility to obtain the MLE of parameters by full likelihood (3.5.26)
is to use the EM-algorithm. Under piecewise transition intensity model, the joint
distribution
i.e., conditional on the data, the
Zh
are still independent and linear combination with
gamma distributions with parameters I and
~i
+ Tl,
i = 1,2,3 and
above. Therefore, given the da~a and current estimates
a, 1 and r"
~/s
are defined
the expectation
step is
E-step:
'"' ai/sa/js
LJ
A
X
ITl
"y
X
{IB (~
1
+ Tl -(')'+1)
-
~ 2 + Tl -b+ 1 ))
/ESj
+ A ~ B (~3 + Tl-(')'+l) - ~2 + Tl-b+l)) }.
.Then M-step is then: Compute
a, 1 and r"
(3.5.29)
the maximum likelihood estimates for
a, I and Tl, based on the full likelihood (3.5.26). In the other way, the M-step can
60
where
2: alvs - 2: aius
A =
VES;
UES;
and
2:
B =
ajl's -
l'ES;
2: aius,
uES;
provides A =I 0, B =I 0 and A =I B. Conditional on {z} the likelihood is
n
k
m
II II II
L(a I z) =
Ph;ij(t s- 1,ts I a,zh).
(3.5.25)
h=1 s=1 i,j=1
Suppose that
ZI, ... , Zn
are iid with gamma distribution, then the full likelihood can
be evaluated by
n
L(a","1) =
k
m
II II II !(Zh;""1)Ph;ij(tS-l,ts I a,zh),
(3.5.26)
h=1 s=1 i,j=1
and
!
is the gamma density
Using the fact
la
OO
zk-l e->'zdz =
r~:),
and Zh, h = 1, ... , n, are iid, we find by direct integration over
Z
the joint distribution
to obtain the marginal distribution
Ptj(ts-l,t s I a","1) =
la
oo
Ph;ij(t s- 1,ts I a,Zh)!(Zh;""1)dzh
2:. ailsaljs
J
+A
x
A
IES
{~((
B
~1
"1
+ "1
)" _ (
~2
"1
+ "1
~ B (( ~3"1+ "1 P - (~2: "1 p)},
A and B are defined as the above, and
~1 = (ts - t s- 1)
2: aius,
UES;
59
p)
(3.5.27)
be divided into two steps; first, based on the conditional likelihood (3.5.25) to obtain
the MLE of a~;1 and based on the heterogeneity model in (3.5.23)' we have
Step 2 is to replace it and
z in the full likelihood (3.5.26)
to obtain the MLE of /
and'TJ.
Note that based on the features of the panel observations, Zh does not depend
on h but only depends on the observed time interval and transition states. That. is,
conditional on the observation time points and transition states, the individuals can
be viewed as homogeneous. Therefore, the individual heterogeneity model in (3.5.23)
is equivalent to
(3.5.30)
where atis is the baseline transition intensity and Zijs are iid across h
3.5.2
= 1, ... , n.
Non-panel Data with Heterogeneity
In non-panel study, we define thetime intervals D 1 , D 2 ,
... ,
Dmo are disjoint subsets
of [to, tmJ in the section 3.1. A possible heterogeneity model is to assume that the
individuals share the same frailty within each time interval D q , q = 1, ... , m*. Then
the frailty model is
(3.5.31 )
if [ts-I. ts) C D q. Note that aijs is the baseline transition intensity and Zq are iid
across i, j and h. If one can assume the distribution of
Z
is gamma distribution, the
approach in the previous section can be used directly to obtain the MLE of a ,'TJ and
/.
61
Chapter 4
Some Measures For Model
Selection
4.1
Introduction
Questions about the adequacy of the model are sometime subjective. As noted by
Fienberg (1980) "Complicated models involving large numbers of parameters most
often fit a set data closely than a simpler model that is just a special case of the
complicated one. On the other hand, a simple model is often to be preferred over a
more complicated one that provides a better fit. There is thus a trade-off between
goodness-of-fit and simplicity, and the dividing line between the "best" rhodel and
others that also fit the data adequately is clearly very fine". This chapter presents
several relative measures which are from the predicting probability and predicting
event point of view can be used to assess the adequacy of the Markov models. In
section 4.2, a Pearson type statistic for the non-panel study and a Cochran-MantelHaenszel type statistic for panel stu9-Y are discussed. We show that these statistics
are applicable for goodness-of-fit test. Section 4.3, asymptotic relative effiCiency approach is proposed for parameters estimation and hypotheses testing. In the final
three sections we apply the predictive sample reuse approach which based on the
predicted outcome to analyze the adequacy of Markov models. In Section 4.4, we
introduce a graphic method to assess the adequacy of the proposed models under
62
panel observations. III Section 4.5, we propose a summary measure of correctness
predicting event. Large sample properties are also discussed. In Section 4.6~ an error
rate approach is applied to adjust the predictive sample reuse method. Finally, some
remarks for the proposed measures are discussed.
4.2
Chi-square Statistics
..
4.2.1
Pearson Type Statistic
Questions of the adequacy of homogeneous Markov models can be assessed by comparing observed transition frequencies, nijs, with expected estimated frequencies, eijs
= ni.s~j(ts-l' ts; it), where ni.s
= 2:j=1 nijs'
Pearson type statistic is applied to as-
sess the model adequacy by several studies (e.g., Kalbfleisch and Lawless (1985), Kay
(1986) )which is defined as
(4.2.1)
Expression (4.2.1) would be obtained if all possible transitions among the states in
S are treated as outcomes of a multinormial distribution. The limiting chi-square
distribution of Qp has degrees of freedom equal to the number of independent cells in
the multinomial distribution minus the number of the parameters which are estimated.
In panel study, since PtJ(t s- 1 , ts; it) are asymptotic independent for 'i or s, and if all
k states in S are transitive states, then the conditional distribution of Qp (given
marginal frequencies, and under H o) is asymptotically chi-square central distributed
with degrees of freedom k(k - l)(m - 1).
In non-panel study, one can reduce the information of transition to the smallest
exclusive time intervals {Dqh$q$m* which D1 , D2 ,
••• ,
Dm*,
m* ~ m, are the smallest
time intervals such that for given hand r, [th.r-I, th,r] ~ D q for some q E {I, ... , m*}
and {Dqh<q<m* satisfy the conditions, for q i- q'
for some
S
E {I, ... , m} if I q - q'
if I q -
63
1= 1
q' I> 1
and
m'
UD
=
q
[io, i m ].
q=l
Let n:iq
=
LrE\II(D q ) nijr' be the total number of transitions from state i transferring to
state j with time interval D q , and nijr' be the numbers of the subject transition from
i state to j state at the rth observed time interval. Then, for given i, we construct a
m* x k contingency table
.
q\j
1
2
k
1
nin
ni21
nikl
ni.l
2
ni12
ni22
nik2
nt.2
m*
nilm'
ni2m'
nikm'
ntm'
nil.
ni2.
nik.
71*
t..
If we treat the last row and column of the table as outcomes of a multinomial distribution with total nt.. and cell probabilities {P.ijq}, then LrE\II(D q ) ni.r'
X P/j (ir,_I, ir';
0)
is a consistent estimate of ni.qPijq. One can define the Pearson type statistic similar
as in the panel observations which is
Qp
=
m'
1: 1:
{*ijq
n
q=l i,j
~
e
eijq
*}2
ijq
(4.2.2)
where
e'ijq
=
1:
ni.r' X Pi'j(tr'-I,ir,;o),
rE\II(D q )
Then under Ho, Qp has asymptotic chi-square distribution
and ni.r' = L7=1 nijr"
with degrees of freedom k(k - l)(m* - 1).
Since greater departures of {nijs} from {eijs} produce greater Q p value for fixed
sample size
71
and observation time points m or m*, it is applicable to use the Pearson
type statistics as a criterion to choose models. We define a relative measure of the
Pearson type statistics for model 2 relative to model 1 as
Q~)
(4.2.3)
Ap=W'
Qp
where
m'
Q(V) = ""'" ""'"
p
~~
q=l
i,j
{*
rlijq
'"
. P*(i
i ' (V))}2
- L....rE\II(D q ) nt.r' ij r'-l, r', a
'"
L....rE.\II(D ) ni.r' P*(
ij i r'-l, i·A(V))
r', a
A
q
64
is the Pearson type statistic for model v = 1,2, and
&(v)
are the estimates of
a(v)
model v. Based on the relative measure, if Ap < 1, we may conclude that
un~er
model 2 fits the data set better than model 1 does.
Note that in applications, if there are zero valus for
nij1'*'
degrees of freedom is k(k - l)(m* - 1) minus the number of
4.2.2
one way to adjust the
{nijr*
= O}.
Cochran-Mantel-Haenszel Type Statistic for Panel Data
Based on the features of the Pearson type statistic approach, there are some disadvantages for this approach; first, the degrees of freedom are large if observation time
point m is large. For example, k = 5 and m = 6 then the degrees of freedom would be
100. The capability of the chi-square test may not be powerful. Second, if there are
relative small values of
nijs( <
5), then the asymptotically behavior may not perform
well. In this section we propose a Cochran-Mantel-Haenszel type statistic which pools
all of the information together for all of the observed time periods to prevent the high
degrees of freedom and the small frequency of observations in cells. We first consider
the panel data. For given a fixed time point s, we have (k x k) contingency table
where
ni.s
=
LJ=l nijs
=
i\j
1
2
1
nus
n12s
2
n2ls
n22s
k
nlks
nk2s
n.l s
n.2s
L7=1 nli(s-l)
k
n.ks
n .. s
= n.i(s-l). To obtain an appropriate Cochran-
Mantel- Haenszel type statistic, we follow a similar argument as Sen (1988).
note that
nis
De-
= (nils, ni2s, ... , niks)(kXl) and eis = E[nisJ = (eils, ei2s, ... , eiks)(kXl) are
the observed and estimated frequencies vector for subjects at time s, given at time
s - 1 with state i; n s =
ni.s, nis
,Pis
(nlsn2S ... nks)t
and e s =
(else2s ... eks)t.
has a multinomial distribution with sample size
ni.s
Therefore, given
and the parameters
= (Pils,Pi2s, ... ,piks)t, where Pijs = Pij(ts-b is) and eijs = ni.sPijs. Suppose that
the initial observations no are fixed and under Markov assumption, Anderson and
65
Goodman (1957) had shown that the observations nis and ni's' are uncorrelated if
8
i= 8' or i i= i'.
It is also well known that for large ni.s
-1/2
n.i.s D
(
- 11/2 ( nis'- e,is )
Pis
nils - ei1s ni2s - ei2s
1/2·'
ei1s
1/2
" .. ,
ei2s
niks - eiks)t
';;:j
1/2
N (0 V. )
k
,
1S,
e iks
.
1/2 1/2
1/2
1/2( 1/2)t
b
'
1
where D p~!2 = DIag(PiIs ,Pi2s' ... , Piks) and Vis = I - Pis Pis
can e consIstent y
,
d by D'lag (A1/2
A1/2 , ... , Piks
A1/2) an d bm 1 - Pis
A1/2( Pis
-l/2)t respec t'lve1y. N0 t e th a t
estImate
Pi1s' Pi2s
1/2( Pis
1/2)t}
trac {I - Pis
1/2( Pis
1/2)t}
k - trac {Pis
1/2}
k - trac { (Pis1/2)t Pis
k -1.
Hence,
(4.2.4)
where V s
=
Diag(V 1s , V 2s , ... , Vks) with rank{V s }
=
k(k - 1). Now let T r =
2:::=1 Z s and To = 0, then
E[Tr I Tr-I, ... , To]
E[Tr I nr-I, ... , no]
T r- 1
+ E[Zr I nr-I, ... , no]
and E[T r ] = 0, i.e., {TTl 1 ::; r ::; m} is a zero mean martingale. Furthermore, since
for all
8'
> 8,
it follows that
r
(4.2.5)
Var(T r ) = L:Var(Zr).
s=l
The Cochran-Mantel- Haenszel type statistic is defined as
where (V m )- is a generalized inverse of V m
and by (4.2.4) - (4.2.5),
QCMH
= 2::~1 V
S '
Under H o assumption model
has closely central chi-square distribution with degrees
66
of freedom rank{V m } = k(k - 1) (compared to k(k - 1)(m - 1) for
reduced degree of freedom often makes
QCMH
more powerful than
Qp),
Qp
and this
to detect the
adequacy of the Markov model.
As the same argument in the previous section, the statistic
QCMH
can be applied
to measure the discrepancy between the observed and the expected frequencies. A
similar measure for model 2 relative to model 1 is defined as the Pearson type statistic
(2)
ACMH
=
QCMH
(1)
(4.2.6)
,
QCMH
where Q~1H is the Cochran-Mantel-Haenszel type statistic for model u = 1,2. Based
on the relative measure, if
ACMH
< 1, we may conclude that model 2 fits the data
set better than model 1 does,
4.3
Asymptotic Relative Efficiency Approach
4.3.1
Generalized Variances
Consider now the context of estimation of a k-dimensional parameter 8 = (ell .... , fh)
by
iJ n = (ell .... , ih),
where
iJ
is asymptotic normal distributed with mean 8 and
covariance matrix n-1E8. The concept of variance as a measure of dispersion for 1dimensional distribution may be extended to the case of a k-dimensional distribution
in terms of a numerical measure called the" generalized variance" - the determinant
I E 8 I·
In this section we first discuss the motivation of the asymptotic relative
efficiency using generalized variances in the estimation application, and further as a
criterion to choose models.
Based on the asymptotic normal distribution of
iJ n ,
an ellipsoidal confidence
region for 8 is given by
where c
> 0 and is assumed that
EiJ: is defined.
Suppose that
in probability,
67
by Serfiing (1980), it follows that
Therefore, if
as
11,
-+ 00,
C
=
Ca
> cO')
is chosen such that P(Xk
=
Q',
we have
so that En represents a confidence ellipsoid for 8 having limiting confidence
coefficient 1 -
Q'
as
11,
-+ 00.
Note that, by Cramer (1946), the volume of the ellipsoid
IS
rr(1/2)k( Cain )(1/2)k
1
L,iJn
1
1/ 2
(4.3.7)
r(~+1)
One can compare two estimation procedures by comparing the confidence ellipsoids under a specific confidence co'efficient. In such a comparison, it reduces to a
comparison of the generalized variances of the asymptotic multivariate normal distributions involved and is independent of the choice of confidence coefficient. Suppose
-(1)
that one compares the sequences {8 n
-(2)
}
and {8 n
AN(8
IS
,
11,-1
},
where
E(v))
8'
and
for v = 1,2. From (4.3.7), a numerical measure of the symptomatic relative efficiency
-(2)
of {8 n
-(1)
}
with respect to {8 n
}
can be obtained by
-(2)
By this approach, we have that {8 n
-(1)
}
is better than {8 n
},
in the sense of asymp-
totically smaller confidence ellipsoid, if and only if 1E~) I~I E~) I.
We now apply this idea to compare the different proposed Markov models.
{6~1)} with dimension k( k - 1) is the estimated intensities
under time homogeneous assumption. By Section 3.3, 6~1) is normal distributed with
Suppose that the sequence
mean
8(1)
and asymptotic variance matrix n-1E~). Also let {6~2)} with dimension
68
..
mk(k-1) be the estimated intensities under time non homogeneous assumption.
is normal distributed with mean
8(2)
8~2)
and asymptotic variance matrix n -1 17~). By the
preceding approach, we defined a numerical measure of the asymptotic relative efficiency of {8~2)} with respect to {8~1)} adjusted with different dimension of parameter
as
I
E = c(k m) x
,
where
17(2)
1/[mk(k-l)])
1
_..:::.,8....,...-
I 17~)
(
_
1/[k(k-l)]
1
,
(4.3.8)
f(lk(k - 1) + l)I/[k(k-l)]
c( k, m) = f( ~~k( k _ 1) + 1)1/[mk(k-l)]
is a constant.
Based on the measure E, we might prefer time non-homogeneous
Markov model if E < 1. In practice, since 17~), v = 1,2, can be consistently estimated
by observed Fisher's information matrix,
E - c(k m)
n -,
J~v~-I, we can evaluate the En by
I J\I)
X
\1/[k(k-l)] )
_=8,,:..;.n
_
( I J(2) Il/[mk(k-l)]
.
8n
4.3.2
Application to Hypotheses Testing
In some applications we might assume that all events are observed exactly. However,
in most cases it is not true. How dose the assumption effect the hypothesis testing of
the parameters? In this subsection discusses a hypotheses testing with discrete time
observations. Let L 1 ,n( 8) be the likelihood function under the conventional assumption Markov model that assumes the exact transition times are equal to the observed
times if subjects are into the absorbing states, and let L 2 ,n(8) be the likelihood function under the proposed model in Chapter 3 that the absorbing times are interval
censored. Suppose that
81 ,n
and
82 ,n
are the MLE under the conventional assump-
tion model and proposed model respectively, and we are interested in the hypotheses,
H o : 8 = 8 0 V.s. H A
:
8 = 80
+6
for some 6
ratio tests statistics
and
69
=f
O. Then under H o, the likelihood
are asymptotic to the chi-square distribution with the degrees of freedom p = dim( ( 0 ),
Under H A , Ql.n and Q2,n are asymptotic to the non-central chi-square distributions
.
,
WIth P degree of freedom and the non-central parameters h 1
(1)
,
(2)
hand h 1 h respec80
80
tively, where I~~ and I~~ are Fisher information matrices given 8 = 8 0 . Since the
larger non-central parameter is more powerful than the smaller one, a measure of
asymptotic relative efficiency of L 2 ,n relative to LI,n can be defined as
h'I(2;n)h
80
£(L1n
, (1 ) h ,
, : L2n
, ) = hI;n
80
where I~~n) and I~~n) are observed nonsingular Fisher information matrices.
If
£(L 1 ,n : L 2 ,n) > 1, it means that the proposed Markov model is more powerful than
the conventional assumption.Markov model to detect the hypotheses. Note that, by
Courant Theorem, the asymptotic relative efficiency is bounded by the smallest and
largest characteristic roots of I~~n)[I~~n)]-\ i.e.,
where Al and Ap are the smallest and largest characteristic roots of I~~n)[I~~n)]-l.
Note that if Ap :S 1, then £(L 1 ,n : L 2 ,n) :S 1.
4.4
A Graphic Method for Panel Data
In this section and next section, we apply the predictive sample reuse method to assess
the fitness of the models. First, we discuss a graphic method by counting process
approach with multiplicative intensity model for panel observation. Let {X(i); i
~
O}
be a Markov process with state space S = {I, 2, ... , k}. We define a predicting state
for X(i s ), the state at time is, given the previous observed state i as
i.e., the state that maximizes the transition probability Pij(ts-I, is). For a data set
{X(ih,s)h~h~n,l~s~m, we define an indicator function of
X(ih,s) as
if X(ih,s) = ki(s)
otherwise,
70
i.e., if X(fh,s) is correctly predicted then b.Lhi(S) is equal to 1; otherwise O. Following
the setup of Sengupta and Jammalamadaka (1993) for the multiplicative intensity
model in discrete time, let (0, 'H, P) be a probability space and for h = 1, ... , n
and i = 1, ... , k,
{b.Lhi(S)}S~1
be a family of stochastic processes defined on (0, 'H),
having discrete parameter S and state space {O, I}. Let 'Hn,o = {¢, O} and 'Hn,s be
a sub-a-algebra of 'H with respect to which {b.Lhi(S)}1$s$m are measurable for each
hand i. We further define b.Nni(s)
= 2:1:=1 b.Lhi(S)
and Nni(m)
= 2::=1 b.Nni(s).
The process N ni (·) can be thought of as a "counting" process with the discrete time
parameter. We impose the following restriction on b.L hi (·):
k
Lb.L hi (s)::Sl
i=1
foreachs.
(4.4.9)
This assumption says that the same individual can not have two different types of
state at the same time. We can define the predictable process Ahi (.) in the same
manner as Sengupta and Jammalamadaka (1993)
(4.4.10)
This process is analogous to the stochastic intensity of a usual counting process. By
Aalen (1978), we postulate that
(4.4.11 )
where for each i, X 1i (·), ... , X ni (·) are predictable binary processes which are independent and identically distributed. On the other hand, the deterministic sequence O'i(')
IS
O'i( s) =
P[b.Lhi(S)
{
= 1 I Xhi(S)]
o
if P[Xhi(S)
= 1] 2: 0
otherwise,
for any h. Furthermore, we assume that 1/ k ::S O'i (s) ::S b < 1 for some constant b, if
O'i( s) > O. From the above we have
(4.4.12)
where Yni(s) = 2:1:=1 Xhi(S), the number (out of n) of individuals with state i at the
time
S -
1. Since O'i( s) is the probability of correct prediction for subject at time s
given subject at i state at time s - 1, therefore,
k
O'(s)'= LO'i(S)
i=1
71
(4.4.13)
can be thought of as a "measure" which indicates how well the predicted states fitting
under the some specific proposed model with respect to the data at time s. It is clear
that larger o( s) is better than the smaller one at time s. We can plot the o( s) verse
s for different proposed models to compare the fitting among models for a given data
set. The advantages of the graphic method are not only giving us an idea which
proposed models is better fitting the data set but also proving the information of
the fitness processes. Therefore, we might based on the information to construct a
new model for the given data set. There are some hypothetical examples given as
the following. For (a) the proposed model 1 (solid line) is better than the proposed
model 2 (dotted line); for (b) there in no significant difference between two proposed
models; for (c) the proposed model 2 is better than proposed model 1 up to time
4, but after time 4 the proposed model 1 is better than the p;roposed model 2. It
suggests that a mixed both models might be considered for the further study. For (d)
a more complicate mixed model is suggested.
72
Figure 4.1. Some hypothetical examples for graphic method
(a).
(b).
U)
U)
en
C')
en
C')
cO
cO
~
~
a.
"'iii
a.
"'iii
C\l
C\l
o
o
o
2
4
6
8
10
12
o
14
2
4
6
8
time-s
time-s
(c).
(d).
10
12
14
10
12
14
U)
U)
C')
C')
en
en
as
cO
~
~
a.
"'iii
a.
"'iii
C\l
C\l
o
o
o
2
4
6
8
10
12
o
14
2
4
6
8
time-s
time-s
73
4.4.1
Non-parametric Maximum Likelihood Estimation
If we assume that there is no censoring or that censoring is one of the states in the
study, then the likelihood function can be displayed as
.IX g
m
k
.
(
s
O:i( S )b.Nnd )
) Y n ;(s)-b.Nn ;(s)
k
1 - ~ O:i( s)
(4.4.14)
The likelihood is maximized by the estimate
(4.4.15)
where Jni(s) =
I(}~i( s)
> 0). For given s, we have
k
an(s) =
L
(4.4.16)
ani(s).
i=1
Large sample properties of MLE
Theorem 4.1 Let O:~i(s) = Jni(S)O:i(S) > O. ani(s) and an(s) are the MLE of O:i(S)
and 0:( s) respectively. Suppose that for every i and S Yni (s)
J
nJni (s)Yni 1 (s)O:i(S)(1- O:i(S)) and S;(s) =
= Op(n)
and let S;i( s) =
2:7=1 nJni (s)Yni 1 (s)O:i(S)(1- O:i(S)). Then
for given sand 'Hn,s, we have
(a)
Vn (ani (s) -
O:~i ( S))
Sni(S)
~
N(0,1)
as n
-+ 00.
(b)
Vn (an(s) - 2:7=1 O:~i(S))
Sn(s)
~
N(0,1)
as n
-+ 00.
Proof: (a) If we denote that
(4.4.17)
ani( s) -
o:~J s)
can be rewritten as a sum of a triangular array
Wn;;(s)
W 21 ;i( s), W 22 ;i( s)
74
I.e.,
n
L If,1nh ;i (S).
n (ani (S) - O'~i ( S)) =
(4.4.18)
h=l
Since we assume the individuals are independent each other, the r.v's, lVnh;i(S), in
each row are conditionally independent. We also have
(4.4.19)
and
n
n
L J-lnh;i (S)
L
nJni(s)Yni10'i(S)
h=l
Xhi(S) - nJni(s)O'i(S)
h=l
nJni(s)O'i(S) - nJni(s)O'i(S) = 0
for all S = 1,2, ... ,m and i
= 1,2, ... ,k.
Var(Wnh,;(s) I 'Hn,.-l)
Since }~i(S)
The conditional variance of lVnh;i(S) is
~ In;(s) (Yn~sJ
= Op(n) and O'i(S)
E
2
(4.4.20)
,,;(s) (1 - ,,;(s))Xh;(s).
[11k, b] , b < 1. Then
n
B~
=
L V ar(Wnh;i( s) I 'Hn,s-d
h=l
nJni(s)
(Yn~s)) O'i(S) (1- O'i(S)) =
Op(n).
Furthermore, Wnh;i(S) is bounded, i.e., there is an no such that for every
f
> 0 and
n > no, we have
.
P(I Wnh;i(S) - J-lnh;i(S)
I>
fBn I 'Hn,s-d = O.
Therefore, the Lindeberg condition
as n
~
00
is satisfied. By Dvoretzky (1972), n- 1/ 2 Lh=1 Wnh;i(S) has a asymptotic normal distribution with mean 0 and variance n- 1 B~, i.e.,
yn(ani(S) - O'~i(s))
Sni( s)
~
75
N(O,I)
as n
-+ 00.
(b) Since an(s) = 2.::7=1 ani(s), we first consider the conditional correlation between
am(s), i = 1,2, ... , k. Suppose that i
i- j,
E[am(s)anj(s) I 'Hn,s-d
-
Jni(S)Jnj(S)Yni(S)-lYnj(S)-l E[~Nni(S)~Nnj(s) I 'H n,s-l]
-
Jni(S)Jnj(S)Yni(st1Ynj(st1 E[L ~Lhi(S) L ~Lhj(S) I 'H n,s-l]
h=l
h=l
-
Jni(S)Jnj(S)Yni(st1Ynj(S)-1
n
+L
hiN
-
n
(E E[~Lhi(S)~hj(S) I 'Hn,s-l]
E[~Lhi(S)~hj(S) I 'Hn,S-l])
Jni(S)Jnj(S)Yni(S)-lYnj(S)-l L E[~Lhi(S)~hj(S) I 'Hn,s-l]'
hi=h'
Therefore,
Cov(ani(s),anj(s) I 'H n,s-l)
-
E[ani(s)anj(s) I 'Hn,s-d - E[ani(s) I 'H n,s-l]E[anj(s) I 'Hn,s-d
n
-
-Jni(S)Jnj(S)Yni(st1Ynj(st1 L E[~Lhi(S)~hj(S) I 'H n,s-l]
h=l
-
-Jni(S)Jnj(S)Yni(S)-lYnj(s)-lai(s)aj(s) L Xhi(S)Xhj(s)
h=l
O.
n
-
That is, ani (s) and anj (s) are conditionally uncorrelated for all i
we have
yin (an(s) - 2.::7=1 a~i(s))
Sn(s)
~
N(O,l)
as n
i- j.
Then by (a),
---+ 00.
•
Remark
1. Bias of the MLE:
ani(S) - ai(s) where ~Mni(S)
= ~Nni(S) -
Jni(s)Yni1(S)~Nni(S)
- Yni 1(s)A ni(S)
Jni(s)Yni1(s)~Mni(S) - (1 - Jni(s))ai(s),
Ani(S), We note that
~Mni(S)
= ~Nni(S) -
Ani(S) is a
martingate difference with respect to the filtration {'Hn,s} s~l' It follows that
76
E (Jni(S)Ynil(S)E[~Mni(S) I 'Hn,S]) - ai(s)E[1 - Jni(S)]
-ai(S)P(Yni(S) = 0)
-ai(S)pn(X1i(S) = 0)
and hence the bias goes to 0 at an exponential rate as n
- t 00.
2. In the real life, the transition probabilities Pij(t s- ll ts) are usually unknown. We
might substitute by the MLE of Pij(ts-l, ts)'
4.5
A Predictive Sample Reuse Approach for Nonpanel Observation
In this section, we use the same idea as the preceding section but consider the data
collected under non-panel studies. Let {X(t); t
~
O} be a Markov process with state
space S = {1, 2, ... , k}. We define a predicting state for X(th,s), the, state of the
subject h at time th,s, given the previous observed state i as
i.e., the state that maximizes the transition probability Pij(th,s-l, th,s)' To distinguish
the counting process approach at preceding section, we use the different notation for
the indicator function. Let hhi(S) be an indicator function of X(th,s) which is defined
as
8hi ()
S =
1 if X(th,s)
{
o
= khi(S)
otherwise,
i.e., if X(th,s) is correctly predicted then hhi(S) is equal to 1; otherwise O. Suppose
that
and
where
77
By Markov properties and the feature of the random variable, c5 hi ('), c5 hi (s) and c5 hi l( Sf)
are uncorrelated if s =I- Sf or i =I- if. We also note that fhi(S) is greater or equal
to
t.
t ::;
If (hi( s) = 1 then c5 hi (s) is determined exactly. Therefore, we assume that
(hi(S) ::; b < 1 for all h,i and s, where b is a constant. We also impose the
restriction as the preceding section on c5 hi (-):
k
L
for each sand h
c5 hi (s) ::; 1
i=l
This assumption says that the same individual can not have two different types of
state at the same time.
Suppose that
mh
Wh
k
=L
L c5hi (s)
s=li=l
is the number of correctly predicted for the subject hand
is the total number of correctly predicted for the sample. Then Cn can be used to
be a measure to select different proposed models. We first discuss the asymptotic
distribution of Cn'
Theorem 4.2 If Cn is defined as the above, then Cn has an approximate normal
distribution with mean
en = I:h=l fth
mh
and variance
k
mh
v; = I: h=l (j~,
where
k
fth = E(LLc5 hi (s)) =LLOhi(S)
s=li=l
s=l i=l
and
mh
(j~
k
V(LLc5 hi (s))
s=l i=l
mh
k
L L Ohi(s)(l - Ohi(S)),
s=li=l
That is
as n
78
---+ 00.
(4.5.21 )
Proof: For mh is finite for all h, we suppose that a1 ::; mh ::; az and we have Wh is
bounded. Since
t ::; (hi ( s) ::; b for some constant b for all h, i and s, it implies
Therefore, v~ =
Lh=l a~
--+ 00
as n
Then note that
--+ 00.
which implies
Therefore
t
E
1
(Wh ; !-lh)
3
1
::;
(az - ad
h=l
~
and by Liapounov Theorem,
---+
0
as n
--+ 00
~
as n
--+ 00.
•
Now we define a sample predictor for X(th,s) based on
Fij (th,s-l, th,s),
the MLE
of Pij(th,s-b th,s), as
(n)
,
k hi (s) = arg 1~a<1 Pij (th,s-l, th,s),
_J_
and the indicator function as
otherwise,
Also let
k
mh
win) = L L t5t)(s)
s=l i=l
be the number of correctly predicted for the subject hand
C~ =
Lwin)
n
h=l
be the total number of correctly predicted for the sample based on the estimated
transition probabilities. We now investigate the relationship between C n and
us first introduce the lemma as the following.
79
C~.
Let
Lemma 4.1 Suppose that i\(t s- I , t s ) is the k x 1 vector with elements Pij(tS-I, t s ),
j = 1,2, ... , k, which is the MLE of Pij(ts-I, t s). Let Wi(S) be the subset of subjects
which are given the state i at time t s- I and observed at time t s'
IfmaxI~j~k
Pij(ts-I, t s )
is unique for all s, then
(a)
maxI<j<d1\j(tS-I,
ts)} - maxI<j<dPij(ts-b
ts)} ~ 0
- - -
(b)
maXhEIlI;(sdk~7)(s) - khi(S)} ~ 0
(c)
maXhEIlI;(sd b~7\ s) - bhi( s)} ~ 0
for all i and s as
11,
-+
00.
Proof: (a). By Theorem 3.1,
as11,-+oo
for all j = 1,2, ... , k. Thus i\(tS-I, ts) converges to Pi(t s- I , ts) with probability 1. We
denote
J\,n
= maXI<j<k
with probability 1,
J\,n
I Pij(tS-I, ts) -
Pij(t s- I , ts)
I.
For
~ 0 with probability 1.
For 1 :::; j :::; k ,
so that
and
as
11,
-+
00.
Hence,
as
11,
-+
00.
(b). For maxI<j<dPij(tS-I,ts)} is unique and the result of (a) is independent for h,
80
(b) is straightforward from (a).
(c). For every
>0
E
P(
U
hEW;(s)
U
hEW;(S)
n>no (f )
< P(
n>no (f )
max
18t)(s). -
8hi (S)
I> E)
max
I kt)(s)
- khi(S)
I> E)
-+
as nO(E)
0
--+ 00,
Hence, by (b)
as n
--+ 00.
•
By Lemma 5.1, we have the following theorem:
Theorem 4.3 Let C~, ~n and v~ are defined as the above, then
(C~ - ~n) ~N(O, 1)
Wn =
as n
--+ 00.
Vn
Proof: It is sufficient to show that W n - Zn
P{I
W n - Zn
,\,n
--+
0 in probability. First we have
1< E}
((n)
P{I LJh=l Wh - Wh
)
1< E}
Vn
P{I 2: h=l 2::n:1 2:7=1 (8t)(s) - 8hi (S))
1< E}
Vn
< Var(2: h=l 2::n:i 2:7=1 817)(s) - 8hi (S))
V~E2
We note that
n
mh
k
)
Var ( '{;~~817)(s)-8hi(s)
n
mh
k
~ ~~Var (8t)(s) - 8hi (S))
h=ls=li=l
n
+2 ~ ~~ Cov ((8t\s) - 8hi (S)), (8h~!'(s') - 8hlil(S'))) ,
h>h' S,S' i,i'
and
n
~ ~~Cov
h>h' s,s' i,i '
((8tl(s) -
n
8hi (S)), (8h~),(s') - 8h'i l(S')))
~ ~ ~ {Cov(8t)(s), 8h~~(s')) - Cov(8h7 l (s), 8hlil(S'))
h>h' s,s' i,i'
81
For given s, Lh,i hhi(S)(n l is fixed. Hence, Cov(htl(s), h~~!,(s')) < O. By Lemma 5.1,
h~7\ s) converges almost surely to hhi( s) and ht l (s) and hhi( s) are bounded. Therefore,
and
s' then h~7l (s) and hh'i' (s') are asymptotic uncorrelated; if i = i'
and S = s', i.e., h,h' E Wi(S), then the Cov(h~7\s),hhli(S)) are asymptotic positive.
Hence, If 1:
=f i' or s =f
Also hhi(S) and hh'i'(S') are uncorrelated for all h =f h'. Therefore, we have
P{ I W n - Zn
1< f}
--7
0
as n
--7 00.
i.e., lVn is asymptotic normal distributed with mean ~n and variance v~.
•
In applications, we might compare the Wn values among different models for
a given data, or the probabilities, P(Z :::; Wn ) (Z is standard normal distribution).
The model with larger W n (i.e. larger P(Z :::; W n )) is better fitting the given data
than the smaller one. i.e., if
then we might conclude that model 2 is better fitting the given data than the model
1 does.
4.6
Error Rate Approach
In this section we estimate the error rate of the sample predictor
kt! (s) in classifying
future observations, and then apply the error rate to compare the fitness of different
82
proposed models. Suppose that Q[X(ih,s), ktl(s)] indicate the correctness of the h at
time is given the state i at time is-I,
Note that
Q[X(ih,s)' k~J)(s)]
and
=1-
8h,i(S)(n)
2::7=1 Q[x(ih,s),kt!(s)] ~ 1. The tTue error rate (Err) of the predictor k.\~)(.) is
the probability of incorrectly classifying a future observation X(iho,s), in other words
the expectation E Q[X(iho,s)' kt~i(S)]. If we could evaluate the true error rate for two
proposed models, then we have the relative inefficiency scale of model 2 with respect
to modell,
REL(l; 2) = Err 2.
Errl
If REL(l; 2) < 1, we conclude that model 2 is better fitness of the data compared with
the model 1. A similar idea called proportionate reduction in error (PRE) procedure
is proposed by Costner (1965) which is defined as
PRE
= Err~-
Err2
rrl
=1_
REL(1;2).
However, the value of the error rate depends on the unknown distribution of the future
observation. A simple intuitive way of estimating Err is to apply the predictor to the
observed values {xh,sh:::;h:::;n,s=I, ... ,mh retrospectively, and to measure the average value
of the observed error rate. The estimate is the apparent error rate
•
err =
2:: h=1 2::;':1 2::7=1 Q[X(ih,s)' ktl(s)]
n
.
2::h=1 mh
(4.6.22)
Since the data are used twice, the apparent error rate tends to under estimate the true
error rate (Err) and to be too optimistic about the predictive power of the predictor.
There are several methods to adjust the apparent error rate. We first discuss the
cross-validation approach which removes each X(ih,s) from the data set used in its
own prediction. Let kt;h,s)(s) be the predictor after X(ih,s) has been removed. Then
the cross-validated error rate is
83
Another resampling approach is to estimate the bias of the apparent error rate.
Following Efron (1982, 1983) we define the optimism 0 by
0= Err - err
with expectation w,
w
If
tv
= EO = E[Err -
err].
is known, then we could use it to adjust the apparent error rate to obtain an
.
estimate of the true error rate Err. In most cases w is not known, and must itself.be
estimated from the observed data. We can apply jackknife or bootstrap approach to
estimate the w. For example by jackknife approach, the estimated expected optimism
could be approximated by
'w(JACK)
=
"n
"mh " k
LJh=1 LJs=1 LJi=1
Q[ X (t h,s,
) k(n;h,s)(
)]
h,i
S
Lh=1 mh
n
I
L Lmh Lk
n
Lh=1 mh h=1 s=1
"n
"mh' " k
Q[ (t ) k(n;h',s)( )]
LJh'=1 LJs=1 LJ~=~
X h,s, h',i
3,
~=1
Lh'=1 mh'
and
Err(JACK) = w(JACK)
+ err.
One also can evaluate the estimated expected optimism by cross-validation approach,
w(CV)
= Err(CV) -
err.
Therefore, the adjusted relative inefficiency scale of model 2 with respect to modell,
' (CV)
Err 2
AREL(l; 2) = , (CV);
Err1
or
' (JACK)
E rr2
AREL(l; 2) =, (JACK);
Err1
Note that by Markov property, the MLE of transition probabilities can be estimated
independently within each time interval D q which defined at the section 3.3. This
feature helps us to reduce the calculation of
84
(n-h s)
kh,i"
(3).
•
4.7
Remarks
1. The conclusion of the measures by predictive sample reuse method may be that
the fit of both, one or neither model is adequate.
2. In section 4.5, the non-maximum transition probabilities make no contribution to the measure
.
en or Wn. We might define another loss
(gain) function which
involves the non-maximum transition probabilities. Suppose that l1h,i(s) = (00 ... 1...
0) is a k x 1 vector with all components zero except for a 1 in the position corresponding to the state occupied at time s for subject h given the state i at time s - l.
A slight extension of the square loss function defined by Stone (1974) can be applied
to the study. For given h, i and s, the loss function is defined as
k
fh,i(S)
=11
l1h,i(s) - Pi(ts-l, ts)
11 2 = l: (71h,i,j(S)
- Pij(t s- b t s))2 ,
(4.7.23)
j=l
where Pi(ts-b ts) = (Pil(ts-l, ts) ... Pik(ts-b ts))' is a k x 1 transition probability
vector and 7lh,i,j (s) = 1 if subject h is at state j at time s; 0 otherwise. Then a loss
function based on the observed data is
(4.7.24)
The same idea of section 4.5 can be applied here, i.e., the fitness measure of a model
is evaluated by
W~
n
=
mh
k
l: l: l:4~1(s).
h=ls=l i=l
Based on the loss function, the model with smaller
..
larger value of
W~
is better than the model with
l¥~ .
3. Since Pij(ts-b ts) is usually unknown in the real life, we have to estimate
Pij(ts-b ts) by the given data. Under this circumstance, the same data have been
used both to construct and evaluate ~Lhi(S),
8t!(s) and 4~1(s), we will overestimate
the fitness of the models. However, if we can assume that the quantity of the optimism
for each model is equal. The predictive sample reuse approach is still valid.
4. The estimated expected optimism is not only helping us to adjust the apparent error rate, but also verifies the assumption about the equality of the overfitness
of the model by a given data set.
85
5. The Pearson type statistic and Cochran- Mantel- Haenszel type statistic are
base on the predicting probabilities, and the predictive sample reuse approaches are
base on the predicting events. It is possible that one would have different conclusions
form different measures. Some examples can be found in the book of Hildebrand, et
al. (1977).
•
.
86
Chapter 5
•
Applications
In this chapter we present two real data sets taken from the literature and the Demography and Health Surveys project respectively, and see how the statistical methodology works for these data sets. Although the statistical methodology is most often
used in health science, it is also applicable to the multistate reliability theory. For
example, an electrical power generation system for two nearby oil rigs was given by
Natvig et ai. (1986). There are five components in the system: three generators,
one controller and one under-sea cable. The availability and the unavailability of
the system to a specific state in a certain time interval of interest. The bounds of
the availability and unavailability are dependent on the performance of components.
Suppose that S = {O, ... , M} is the set of the states of system; the M
+ 1 states
represent successive levels of performance ranging from the perfect functioning level
A1 down to the complete failure level O. Let the performance process of the ith com...
ponent be a stochastic process {Xi(t), t E [O,oo)}. They modeled each component
performance process as a three-state continuous time homogeneous Markov process,
then evaluated the bounds of the availability and the unavailability of the system.
The proposed methodology can be applied directly for modeling the Markov process
with nonhomogeneity.
87
5.1
Breast Cancer Data
Thirty-seven breast cancer patients treated for spinal metastases at the London Hospital which are given by De Stavola (1988). Their ambulatory status, defined by their
ability or inability to walk without aid, was recorded before treatment began and at
3 months, 6 months, 12 months, 24 months and 60 months. However, not all patients
were observed at the same time periods. Some of the patients were censored or did
not record the status for an observation time point. Let state 0 represent death, state
1 inability, and state 2 ability to walk. The data are shown in the Table 5.1 and
can be viewed as discrete time observations for a time continuous process with three
states.
We consider the data set fitting the non-homogeneous Markov model (NHM)
with piecewise transition intensity and the time homogeneous Markov model (HM).
Suppose that
0ijs,
i = 1,2, and j = 0,1,2, are transition intensities from i state to j
state at time interval (is-I, is)' Then the pseudo transition probabilities are
and
_ _0_1_0s_ _
012s
+ 010s
[1 _ e-(G>12S+G>lOS)(tS-ts-1)]
+ °12s020s e-(G>12S+G>lOS)(t s-tS-l)
Al
+
1
(e-Al(ts-ts-d
A1- B1
_ _0_20_s_ _
021s
+ 020s
x {_1_(1 _
Bl
_ e-Bl(ts-ts-l))}
'
[1 _ e-(G>21S+G>20S)(ts-ts-1)]
+°21s01Os e-(G>21S+G>20S)(ts-tS-l) x
+
e-Bl(ts-ts-l))
A2
{_1_(1 _
B2
1
(e-A2(ts-ts-d _ e-B2(ts-ts-l))}
A2 - B2
e-B2(ts-ts-d)
'
where
88
,.
Table 5.1 Breast Cancer Data - Status at following follow-up times (in months)
..
Patient
0
3
6
12
24
60
Patient
0
3
6
12
24
1
2
2
2
2
2
0
20
2
1
1
1
0
2
2
2
0
21
1
0
3
1
0
22
2
2
2
2
2
4
2
1
2
2
2
0
23
1
0
5
2
2
2
2
1
0
24
2
1
NK
0
6
2
2
2
0
25
2
2
2
2
0
7
2
1
2
2
26
1
1
1
1
0
8
2
1
0
27
2
1
1
1
0
9
2
2
2
28
2
2
2
2
0
10
1
0
29
2
1
1
0
11
1
0
30
2
2
1
0
12
2
2
31
2
1
1
1
13
1
0
32
.2
2
1
1
14
2
2
33
2
1
1
0
15
1
34
2
2
2
0
16
2
2
2
35
2
2
0
17
1
1
0
36
1
0
18
2
0
37
1
0
19
1
0
2
2
0
2
0
0
0
89
0
60
Table 5.2. MLE of
O'.iJs
states
1-+0
1 -+ 2
2-+0
2 -+ 1
0.1831(0.0766)
0.0410(0.0386)
0.1530(0.0293)
0.1023(0.0023)
o- 3
0.2125(0.0483)
0.0843(0.0268)
0.1113(0.0354)
1 x 10- 7 (0.0136)
3- 6
0.0312(0.0223)
0.0285(0.0650)
0.0231(0.0220)
0.0130(0.0142)
6 - 12
0.1131(0.0715)
2 x 10- 16 (0.2387)
0.1104(0.0206)
0.0318(0.0159)
12 - 60 t
0.2433(0.0174) 3x 10- 15 (0.0102) 0.1689(0.02397) 0.0043(0.01376)
t assuming the subjects homogeneous in this period
1-+)
HM
NHM
Bl = -( 0'.12s
+ O'.lOs)
and
A2 = -AI
Based on the quasi scoring method, the results are shown in Table 5.2. Under
the time homogeneous Markov model, the estimated transition from non-ambulant
status to death is higher than the other transitions. This situation is still true for
the non-homogeneous model although the transitions are quite different from period
to period. However, the non-homogeneous model also suggests that transitions from
ambulant status to death are increasing over time. The test of homogeneity produces
the Wald statistic Qw = 21.43 with 12 degrees of freedom. Although this result is
based on a small sample, it is apparent that there is fairly strong evidence against
the homogeneity assumption. However, the validation scores for these two models
are CHM,n = 58 out of 97 (Zn = -0.004) for the homogeneous model; and CNHM,n
= 65
(Zn
= 0.015)
for the non-homogeneous model. We might conclude that based
on these prediction measures, the non-homogeneous Markov model is better than the
homogeneous Markov model.
In survival analysis of the breast cancer data, the proposed estimators are compared to the Kaplan and Meier (1958) estimators and the Peto (1973) estimators
which are shown in the Figure 5.1. In that Figure, the proposed survival probability
90
estimators are between the Kaplan-Meier estimators and the Peto estimators, which
is expected. The Kaplan-Meier method assumes that the failures occur at the end
of the observation periods. In the Kaplan-Meier method, the survival probabilities
are overestimated. The Peto method tends to assume that the failures occur at the
beginning of the observation periods. There, the survival probabilities might be underestimated. The proposed estimates assume that the failure rates are constant
..
within the observation periods, which is more reasonable than the others. Note also
that the proposed estimators are smoothing the Peto estimators and the Kaplan-Meier
estimators if there is no right censoring observation in the data set.
The piecewise intensity assumption in the breast cancer study seems to be reasonable for the first year since the ambulatory status ordinarily would not change very
quickly. However, after one year, the follow-up periods were too long (12 months and
:36 months) to assume constant transition intensity. A linear function of intensities
might work better for follow-up periods after one year.
5.2
Birth History Data
In this section we applied the proposed statistical methodology to a data set taken
from the Demographic and Health Surveys project. The investigator followed up
1200 women whose ages were between 15 and 49 years in Indonesia to record the
timing of the women's experience with pregnancy, childbearing, contraception use,
the duration of breast.feeding, postpartum amenorrhea and other related experiences
for 69 months (from Jan. 1986 to Sep. 1991). Seven hundred and eighteen women
never used contraceptive methods during the survey period, and 418 women had one
or more live births within the 69 months. Among those 418 women, 271 women
had two births, 55 women had three births and 4 women had four births. In this
application, we would like to use this data set to understand the birth process among
women who never used any contraceptive method within the survey period.
This application uses some definitions and assumptions given by Yashin et al.
(1994). A woman is fecund if she is capable of producing a live birth. Fecundability
is defined as the probability of a conception in a one unit time interval (the menstrual
91
cycle is approximated by a one unit time interval of one month) for a sexually active
woman who is not pregnant, postpartum infecund or using a contraceptive method.
The period from a birth until ovulation is called the postpartum infecund period.
During the postpartum infecund period a woman cannot conceive, and she is defined
as postpartum infecund. Usually, the transition times from the postpartum infecund
state to the fecund state are unobserved. Often, the birth data about pregnancy
wastage are not available ,and analyses are confined to live-birth conception, i.e.,
effective fecundability.
Sterility is defined as the inability of a non-contraceptive
using sexually active woman to have a live birth. From a biological point of view,
after a childbearing, a woman would have a chance to go to permanent sterility or
temporary sterility (the postpartum infecundity). Since the data did not provide
information about permanent sterility, we would assume that all of the women after
a live birth would go to the postpartum infecund state. Furthermore, we do not
consider miscarriages, and only live births can cause postpartum infecundity.
5.2.1
Model of Postpartum Infecundity and Effective Fecundability
The model of postpartum infecundity and effective fecundability can be described in
terms of a three-state stochastic process where "PI" is the postpartum infecund state,
"F" is the fecund state, and "P" is the pregnant state. Note that the empirical data
available usually contain information about only dates of each birth of each individual woman. To simplify the calculations some studies have set the duration of the
postpartum infecund period to a fixed but unkown parameter which can be estimated
from the data (e.g., Yashin et al (1994)). Since this data set provides information
about postpartum amenorrhea, the transition from the postpartum infecund state to
the fecund state is observed. We can evaluate the transition from the postpartum infecund state to the fecund state and from the fecund state to the conception directly.
Also based on all this information, we infer a transition probability from the fecund
state to conception in cases with unobserved the fecund state.
We first consider the proposed piecewise Markov model based on the full information. Suppose that afIF and a;P are the piecewise transition intensities from the
92
"PI" state to the "F" state and from the "F" state to the "P" state, respectively, at
time interval (t s -
l ,
t s ]' Then the transition intensity matrix for a woman after a live
birth is
PI
..
:' (
F
P
_a sP1F
a sP1F
0
-a FP
s
0
0
+]
and the probability of transition from the "PI" state to the "F" state is
(5.2.1 )
and the probability of transition from the "F" state to the "P" state is
(5.2.2)
Based on full information, the maximum likelihood estimators are shown in Figures
.5.2 - 5.7.
To compare the probability of transition from from the fecund state to conception using the existing methodology and using the proposed piecewise model, we now
disregard known observations of the transition from the postpartum infecund state to
the fecund state. First, we follow the proposed methodology by Yashin et ai. (1994).
Suppose that a woman at the birth of her i th child is in the postpartum infecund
state for some fixed but unknown time
specific for parity i. Let
Ti
tt , i =
j = 1,2, ... , n be the observed intervals between the i th birth and the (i
.
ception for the
ni
0,1, ... , I,
+ 1 )st
con-
women. Assuming that the conditional time to conception given
Ti
is exponentially distributed with parity specific rate hi +l (transition from parity i to
(i
+ 1)st
conception), then the MLE of
A
Ti
Ti
and hi are given by
= mm {J}
ti ,
i=1,2, ... ,I-1, .
•
J
and
no
A
hl =
",no
L.Jj=l
0
n·
A
h·~+l --
.
tJ
",nj
L.Jj=l
93
j
ti
~
A
-
niTi
Since the data provides only 69 months' window, we assume that Ti is the duration
of the postpartum infecund state from the i th birth to the (i + 1 )st conception during
the survey period for each individual woman. Under this assumption, the maximum
likelihood estimators are 71 = 2, 11,1 = 0.0345, 11,2 = 0.0390 and 72 = 2, 11,3 = 0.0475.
The results are shown in Figures 5.6 - 5.7.
Second, we revised the Yashin et ai. (1994) model. Suppose that we follow
the Yashin et ai. (1994) methodology but assume that the conditional time to the
conception given Ti is exponentially distributed with the parity specific rate hi+di) =
h 1+ I ,S for all i E [is-I, is). We thus use a piecewise model similar to (5.2.2) - but
given a transition time from the postpartum infecund state to the fecund state equal
to 7i = 2 for all women to estimate the transition intensities and the transition
probabilities. These results are also shown in Figures 5.5 - 5.7.
From Figures 5.6 - 5.7, we find that the estimated transition probabilities given
by Yashin et ai. (1994) are always larger than the estimators given by the proposed
piecewise model. However, the estimators given by the revised piecewise model are
very close to the proposed piecewise model using full information. From Figures 5.2
- S.4, around 70 percent of women tranferred from the postpartum infecund state to
the fecund state in about 2 months which is equal to the MLE of the Ti. That is why
the estimators given by the revised model are close to these given by the proposed
piecewise model in this data set. Therefore, we conclude that the assumption of a
constant transition rate to conception is the main cause for the overestimation of the
probability of transition from the fecund state to the conception. In fact, in clinical
studies the conception rate varies a lot among women. Especially, after a certain time
(this data set showed around 15 months) the conception rate decreases. This also
explains why the difference between the estimators by Yashin et ai. ( 1994) and the
proposed piecewise model increases over time.
94
Table 5.3. MLE of (3
all births
1st - 2nd birth
2nd - 3rd birth
131 -0.0151(0.0043) -0.0134(0.0046) -0.0249(0.0110)
(32
-0.0235(0.0047)
-0.0238(0.0491 )
-0.0290(0.0150)
Incorporation of Breast-feeding
..
It is well known that breast-feeding appears to be a very strong determinant of du-
ration of the postpartum infecund period, and of the birth process. Women who do
not breast-feed are amenorrheic for an average of 1.5 - 2.0 months after a birth, while
the meafi length of amenorrhea may exceed 20 months in populations with extensive
breast-feeding customs (Potter and Kobrin, 1981). Thus, variations in breast-feeding
patterns, and hence in the duration of the postpartum infecund period, are crucial in
explaining variations in fertility. We consider the transition intensities to incorporate
the duration of breast-feeding z as
and
The estimators of 13 are shown in Table 5.3. The previous description is confirmed by
the negative sign of the estimators.
..
9.5
Figure 5.1 Estimated Survival Probability of Breast Cancer
0
T"""
~
Proposed
Peto
a>
K-M
0
ex>
0
r--.
---<
I···..
I .....
I
.....'
0
.~
:c
ctl
--.
CO
0
1\
1\.
.
I \
I \
I
.
I
.
I
....
I
.
I'"
IL
\ ...'
.0
CZ>
~
e
a..
as
U1
0
.:::
2:
v•
::J
en
,
0
,
,.~
('I)
0
I····.
I ....
I
C\J
:
0
.
.
1
".
.
l_·~·~
L.. - - - - - - - - ·~l\
T"""
0
0
0
o
5
10
15
20
25
''::':':':':':'::'=-=._._._._._._._.
30
Months
35
40
._._.
45
50
.
55
.
60
a
•
Figue 5"2 Estimated Transition from Postpartum Infecund state to Fecund state(all births)
C!
.....
/
I
CC!
,
I
~
,
0
0
"~
:is
ro
co
-l
.0
0
....
.?;-
"w
e::
CC?
--
0
Q)
e::
e::
a..
e::
0
.-::;
"w
e::
....ro
C\!
0
0
.-::;
"w
e::
~
0
....ro
l-
l-
I
I
I
.....
0
C\!
0
C!
o
C!
o
o
10
20
30
months
40
50
60
o
10
20
30
months
40
50
60
Figue 5.3 Estimated Transition from Postpartum Infecund state to Fecund state(after first birth)
~
.....
r
I
-
~
I
~
0
0
~
co
00
~ C'?
.c 0
'~
wc:
a..
c:
c:
...c:
Q)
0
~
'wc:
ctl
0
0
:;::::;
0
'';::::;
"!
'wc:
~
0
ctl
~
~
I--
I-I
I
I
.....
0
"!
0
~
~
o
o
o
10
20
30
months
40
50
60
o
10
20
30
months
40
50
60
~
Figue 5.4 Estimated Transition from Postpartum Infecund state to Fecund state(after second birth)
C!
.....
~
I
I
et:!
0
CX?
0
~
~
.c
co
co
....
00
~
"iii
~
c::::
Q)
0
c::::
0
E
rn
c::::
ro
....
'"":
0
I-
.....
ro
....
0
l~
~
c::::
c::::
0
+=<
"iii
c::::
I
I
I
0
0
C!
C!
o
o
o
5
10
15
months
20
25
30
o
5
10
15
months
20
25
30
0
LO
0
'<t
-....
-
en
.c
......
.-
:c
0
C")
--'"
en
.c
-E
c:::
0
c:::
0
+=
0a>
()
c:::
0
C\I
----=----=-- --
0
0
.....
()
--'"
0
a>
--
en
.......
0
'0
c:::
::J
()
90'0
80'0
a>
Ll-
170'0
Al!SualUI UOmSUBJl
0'0
GO'O
E
°....
c:::
E
en
c:::
0
0
LO
'"
....
I-
0
'<t
'0
.8
'E"
+=
en
-
.c
a>
OE
.c
::J
0>
i.L
C")
:-...
0
LO
Lri
0
\
W
\
.......
'0 a>
a> a>
E.~
"-
.......
0
C\I
"- "-
=a;
O' ~
......
-
"- ......
~a:
I
.......
"-
I
I
0
"-
.....
""
"
8'0
9'0
17'0
Al!l!qeqOJd uOmSUBJl
100
c:::
0
E
"- .......
-'0
-
en
.c
G'O
0'0
0
•
..
.-
Figue 5.6 Estimated Transition from Fecund state to conception(after first birth)
~~
- - Full method
Revised metho
Yashin
,/
/'
I-'
0
I-'
0
./ "
/
/
/ "
~
c::
-.:t
0
•
~
I--
;
~1
;
!
;
c::
o
:;::::;
I
/
I
-.:t
~
I
. - _ _ _ _ _ _ _ _ ... _ - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
0
"w
c::
/
/
as
/
~
I--
I
I
I
C\J
I
I
.I
I
I
~
0
I
I
I
ci 11-/
I
;
~
"w
c::
Q)
.....
c::
/
/
/
:;::::;
"
.,
CD
0
0
I
/
/
0
as
/
/
---
./
/
CD
.
0
a..
"w
c::
,.., ,.."
./'
, ./"
.0
~
0
I
ci 1
"~
:0
as
co
I
/
/
.
II
/
/
I
I
I
I
I
i
0
10
20
30
40
50
months
I
~
~ ~/ j'U..-J-l
L
tv'"
0
0
10
20
30
months
40
50
Figue 5.7 Estimated Transition from Fecund state to conception(after second birth)
~
1
~j
- - Full method
Revised metho
Yashin
~ ~
"" ""
,
.~
:0
ro co.
.0
........
0
t-:l
0
/
....
0
0..
c::
0
0
~
"Ci)
c::
Q)
E -:t
c:: C?
o 0
:;::::;
/
/
~
0
/
"Ci)
,/
....ro
c::
I
I--
~~
I
I
I
~ ~ '""
0
ft
11
/I
1\
II
-------------r
/I
II
I
I
I
I I
I
C\I
C?
I
-----------,---------
,
I'
1\
0
f
f
I I
I I
I
I
I--
I
/
/
.I
/
I
co
/
0
....~
./
/.---
/
"":::;
"Ci)
,/
,/
.""
"".""./.
I
I
1
I
I
I
1
I
I
1
\
\
I
I.
/-
1
I
1
1
A
1
11
I
fI
I
1
I
f
1
I I
1
I I
1
1 \ "\ 1
1
\
-
/
I
20
10
months
"
30
40
I
C?
0
0
10
20
months
.
30
40
Chapter 6
..
Discussion and Further Research
The main contribution of this paper is to propose a methodology for analyzing discrete time categorical observations which often occur in medical studies, social science
research, and demography. The discrete time categorical observations can also be analyzed by the generalized linear model approach under some appropriate assumptions,
for example, the weighted least square approach by Grizzle, et al. (1969) and Woolson
and Clarke (1984). However, the time effects have not been well studied using linear
model approaches.
The identifiability problem occurs if one of the states has never been observed
at the certain time point. For example in the usual birth history data only dates of
births are available; the fecund state is unobservable. Since only the pregnant state
can be observed exactly, the probability of transition from the postpartum infecund
•
state to the pregnant state is
(6.0.1)
The intensities of transition from the postpartum infecund state to the fecund state
and from the fecund state to the pregnant state can not be estimated simultaneously. Further assumptions are needed to adjust the identifiability problem. Another
. example is in clinical research. Patients with severe health problems tend to visit doctors frequently, i.e., their observed periods between visits are shorter than for others.
103
The assumption about the transition occurrence might depend on the various states
and the length of the periods. Therefore, the assumption of at most two transition
occurrences within an observed interval would be violated in this situation.
The adequacy of the proposed models can be evaluated by the proposed relative
measures based on the predicting probabilities and the predicting events. There are
no standard answers right now about which relative measures are more appropriate
to assess the goodness of fit. A numerical study to evaluate the different measures
in Chapter 4 might be important as the next step in the investigation. However,
it might sometimes depend on a non-statistical point of view. For example, in the
subarachnoid intracerebral hemorrhage (SICH) study, the predicting survival rate
for patients with certain symptoms is 0.3. Hospital managers are quite concerned
about optimal medical resource allocation since medical care resources are limited and
treatments (ICU and surgery) are very expensive. Suppose that there are ten patients
with certain identical symptoms; 3 patients would expect to survive. The question
is which 3 patients would survive. The cumulative predicting probability might not
be useful for a hospital manager. However, the predicting event approaches provide
a clear cut prediction of individual outcomes which medical resource managers need
in order to make decisions about patients in this particular study.
The proposed relative measures can also be applied for a general purpose. For
example, in the missidentified risk sets of the counting processes approach, one can use
the relative measures to assess the inefficiency. For the different order assumptions
of the Markov process as well as for the assumption of the number of transition
occurrences during the observed periods, the relative measures can be applied to help
researchers to understand the impact of the assumptions. For the evaluation of the
categorized continuous response effect and of the sensitivity analysis, the relative
measures provide a way to compare different models simultaneously.
For non-panel studies, the theoretical results are not well investigated yet of the
Cochran-Mantel-Haenszel type statistic and the graphic method in Chapter 4 are not
well investigated yet. The asymptotic pattern might be interesting in some particular
situations. For example, if we apply the available observations retrospectively, the
graphic method proposed in Section 4.4 can be extended to non-panel
104
studie~.
Let
Y:J s)
be the size of the risk set at time s for subjects with state i at the previous
observed time point. Note that the previous observed time points may not be equal
for all subjects in the risk set. Then the estimated intensity function is
f1Nni (s)
CYni(S) = Jni s) Y;i(s) ,
A
where Jni(s) = I(Y:i(s)
(
> 0). For a given s, we have
k
an (s) =
L
ani (s).
i=l
•
The same procedure in Section 4.4 can be applied to assess the adequacy of the
models. However, further theoretical work is needed to study the behavior of the
large sample properties for the statistics.
Based on the nature of the periodic observation scheme, the covariates are missing at certain time points. If the observation times for each subject are completely
random, we can assume the missing covariates are missing at random. The different
imputation methods which are proposed in Section 2.4 can be evaluated by numerica.l study. The length of the observed interval and the proportion of the missing
value might influence the imputation effect. Further numerical study is important
to investigate possible patterns based on the length of the observed interval and the
proportion of the missing value.
•
105
Bibliography
Aalen, D. D. (1978). Nonparametric inference for a family of counting processes, The
Annals of Statistics 6: 701-726.
Aalen, D. D. (1988). Heterogeneity in survival analysis, Statistics in Medicine 7: 11211137.
Andersen, P. K. (1988). Multistate models in survival analysis: a study of nephropathy and mortality in diabetes, Statistics in Medicine 7: 939-944.
Andersen, P. K., Borgan, D., Gill, R. D. and Keiding, N. (1993). Statistical Models
Based on Counting Processes, Springer-Verlag, New York.
Andersen, P. K. and Gill, R. D. (1985). Counting process models for life history data:
a review (with discussion), Scandinavian Journal of Statistics 12: 97-158.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis; 2nd
edil'ion, Wiley, New York.
Anderson, T. W. and Goodman, 1. A. (1957). Statistical inference about markov
chains, AnnMathStat 27: 89-110.
Bhat, U. N. (1972). Elements of Applied Stochastic Processes, Wiley, New York.
Blossfeld, H. P., Hamerle, A. and Mayer, K. U. (1989).
Event History Analysis,
Lawrence Erlbaum Associates, Inc., New Jersey.
Chiang, C. 1. (1964). A stochastic model of competing risks of illness and competing
risks of death, Stochastic Models in Medicine and Biology pp. 323-354.
106
.
Chiang, C. 1. (1979).
Survival and stages of disease, Mathematical Biosciences
43: 159-171.
Chiang, C. L. (1980). An Introduction to Stochastic Processes and Their Applications,
Krieger, New York.
Clayton, D. (1988). The analysis of event history data: a review of progress and
outstanding problems, Statistics in Medicine 7: 819-841.
Clayton, D. G. and Cuzick, J. (1985). Multivariate generalizations of the proportional
•
hazards model (with discussion), Journal of the Royal Statistical Society, Ser. A
148: 82-117.
Costner, H. L. (1965). Criteria for measures of association, American Sociological
Review 30: 341-353.
Cox, D. R. (1972). Regression models and life-tables(with discussion), Journal of the
Royal Statistical Society, Ser. B 34: 187-220.
Cox, D. R. (1986). Some remarks On semi-markov processes in medical statistics., in
J. Janssen (ed.), Semi-Markov models. Theory and Application, Plenum Press,
New York and London, pp. 411-421.
Cox, D. R. and Miller, H. D. (1965). The Theory of Stochastic Processes, Wiley, New
York.
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, Chapman and Hall,
.
London.
Cramer, H. (1946). Mathematical Methods of Statistics, Princeton University Press,
Princeton.
De Gruttola, V. and Lagakos, S. W. (1989). Analysis of
doubly~censored
survival
data, with application to aids, Biometrics 45: 1-11.
De Stavola, B. L. (1988). Testing departures from time homogeneity in multistate
markov processes, Applied Statistics 37: 242-250.
107
Dvoretsky, A. (1972). Asymptotic normality for sums of dependent random variables,
Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2: 513-535.
Efron, B. (1982). Maximum likelihood and decision theory, The Annals of Statistics
10: 340-356.
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on
cross-validation, Journal of the American Statistical Association 78: 316-331.
Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data) 2nd edition,
The MIT Press, Cambrige, Massachusetts.
•
Fleming, T. R. (1978). Nonparametric estimation for nonhomogeneous markov processes in the problem of 'Competing risks, The Annals of Statistics 6: 1057-1070.
Frydman, H. (1992). A nonparametric estimation procedure for a periodically observed three-stat markov process, with application to aids, Journal of the Royal
Statistical Society) Ser. B 54: 853-866.
Gentleman, R. C., Lawless, J. F., Lindsey, J. C. and Yan, P. (1994). Multi-state
markov models for analysing incomplete disease history data with illustrations
for ivh disease, Statistics in Medicine 13: 805-821.
Gill, R. D. (1992). Multistate life-tables and regression models, Mathematical population studies 3(4): 259-276.
Grizzle, J. E., Starmer, C. F. and Koch, G. G. (1969). Analysis of categorical data
by linear models, Biometrics 25: 489-504.
Griiger, J., Kay, R. and Schumacher, M. (1991). The validity of inferences based on
incomplete observations in disease state models, Biometrics 47: 595-605.
Hamerle, A. (1989). Multiple spell regression for duration data, Applied Statistics
38: 127-138.
Hildebrand, D. K., Laing, J. D. and Rosenthal, H. (1977). Prediction analysis of cross
classifications, Wiley, New York.
108
,
Hoem, J. M. (1985). Weight, missclassification, and other issues in the analysis of
survey samplesof life histories, in J. Heckman and B. Singer (eds), Longitudinal
Analysis of Labor Alarket Data, Cambrige: Cambrige University Press, chapter 5.
Holford, T. R. (1976). Life tables with concomitant information, Biometrics 32: 587597.
.
..
Jager, J. C. and Ruitenberg, E J, e. (1988). Statistical Analysis and Mathematical
Alodelling of AIDS, Oxford, London.
Kalbfleisch, J. D. and Lawless, J. F. (1985). The analysis of panel data under a markov
assumption, Journal of the American Statistical Association 80: 863-871.
Kalbfleisch, J. D. and Lawless, J. F. (1988). Likelihood analysis of multistate models
for disease incidence and mortality, Statistics in Medicine 7: 149-160.
Kalbfleisch, J. D. and Prentice, R. 1. (1980). The Statistical Analysis of Failure Time
Data, Wiley, New York.
Kaplan, E. 1. and Meier, P. (1958).
Nonparametric estimation from incomplete
observations, Journal of the American Statistical Association 53: 457-481.
Kay, R. (1986). A markov model for analyzing cancer markers and disease states in
survival studies, Biometrics 42: 855-865.
Klein, J. P., Klotz, J. H. and Grever, M. R. (1984). A biological marker model for
predicting disease transitions, Biometrics 40: 927-936.
Lagakos, S. W. (1988). The loss in efficiency from misspecifying covariates in proportional hazards regression models, Biometrika 75: 156-160.
•
Lagakos, S. W., Sommer, C. J. and Zelen, M. (1978). Semi-markov models for partially censored data, Biometrika 65: 311-317.
Longini, I. M., Clark, W. S., Byers, R. H., Ward, J. W., Darrow, W. W., Lem, G. F.
and Hethcote, H. W. (1989). Statistical analysis of the stages of hiv infection
using a markov model, Statistics in Medicine 8: 831-843.
109
Manton, K. G. and Stallard, E. (1988). Chronic Disease Modelling, Charles Griffin
and Company LTD, London.
Namboodiri, K. and Suchindran, C. M. (1987). Lzfe table techniques and their applications, Academic Press, New York.
Natvig, B., Sormo, S., Holen, A. T. and Hogasen, G. (1986). Multistate reliability
theory - a case study, Adu. Appl. Prob 18: 921-932.
Peto, R. (1973).
Experimental survival curves for interval-censored data, Appied
Statistics 22: 86-91.
Potter, R. G. and Kobrin, F. E. (1981). Distributions of amenorrhea and anovulation,
Population studies 35: 85-94.
Prentice, R. 1. and Gloeckler, L. A. (1978). Regression analysis of grouped survival
data with application to breast cancer data, Biometrics 34: 57-67.
Rao, C. R. (1973). Linear Statistical Inference and its Applications(2nd Ed.), Wiley,
New York.
Rucker, G. and Messerer, D. (1988). Remission duration: an example of intervalcensored observations, Statistics in Medicine 7: 1139-1145.
Sen, P. K. (1988). Combination of statistical tests for multivariate hypotheses against
restricted alternatives, in S. Dasgupta and J. K. Ghosh (eds), Proceedings of
the International Conference on Advances in Multivariate Statistical Analysis,
Indian Statistical Institute, Calcutta, pp. 377-402.
Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics: An Introduction with Applications, Chapman and Hall, New York.
Sengupta, D. and Jammalamadaka, S. R. (1993). Inference from discrete life history
data: a counting process approach, Scandinavian Journal of Statistics 20: 51-62.
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics, Wiley,
New York.
110
'/II
..
Shah, K. R. and Sinha, B. K. (1989). Leture Note in Statistics 54 : Theory of Optimal
Designs, Springer-Verlag, New York.
Stone, M. (1974). Cross-validation and multinomial prediction, Biometrika 61: 509515.
Thompson, W. A. (1977). On the treatment of grouped observations in life studies,
..
Biometrics 33: 463-470.
Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with
censored data, Journal of the A merican Statistical Association 69: 169-173.
Turnbull, B. W. (1976). The empirical distribution function with arbitrarity grouped
censored and truncated data, Journal of the Royal Statistical Society, Ser. B
38: 290-296.
Woolson, R. F. and Clarke, W. R. (1984). Analysis of categorical incomplete longitudinal data, Journal of the Royal Statistical Society, Ser. A 147: 87-99.
Yan, P. (1992).
Some models for multistate life history analysis with incomplete
data. PhD thesis, PhD thesis, Department of Statisitcs and Actuarial Science,
University of Waterloo, Waterloo, Canada.
Yashin, A. I., lachine, I. A., Andreev, K. F. and Larsen, U. (1994).
Multistate
model of postpartum infecundity, fecundability and sterility by age and parity:
methodological issues. unpublish paper.
...
111
© Copyright 2026 Paperzz