Wu, Margaret C. and Carroll, Raymond J.; (1987).Estimation and Comparisonof Changes in the Presence of Informative Right Censoring by Modeling the Censoring Process."

Estimation and COmparison of Changes in the Presence of Informative Right
censoring By Modeling the censoring Process
Margaret C. Wu
Biometrics Research Branch
National Heart. Lung. and Blood Institute
Be thesda. Mary land
20892
Raymond J. Carroll
1
Department of Statistics
University of North Carolina
Chapel Hill. NC
27514
1Research supported by Air Force Office of Scientific Research contract
AFOSR-F-49620-85-C-0144.
In estimating and comparing the rates of change
between
two
groups,
all
individuals
with
censoring
effects
expected
rates
When,
in
of
change.
addition,
the
process is dependent upon the individual rates of change (i.e.,
squares
estimates
will
be
Likelihood ratio tests for informativeness of the censoring process and
maximum likelihood estimates for the expected rates of change and the
of
model.
censored or missing data, these estimates are no longer efficient
informative right censoring), the generalized least
biased.
variable
have complete observations at identical time points these
when compared to generalized least squares estimates.
right
continuous
Under a linear random
statistics are maximum likelihood estimates for the
However,
a
the unweighted averages of individual simple least squares
estimates from each group are often used.
when
of
parameters
the right censoring process are developed under a linear random effect models
with a probit model for the right censoring process.
In realistic situations, we
illustrate that the bias in estimating group rate of change and the reduction
power
in
of
comparing group difference could be substantial when strong dependency
of the right censoring process on individual rates of change is ignored.
Some Key Words:
Informative right censoring; Linear random effect; Probit
censoring; Rate of change.
right
- 1-
•
~IOO
In
clinical
trials
and
Introduction
1
longitudinal
studies
it'is often of interest to
estimate and compare the rates ·of ·change of one or more vaTiables between groups.
in e.g., lung function or tumor growth.
change
of
a
Furthermore.
continuous response variable between
the primary objective.
t1ro
oomparing
the
rates
of
t.reatment groups is often
Death or withdrawal may cause some
observations
of
the
primary variable to be right censored.
Growth
methods
for
comparing
ra'tes
'of
<::hange
have been studied
extensively, see Rao (1965), Fearn (1975) and Schlesselman (1973).
Most of these
analyses assume that
observations.
Maximum
'.
curve
likelihood
there
and
are
no
Tight
generalized
censored
or
nlissing
weighted least squares provide alternative
approaches to simple least squares for .the.analysisof series
some
observations are right censored or missing.
a distribution-free test for the comparison
data.
of
Koziol"
measurements
when
et. al. (19B1) proposed
growth 'curves
with
incomplete
In order to be valid, these procedures requi're that the probabi Ii ties of
right censoring or missing do not depend on the parameter values of the
under
response
investigation, 1.e., they are non-informative with respect to the response
parameters.
In this paper we are primarily interested in right 'censoring caused
participant's
death
censoring process.
or
withdrawal,
to
be
referred --to
by
the
as the primary right
The·primary·right·censoring ·process· could be informative with
respect to the response parameters.
In
our
development,
staggered
entry
and
other missing value.processes, if incorporated, are assumed to be non-informative
and independent of the primary right censoring process.
Under
•
a
linear
random
effects model, we propose' a model which can depend
both on the individual's initial.value and slope.
informativeness
·A likelihood
ratio
test
for
and maximum likelihood' estimates for the response parameters and
- 2 -
the primary right censoring process coefficients are derived under a probit model
for the probability of primary right censoring.
The right censoring is considered to be non-informative with respect to
response
parameters
if
the
likelihood
function
independent parts. one corresponding to the
can
response
be
factored
parameter
the
into
and
the
e
two
other
corresponding to censoring parameters.
We
show
that
when
the
primary
right
censoring is non-informative. the
maximum likelihood estimates for the average linear
the
response
estimates.
all
are
weighted
linear
combinations
regression
of
the
coefficients
simple least squares
In the case of complete observations at identical time
individuals.
these
estimates
are
just
the
of
unweighted
points
averages
among
of the
individual simple least squares estimates.
The proposed method is applied to
,
gathered
by
the
workshop'
on
data
Natural
on
History
patients
of
PiZ
with
PiZ
phenotype,
TOe-
Emphysema (1983).
illustrate the effect of irifor~five 'right censoring, maximum likelihood and
weighted
and
simulated
unweighted
clinical
non-informative
least
squares
trial$Y with
probit
proces~
primary
procedures
right
are
applied
censoring
to
generated
the
a set of
from
a
and then to another set of simulated trials with
,.
primary right censoring generated from
~
..
procedures and between these two:
set~.
squared
error
and
power
eompa~~so.rls
"'....
SECfION 2
an
informative
probit
process.
Mean
are made among the different statistical
Linear Random Ef£ects and Informative Right Censoring
We assume that the particiPants of a longitudinal study are divided into two
treatment groups of sample sizes
=
n +n .
1 2
~.
for k
= 1,2.
The combined sample size is
ne
Let there be J identical mortality (and withdrawal status) follow-up
- 3 time points. t • with t
j
can
have
at
most
1
= 0 and t
= the length of the study.
J
R measurements
of
the
response
measurement time need not be identical among individuals.
the
corresponding
Wi th t
1
or
right
censoring
participant between time t
was
not
right
the
study.
Let vi = total
The
number
measurement time for the ith participant in the combined
sample for v=1.2 •.... v. and i=1.2 •...• n.
withdrawal.
during
participant
Let Y and t
be the vth response
iv
iv
of measurements made for the ith individual.
and
Each
j
and
censored and t
due
t +
j l
ivi
to
i1
iR
= t
J
iv
t.
~
IV.
death.
entry occurred for the ith
.= t
1
if
J
1
staggered
; otherwise t
= t
=O. let t.
J
if the
ith
participant
if the ith participant had complete
observations.
It is assumed that the serial measurements of the primary variable follow
linear
function
of
time.
Let
{3.t
",'-
representing the true initial value and
individual in the combined sample.
=
({3jl.{3i2)
slop~~ofth~
For<,~;€*., tW.~
k
t
a
be the unobservable vector
primary variable for the ith
= 1,2:
(2.1)
Xl·
=[1 ..... 1 ]•
til····· t ivi
The notation. i € k. is used to denote that the ith participant in
the
combined
sample belonged to the kth treatment group.
We
further
suppose
that the probability of being primarily right censored
due to death or withdrawal during a specified time interval (O.t ). given ",1
(3 .•
j
M(~t~i' t
j
).
Here ~t =
is
(a 1 .a2 ) is the vector of "regression parameters" relating
- 4 this
~i'
probability to the primary variables
Examples of logical choices for
are proportional hazards regression (Cox 1972). logistic regression
Duncan
(1967»
and
probit
regression
(Halperin.
instance. under probit regression I(at~. t )
j
= ¢(~t~
14It
(Walker
and
Wu and Gordon (1979».
For
+
a
Oj
$
where
)'
is
the
cummulative probability of a standard normal variate.
Since
t
(Xi Xi)
-1
for
~i'
each
(i)
the
t
(Xi ~i) (ii) censoring time
statistics
for
~i'
least
simple
and
squares
survival
(iii)
time
are
the primary right censoring process.
The marginal likelihood for
=
sufficient
it suffices to consider the joint distribution of
._
~i
estimates
'"
~ .•
",1
and
~
~.
and
~
for
",1
the ith individual can be expressed as
{(l-I.
J-1
)C(i,j-1)
( M - M )Z(i,j-;~)}.·E1-M J(1-m(i»d R
j
j-1
.
--"
.\'J.J
!::i'
indicator
fun(~tion;thatrr(lth.e
,;.!-tl:l
individual
because of staggered entry. "Z( L,J)" is
wi thdrawal
occurred
in
the
e·
(2.2)
was censored in the j th interval
indicator
function
that
death
or
thEk Jth interval for the i th individual, D is constant
with respect to ~i' ~ and ~.and
2
~1'1 = a ~ (Xi
The notation
~
t
X.)
1
-1
'2(~' ~. ~)
and covariance matrix
integration
sign.
the
m(i) =
•
~
(C(i.j) + Z(i,j)} .
j
represents the bivariate normal density with mean vector
~.
On the right hand-side of equation (2.2).
first
factor
represents
the
conditional
under
the
probability
4It
- 5 A.
distribution of £i given £i ' C(i.j) and Z(i,j)
factor
is
the
probability
distribution
of
for
£i'
(j-1)th
time
Withdrawal) between the
J=2 ....• J
given
and
point
~ ..
",1
second
The third factor of products
corresponds to the conditional probabilities that the
the
The
J=1 •...• J-1.
ith
participant
survived
then was censored by staggered entry or death (or
(j-1)th
and
the
jth
time
points.
respectively
for
The last factor represents the conditional probability that
the ith participant survived the entire study. given
~..
Therefore
",1
of these four factors is proportional to the joint distribution of
the
product
~ .• ~ ..
Z(i.j)
",1
",1
and C(i.j). because the staggered entry process and the missing value process are
assumed
to
process.
be
non-informative
and
independent of the primary right censoring
Hence. integration with respect to the vector
~.
",1
provides the
marginal
A.
likelihood of £i' Z(i,j) and C(i.j) with respect to
~
and
~.
This notation can be used for those measured only at baseline.
Equating all
elements except the (1.1)th of «;11 and:~~!~o~:t~~o and letting the (1,1)th element
of
If'
~1i
equa I
a~2
and
~
~i2
=
R
~i2
=
0.
The marginal likelihood for all n
individuals is the product of the indiVidUal iikelihoods:
Joint estimation of the parameters depends'ontthe ability to evaluate
and
its derivatives.
For this section
w~ia~sUmethat
more realistic case will be discussed in the next two
(2.2)
can
be
evaluated
by
numerical
integration.
J. p
and a ~
2
are known.
sections.
When
(2.2)
In
principle
the primary right
censoring process is a probit model. (2.2) can be evaluated explicitly: for i
and k = 1,2.
t-1
= In{D} + In{A i } - 0.5(£i - ~) «;2i (£ik - ~) + T i •
A.
where
1
Ai =
(2T1~2iI2)-1. ~2i = ~li
+
J.p'
The
(2.3)
€
k
- 6 -
J
T. = ~ {C{i,j-l) In[l-¢(U ij _ )] + Z{i,j-l) In[¢(U ij ) - ¢(U ij - l )]}
l
1
j=2
+ (l - ~ Z{i,j) - ~ C{i,j) ) In[l-¢(U )],
iJ
j
j
1
Uij = {aOj + ~ik
~ik = Cli
-1
A
~i + ~p
t
-1
C3i ~)(l + ~
t--
~, C3i = (Cli
C3i~)
2,
-1
-1 -1
+ ~p
)
When there are right censored or missing observations, C will differ among
3i
individuals.
Ti ,
is
Hence, the primary right censoring contribution to the
in the form of a non-linear probit model.
of the parameters can be made in principle
intervals
is
small.
likelihood.
Maximum likelihood estimation
provided
that
the
number
of
time
Otherwise, some contraints could be imposed on the a 's to
o
reduce the number of parameters.
Likelihood ratio testis for the hypothesis (H : a = a = 0) versus (H l : a =
2
O l
2
o and a l
-# 0) and the hypothesis H
l
conducted.
versus
(~:
a
l
-#
0
and
a
2
-#
0)
can bee·
When H is true, the primary right censoring will be non-informative
O
with respect to ~ for k=1;~j2. However, when H is true, it can be shown that
l
the coefficient of B2k in Uij; of
-
the
primary
is
right
"
,
,
~2.3)
is non-zero even when
a~ ~
",
= o.
1 2
censoring will be informative with respect to ~ for k=1.2.
true,
the
primary
right
censoring
non- informa t i vewi threspEic t 't'e} '~ ;-'
SECTION 3
Hence
Estimation and Testing for Noninformative Censoring
When H is true, the maximum likelihood estimate of
O
~
is
is
,
- 7 -
B
""GL,k
the
generalized
least
= [I
~ -1]-1
iEk 2i
squares
I (~2i
iEk
estimate
-1 Jj.),
"
(3.1)
",,1
When
(GLSE).
complete observations measured at identical time points,
all individuals have
~2i
wi 11
the
be
same
among individuals, in which case (3.1) reduces to
~,k
~i / ~,
I
= iEk
the unweighted least squares estimate (UWLE).
~
_
I
GL,k - [ iEk
When
IJj
~
-1
2i
and
The covariance matrices are,
-1
]
a
2
E:.
(3.2)
I
and
Cuw,k = [
are
unknown, the following unbiased estimators can be
iEk
substituted,
n
sf3 /(n-l),",", It,
ltriln
(3.3)
i=l
- Y.
t
"
X.Jj.)
",,1
1",,1
However, IJj has the disadvantage that it' is-not i necessarily
positive
definite.
The procedure given by Bock and Peterson (1975)f9r constructing an estimate that
is at least semi-definite will be used.
When the goal of a study is to compare differences in rate of change between
two
groups,
we
wish
to
alternative hypothesis, H
A
test
B
12
the null hypothesis,
< B22 .
HN :
B
12
= B22
against the
The test statistic is of the form,
1
B" )/(-" 2 + -" 2)2
"
(B
12 - 22
u
u
B12
B22
""
"
with ~ = BGL ,k2' and Buw,k2 respectively.
For
shifted
alternatives,
sample
- 8 size,
power
and
significance level of the test can be related according to the
4It
approximate formula,
(3.4)
where A is the difference in expected rates of change we wish to detect, a and
are
the
Type
I
Type II errors of the test with Za and
and
deviates corresponding to a and
REMARKS:
Z~
~
the unit normal
~.
We have by assumption that an
individual's
coefficient
estimate
is
unbiased, 1. e. ,
Thus
the
cases.
~2i
and
unweighted
least
squares estimate is unbiased
for~.
There are two
When the primaryri~t'~ensoring is non-informative, the distribution
in (3.1) does not depend on
unbiased
~i'
Ofe"
so that the GLSE and UWLE are both consistent
estimators' {6f i~; ~rthbugh of' course the UWLE is less efficient.
Furthermore, the
T
diff~rences
relative
~,ChI
j'iH
between
the
variances
and
hence
the
Vi
i
reqUired sample sizes of the UWLE and the GLSE for the slope or initial value are
2" £"'2'> : , , , , 1 2
2
a function of (ae-. /~{32
or (c:re /a~1 ) respectively. When the primary right
L,
censoring process is
~n,forwa~iye,
unbiased, although the GLSE
i~"not
SECfION 4
the unweighted least squares estimate is
because
~2i
and
~i
still
are dependent.
Examples and Simulations
This paper was motivated by design and analysis problems encountered in many
clinical
trials
concerning
pressure breathing trial
lung
(IPPB
diseases,
1983)
for
e.g.,
chronic
the
Intermittent
pulmonary
pOSitiV~
diseases.
One
- 9 specific
example
was
the
feasibility study of an anti-proteolytic replacement
therapy trial among individuals with PiZ phenotype. conducted by the Workshop
the
Natural
History
of PiZ emphysema.
The association between severe alpha
antitrypsin deficiency and lung diseases. particularly pulmonary
been
on
emphysema,
observed since the early 1960's (Laurell and Eriksson (1963».
l
-
has
Individuals
with PiZ phenotype tend to develop severe alpha antitrypsin deficiency and hence
l
pulmonary emphysema and more rapid decline in lung function.
was
designed
expiratory
to
detect
volume
Retrospective
data
planned
trial
differences in rates of decline of a one second forced
(FEY )
l
on
The
between
a
control
and
therapeutic
a
group.
PiZ indiViduals were gathered from the ten participating
institutions (see Workshop on Natural History of PiZ Emphysema (19B3»
to prOVide
crude estimates of parameter values required for sample size calculations.
4.1
Estf~tion and Testing
£1,) .<-11~
>
A Fortran program was developed
entry.
"
.
-:"
f~r ~~hima~io~ ~h,en
there
is
no
staggered
The method of pseudo maximum likelihood estimation (PMLE - see Gong and
i·f~, ~21'~: L
Samaniego 19B1) was used.
Estimates of
0f:. ..
,;:md
r:.i
and
were made according
simple
and1.f3"
for
least
squares
The UWLE of
each
~
The algorithm first calculates
(3.3)
is used as ini tial value for
individual
according
to
initial values for a • a • a ' ...• aOJ'
02
l
2
presented
the
intercept and slope "t-Ol-".each individual and estimates a f:.
(2.3).
~
in
calculating
in the Appendix.
d'
",,1 k
2
and
Partial derivatives of the log
likelihood for the Newton-Raphson iterative procedure are then
are
to
:~"
the Bock and Peterson (1975) procedure and sN-bstituted into (2.3). which was
then maximized by the Newton-Raphson method.
C
3i
~f3
:','D
calculated
using
Formulae for these partial derivatives
Note that the initial values for the a's can be
chosen arbitrarily with the constraint a 02
This algorithm was applied to the
PiZ
< a 03 < ... < aOJ'
emphysema
data.
Among
the
da ta
- 10 -
gathered
for
294
PiZ
individuals
by
the
ten U.S. institutions, initial and
follow-up FEY! values (with the initial and the
months apart) were available on 117 individuals.
ranged
from
last
The number of FEV
to 12 (mean number of measurements
2
measurements
= 3.8).
at
1
least
6
measurements
The duration between
the initial and the last measurement ranged from 6 to 227 months (mean duration
52 months).
~
Since the proposed trial duration was between three and
=
six years,
an analysis, corresponding to a three year follow-up study, was first made using
the initial and all follow-up FEY
initial
measurement.
1 measurements made within three years of
Since many did not have reported follow-up FEV 's within
1
three years of the initial measurement. only 81 individuals with
included
in
this
deaths
8
a
reported
years
the
of
individuals
death
within
ini tial
the first six years.
measurement
h.c;(',"-;\
t ') G: ",
were
used.
1
This
analysis
included
? -,-'
19 deatbs. - Because of the small number of deaths. mortality
with
of
ffiilL~~af~ltterits were
between the ini tial
~~fihe"~iJsrFEVI;'s were
year
number
follow-ups,
The
2.9 and 3.6 and the average duration
28 and 48 months for
the
3
and
6
r~spe6~I~eiy: i'Thi:)~e individuals with only one FEV 1 measurement
-.
':~
.
were not included in these
bias
years
Follow-up FEV 's within six
follow-ups were grouped inicit~6Fgq~i:length intervals for both analyses.
average
were
A second analysis. corresponding to a six year
analysis.
follow-up study. was also made among those with a minimum follow-up of six
or
the
'.: ,r .', i
,:.~
;:.:~ (J'I-~'
,","1\'~ "'"t
analyses~This
has the effect
of
causing
a
slight
',' ft"r~
in the unweighted least squares analysis and a slight loss of efficiency in
the informative censoring analysis.
)()()()()()()()()()()()()()( )(
The purpose of these analyses was to test for informativeness of
censoring
caused
by
of
right
particiPant's death with respect to FEV 1 initial value and
slope. obtained from 3 and 6 year follow-ups. respectively; and to
estimates
the
the primary right censoring coefficients.
for the iterative procedure were a 02
= -1.35,
a
03
= -0.90
derive
crude
The initial values used4lt
and a 1
= a 2 = O.
The
- 11 -
algorithm converged after 12 and 10 iterations for the first and second analyses.
The results are presented in Table 1.
respectively.
The estimated probit right
censoring coefficients for FEV
initial value and slope (a and a ) were -3.8.
1
2
1
-11.3 and -4.6. -13.8 for the two analyses. respectively. Likelihood ratio tests
indicated
that
coefficients for FEY
1 initial value (a 1 ) were statistically
the
Although
significantly different from zero in both analyses.
statistic
(with
the first
analysis
the
chi-squared
one degree of freedom) of 2.8 for the slope coefficient
was
not
statistically
significant
at
5%
a
(a2 ) of
level.
the
chi-squared statistic of 7.1 for the slope coefficient of the second analysis was
statistically
significant.
and the large negative
The significance of the initial value coefficients
slope
coefficients
obtained
from
both
analyses.
the
significance of the slope coefficient from the second analysis indicated that the
right censoring by participants' deaths could be informative with respect to both
FEY1 initial value and slope.
Survival
probability
distributions
.. estimated
;.
:1,:.],
,
j.C ':
s:~
r-:;.~;
by the product limit method
(Kaplan and Meier. 1958). for the entire
data set of 294 indiViduals.
- ·'··;n pc, 0 '(.1 (Yl .'''''( -th·.
individuals
included
in the first
aI)~.:;~~C?Rp.~tanal~ses of
, , ; . . , . , . ,.•
and for the 117 individuals with two
1.
Figure
Since these data were
>.
"~.
;' ' ".;
Table 1. respectively;
are
retrospectively.
mortality follow-ups
I·t _
-',.
~'!\'
';'::'1,;
the
.::~p.
study.
Hence.
survival
in
displayed
like them to be for
were not as complete and rigorous as one ',,-would
:
prospective
those
'
orvgtor~<:ITV1.~surements
collec~ed
for
~robabilities
in
Figure
1
proposed
could
be
optimistic.
The
estimates
we
have
When
misspecification.
proposed
are
of
course
to note that the estimated probability for the
ith
right
for given ei' is
A
in
the
model
using the estimation and test procedures derived under
the probit model. goodness-of-fit to the data should be checked.
censored
to
sensi tive
jth
time
interval.
individual
One approach is
being
primarily
Pij = ¢(Uij+ 1)
-
A
¢(Uij ). for j=I •...• J-l: where Uij is Uij of
(2.3)
with
a
and
a
Oj
and
the
- 12 expected
MLE's.
intercept
and
slopeA for the primary variable being replaced by their
Therefore. group the ¢(U
iJ
) into groups and compute for each group, Ej
4It
=
A
~iPij'
Then compare Ej with the observed deaths and dropouts between the jth and
(j+1)th time points.
For
the
PiZ
six year
follow-up
data of Table 1, the observed number of
deaths among those whose estimated probabilities of death in six years were above
the 85th percentile. between the 70th and 85th percentiles
and
below
the
70th
percentile (for the entire 65 individuals) were 4. 2. 2 and 3, 4, 4 for the first
and
second
three
year
intervals,
respectively.
The corresponding expected
numbers of death were 4.47. 1.82, 0.85 and 2.98, 3.70,
intervals.
respectively.
Hence,
the
probit
3.88
model
for
seems
the
to
two
fit
time
the data
reasonably well.
Graphical comparison of the
death
by
the
~ctual
versus
e5timated,,;pro~1}ility
follow-up data is displayed,jn
f4gur~
of
expected
death
2A.
in
cumulative
numbers
of
six years for the six year4lt.
Comparisons
of
the
actual
versus
expected cumulative
numberac~f[~~~ in
by
PTobability
of death
in the corresponding time intervals. for
r
..,-,. ii
.i"
the
estimated
.- , j
,..:
the first and second three year intervals
,-~-' _~-/
the same six year follow-up data. are shown in Figure 2B.
The
overall
fits
of
the data from both h'gtir~~'i.~re'$ln reasonably good.
·1.2 : The -effect "Of informative censoring
The UWLE. GLSE, and the PMLE were compared in simulated experiments based on
the
model
(2.2)
with
the
following
probit non-informative censoring with a
censoring
with
coefficients
censoring with coefficients a
analyses
a
1
1
=
= -4.6,
primary
1
=
a
-3.8,
a
2
=
=
2
a
right censoring processes:
2
0;
(2)
= 11.3;
-13.8.
probit
(3)
(1)
informative
probit informative
corresponding
of Section 4.1; and (4) probit informative censoring with a
to
1
the
= -3.8
tw04lt
and
,
- 13 -
a
2
= O.
three
Similar to the IPPB trial (1983). the study duration was assumed
years
with
initial value in
variances
used
four FEY1 measurements per year.
the
were
control
all
group
and
the
each
were
generated
and
estimated from the PiZ data.
for
the
two
be
The expected FEY 1 slope and
within
rate of decline was assumed in the treatment group.
to
between
individual
A 50% reduction in FEY
Equal sample
sizes
of
1
100
In the IPPB trial. similar to the
groups.
proposed trial. patients were required to have their FEY values
1
less
than
65%
predicted at entry and the comparison of FEY
annual rates of decline between two
1
treatment groups was the primary objective of the trial. The primary
randomized
right censoring rate for the IPPB trial was more than 12% per year.
illustrations
the
probability
For
of primary right censoring was assumed to be 16%
each year for all individuals under the non-informative right censoring
When
the
informative
for
the
process.
probit model was used. this probability was assumed to be
16% for an individual whose initial value'cana S10pe were equal
values
these
control
to
It:'.:wall ,.:ftft-;'t'OOt1 . assumed
group.
correlation between the slope and the ihit!~l:~~ue {a~ ~
the
that
= 0).
expected
there is no
The
decision
1 2
value
used
rejecting
for
the
difference was (B
Normal TMdom
•....
routine GNPM:.
The
n.,'WI~.:r:s.were
'..."."
generated by
the
12
-
IMSL
The experiments were repeated 600 times.
resul ts in Table 2 indicatetha·t:tne
UWLE procedure remained relatively
unbiased in estimating the mean FEY
difference
in
slopes.
slope for each group and the between group
1
However. the PMLE clearly had much smaller mean squared
errors in estimating the individual group
differences.
and
much
higher
mean
slopes
and
the
between
group
statistical power in detecting the between group
differences in all four censoring processes considered.
The GLSE. although
most
efficient under non-informative censoring. resulted in large under-estimations of
individual
group
mean
FEY!
rates
of
decline
(24-46%). under the two probit
- 14 -
informative censoring processes with non-zero coefficients for FEY 1 slope.
under-estimation
for
the
between group differences were much smaller (11-13%),
because under the shifted alternative of equation (3.4), the biases
group
estimates
tend
to
The
cancel
with
each
other.
in
the
~
two
The GLSE had smaller mean
squared errors in estimating the between group differences and higher statistical
power to detect these differences than the UWLE in all four
considered.
Compared
to
censoring
processes
the PMLE, under the two probit informative censoring
processes with non-zero slope coefficients, the GLSE had much larger mean squared
errors (39-69%) in estimating the individual group mean slopes
and
(14-22%)
in
estimating the between group differences; and lower statistical power (10-15%) in
detecting
the
between group
differences.
The expected power for the proposed
study, calculated according to (3.4) using the assumed parameter values, was 0.85
for the GLSE under the non-informative censoring process.
for
the
GLSE
under
non-inforrna.tive
The
simulated
censoring, using the estimated wi thin and
between individualvariancesaocordlng to (3.3) and the Bock and Peterson
procedure
for
constructing
Qovarlanc~
censoring
coefficient
~as
process
e'
(1975)
matrices that were at least semi-definite
was 0.81, and not verydtHer,ent? from the expected power.
the
power
was c' "non":":bnformative
Using
the
PMLE
when
or when the probi t censoring slope
zero could result in- larger mean squared errors than the GLSE, in
estimating the group slopes. , cThe simulated significance
levels
were
not
much
different from the expected 5% level-for all procedures in Table 2.
~O()()OOOaef
Table 2 about here
Discussion
SECTION 5
The
probit
biologically
model
valid
for
used
)()()()()()()()()()(
in Sections 2 and 4 is not necessarily meant to
describing
the
underlying
right
censoring
bee
process.
- 15 Indeed. the choice of the probit was made primarily on computational grounds. and
because
logistic
and
probit
regressions
give
similar
estimates
of
event
probabilities (Halperin. et al. (1979».
When using the estimation and
test
procedures
derived
under
the
probit
model. goodness-of-fit to the data should be checked as suggested in Section 4.1.
However
the
distribution
of the chi-squared goodness-of-fit test statistic for
this situation cannot be obtained from a straightforward application of the usual
theory because (i) parameter estimates are determined using likelihood
for
ungrouped data; and (ii) random cell boundaries.
Spruill (1975)
derived
goodness-of-fit
large
sample
Moore (1971) and Moore and
distribution
of
statistics under these two problems.
under appropriate regularity conditions the
large
goodness-of-fit
central
statistic
is
that
of
a
functions
the
usual
chi-squared
Their basic result is that
sample
distribution
chi-squared
of
the
with the usual
reduction in degree of freedom due to estimated 'parameters plus a weighted sum of
independent chi-squared
random
variables.~h
.with
one
degree
of
freedom.
Appl ication of their resul t to this pr·oblemvts Wider: investigation.
Al though
the
estimation
and
te$t
proaedutres" of
Sections
developed for k=2 groups. they could be ,extended <:easi ly to the
groups.
case
stat~stic
k
> 2
could be used.
The standard errors provided in Table 1 for estimates based
censoring
model
and
maximum
likelihood.
on
the
probit
those used in computing the test statistics for the
PMLE in Table 2 were estimated from the sample information matrix
pseudo
of
test for equali ty or linear ,'tnend among the expected slopes of the
To
k>2 groups. the likelihood ratio chi-squared
right
2 and 4 were
by
assuming
based
on
the
that the estimated between and within
individual error variances were the true values. rather than based on the maximum
like lihood.
The
bootstrap
(Efron.
1979)
could
be
estimates.
Alternatively. the maximum likelihood procedure. treating
and 0p 2 as additional parameters. could also be used.
2
used
to
improve
° 2.
E.
these
OR
1-'1
2
- 16 -
The authors wish to thank the referees for many helpful suggestions.
References
Bock. R.D. and Petersen. A.C. (1975).
Biometrika 62. 673-678.
A multivariate correction for
attenuation.
Carroll. R.j .• Spiegelman. C.H .• Lan G.• Bailey. K. and Abbott. R.n. (1984).
errors-in-variables for binary regression models. Biometrika 71. 19-23.
Journal
Cox. n.R. (1972). Regression models and life tables (with discussion).
of the Royal Statistical Society. Series B 34. 87-220.
Efron. B. (1979). Bootstrap methods.
Statistics 7. 1-26.
Fearn. T. (1975).
89-100.
Gong.
Another look at the jackknife.
A Bayesian approach to growth curves (1975).
On
Annals of
Biometrika
62_,
G. and Samaniego. F.j. (1981).
Pseudo maximum likelihood estimation:
theory and application. Annals of Statistics 9. 861-869.
Halperin. M.. Wu. M. and Gordon. T. (1979).
Genesis and Interpretation of
differences in distributions of baseline characteristics between cases and
nonncases in cohort studies. Journal of O1ronic Diseases 32. 483-491.
Intermittent Positive Pressure Breathing Trial Group (1983).
Intermittent
positive pressure breathing therapy of chronic obstructive pulmonary disease.
Annals of Internal Medicine 99. 615-620.
Kaplan. L. and Meier. P. (1958).
Nonparametric estimation from incomplete
observations. Journal of the American Statistical Association 53. 457-481.
Kannel. W.B. and Gordon. T. (1971).
The Framingham Study Section 27.
U.S.
Department of Health. Education and Welfare. Public Health Service. National
Institutes of Health.
Koziol. j.A.• Maxwell. D.A .. Matsuro Fukushima. M.E.M. and Yosef, H.P.
(1981). A distribution-free test for tumor-growth curve analyses with
application to an animal immunotherapy experiment. Biometrics 37. 383-390.
Laurell.
B.
and
Eriksson.
S.
pattern of serum in alpha
(1963).
The electrophoretic alpha
t - trypsin deficiency.
Scandanavian
l
- 3 lobulin
Journal
ott
Clinical and Laboratory Investigation 5. 132.
Laird.
N.M.
and
Ware. H.H. (1982).
Random effect models for longitudinal data.
- 17 Biometrics
38. 963-974.
Rao. C.R. (1965). The theory of least squares when the parameters are stochastic
and its application to the analysis of growth curves. Biometrika 52.
447-458.
Schlesselman. J.J. (1973).
Planning a longitudinal study. II frequency of
measurements and study duration. Journal of Chronic Diseases 26. 561-570.
Walker. R.H. and Duncan. D.B. (1967). Estimation of the probability of an event
as a function of several independent variables. Biometrika 54. 167-179.
Workshop on Natural History of PiZ Emphysema (1983). The natural history of PiZ
Proteases and Inhibitors in the Lungs. American Review of
emphysema.
Respiratory Diseases. 127 (Supplement) 543-545.
<.. ~-' .."
APPENDIX
Let
<p.1 j
=
<P<U IJ
.. ) , for
1
and ~il = <Pil = O. From (2.3), Uij = (aOj + a 1 d ik1 + a 2 d ik2 ) / D ,
. 2 2 2 2
for J=2, ... ,j: where D = (1 + a3i1 a 1 + a3i12a1a2 + a 3i2 a 2 ). The parameter to
2
j=2, ... ,j
be estimated is ~t = (9 1 , ... ,9 j +5 ) =. (B11,B12,B21,B22,a1,a2,a02, ... ,aOj)'
partial
The
derivatives of the log likelihood (2.3) with respect to these parameters
are as follows:
2
~
~
{(~It - ~t)/ O2.12 -PI (~Im - ~) / (°211 °212)} /
(1 - Pi )+ [BT i /
,~, \
c. A
=
[
o
a In Li
a~k2]'
for iEk, k=1,2,2=1,2, m=3-2
j
otherwlse
,
"
;~~
/
a 92
~1
= a Ti
/
a 92 ,
for
When there is no staggered entry we have,
2
= 5,
.... j + 5;
a2 Ti
/ 8 0e 8 Om =
~i=2
{Z{i.j-l} {{$ij - $ij-l}[-U ij
~ij{8
Uij /8 0e}
{8 Uij /8 Om} + Uij - 1 ~ij-l{8 Uij - 1 / 8 0e}{8 Uij _ 1/8 Om}
+ ~ij{8 Uij / 8 0e 8 Om} - ~ij-l{8 Uij - 1 / 8
°e 8 °m}]
- [~ij{8 Uij /8 Be) - ~ij-l{8 Uij - 1 / 8 Be)]
[~ij{8
Uij / 8 Om} - ~ij-l{8 Uij - 1 / 8 Om}]} /
2
J
{$ij - $ij-l} } - {I - j:2 Z{i.j-l}} [{I - $iJ}U iJ ~iJ{8 UiJ / 8 0e}
{8 UiJ / 8 Om} + ~iJ{8 UiJ /
(8 U'1 J / 8
for
e=
1•...• J+5
2
+ ~iJ {8 U
a ge a 8m)
iJ / 8 0e}
°)]
m / {1 - $'J)2 .
1
and
m = 1 •...• J+5;
wi th
1
2
_ (1{8 d ik1 / 8 ~e) :,~~{: d ik2 / 8 ~e) / D
8 Uij / 8 ~e for i € k. e ~i:2
otherwise;
1
for
i €
1
. D- 2
a U1j
I
a a oe
1
={ 0
for
I! = j
otherwise ;
k.
e = 1.2;
•
for
i €
k.
l
= 1.2
m = 1.2;
and
1
a Uij
/
a a l a am = {D[d ikl
Cm D-
2
a em Uij -
-
Ceca
Uij /
a am)]
1
3
a Uij
au.IJ. / a Rna
Ke;
j
wlhere
C~
a a e a a Oj = -Cl /
R
-kIn
= a Ut j
= 1.2. m = 1.2.
= 2 ..... J ;
e
for
/
= G3i~
2
/
a~ + G3i12 a3_~
for
for
e = 1.2
= au.IJ. / a aOJ' 1 a aOJ' = 0
2
= 2 ..... J. j2 = 2 ..... J and
a -k«;
Rnaa
i € k. jl
D2
~
•
Oj
= 1,2;
=m
for
e
for
l '# m.
..
Table 1
Estimation for the Expected FEY
Slope. the Missing Value Coefficients
l
and Likelihood Ratio Tests Statistics for the Coefficients
Three Year Mortality
Estimates and
Three Year FEV1
Follow-Up
Test Statistics
Estimated FEV
1
Six Year Mortality
Six Year FEY1
Follow-Up
change/year
Unweighted
-0.093
(.0164)*
-0.078
(0.013S)*
Weighted
-0.090
(.0151)
-0.076
(0.0136)
Probi! Informative Missing
-0.095
(.0152)
-0.085
(0.0133)
-3.80
(2.01)
-4.61
(1. 70)
-11.30
(7.46)
-13.80
(6.76)
-0.53
(1.10)
1.25
(0.93)
0.42
(1.10)
2.42
( 1.05)
Estimated Missing Value Coefficients
FEV
FEV
1
1
Initial Value
Slope
L.R. Test Statistics
Initial Value, HI vs.
2
H {X 1)
O
Slope,
~ vs. H1 {X2 1)
No. at risk at baseline
No. of Deaths
*Numbers
11.02
26.97
2.81
7.13
81
65
8
19
in parentheses are estimated standard errors.
Table 2
Comparison of Simulated Results Among Different Procedures Under a Linear Random Effe
Model with Non-Informative Versus Probit Informative Missing Values with Parillt
.
Values EstImated
from PiZ Emphysema Data*
..
Informative Censoring
Statistical
Procedures &
Treatment
Groups
Non-Informative
Censoring
a =-3.8
a
a =-11.3
1
=a2 =0
FEV
MSE
1
a
2
FEY1
Slope
a =-4.6
1
FEV
1
MSE
Slope
2
=-13.8
FEV
1
MSE
Slope
Slope
1
MSE
Control Group
UWLE
GLE
PMLE
-89.3
-90.3
-89.4
156.3
200.6
-88.9
-68.3
-83.6
645.7
634.9
455.0
-88.3
-63.7
-81.2
719.4
883.3
597.8
-89.2
.90.7
-90.2
570.4
164.6
210.6
-46.5
-45.9
-44.6
442.3
148.2
194.4
-45.9
-28.2
-39.7
444.9
423.1
288.4
-46.2
-24.4
-38.8
489.8
576.0
340.6
-46.9
-46.3
-44.3
444.7
150.7
162.9
465.4
Treatment Group
UWLE
GLE
PMLE
*The
parameter
values
us~d were:
Measurement error standard deviation
=
FEV1 ini tial value standard'deviation Gfj
0.091L1yr ..
Gfj
2
=
0.39L..
FEV1
slope
std.
G
c
= 0.155
dev.
1
0.091l/yr .•
Gfjl.fj2
=
Gf3
2
O. expected FEV 1 initial value B11
= B21
-Q.045/yr.
For significance level. B
21
16%/yr. and n 1 = ~ = 100.
= B22
•
slopes B
= -0.09L1yr. and B22
12
1
= -.09L1yr. The probability of missing~
O.96L. control and treatment group expected FEV
**Expected power under non-informative missing process. calculated according to (3.4
using the actual parameter values.
FIGURE 1
SURVIVAL CURVES: PIZ EMPHYSEMA DATA
1 .0
P
R
o 0.9
B
R
B
1
.'-';
T
Y
,.\'
81
3 YR. fOLLOW-UP
CI
6
YR~
fOLLOW-UP
i
"J;
:!
1 0.8
ENTIRE GROUP
.
"
L
RI
O'(20R MJORE MEASUREMENTS
'"
o
F
0.7
S
U
R
V
1
V 0.6
A
L
o.5
--,r---,....--...,
lL.,r---vr---vr---vr---vr---vr---rv--"'TV--"'TV--"'TV---,.,---,.,---,.,---,.,---,.,---,.,--...., ---"r----"r---"r----"r---"....- - ,....- - ,....
o
24
36
NA
241
211
187
tm
65
81
IHJ
62
61
154
57
78
1f)H
77
59
125
54
48
1nf)
82
65
!\C
~])
12
48
60
72
94
51
37
')2
84
66
35
26
16
96
47
26
18
108
?7
17
33
16
10
MONTHS
NAI
NBI
Nel
NOI
NUMBER
NUMBER
NUMBER
"UMBER
AT
RT
RT
RT
RISK
RISK
RISK
RISK
FOR THE ENTIRE GROUP
AMONG THOSE WITH THREE YERRS FOLLOW-UP
RMONG THOSE WITH SIX YERRS ~~LLOW-UP
AMONG THOSE WITH TWO OR "O~, FEVI MEASUREMENTS
120
132
144
156
•
~
e
"
"-'1
A
e
FIG.2A
ACTUAL VS. EXPECTED DEATHS BY EXPECTED RISK OF DEATH IN SIX YEARS
X. ACTUAL NUMBER OF DEATHS AMONG THOSE WHOSE RISK OF DYING
WERE AS INDICATED BY THE HORIZONTAL AXIS OR GREATER
O. EXPECTED NUMBER OF DEATHS AMONG THOSE WHOSE RISK OF DYING
WERE AS INDICATEO BY THE HORIZONTAL AXIS OR GREATER
C
~
U
L
A
19
18
17
16
15
14
13
12
::.Eo
--------
T II
\
Q1
I
iJ
V 10
E
9
D 8
E
A 7
,
5
4
3
2
1
>;'.t~1,t
~-
I
I
~.O
N:
fl'
I,;,
0'....
0'
T 6
H
S
A.-'t·g)-----~·
0
i
I
,.: ---,
,
I
0.9
0.8
2
4
0.7
6
0.6
10
0.5
13
17
0.4
19
0.3
29
0.2
12
0.1
0.0
34
RISK OF DEATH IN SIX YEARS
NI NUMBER OF INOIVIDUALS WHOSE RISK OF DYING WERE AS INDICATED BY THE HORIZONTAL AXIS OR GREATER
65
.~
"'\
I1GUBI 2B
ACTUAL VS. EXPECTED DEATHS
B. EXPECTED RISK OF DEATH IN EACH OF THE TWO THREE YEAR INTERVALS
11
X.BI ACTURl NUMBER OF DERTHS RMONG THOSE WHOSE RISK OF DYING
WERE RS INDICATED BY THE HORIZONTAL AXIS OR GREATER
10
.. ~/-.~
O.AI EXPECTED NUMBER Of DEATHS AttONG THOSE WHOSE RISK Of DYING
WERE AS INDICATED BY THE HORIZONTAL AXIS OR GREATER
9
C
U
8
tt
U
L
R 7
T
I
V 6
_-I
E
0
E
5
R
T 4
H
fIRST INTERVAL
S
3
~
2
1
.1.0
NA
NB
"
----
..4r-------------)(
INTERVAL
--------------------'
0.9
0.8
1
0.7
0.5
0.6
2
3
0.4
0.3
4
1
7
17
11
0.2
30
18
33
0.1
21
3~
0.0
65
RISK Of DEATH IN THE INTERVAL
NAI NUMBER OF INDIVIDUALS WHOSE RISK Of DYING IN THE FIRST INTERVAL WERE AS
NOI NunBER Of INDIVIDUALS WHOSE RISK Of DYINO IN THE &ECDND
I~'-~AYAL
INDICAT~D
8Y THE HORIZuNTnL
h~JS
DR r- 'MTfH
WERE Ai INDICATED 8Y THE HDRIZONTAL AXIS Dft
,EATEft