Majumdar, H. and Sen, P.K.; (1977)Nonparametric Testing for Simple Regression Under Progressive Censoring with Staggering Entry and Random Withdrawal."

BIOMATHEMATICS ff{Alf{lti~ PROGRAM,
.e
* On leave of absence from the Government of India .
NONPARAMETRIC TESTING FOR SIMPLE REGRESSION
UNDER PROGRESSIVE CENSORING WITH STAGGERING
ENTRY AND RANDOM WITHDRAWAL
by
Hiranmay Majumdar* and Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1128
July 1977
•
NONPARAMETRIC TESTING FOR SIMPLE REGRESSION UNDER PROGRESSIVE
CENSORING WITH STAGGERING ENTRY AND RANDOM WITHDRAWAL
Hiranmay Majumdar and Pranab Kumar Sen
University of North Carolina, Chapel Hill
ABSTRACT
In the context of (multi-center) clinical trials and life
testing problems, a general model incorporating both the staggering entry and random withdrawal and pertaining to a simple regression problem (including the two-sample location problem as a
special case) is conceived, and, within this framework, a scheme
allowing progressive censoring (continuous monitoring of experimentation from the beginning) is developed along with the proposal for and study of some nonparametric testing procedures.
The proposed tests rest on the construction of certain two-dimensional time-parameter stochastic processes from a triangular array
of progressively censored linear rank statistics and their weak
convergence to appropriate Gaussian functions.
erties of these procedures are studied.
Asymptotic prop-
A computer program
pertaining to the numerical computations and practical administrations of these testing procedures is also provided at the end.
1.
INTRODUCTION
In a variety of situations, especially, relating to multicenter clinical trials and life testing problems, usually a time-
e.
sequential study is encountered where the experimental units may
not enter into the scheme all at the same point of time.
Such a
staggering entry plan may be due to either some chance causes or
some recruitment protocols regulating the entry in a stipulated
manner.
Moreover, from considerations of time and cost of experi-
mentation (or other practical limitations), experimentation may
not continue for an indefinite period and some restriction is
therefore imposed on experimentation in terms of either maximum
number of responses to be observed or maximum duration of observation.
As a result, statistical conclusions need to be made
from data pertaining to such a censored or truncated scheme.
The
situation becomes somewhat more complicated when there are withdrawals (dropouts) of experimental units from the scheme before
the planned termination of experimentation - a case that is very
common in experimentation involving human populations admitting
migration and behavioral dropouts.
may not be stochastic in nature.
Again, these dropouts mayor
Finally, optimal period of ex-
perimentation may be a difficult task to accomplish:
a very short
duration may lead to totally inefficient conclusions, whereas too
much of prolongation may unnecessarily increase the time and cost
of experimentation without any significant increase in its sensitivity and precision.
For these reasons, progressive censoring
schemes (PCS) are often adopted with a view to monitoring experimentation from the beginning so that if, at any early stage, the
accumulated statistical evidence provocates a clear-cut decision,
experimentation is curtailed along with the adoption of that
decision resulting in savings of time and cost of experimen~
tation.
For the simple regression problem (containing the two-sample
problem as a special case), Chatterjee and Sen (1973) have developed a general class of PCS rank tests for the non-staggering
entry plan. Further works in this direction are due to Sen
(197ba, b) , Maj umdar and Sen (1977), and Davis (1977), among others. For
the two-sample problem, Halperin and Ware (1974) have 'considered
"
.e
the problem of early stopping in a censored Wilcoxon test.
For
the staggering entry plan, Gehan (1965) has considered a modified
censored two-sample Wilcoxon test.
Further works in this direc-
tion are due to Effron (1967), Peto and Peto (1972), and others.
To the best of knowledge of the present authors, not much work
has been done for the PCS with staggering entry based on rank
statistics.
The object of the present investigation is to propose
and study a general class of nonparametric tests under PCS with
staggering entry (and also random withdrawal). The proposed procedures are the generalizations of the ones by Chatterjee and Sen
(1973) to staggering entries and they rest on certain basic weak
convergence results studied by Sen (1976a).
Along with the preliminary notions, a general class of twodimensional time-parameter stochastic processes based on PCS
linear rank order statistics is considered in Section 2.
In
Section 3, these stochastic processes are incorporated in the proposal for and study of suitable PCS rank tests for the simple regression model (including the two-sample problem as a special
case).
Modifications of these procedures for some restricted
designs are considered in Section 4.
Section 5 is devoted to the
accommodation of withdrawals in the design.
In Section 6, we have
considered the PCS test as an alternative to Gehan (1965) for a
single-point truncation staggering entry scheme with a numerical
example illustrating comparison between the two tests.
The
Appendix is devoted to a computer program pertaining to the administration of the proposed testing procedures in life problems.
2.
SOME BASIC RANK ORDER STOCHASTIC PROCESSES
As a basis for subsequent statistical analysis, we formulate
first some rank order processes.
Let
{Xi' i
~
I}
be a sequence
of independent random variables with continuous distribution
--
functions (df) {F. , i > l}, all defined on the real line
1
(_00 , 00) • For every n(> 1),
let a (1), ... , a (n) be a set of
n
n
n
real scores and let R . (= E. lU(X. - X. ) where u(t) = 1 or 0
n1
J=
1
]
e.
t is > or < 0) be the rank of X.1 among
for i = 1, . '., n. By virtue of the assumed con-
according as
X., "., X,
n
1
ties among the X. may be neglected, in
1
= (R
, .'., R ) is some (random)
probability, so that R
~n
nl
nn
n). Then, a lineal' rank statistic is
permutation of (1,
tinuity of the
F. ,
1
defined by
Tn = r~=l (c i - cn) an (Rni ) = r~=l (c S . - cn) an (i), n ~ 1,
nl
{c.,
i -> I}
1
where
is a sequence of known constants,
-1 n
-
c n = n r.1= lC"1
and
anti-l'anks i.e.,
R
nS.
nl
(2.1)
=
(S n l'
and
X
S-n
= SnR. = i
nl
S
nn
is the vector of
)
snl. = Zn,1.,
i
= 1,
" ' , n,
(2.2)
<... < Z
stand for the order statistics correspondn,n
n,l
ing to Xl' ., . , X
The constants c. may be chosen in various
n
1
ways depending on the model we impose on the F.. For example,
where
Z
1
in the two-sample case, we have
F
c
l =
1
= Fn , Fn +1 =
l
l
= ... = cn =
1
=
n
1
+
n. -> 1, i = 1, 2,
n ,
2
1
and, we may take
= Fn'
c n +1 = . .. = c = 1,
n
1
and
0
n
so that
T
n
(2.1) reduces to a conventional two-sample rank statistic.
in
We
may also consider a simple regression model by setting
= F (x
F. (x)
1
- So - Sc.), i
1
~
1,
where 80 ,8 are unknown parameters and the
regression constants.
set of
c.
1
_00
ci
< x <
00
(2.3)
are known
In this case, we choose in (2.1) the same
as appearing in (2.3).
In the sequel, we shall refer
to the null hypothesis
F. (x)
1
~
F(x), V i
~
so that under (2.3), this reduces to
1,
_00
< x <
00
,
(2.4)
(3 = O.
In the context of a life testing problem, often, we use a
censored rank statistic.
kth order statistic
Z
n,k
A linear rank statistic censored at the
is defined by
e
.e
T
n,k
{
=
E~=l (cS
n1
T,
=n
- 1, n,
-I
n
k
n
1 < k < n - Z,
. - cn)[an(i) - a*(k)],
n
(2.5)
where
=
a*(k)
n
(n - k)
{ 0,
.
k < n - 1,
(2.6)
k = n,
and, conventionally, we let
= 0,
T0
o<
~j=k+lan(J),
n,n+j =
Z
and
T
0
n,O =
00
,
'rJ j ~ 1, n> 1,
T
= T
n,k
n
and
k > n > 1.
(Z.7)
For a non-staggering entry plan, in the context of PCS,
Chatterjee and Sen (1973) have constructed an one-dimensional
process from the partial sequence {Tn , k' 0..::. k.::: n} . In the case of
staggering entry, the situation becomes more complicated. As in
Sen (1976a), we consider a two-dimensional time-parameter stochastic process constructed from the triangular array
{Tn,k' 0 ~ k
as follows.
Let
2
C
n
a
n
~
n; 0
~
n < N} (where
= n -1 Ln1= lan (i),
= ~.n1= 1 (c.1
- Z
- c ) ,
n
n >_ 1, and for n >_ 2,
2
An
and, conventionally, we let
N is the target sample size)
=
Z
C
n
(n - 1)
-1 n
= An2 = 0,
~i=l
2
[an (i) -an] , (2.8)
for n
=
0, 1.
Let
then
0,
=
k = 0
(2.9)
(n
2
A
k > n - 1.
l n'
Note that as in Chatterjee and Sen (1973),
E(T~,kIHO) = C~A~,k' k:::. 0, n > O. (2.10)
Z
unit square r = {t = (t l , t z): 2 ~ ! ~!}
E(Tn,kIHo) = 0 and
Consider now the
and a stochastic process
Z
W = {WN(!), t £ r } by letting
N
e.
(2.11)
where
(2.12)
r (t ,
1
t
2
2
< t A
}, t ~~ 12 .
A2
n(t ),r - 2 n(t )
1
1
= max {r:
)
a (i)
At this stage, we assume that the scores
are generated
n
¢ = {¢(u), 0 < u < I} as fa 11 ows :
by a seore function
1
a n (i) = ¢(---1)
or E¢(U n,1.), 1
n+
¢(u)
(2.13)
¢1 (u) - ¢2(u),
~
~
i
n,
(2.14)
0 < u < 1,
< ... < U
U
are the ordered random variables of a
n,l
n,n
sample of size n from the rectangular (0,1) df and both ¢1' ¢2
where
are
in
~
u(inside I) with
1 2
fo ¢.J (u){log(l
+
I¢.(u) I)}rdu <
for some
(X)
J
r> 1, j=1,2.
(2.15)
Also, it is assumed that
max
1
. <
<1
n
{
- 2 2
nee.1 - c n ) Ie n } = 0(1 )
(2.16)
with
W = {Wet), t E 1 2 } be a Gaussian function on
d f or every ~, t~ ~~ 1 2
EW(t) = 0, V tEl 2~ an,
Then
EW(s)W(t) = s t = min(sl' t 1)min(s2' t 2).
(2.17)
2
W is known as a Brownian Sheet (on 1 ). The following
Finally, let
1
2
result is due to Sen (1976a):
Under H in (2.4) and the assumptions in (2.14) - (2.16) ,
V O.
W -+ W (-z..n the SkorokhoJ
N
where
law.
V
-+
2
J -topology on D
1
[0,1])
(2.18)
(or ,Q) stands for the convergence (or equal i ty) in
Note that (2.18) insures that as
(2.14)-(2.16), for any
I*c
N -+
under (2.4),
1 ,
V
sup WN(!), t
E
I*} -+ sup{W(!),
I,
t
E I*}
sup{!W N (!)
00,
2
!
E
£ sup{IW(t) I,
I*},
(2.19)
t
(2.20)
E I*}.
.e
Note that for any
{W(tlc~,
t2
C;),
t
0 < c
1
<
1,
0 < c2 < 1
2
1 } ·£{W(t). 0
£
~
t1
~ c~,
~
0
t 2 2. c;
(2.21)
For pEl, let us define
p
V(p)
Io
=
(so that
2
ep (u)du + (l-p)
P
v(p) is
=
in p
~
J ep 2 (u)du - (
r
II
o
0
1
v(l)
-1 (Jl ep(u)du) 2 - (Jl ep(u)du) 2 ,
£
(2.22)
0
I with
v(O)
ep(u)du) 2 < 00).
=
0 and
Then, from Chatterjee and
Sen (1973), we have
n
-1
r
n
2
p ==> An r
~
, n
v(p), V p
~
Thus, if we consider the partial process
o2
P £ I, where
t 2 2 v(p)/V(l)}
WN,p
I.
£
=
(2.23)
{WN(t), °2t121,
W is defined by (2.11)-(2.13),
N
by viture of (2.18) through (2.23), it follows that for every
(fixed)
p
E
I,
O~,p = suP{WN,p} = sup{W N(!):
V!
~[v(p)V(l)]2suP{WN(!):
0N,p
t
£
0 ~ t 1 ~ 1, 0 ~ t 2 ~ v(p)/V(l)}
2
1
I} = [v(p)/V(l)]2 0 +,
= suP{lwN,p!} = sup{lwN(!)!:
V!
~ [V(p)/V(l)]2Sup{jW(!)
I:
t
£
I
2
} =
(2.24)
O~tl ~l, 0~t2 ~v(p)/v(l)}
l
[v(p)/v(l))2D.
(2.25)
We conclude this section with a note that if I~ be a
2
2
stochastic subset of 1 , such that there exists an I*c 1 ,
which
1N
*2 1* as N + '
00
then both (2.19) and (2.20) hold with
1* replaced by
IN
for
(2 .-16)
on
the left-hand side quantities; this result is a direct consequence
e.
of the tightness and convergence of the finite-dimensional distribution (f.d.d.) of {W }.
N
3.
RANK ORDER TESTS FOR PCS WITH STAGGERING ENTRY
Let N(> 2) be the target sample size (set in advance).
If
all the N units enter into the experimental scheme simultaneously,
then for a life testing problem, the observable random variables
are
(ZN k' SNk) , k := 1, ... , N, and, as in Chatterjee and Sen
,
(1973), we may proceed as follows:
at the k-th failure
ZN,k'
we compute
k > O.
~
TN,k
as in (2.5), for
As long as k
r
N
and TN,k (or ITN,kjJ for a two-sided case) does not exceed a
critical value TN
(where 0 < a < 1 is the desired level of
,a
significance), experimentation is continued. If, for the first
time, for some
k(~
r ),
TN,k (or ITN,kl) exceeds TN,a' experimentation is stopped when ZN,k is observed, along with the
N
H . If no such k(~ r ) exists, experimentation
O
N
is observed and the null hypothesis is
is stopped when ZN ,r
N
accepted. In a staggering entry plan, since the entry time-points
rejection of
are not all the same, the cumulative sample size
Nt (of the
entries prior to any time-point t) is a non-decreasing function of
t
and the failures may correspond to different
dueing more complications in the analysis.
points are distinct
then
Nt
cohorts - intro-
If all the entry-
assumes all the integer values
between 1 to N, while if the units are admitted in
at the time points (OS)t
the jth batch,
Nt := Nt.
J
where
1
:=
11
~
0
j ~
+ ...
< ... < t
l
i,
+
i
then
n
j
,
there being
N:= n
for
1
+
•..
+
tE [t , t + ),
j
j 1
+
00
•
i~l)
batches
units in
and
O~j ~{.,
(3.1)
·e
to (2.5), we can compute a (censored) linear rank statistic, and
obtain
for 0
TN( t* ,u),r (
)
t*,u
~
Note that, by definition, both
negative integers.
be clear that
u
~
t*-t l and t*
N(t*,u) and
~
t .
(3.2)
l
r(t*,u) are non-
If we closely examine the process, it will
N(t*,u)
and
r(t*,u)
vary only either at the
entry points or at the failure points.
It will, therefore ,suffice
if we take t* as one of the failure or entry points, there being
N+l such points, and
taking on one of l
u
entry times.
For
every t* which is either "'. failure ,?oint or an entry point,
':Ie
. need to compute the set of statistics {TN(t*,u),k: O~ k ~ r(t*,u)}
for every entry point
u
which precedes
t*.
Thus, we need to
review the process only when either a new entry takes place or
a failure occurs and, at each such non-stationary point, we have
only to compute a finite number of such statistics.
We should
also note that computations commence only after the occurrence
of the first failure.
It is clear from above that the set of
, all statistics {TN( t * ,u ) , k: 0 -< k -< r(t* , u)}
"t* ~aiid u
points.
for all possible
depends on the entry pattern as weil as the failure
a denote a particular configuration while
Let
denote the class of all configurations.
sup
sup
aeA O<u<t*-t
-
-
=
sup
A
Then, we have
max
-1 -1
<co k<r(t*,u) eN ~ TN(t~,u),k
I
max
max
0< n < N 0 < k < n
sup
C-lA~lT
NooN n , k
(3.3)
max
aE.A 02.u2.t*-tl<co k<r(t*,u)
max
= 0< n <N
(3.4)
e.
A Typical Batch Arrival Entry Pattern
N/\
't
t
N --------------------------
* -tf~
~
+-
t*
N
t{_l-------------------------------+---4----~
------------------------.~----4_---~~--~~
t*
N
t4-------------------·~----+-----~--~----_1
t*-t
~----~---3+_----~--+------1
t* -t
2
Nt - n n n·--r---+---+----t----+----f-----I
2
t*Censoring
time
Nt3
t
3
> t
t*
f
Time of observation
For any censoring point
t*(>t ), we need to take into
l
account the set {t l , ""', t } of entry points prior to t*.
s
Then, for the censoring point t*, the n. individuals entering at
time point
t.
J
have an exposure period
J
t* - t., for j=l,·· ·,s.
J
The actual failuY'e time of an individual is equal to
where
te
is the entr>y point and
Then, for each failure point in
t
t
- t
f
e
is the failuY'e point
f
(t1,t*], by reference to the
respective cohort, we arc able to find the actual failure time.
Let
N(t*,u)
be the total number of units entering the scheme on
or before the time point
t* - u
and let
r(t*,u)
be the number
of failures among these units with actual failure times
remaining
o
< u
N(t*,u) - r(t*,u)
~
u; the
have failure times> u, for
:s.. t*-t l " Thus, for every u € [0, t*-t l ],
by reference
·e
and
where
p
= 1.
are defined by (2.24) and (2.25) for
Thus, if
and
6
N,a.
be the upper
+
the null distributions of DN,1 and DN,I'
we may formulate our pes test as follows:
IOOd~
point of
respectively, then,
As experimentation continues (from the starting point
t ),
l
we review the process at each entry or failure point (to follow). If for
the first time, at one of the these censoring point
t*,
TN(t*,u),k (or ITN(t*,u),k r for the two-sided test ),for some k <
r(t*,u), is > Aue,I\N~
(or -~
A..--e~,6""1
)
for any u ~t*-tl' then
.~ 1-1,0.
l~ i~,a
experime,1tation is stopped at that time along with the rejection
of H . If, no such t* exists, experimentation is curtailed
O
when N(t*,u) =N and r(t*,u) =N-I, along with the acceptance
H . The overall level of significance of this pes ,rest is
O
•
s a., uniformly in a. E: A.
of
~~
Determination of
blem.
For small
ail possible
N!
N,
~,a.
and
6
N,a.
poses a challenging pro-
these may be evaluated by enumeration of
permutations of the set of ranks (or anti-ranks).
The procedure becomes prohibitively 'laborious when
N increases.
However, because of (2.24)-(2.25), it is possible to approximate
6+
~,a.
6
a
D.
and
1\
~,a.
are the upper
by
6+
respectively, where
6+
and
100a% points of the distribution of
D+
and
a.
and
6,
a.
a
Though some theoretical results in this direction have
recently been obtained by Zin~enko (1975), for numerical evaluation of
6+
and 6, simulation techniques are found to be very
a
a.
useful. From 4000 repetitions of 50 x 50 blocks of 2500 standard normal deviates, partial sums of blocks of all possible
lower orders were computed and incorporated in the enumeration of
the simulated distributions of 6+ and 6. These are reported
a.
a.
below.
e.
TABLE I
Table for the Simulated Values of
a.
0.10
0.05
0.025
0.01
{..+
a.
1,81
2.13
2.39
2.68
{..+
a.
~
and
{..
a.
a.
2.11
2.40
2.61
2.88
Comparison with the one-parameter Brownian motion process reveals
that we are not paying too much extra price in the two-parameter
case.
Theoretical studies on the asymptotic power of the proposed
tests pose some problems.
Though Sen (1976a) has shown that under
contiguous alternatives (in the LeCam sense) the distributions of
+
~
, 1 and
~N
, 1 are asymptotically those of the corresponding
functionals of the drifted two-dimensional Wiener processes, no
workable analytical expressions for the distributions are available.
As such, the asymptotic behavior of the PCS tests in terms
of stopping time and power can only be investigated through simulation studies for various drifts.
4.
PCS TESTS IN RESTRICTED DESIGNS
The treatment of Section 3 applies only when one works with an
unrestricted design in the sense that in the event of the experimentation not curtailed (on statistical ground) at any intermediate stage, one is prepared to continue until all the subjects
have responded.
On the other hand, due to operational limita-
tions, the experimenter may be restricted to truncate experimentation after a certain period of time or to censor when a certain
number of failures occur.
designs.
These will be termed restricted
e.
.e
In the truncation case, let the maximum duration of the
experiment be
T and let the subjects be admitted into the
scheme either in batches or randomly.
t(~
there be
n , ... ,n
l
t (so that
n + •.. + n = N) and entry-points t < ... < t ($; T). Note that
i
i
l
l
as in Figure 1, the j-th batch will be observed for a maximum
period
1)
In the former case, let
batches of strengths
T - t ., 1 $; j :s L
Let us denote by
J
pj=F(T-tj),j=I, ... ,i
Also, defining
v(p)
(so that
Y(P I ,P 2)
for a fixed
PI
(4.1)
as in (2.22), we let
Y(PI,P2) =V(P I )!v(P 2) ,
so that
O$;Pi< ... <PI$;I).
~
is
with
in
PI
y(p,p) = 1,
0 $;P
l $;P 2 $; 1,
for a fixed
'if
pEr.
(4.2)
P2
and
~
in
P
2
Then, from (4.1) and
(4.2), we have
(4.3)
As in the case with many clinical trials, the null distribution
X.
of the
(i.e.
1
studies, so that
though
F)
may be known fairly well from indepedent
PI
is fairly accurately known.
F may not be known,
PI
rately from independent studies.
In many cases,
may be estimated fairly accuWe assume first that
PI
is
known.
Let
r.
J
be the number of failures among the
during [t. ,T],
for
J
A
1:
A
j = 1, ... ,i
and let
A
p.J =r.J In.J
= (PI'''' ,Pi)
1! = (u ' ' ' ' ,uR)
l
-
where
Nt
wi th
with
1
u. = N- N
is defined by (3.1).
J
Let
,
n.
J
subjects
j = 1, ...
,i ;
(4.4)
j = 1, ...
,i,
(4.5)
t.
J
I~ be the subset of
defined by the closed region formed by the axes
t
l
=0
and
12
t
=0
2
Then, by (2.19),
(u.,Y(r.,J)), j = 1, ... ,to
2
J 2 J
(2.20), (2.21) (for c =1, c =Y(P ,1)), (2.26) and the fact that
1
2
1
p. -r p., 1:-:: j :-::i, in probabilitY,we conclude that under H '
O
J
J
and the set of vert"ics
* V
k
** },
sup{W N(!): ! E IN}
-r [Y(P1,1)] 2 SUp {W(!): ! E I
(4.6)
e.
* v
C
**
sup{IWN(!JI: !E IN} -+ [y(Pl,l)rl~up{IW(t)I: !E I }
(4.7)
** 2
where 1 c 1
corresponds to the shaded part of the unit square,
as presented in the following (where
v. = y (p. , PI)' 1 $ j
J
J
$ {) :
FIG. 2
Asymptotic Effective Domain of
W
1\
v =1
1
v
2
v3
I
>
v
4
v{
o
u {-1
->
Since for every
1
**
(and similarly for the
C
U
2
**
2
I , sup {W(t) : ! E I } ~ sup{W(!): ! E I }
Iw(!)
I), the critical values for the
right-hand sides of (4.6) and (4.7) are bounded from above by
~
+
{Y(P1,1)}2~a
and
~
{Y(Pl,1)}2~a'
respectively.
Hence, we are
able to proceed as in Section 3 with the extra condition that
t*
in Figure 1 is
$
T,
and, for this reason, we need to adjust
the critical values by the factor
C
{Y(Pl,l)}~.
Though, this test
for the staggering entry with restricted design is a valid one,
**
it is usually conscvative in the sense that, because of I
being a subset of
< a.
2
1 ,
its asymptotic level of significance is
The greater is the unshaded area in Figure 2, the lesser
will be the actual level of significance (as compared to the
specified
a).
e.
.e
In the above development, we have assumed that
cified.
If
PI
PI
is unknown, but is known to be bounded by
(from above), then we may replace
carryout a similar analysis.
Y(Pl,l)
by
[The case of
pi
possibility - though for small values of
y(pi,l)
=I
we have considered the case of
through even when the
points of time.
f
pi
and
is thus a
this could result
PI
in higher degrees of conservativeness of the PCS test.]
gon with
is spe-
Further,
batches - the results go
N subjects enter the scheme at distinct
In that case, in Figure 2, we will have a polyvertices.
N+3
In the extreme case, where the entry
points are random (following some distribution
G),
the actual
N points represent the N order statistics of a sample of size
N from this distribuiton. So, by using the Glivenko-Cantelli
theorem and the weak convergence of the empirical process, we are
able to replace the horizontal steps of the shaded region
I **
in Figure 2 by a smooth curve - the conclusions remaining the
same.
Let us next consider the case of censored experiments in the
restricted case.
size
N,
Suppose that corresponding to the target sample
r ,
we have a positive integer
such that the experi-
N
ment will be continued atmost upto the time when
have occurred.
batch, for
N
Let us again suppose that there are
batches entering at time-points
n , ... ,n ,
f
1
(among the
r
respectively, and let
t
l
< ... < t f
rn .
failures
f(~ 1)
with strengths
be the number of failures
failures of the combi~ed sample) from the j-th
N
j=l, ... ,f, so that
r
(4.8)
Let us also assume that
N
-1
N
Since
F
~l
rN~p((O,l]
n.~o.,
J
J
l<;j-sl
as
(where
is continuous, there exists a
N~
(4.9)
00
f
L. 10. = 1) .
J=
T
J
such that
(4.10)
e.
T
and let then
p.
J
l
= inf{x:
~. lo.F(x-t.)
J= J
J
= F(T-t.),
1:0; j :0;.(.
J
in after (4.4)-(4.5) (with the
r.
= p}
(4.11)
I~
Then, we may define
being replaced by
J *
construct the process {WN C!J, .t E: IN}'
and (2.19)-(2.20), in such a case, our
as
and
rn.),
J
By reference to (2.26)
I*
corresponds to the
shaded region of the following:
FIG. 3
Asymptotic Effective Domain of
W
v
V l+-r...,..-" -
-
-
-
-
-
-
-
-
-
-
Y (p 1 ' 1)
v.
J
=Y(p . ,1),
J
1:0; j
:o;.e.
v
4
{.t: 0:0; t l ~ 1, 0:0; t <; Y(Pl' l)} by
2
I , so that I * c I . Hence, for the critical value of sup W(j:):
.t E: I*}, we may take the dominating statistics sup{W(t):.t E: 1°},
Let us denote the rectangle
°
°
which has the critical value
holds for the two-sided case).
{
!:: +
{Y(PI,l)}Z~a
(and a similar case
As in the case of truncation, in
many cases, we may have a fairly accurate estimator of
PI
(from
an independent source), so that a (conservative) pes test can be
carried out as in Section 3, but working with partial sequence of
censored rank statistics for which
every
n:O; Nand
k:o; r
n
where for
n, r n< r . For random entry plan, similar case holds.
N
The failure time is the sum of entry time and actual time to
e.
.e
failure which are assumed to be independent and hence its distribution (say,
F)
is the convolution of the distributions of its
two constituents.
of
F,
from
T which is the IOOp-th percentile point
we may obtain an upper bound to
PI
without any diffi-
culty.
For both the censored and truncated case, it appears that
the smaller is the variation among PI, ... ,PI (or the scatter
of the distribution of the entry points), the closer will be I *
to 1
(and PI to p) or I ** to I2, and hence, the lesser
0
will be the extent of conservativeness of the PCS tests. Ideally,
*
if the distributions of sup{WC!):! e: I}
and sup{ IW(~) I: ! e: I *}
2
were known for an arbitrary 1* C 1 , we could have made the asymptotic level of significance of the proposed PCS tests exactly
equal to
and thus avoided the conservativeness of the tests.
a
However, the current status of Probability theory fails to provide us with these results.
ACCOMMODATION OF RANDOM WITHDRAWALS
5.
Dropouts or withdrawals are quite common in longitudinal
studies. Let x? be the actual failure time of the i-th unit
1
and suppose that
is its withdrawal time, so that the obs-er-
Y.
1
vable random variable is
x.I
= min (X? , Y. ) ,
I I
i ~I ,
(S.l)
and, we may then proceed as in Sections 3 and 4.
Here, we assume
that thp. withdrawal times are random and independent of the failure times, i.e.,
and
X?
1
= p{x?1 ~ x}
F. (x)
1
Y.
1
and
are independent.
H(y)
= p{Y. ~
1
Also, let
y},
i
~ I
(S.2)
so that the distribution of the withdrawal times does not depend
on
i(? 1).
Then,
G. (x)
1
= p{X.1
=
F. (x)
1
~ x}
+
= p{X.o1 ~ x or
H(x) - F. (x)H(x)
1
Y. :::; x}
1
i
?
1
(5.3)
e.
and hence, under
V
i >1.
Ill): F j
Hence, under
V i c::1 1
~F,
the
1I ,
0
Xi
we have
G -=G::F+H-FII,
i
are all i.i.d.r,v. and hence,
the theory developed in Sections 2 and 3 apply as well.
Note that, in general
(5.4)
and hence, when the null hypothesis is not true, because of the
damping facting
G.,
[l-H(x)]
the distance between
(~l),
is less than that of
F.
and
and
1
This will result in a
F. "
1 1 1
loss of power of any test based on the
G.
X.
1
°
(instead of the
X.) .
1
Moreover, if, as in Section 4, we have to know or estimate
(or
PI)'
we need a knowledge of
take the factor
Section 4.
Y(Pl,l)
as
1
H as well.
P
Of course, we may
and get a conservative test as in
Accommodation of withdrawals without assuming inde-
pendence of
X?
and
1
6.
Y.
remains an open problem.
1
COMPARISON WITH THE GEHAN TEST
For purposes of comparison of Gehan's test (1965)
with the
proposed PCS tests, two independent samples, each of size 25,
have been generated - one representing the placebo group and the
other the treatment group.
measured in weeks.
The observations have been treated as
The distribution of actual failure time
(XO)
=9
weeks for the placebo group and the same with parameter 8 = 29
2
has been taken to be negative exponential with parameter
weeks for the treatment group.
wal time
(Y)
and entry time
(U)
are the same for both groups,
weeks and the latter being uniform with parameter
min(XO,Y) =X
bution with parameter
the same with parameter
8
3
=
1
The distributions of the withdra-
the former being negative exponential with parameter
Consequently,
8
11.=29
e = 12
weeks.
has a negative exponential distri6.3
weeks for the placebo group and
e4 = 12.18
weeks for the other group.
The data, thus generated, have some similarity to the data on
remi ssion times of Leukemia patients used by Gehan (1965) .
The
experiment has been supposed to continue upto 26 weeks (half a
e.
·e
year) such that the last subject enetering the experiment gets an
exposure of at least ]4 weeks.
to the distribution of
Our illustration is with respect
X for the special case of the two-sample
problem explained after (2.2).
Under the null hypothesis
H ' the two distributions are
O
assumed to be equal whereas, for the one-sided alternative, the
treatment group is assumed to be stochastically larger than the
placebo group and for the two-sided test they are assumed to be
different.
for which
We base our comparison with respect to Wilcoxon test
a (i)
n
=
i
n+ l
where n:S; N is the cumulative sample size.
As has been mentioned earlier, the PCS statistics are computed as and when either a failure or an entry occurs.
A com-
puter program for the computations of Wilcoxon and Savage statistics under the progressive censoring scheme with respect to an
unrestricted design, is given at the Appendix.
The statistics for
the restricted design can be obtained from those of the unrestricted design by multiplying by the appropriate scaling factor
Y(Pl,l)~.
In the present example,
=1-
and
Y(PI,l)
muc~
information by
cance of the
pes
t
(I-PI)
3
~
1.0.
r~ferring
t F
*
es. or t
=I
PI
-exp(-26/6.3)
=
.98
Consequently, we do not lose
directly to Table 1 for signifi=
18 516
•
,
sup
0<u<t*-t
max
T
k<r(t*,u) N(t*,u),k
1
when it is > than 2.13 for the first time, whereas for t* =
20 • 263 an d 22 • 090 , 0<u<t*-t
sup
max
ITN(t*,u),k I is equal to 2.26
k<r(t*,u)
1
and 2.40 respectively. So the PCS test conclusively rejects the
~2.14
null hypothesis at
5%
level of significance level for the one-
sided alternative.
For the two-sided alternative, the PCS test
barley fails to reject the null hypothesis within 26 weeks at
5%
level of significance.
For Gehan's test the data have yielded:
r =2, r =4,
2
1
Compared against the
W= 252, V(W/P ,H ) = 10582.14, Z = 2.45.
5%
O
critical values of the standard normal distribution, we find this
value of
z
to be significant at
5%
level of significance for
both one-sided and two-sided tests.
Therefore, for this parti-
cular set of data, Gehan' s test has performed better than the
pes
pes
test from power consideration, but the
test has resulted
in saving of more than 5 weeks for the one-sided test.
Finally, it may be mentioned that Gehan has considered ARE
of his test (in the Pitman sense) for the exponential alternative assuming uniform entry pattern.
The derivations may not be
so simple with respect to general entry pattern.
butions of Gehan's statistic
and the
pes
As the distri-
statistic considered
here are not congruent in large samples, Pitman relative efficiency is not relevent in the present case.
We may consider
Bahadur - efficiency, but current statistical tools do not lend
themselves to deriving Bahadur slopes for
D+
and
D.
APPENDIX
As explained in Section 3, the two-dimensional rank order
statistics are to be computed as and when either a new entry or
a response (failure) occurs.
The program first reads the target
sample size and the cumulative sample size.
The columns for
failure time and effective time to failure are to be left blank,
if the corresponding individuals have not responded so far.
The
program has been kept flexible and computes Chatterjee and Sen
(1973) PCS statistics for the single-point entry situation if
the target sample size and the cumulative sample size are
identical.
.e
C
C
C
90
C
C
C
C
10
100
20
C
C
22
23
30
C
C
C
42
PROGRFSSIVC CENSORING WITH STAGGERING ENTRY; SAVAGE AND
WILCOXON STATISTICS
DIMENSroN C(50) ,X(SO),W(SO),Wl(SO) ,WW(SO),Wll(SO),Cl(SO),
IB(SO),AA(SO),BB(SO),T(SO),S(SO),C2(SO),XX(50)
2A(50)
N=CURRENT SAMPLE SIZE; M=TARGET SAMPLE SIZE
READ(I,90) N,M
FORMAT (213)
DO 10 I=I,N
C(I)=-I, IF SAMPLE 1; C(I)=+I, IF SAMPLE 2
X(I)=ENTRY TIME,W(I)=FAILURE TIME,W1(I)=W(I)-X(I)=
EFFECTIVE FAILURE TIME. WeI) AND Wl(I) ARE BLANK IF WeI)
IS GREATER THAN CENSORING TIME
READ(1,100) C(I) ,XCI) ,weI) ,W1(I)
WW(l)=W(I)
CONTINUE
FORMAT(F3.0,3F9.4)
WMAX=WW(I)
DO 20 I=2,N
IF(WW(I) .GT.WMAX) WMAX=WW(I)
CONTINUE
U=O.
CC=M
VARIANCE OF THE WILCOXON STATISTIC
V=CC/(12.*(CC+1.))
DO 22 J=I,M
VARIANCE OF THE SAVAGE STATISTIC
U=U+1./FLOAT(J)
U=CC-U
U=U/(CC-1.)
PRINT 23,U,V
FORMAT(lHO,'****** VARIANCES '//7X,'U ',llX, 'V'//3X,2F9.4/)
WX=AMX1(WMAX,X(N))
WXX=WX+5.
DO 30 I=l,N
IF(W(I).EQ.O.) W(I)=WXX
IF(Wl(I) .EQ.O.) Wl(I)=WXX
XX (I) =WX-X (I)
CONTINUE
CONSTRUCTION OF TWO-DIMENSIONAL STATISTICS: SAMPLE SIZE=N
CENSORING POINT MAY BE THE N-TH ENTRY TIME OR ANY OF THE
FAILURE TIMES BEFORE THE N+I-ST ENTRY
DO 40 I=l,N
C2(1)=0.
DO 42 J=l, I
C2(I)=C2(I)+C(J)
C2(I)=C2(I)/FLOAT(I)
ICOUNT=O
C
C
C
45
60
50
C
75
70
85
80
82
83
TAKING STOCK OF NUMBER OF FAl LURES WITH FFFECTI VE FAlLURf:
TIMES LESS THAN OR EQUAL TO THE TOTAL CURRENT EXPOSURE
TIME RELATING TO EACH CUMULATIVE SAMPLE SIZE
DO 45 J=l,I
IF(Wl(J).LE.XX(1)) ICOUNT=1COUNT+l
IF(Wl(J) .LE.XX(I)) W11(ICOUNT)=Wl(J)
IF(W1(J) .LE.XX(I)) C1(ICOUNT)=C(J)
CONTINUE
IF(ICOUNT.EQ.O) GO TO 40
DO SO IJ=1,ICOUNT
DO SO 1K=IJ,ICOUNT
IF (1K.EQ.IJ) GO TO SO
W2=Wll (IJ)
CC2=C1 (IJ)
IF(W11(IK).LE.W11(1J)) GO TO 60
GO TO SO
W11(IJ)=W11(1K)
C1 (IJ) =Cl (IK)
Wll (IK)=W2
C1(IK)=CC2
CONTINUE
COMPUTATIONS FOR EACH SAMPLE SIZE I=1,N
DO 70 J=l,I
A(J)=O.
JJ=I-J+l
DO 75 K=JJ,I
A(J)=A(J)+1.jFLOAT(K)
B(J)=FLOAT(J)jFLOAT(I+1)
CONTINUE
CONTINUE
MAX=ICOUNT
IF((I.EQ.ICOUNT) .AND.(I.EQ.1)) GO TO 82
IF(ICOUNT.LT.I) MAX=ICOUNT
IF((ICOUNT.EQ.I) .AND.(I.GE.2))MAX=ICOUNT-1
DO 80 K=1,MAX
BB(K)=O.
M(K)=O.
KK=K+1
DO 85 KL=KK,I
BB(K)=BB(K)+B(KL)
M(K)=AA(K)+A(KL)
M(K)=AA(K)jFLOAT(1-K)
BR(K)=BB(K)/FLOAT(I-K)
CONTINUE
GO TO 83
AA(1)=A(1)
BB(1)=B(1)
PRINT 140,1
ICOUNT=MAX
95
C
C
91
40
140
150
DO 91 K=l, ICOUNT
T(K)=O.
S(K)=O.
DC 95 KK=l. K
T(K)=T(K)+(Cl(KK)-C2(I))*(A(KK)-AA(K))
S(K)=S(K)+(Cl(KK)-C2(I))*(B(KK)-BB(K))
CONTINUE
SAVAGE STATISTIC
T(K)=T(K)/((SQRT(CC))*SQRT(U))
WILCOXON STATISTC
S(K)=S(K)/((SQRT(CC))*SQRT(V))
PRINT lSO,T(K) ,S(K)
CONTINUE
CONTINUE
FORMAT (lHO, '****SAMPLE SIZE', I3/14X. 'T(I ,K)' ,9X, 'SCI ,K) '/)
FORMAT(10X,F8.4,SX,F8.4)
STOP
END
ACKNOWLEDGEMENTS
The authors are thankful to Mr. A.N. Sinha for his program-
ming assistance for the computation of two-dimensional PCS rankorder statistics.
The work has been supported by the National
Heart, Lung, and Blood Institute, Contract NIH-NHLBI-7l-2243.
BIBLIOGRAPHY
[1]
Chatterjee, S.K. and Sen, P.K. (1973). Nonparametric testing under progressive censoring. CaZcutta Statist.
Assoc. BuZZ. ~, 13-50.
[2]
Davis, E.C. (1977). A two sample Wilcoxon test for progressively censored data. (To appear).
[3]
Effron, B. (1967). The two sample problem with censored data.
Proc. Fifth BerokZey Symp. Math. Statist. Prob. i, 831-54.
[4]
Gehan, E.A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly censored samples. Biometr>ika
g, 203-23.
[5]
Halperin, M. and Ware, J. (1974). Early decision in a censored two-sample test for accumulating survival data.
J. Amer>. Statist. Assoc. 69, 414-22.
[6]
Majumdar, H. and Sen, P.K. (1977), Nonparametric tests for
multiple regression under progressive censoring. (To
appear).
[7]
Peto, R. and Peto, J. (1972). Asymptotically efficient
rank invariant test procedures. J. Royal Statist.
Soc., Series A 135, 185-198.
[8]
Sen, P.K. (1976a), A two-dimensional functional permutational central limit theorem for linear rank statistics.
Ann. Prob. 4, 13-26.
[9]
Sen, P.K. (1976b). Asymptotically optimal rank order tests
for progressive censoring. Calcutta Statist. Assoc.
Bull. ~,
[ 10]
. v
Zmcenko, N.M. (1975). (cf. Math. Review, 1977 #1912).
Teor. Verojatnost. i Mat. Statist. ~, 62-65.
Keywords and Phrases:
Brownian sheet, clincical trials,
?
n~(O,I)
space, finite-dimensional distribution, life testing problems,
progressive censoring schemes, restricted designs,
study.
time~sequential