Johnson, Norman L.A Genereal Purpose Test for Indirect Censoring."

A GENERAL PURPOSE TEST FOR INDIRECT CENSORING
by
Norman 1. Jolmson
University of North Carolina
ABSTRACT
A general purpose test for censoring of extreme values
of an unobserved variable, in lmknown proportions, is suggested.
The test uses observed values of a variable correlated with the
censored variable, and the joint distribution of the two
variables has to be known for application of the test.
Key Words .§. Phrases:
Multivariate censoring; Farlie-Gumbel-M>rgenstern distribution; order statistics;
U-statistics
Norman L. Jolmson' s work was supported by the National Science
Foundation under Grant MCS-8-2l704 .
•
1.
INTRODUCfION
In [1], I discussed testing whether data consisting of r sample values
n ,··· '~r)
~1 =(X
on a variable Xl represent a complete random sample, or the
remainder of a random sample of size n = r+so + sr from which the individuals
with the So least and sr greatest values of another variable Xz have been
removed, so that the Xl values have been 'indirectly censored'.
asswned that the joint distribution of
It was
~
and X2 is absolutely continuous,
and that their j oint distribution is known.
Using a likelihood ratio approach, the criterion appropriate for testing
the hypothesis Ho ,0 (that there has been no censoring) against the alternative Hs
(so least and sr greatest X2 values censored) was found to be
o,sr
when F2( .) is the Ctunulative distribution ftmction of each X , while the
2
corresponding test for direct censoring of an observed variable X has
s
s
critical region of form {F(min X)} °{l-F(rnax X)} r > K.
I now propose a procedure for use when So and sr are not known.
2.
A GENERAL PURPOSE TEST
If s o and s r are not known, analogy with [2], wherein the criterion
F(min ~)+ I-F(max ~ is indicated as a 'general purpose' test for direct
censoring an observed variable X, suggests consideration of a criterion
(1)
as a basis for a general purpose test of indirect censoring on X2 ' using
observed values ~l of Xl.
The conditional cumulative distribution functions, given
L = min
~2
and G = max
~2
are, respectively
~l'
of
-Z-
and the corresponding density functions are:
r
I
r
fZll(£IXli)
i=l
IT
r
U-FZll(£!Xlj )} and
jfi
L
r
fZll(gIXli)
i=l
IT
FZll(gIJlj)
jfi
where Fz1l(oIXl ), fzll(oIX l ) denote the conditional cumulative distribution
function and density function, respectively of Xz given Xl' So in (1)
If
with
(3)
fZll(yIXlj)= fZ(Y)hZ(FZ(Y)' Xlj )
aHZ(u,v)
h Z(u ,v)
3,
= -----ea=u--
FARLIE-GUMBEL-MJRGENSTERN POPULATICN
In the special case where (Xlj ' XZj ) have a joint Farlie-Gumbel-Mbrgenstern distribution with cumulative distribution function
(I e I
we have
HZ(Y'x l )= y[l + 8{1-ZFl (x l )}(1-y)]
1- HZ(Y'x l )= (l-y) [1-8{1-ZFl (x l )}y]
hZ(y,xl )= 1 + 8{1-ZFl (xl )}(1-2y)
~ 1)
(5)
-3-
and
r
T=1 +
f~Z{l + eZ.(1-2z)}[(1-z)r-l IT (l-ez. )_zr-l IT {1+8Z.(1-z)}Jdz
I
i=l
j~i
1
JZ
j~i
J
where
(6)
After some manipulation, we obtain
+ 2 [r~2] (2j)! r! e2jY2J.
j=l (r+2j+l)!
...!..-
T=
r+l
(7)
where [.] denotes integer part of Y2j
There are
(8)
({y
terms in the summation for Y2j • It is interesting to note
that, introducing the arithmetic mean
(7) can be written
_
(r+l)T -
1+
2
[rL2]
~
r(2j)
j=l (r+2)
[Z"] e
J
2j Y2
J
(7) ,
0
where a(b) = a(a-l) .•. (a-b+l) and arb] = a(a+l) •.• +(a+b-l) are the ascending
and descending factorials, respectively.
It is also interesting to note that the criterion obtained in testing
for direct symmetrical (s o=s r =s)censoring is
[rL2]
1 +
whic~
has
~
(j+l) [j]s[j] 2j
j=l (r+2s+l)
~vident
lZ •] e
J
Y 2J·
(eq. (25) of [1])
points of similarity with (7).
Formula (7) (or (7)') requires knowledge of 8 for its evaluation (though
it is not necessary to know its sign). Noting that Y ~ Y ~ ••.. , and that
2
4
2j
the coefficients e r(2j)/(r+2) [2j] decrease as j increases, it seems
-4-
reasonable to consider using a test with critical region Y2 > K when
is not known, as was suggested in [2] for the case So
=
e
sr.
In [2] it was suggested that a normal distribution might be used to
approximate the null hypothesis (H
0,0
) distribution of Y , but this does not
2
seem to be justifiable, even for large r.
(For Y1 , on the other hand, it
is a good approximation
The approximation (see Appendix)
1 + 6Y 2r- 1 distributed as x2 with one degree of freedom seems to be
appropriate. The corresponding critical region would be
1 2
Y2 > 6r(A~a - 1)
where
A~a
(<P(A~a)
is the upper
~a
point of the unit normal distribution
= l-~a).
REFERENCES
[1]
JOlmSOll, N.1. (1980).
multivariate data:
"Extreme sample censoring problems with
Indirect censoring and the Farlie-Gumbel-MOrgenstern
distribution", J. MuZtiv. Anal. lJ)." 351-362.
[2]
Jolmson, N.L. (1970).
sample values",
"A general purpose test of censoring of extreme
S. N. Roy MemoPial Volume, University of North Carolina
Press, pp. 377-384.
APPENDIX
•
r-Ir
• We first show that if Y2 =
I I z. z.
i<j
1
J
and the ZI S are independent and
identically distributed, with zero expected value and all moments
(~I=~
q
finite, then
(i)
the limit of the moment ratio
{~q(Y2)}/{~2(Y2)}~q as r
+
00
is
q
)
-5-
finite and does not depend on the common distribution, and
(II)
if the Z' s are normal then the limiting distribution of
..
I
Result (I) suggests that this result applies generally.
E[Y ) = 0;
2
r-1 r
~ (Y ) = E[( I Iz.z.)q] and so, if r > q,
q 2
i<j
1 J
Proof of (I)
q 1 q 2
terms in r - , r - , ..• r,l
r-1 r
where g is the coefficient of ziz~ ... z2 in (I Iz.z.)q - the same for
~q(Y2) = (~)gq~2q
q
any r > q.
+
i<j
q
J
1
Hence
(Al)
Proof of (II)
Y =
2
~{(
If the Z's are normal
r
2
r 2
I z.) - I z.] =
1·=1 1
. 1 1
1=
=
2
N(O'~2)'
~{(r-1)(Z/r)
2
2
= _1_
1{~r(r-1)}
12
r
2
1 r
- I (z. - Z) } when Z= r- I z.
. 1 1
. 1 1
1=
1=
~{(r-1)x·i -X;1}~2
where Xl' Xr -1 are mutually independent and
__
Y_
2_ _
then
[2
~1
and the limiting distribution as
;rrr
T +
~2
=
~{(r-l)x:i
-X;-l}
2
= E[Z ] = 1
2
J
- l{r(r-1)}
Xr -1
00, is that of
~ (xi-I). Dr.
12
(A2)
w.
Hoeffding
has pointed out to me that Y2 is proportional to a degenerate U-statistic,
and that the general result can be established by a similar analysis.
In our case, when Z = l-2F l (JS) the common distribution of the Z's
(under H ) is lIDiform (-1,1) and
0,0
-6-
11
=
q
o
if q is odd
(A3)
(q+l) -1 if q is even
This suggests using the approximation
.
•
(
•
I{
fa
Y2
r(r-l)}
approximately distributed as -!{xi-l)
12
or equivalently
2
6Y 2
1 + - - - - approximately distributed as XI •
I{r(r-l) }
(A4)
The I{r(r-l)} may be conveniently replaced by r.
The first four moments of Y2 are, in general
E [Y2]
var(Y )
2
11
e
=0
r 2
(2) 2
~
2 2 = ~r
2
r
3
3
r 2 _ (3) 3
(2) 2
(3)(1,1,1)~2 + (2)~3 - r
~2 + ~r
~3
= ~ 2 (Y 2) =
~3(Y 2)
=
~4 (Y2)
=
(A5.l)
()~
(AS.2)
(AS. 3)
r
3
4
4
r
4
2
(4){(2)(1 ,1 ,1 ,1) + 3}~2 + (3)3(2 ,1 ,1)~3~2
15 (4) 4
= 4""
r
~
2
+ 6r
(3) 2
(3)
2
~ ~
~
+
32 + 3r . ~ 42
~r
2
(2) 2
~
4
(AS .4)
Inserting the special values (A3), we obtain
E [Y
2
IH
0,0
]
=0
var(Y IH
) = .-!.-r(2)
2 0,0
18
,
1
(3)
- :r.r
r
~3(Y2 IHo,o) -
(A6.l)
(A6.2)
(A6.3)
J
•
_ 1 (2) + 1 (3)
5
(4)
~4(Y2IH0,0 ) - SOr
15 r
+ 108 r
_
1
(2)
2
- 2700 r
(125r -445r+444)
(A6.4)