COMPLETENESS COMPARISONS
A~DNG
SEQUENCES OF SAMPLES
by
N.L. Johnson*
Department of Statistics
University of North Caro Una at Chape Z Hi U
Institute of Statistics
~1llneo
Series No. 1056
February, 1976
* This
research was supported by the U. s. Anny Research Office tmder
Contract No. DAHC04-74-C-0030.
Completeness Comparisons Pmong Sequences of Samples
By N.L. Johnson
University of North Cal'oZina at Chapel Hill
1.
Introduction
In previous papers [1] - [3] we have considered methods of testing
wllether a set (or sequence of sets) of observed values represents a complete
random sample (or a sequence of complete random samples) or is the consequence
of some form of censoring having been applied to a complete
(or sequence of such samples).
rando~
sample
The methods required a knowledge of the
population distribution of the measured character(s); some assessment
of the effects of imperfect knmiledge were discussed in [4] and [5 ].
We nm" turn our attention to problems arising when there are t\'ro
sets (or sequences of sets) of data and we wish to compare the incidence
of censoring them. We will suppose the appropriate population distribution
to be the same for each set? and also (as in the earlier papers) that the
distribution is continuous.
Certain special problems will be selected
for detailed discussion from among the very wide range of possible problems.
A particularly interesting feature of the enquiry is that distribution-free
procedures can be used (see Section 3), thus freeing us from the dependence
on knowledge of population distributions which characterized the earlier
work.
A considerable variety of
prcble~s
can arise
(i) We may have different types of censoring.
We will restrict
ourselves to symmetrical censoring of extreme values - omission of the
2
s greatest and s least sample values.
(ii)
We !J'I.ay have different hypotheses in mind.
may know that one sample (or sequence of samples) is
For example we
censored~
and the
other not, and wish to find which one.
(iii)
(iv)
2.
We may, or may not,
knOlt{
the population distribution.
The sample sizes in each sequence may not stay the same.
"Parametric" Test Procedures
2.1 Fixed
r~ers
of Samples
We suppose that we have available two samples, one
(All)
of size
r l and one
Ail
(A 21 ) of size r 2 • We denote the order statistics in
(i = 1,2) by
XiI
~
Xi2
~
...
~
X
•
iri
We further suppose that it is mown that one of the two samples
(it is not specified which one) is a complete random sample, while the
other is a random sample which has been censored by removal of some extreme
observations.
We will specialize the probleTI by supposing that the censoring
is symmetrical, i.e. the s greatest and s least values Ln the original
random samples are removed.
The value of s may (or may not) be known.
Finally, we suppose that the population cumulative distribution
fl.ll1ction,
F(x), is the same for each of the two sanples; and is absolutely
continuous with f(x) = dF/ilic.
Let Hi
Ail
(i
= 1,2) be the hY',JCthesis that the censored sample is
The likelihood function for Hi is
3
(1)
r ·I(r.+zS)![
JS Z r h
L. = 3-1 1 2
F(X. l){l - F(Xi r ) }
IT
IT f(Xk J')
1
(s!)
1,
, i
k=l j =1
'
and so the likelihood ratio
(Z)
(HI/HZ)
is
_ (r l +1)[Zs] [F(XI~l){l-F(XI,ri)} JZ
l? I
•
(r +1) ~SJ
F(X~ ){l-F(X
)}
2
~,l
2,r Z
Whatever the value of s, we see that discrimination between HI
L/ LZ -
and HZ will be based on the statistic
F(X_
l){I-F(XI , T1)}
--1,
(3)
T
= =F"""'CX:---l......){..-.:l""""""-F=-:C""""X--r'::::"')""""'}
Z,
_
(3' )
Z, Z
YIl(I-Ylr )
I
- Y21 (1-Y2r )
Z
(4) where
y ..
1)
= F(X1J
.. ).
(It is worth noting that if the X's have different distributions
FI and FZ in Ap and AZ respectively, we reach the same criterion
(3') with y .. = F. (X.. ) and the Y's have the sanle joint distribution.)
1J
1
IJ
The hypothesis HI will be accepted if T >
will be accepted if T < K.
made.
K~
and the hypothesis HZ
(If T = K, an arbitrary decision can be
Since Pr[T=K] = 0, this will not affect the probability properties
of the procedure.
Choice of K can depend on s, rl' r Z and relative
costs of different incorrect decisions.
= r Z ' the likelihood ratio is just TS and it is natural
l
to take K = 1, in the absence of information on relative costs of errors
If r
(whatever the value of s).
4
To evaluate probabilities of error, we need the distribution of
T.
A discussion of this distribution follows.
He first evaluate the
moments for purposes of record.
If both All-and AZI have been censored with the sl least and
si greatest observations removed from Ail (i = 1,2) then the joint
density function of Y11' Y , Y2l , Y
is
2r2
1r1
(0
(Hypothesis H1.' corresp.onds to s1.! = SI1.} = s;
S
$
Y'l
1.
$
y.1.r.
1.
$
1; i
= 1,2).
i
. = Sil . = 0.)
3-1.
3-1.
and the h-th moment of T (h an integer) is
(6)
Y
llh (T)
= ' h = 2 {(ri+si+s i )!}
E[T]
n
"Y'S"1
i=l
~i' i'
0
(sl+h)!(si+h)!(s2-h)!(s2~h)!
-r:;:r
+Sf+
~L1
1 SIl+2h)'
i
. (r 2+S'+SIV;~
2 2 ~1);
(sl+l) [h](sl+ 1) [h]
( r +S'+SI1) (h)
Z 2 2
=----,'r'('"-7'!""'<-s (h) snCh)
i +s1l+ 1) [hr
(r 1+s1
2
2
1
I;:bments of order (min(sZ ,s:p + 1) or higher are infinite.
Y
We do not
propose to use the moments in approximating the distribution of T.
The distribution of Zi = Yi l (1 - Yir.) has been studied in [ 1 ]
1.
and [ 3]. It is quite a complicated distribution? and the distribution
of T = Zl/ZZ
is even more complicated.
Here we describe two methods
of obtaining an approxiJnation.
As pointed out in [ 3] we can vITite
(7)
z.1. = U.W.(U.
1. 1. 1.
+
V. + W.)-Z
1.
1.
5
where Ui1 Vi and Wi are independent random variables distributed
2
as X with 2(si + 1)1 2(r i - 1) and 2(si + 1) degrees of freedom 1
respectively.
Hence
u W, (D
+V +W ) Z
T - l.~ 2 Z 2
- UZWZ(U1+Vl+Wl)2
(8)
all six variables on the right hand side being mutually independent.
and r Z are large compared with s1 1 sl~ s2 and s2 then
2
-1
-1 Z
(rl-l) T = UlWl {(rz-l) (UZ+WZ)+(rZ-l) Vz }
2
Uw
-1
-1 2·
(1'2- 1)
(rl-l) (UI+WI ) +(rl-l) VI
2 2
If r l
0
(9)
Since
(ri-l)-l(Ui~~i) ~ 0 and (ri-l)-lVi ~ 2 for r i large 1 we would
expect the large sample distribution of T to be approximated by that of
(10)
i.e.
The distribution of the product of two independent F-variables has been
studied by Schumann and Bradley [ 6].
They give tables of upper percentage points of products of independent
variables distributed as
F, \ \)
respectively.
and F
\)2'\)1
VI ~ 2
52
apply to (10) for the special case si = SZj
present interested in si = sl = s;
s2
=
s2
=
These will only
= s11 while we are at
o.
Alternatively we can consider the distribution of log T.
The character-
istic function of log T is (cf.(6))
'h
(11)
E[T 1
]
=
Z {(r.+s:+s1n!}
1
1
1
i=l
si1si'
II,I
•
f(sl'+l+ih)f(S'l'+I+ih)f(SZ'+l-ih)f(siZl+l-ih)
fCrl+si+sl+1+2ih)fCrZ+sZ+sZ+l-Zih)
-.....;;,...,~---.,..,.-,r.=..~.............~-c--Il'"~~~ ..........-
6
Taking derivatives of the cumulant generating ftmction
(log E[Tih])
we find
(IZ)
Kt(log T)
= 1jJ(t-l) (51' + 1) + 1jJ(t-l) (s"1 + 1) - ZtljJ(t-l) (r1+5'1'5"+1)
:
11
..
+ (-1)t{ljJ(t-l)(5i+l) + 1jJ(t-l) (sZ+I) - zt1jJ(t-l)(r +5i+ s z+ l )}
z
where 1jJ (s ) (x) =
ds+ll
( )
is true , and r 1
(12) becomes
= r Z = r,
we have s{
(13)
K
t
(log TIB)
o~+l x
dx
is the (s+2)-gamma ftmction when HI
= Z1jJ(t-1)(x+1)
= 51 = 51
and
52 = 52 = 0,
whence
zt1jJ(t-1) (r+Zs+l)
+ (_I)t{Z~(t-1)(1) _ zt1jJ(t-1) (r+l)}.
Also
(14)
\(log TIH ) = Z1jJ(t-l) (1) - zt1jJ(t-l) (r+l)
Z
+ (_1)t{~~(t-1)(s+l) _ zt1jJ(t-l)(r+Zs+l)}.
In particular
(15)
1 (log TIH1)
K
= Z[1jJ(s+l) - W(I) = -Kl(log TIHZ)
{w(r+Zs+l) - w(r+l)}]
and KZ(log TIH ) = Z[~(l)(s+l) + 1jJ(1) (1) - Z{w(l)(r+l) + 1jJ(1) (r+Zs+1)}]
l
(16)
= KZ(log TIH Z)'
Some values of {K1 (log rlHl ) - 1jJ1 (log
are shO\m in Table 1.
TIHZ)}/~<Z(log
TIH i ) =
p
(i = 1,2)
7
TABLE 1:
r l =r 2=r\s
1
4
5
6
7
1.44
1.52
1. 57
1.60
Values of
2
r\s
2.10
2.23
2.33
2.40
p
1
1.63
1.66
1.68
1.77
1.87
8
9
10
20
00
2
2.46
2.51
2.55
2.74
2.97
(The values for r=oo are the limiting values
4{lj; (s+1)
;,t (1) }
[2{ljJ(1)(S+I).~1)(1)}]~
TI1ese values of
p
.)
are of such a size as to indicate that even
with a single pair of samples, fairly good discrimination is attainable,
at least when r l = r Z = r.
If we choose HI (H 2) when log T > «) 0 then approximating the
distribution of log T by a normal distribution, the probability of
correct decision would be
~(\p).
If we have two sequences of samples
and
we would use as criterion
(17)
T
where T.
J
(m)
=
m
IT
T.
j=l J
is the value of T (as defined in (3)) for the j-th pair
of samples Alj ? A2j
(j = 1,2? .• m) .
Denoting the order statistics
8
of sample Aij
by
:os;
x..
l)r.
1
we have
Y. 'l(l-Y.. )
1Jrl
1J
(18)
Tj = YZjl(l-YZjrz)
where Y"1) h = F(X.1)·h). The second method of approxiw.ation (described
above for T) is the more simply extended to T(rn). Since
(19)
(on the assumption that r 1 , r Z' si' s1' s2 and s2 remain constant
throughout). It is not essential that the X's have the same distribution
in each of the samples Aij , provided the appropriate transfonnations to
the Y's are used (see the remarks following (4)). (Nor is it essential
that there be the same mnnbers of samples in the sequences Sl' S2 ; the
modifications in such a case are obvious and will not be discussed here.)
From (19) we can see that there will be increasing power (i.e. probability of reaching a correct decision) as the lengths of the sequences
Sl' Sz
increase.
In fact, results obtained in the next Section support
the view, stated above, that the fixed sample procedure is likely to be
quite effective, even with a single pair of samples.
2.Z A Sequential Procedure
In the circumstances described above if pairs of samples A1j , AZj
from the sequences Sl' Sz bec01ile available at about the same time,
9
while there is an appreciable interval between successive pairs - (Alj,A Zj )
and (Al,j+l,AZ,j+l) - it is natural to consider using a sequential test
procedure. A sequential probability ratio test discriminating between
HI and HZ ' using the pairs of sa.Iilples as they become available, is
based on the continuation region
al
{Crl+l)[s]}m m
l-aZ < (rZ+l) [s] j~l
(ZO)
with acceptance of HI (HZ) if the right (left) hand inequality is violated.
( aI' a Z are the nOJTllnal chances of error when the hypotheses HI' HZ
respectively are valid.)
TIle remainder of this section will be devoted to the special case
(Zl)
The limits depend on
as well as on a l and a Z ' The larger s,
the closer together are the limits and the earlier a decision is reached.
s~
If an incorrect value of s - g, say - is used in (17), then the approximate
actual chances of error a l (s)
formulae
and a Z(s)
can be obtained from the
l-a.l (s) }l/
{ azes)
whence
:
(1=azlil/5
\A
J
10
From (15)
(23)
Yl·l(l-Yl · r )}
E1 = E[s log{Y: (I-Y ~ ) I~Il] = s[2~(s+1)-2~(r+2s+1)-2~(1)+2~(r+l)]
2J1
2Jr
s -1 r+2s -1
= 25 (I
j=l
and
(24)
j
-
I
j
)
j=r+l
E2 =-E1
is an obvious notation.
The standard approximate f0rnR.11a for the average sample number (ASN)
of the procedure when HI
(25)
is
Eil{al log
valid~
1~2 +
if
(l-al ) log
l~:l}.
Table 2 presents a few values of this quantity for the symmetrical case
= Ct 2 = Ct. The values for Ct = 0.05 are not shol'ffi. TI1ey are so small
that they clearly are only very rough approximations. (It is, of course,
Ct
1
impossible for the ASN to be less than one, in reality.)
They do indicate,
however, that a decision can be expected very soon in the process.
TABLE 2:
Approximate Average Sample l~bers
ex. = 0.001
Ct = 0.01
2
r\s
1
2
1
2.0
4
3.6
1.3
5.4
1.8
5
3.3
1.2
5.0
6
3.1
1.1
4.7
1.7
2.9
1.0
4.5
1.6
7
2.9
1.0
4.4
1.5
8
1.0
1.5
4.3
2.3
9
1.0
2.7
4.2
1.5
10
1.3
(0.9)
20
2.5
3.8
00
2.3
(0.8)
3.4
1.1
(The "r = 00" values are calculated from the formula
(1-2Ct)log{(1-Ct)/Ct} )
tS
2s Lj =l j
-1
.
11
3.
Distribution-Free Tests
3.1 Single Pair of Samples
We suppose we have a single pair of samples All) AZI as described
at the beginning of Section Z.l. It is remarkable that it is possible
to construct a test of HI
(or H2 ) in this case with a known significance
level without knowing the population distribution function of the X's,
provided only that it is continuous, and the same for both All and
A2l • Heuristically one might "expect" this, on the grotmds that one
of the two saTJ1Ples (the one which is not censored) provides some information
on the population distribution.
The only assumption we need (apart from continuity) is that the
population distribution of X is the same for All and A2l .
Suppose HI is valid. Then the original sample sizes were
(rl + 2s)
for All and r Z for AZI . For the original set of Crl + r Z + 2s)
rl+rZ+Zs
) possible rarucings of these
sample values there would be ( r
Z
values, according to origin, in order of ascending magnitude. Since
the population distributions for All and A21 are identical, these
r l +r Z+2s -1
orderings would be equally likely, each having probability ( r
)
Z
(Under HZ ' the number of equally likely orderings woeld be
r 1+r Z+2s
(
r
).)
l
To calculate the likelihood, based on ranks, corresponding to the
specific sets of observed ranks we
of the original
(rl +r Z+2s)
observed ranks (of
r~ve
to evaluate the numbers of rankings
observations which could have produced the
Crl+r Z) observed values) after censoring.
Under
1Z
HI ; we can recover the original ranking by adding to AU'
less than the least observed value in AU
~
s observations
and s greater than the
greatest observed value in All .
(rl+r Z) observed values in All and AZI cowbined,
least and the Gz greatest are from AZI then there are
If, among the
the Lz
corresponding original rankings.
Hence the likelihood function for
HI is
(Z6)
Similarly, the likelihood ftmction for HZ is
, = (Ll+S) (GI+S)
s
LZ
(27)
where LI , Gl
(HI/HZ) is
s
2s
1+r 2+ 1'
(r
/
r
.
~
1
are defined analogously to LZ' Gz . The likelihood ratio
L'
(r +1) [Zs]
CL +l) [s] CGz+l) [s]
Z
1 _ -=l'-----r'>'I'...... •
(r +1) [2sI (L +1) [S]C G +1)W
I
l
Z
(Z8)
11 -
.
In the special case of equal observed sample sizes
(rl
= r Z = r)
we have
(29)
If we construct a rule:
Li _ (L z+l)[s](G Z+1)[s]
LZ - (LI+l)[sJCGI+1)[S] •
choose HI (HZ)
decision appears to depend on
if
Li/Lz
>
«) 1, then the
as well as L1 , LZ' GI and GZ .
This was not the case with the "para.m.etricH test (see (3)).
5,
13
However, we note that just one of L1 and 12 IlUlst be zero, and
just one of Gl and GZ must be zero. If both Ll &,d G ' (or both
1
L2 and G2 ) are zero then the test will accept HI (or HZ) whatever
be the value of s.
If Ll and G2 are zero then the test accepts
HI (HZ) if LZ > «) GI ; and if L2 and Gl are zero then the test
accepts HI (HZ) if GZ > «) Ll . In all these cases the decision is
in fact not dependent on
s.
So we can present the test in the form:
(30)
or even more simply:
accept HI (HZ)
whatever be the value of s.
so also Ll
If L2
(Note that LZ > Gl
implies L2 > P and
= a .)
= Gl > a
or G2 = Ll > 0, we are unable to reach a decision.
In order to obtain the distribution of Li/Lz we have to take into
account not only the probabilities (26) or (27) but also the number of
possible orderings of the values between the max(Ll+l, L2+l) and
min(r1+rZ-Gi, rl+rZ-G Z) order statistics of the combined (AI u AZ)
sample. When Ll = r 1 we must have GZ = r Z ' and when Gl = r 1 we
must have LZ = r Z ' and conversely. Apart from this case we have, in
general, ~hen max(LI,Gl ) < r l and nax(LZ,GZ)<r Z to multiply the
probabilities in (Z6) (limen HI
is valid) or (Z7) (when HZ is valid) by
14
(31)
where ep (=
is the TIl.tr;iber of zeroes in (Ll'Gr). We further
0,1~2)
note that Li + Gi
L.,
G.1 equals. r.1
l'
S
ri
or
(i = 1,Z) and (31) is equal to 1 if any of
(r.-l)
1
for either i = 1 or 2.
Table Z sets out the probabilities for the case HI valid. A similar
table can easily be constructed for HZ valid.
take
(In calculating (Z8) we
(~) = 1.)
Table 2:
Distribution of L'1-2
/L' under HI
L1= L2= Go1- GZ=
0
tz
0
'0.12
0
tz
gl
0
t1
0
0
t1
0
gl
IT
IT
(:>Z
0
,
(r +r +ZS)
PROBABILITYx I r~
(Li/Lz) (rz+l) [2s]/(rl +l) [Zs]
(S!)-lCtZ+l)[S]CgZ+I) [s]X(31)
(S!)-1(t +l)[s]X(3I)
Z
(S!)-I(gZ+1)[S]X(3l)
1x (31)
0
Ct +1)[s]Cg +l)[s]/CS!)Z
z
Z
(t +I) [s]/(gI+I) [5]
2
(gZ+l) [s]/(t +1) [5]
1
(sl)Z/{(t +l)[s](g +l)[s]}
.11
(For max(t l ,gl) < r l
maxCtz,gZ) < r Z . If max(tl ,gl) = r l then
max(R.Z,gZ) = r Z ~~d the entry in the fifth column is (r Z + 1) [s]/sl .)
When r 1 = r 2 the last coh.nnn gives the values of Li/Lz.
acceptance of HI follows when (L Z > Gl ) or (G z > Ll ) then
If
Pr[correct decisionlt~] = Pr[CLz>G1) u (GZ>Ll)IHI ]
(32)
= Pr[L Z>Gl >OIH1] + Pr[GZ>Ll>OIHI ] + Pr[L1=GI =0!Hl ]
(remembering that L2 > 0, G2 > 0 imply Ll = 0, Gl = 0 and conversely).
From (32) and Table 2 it is possible to evaluate.the probability of a
correct decision (and the probability of not reaching a decision) when
15.
HI
is valid.
In the symmetrical case
have the same values when HZ
(rl=rZ=r)
these probabilities
is valid.
The calculations for r l = r Z = 4 are set out below in detail:
ZS
(zrr+ ) = (8+25)
4 = 210 (s=l);
495(s=2)
(~Z+1)[S](g2+l)[5]/(SI)2
~l
~2
gl
gz
4
3
Z
1
0
0
0
4
3
3
3
0
0
3
2
0
0
0
0
1
0
1
0
0
2
0
1
Z
0
1
0
0
0
Z
Z
1
1
1
1
2
3
3
1
1
6
0
1
6
0
0
1
0
0
2
(31)
5=1
s=Z
5
4
4
4
15
10
10
10
30
6
6
18
36
3
9
8
3
3
6
9
2
4
=
(a)"
(a)H
(a) ,
(a) ,
(a)
(a) II
(a) ,
(a)
(a)
(a)"
(a)
s=l
(2 x 8)+(2 x 18)+9+24 = _85
210
Z10
5=2
258
49"5"
4+4+9
38
495
(a)
=G1
=OIH
P: = Pr[L1
1]
(a) I
pi
= Pr[GZ>Ll>OIHl ] = 210
= 210
(a)"
pI!
= Pr[GZ-Ll>O\I\] = 5+4+6+12
210
55
27
=m
495
5=1
s=2
334
= 0.567
49S = 0.675
Probability of correct decision
Probability of no decision
(2p")
Probability of incorrect decision
17
(p+Zp I)
m
54
210
=
0.257
rib- =0.176
~§~
=
0.222
ri}=
0.103
16
Table 3 presents the results of sini1ar calculations for r = 4(1)9.
TABLE 3:
Properties of Distribution-Free Test Based on a Si..'1.g1e Pair of
Samples Each Containing r ObSeT'led Values
r\s
PROB~
q>,RRECT
',-." .... ..
:~
1
PROB.
NO DEC.
PHOB.
INCOP.RECT
PROB.
CORRECT
NO DEC.
PROB.
INCORRECT
0.257
0.194
0.165
0.149
0.139
0.133
0.176
0.187
0.188
0.187
0.184
0.182
0.675
0.742
0.778
0.801
0.815
0.827
0.222
0.152
0.117
0.099
0.088
0.081
0.103
0.107
0.104
0.100
0.097
0.092
2
PROB.
'
4
5
6
0.567
0.619
0.647
7
0.664
8
9
0.676
0.684
The figures in Table 3 are fairly
that ",e are trying to reach
El
encouraging~
"'hen it is realized
conclusion on the basis of a single pair
of samples, without knowing anything of the population distribution,
except that it is continuous.
3.2 Sequence of Pairs of Samples
If we have t'wo sequences ~ $1 and S2
2 (pages 7-8),
"Te
~
as described in Section
can expect considerable increases in power.
ratio test criterion, based on two sequences of
m
(33)
•TI
J=l
in an obvious notation.
(34)
l~
'M
The likelihood
samples each, would be
2.)
(Li ./ L
J
J
r 1 =rZ=r the criterion would be
~ [(L2j+l~[Sl(G2tl)[Sl ]
j=l
(L1j+l)[s](Glj+l)[S]
and it ",ould be reasonable to assign to HI (HZ)
reaching no conclusion if it equals 1.
if its value is >
«)
1,
17
Probabilities of correct and incorrect decision with this criterion,
for the case m =
TABLE 4:
PROB.
CORHECT
5
6
are shown in Table 4 for r
= 4(1)6.
Properties of Distribution Free Test Based on
Each Containing r Observed Values
r\s
·'4
2~
1
PROB.
NO. DEC.
0.791
0.832
0.851
INCORRECT
PROB.
CORRECT
0.093
0.085
0.083
0.897
0.936
0.954
PROBe
0.117
0.084
0.066
~~
Pairs of Samples
2
PROB.
NO. DEC.
PROB.
INCORREC
0.076
0.044
0.029
0.027
0.020
0.017
These figures do, indeed, show considerable improvement over those
in Table 3. They also confirm the conclusions in Section 2 on the power
of discrimination 'when the population distribution(s) is (are) known,
since the latter will be more powerful than the present procedure.
If it be supposed that the population distribution connnon to the
two members of a pair of samples is the same for each pair, then a better
distribution free procedure should be feasible 7 taking accotmt of this
additional knowledge.
To constr.Jct such a procedure, it is necessary to enumerate all
arrangements of the original m(rl + r Z + 2s) observations, which could
result in the observed sequences Sl(: All' ... , AIm) and S2(= AZl ' ... ,A 2m)
after reIOOval of the s greatest and s least observations from each of the
members of Sj
(for Hj
= 1,2).
Under Hj ,the mr 3_j
constitute a random sample of that size, but the
observations in S3'
-J
mr j observations in Sj
valid;
j
have been censored in a special way.
Evaluation of the required probabilities appears to be rather complex,
though not impossible.
In view of the good power attainable with the
18
procedure already discussed in this section, (which can also be applied
when variation in population distribution from pair to pair is suspected)
we will not investigate further possibilities here.
A distribution free sequential probability ratio test has the continuation
region
(35)
{(r1+l)[ZS]}~'m
al
I-a Z < (rZ+l) [2sJ
with HI (HZ)
m {(LZj+l)[S](GZj+l)[S]}
j~l
(Llj+l) [sJ(Glj+l)[sJ
l-al
<-az-
accepted if the left (right) inequality is the first to
be violated.
4.
Some Other Problems
In the situation described in Section 3.1, if we just want to test
the hypothesis:
Ills Al uncensored (assuming AZ to be uncensored? Ii , )
the likelihood criterion is simply
(36)
with high values of II leading to rejection of the hypothesis.
(Note that
(L Z + 1)
= rank
order of least member of A, and
(r l + r Z - GZ) = rank order of greatest member of A in the combined
set of (rl + r Z) observed sample values.)
Under Hs,s
(using the notation of [ 3] for symmetrical censoring
of the s extreme values from each end of the sample), the joint distribution
of L2 and
G
2
is
19
(37)
It is interesting to note that if r 2 is large, so that we have
a lot of infonnation on the population distribution provided by a large
random sample, known to be uncensored, then
(38)
which is the criterion used to test for symmetrical censoring (see (2)
of [3]) when the population distribution is known.
We now, for a JOOment, consider the problem of detecting which one,
out of k sequences Sl'S2' .•• ,Sk of samples, has been s)Tl111etrically
censored.
We identify the hypothesis Hi with the statement
11
Si is the
censored sequence. Ii For simplicity we will suppose that each available
sample contains r observed values.
The method, of course, is to consider each set in tum as the censored
one, the other
(k - 1)
sets being UIlcensored.
The appropriate likelihoods
are those appropriate to the case of two samples (as in Sections 2 and 3)
of sizes r
and
(k - l)r respectively, the smaller sample being the
residue after removing the s greatest and s least values from a complete
random sample of size
(r
+
2s).
If the popUlation distribution fi.mction(s)
Fj (0)
are known the
sequence Si indicated as the censored one by this method is that for which
20
m
(39)
T. = IT [F.(X.. J{l - F.(X.)}]
1
j=l 1 J1
1 Jr
is maximum among Tl' TZ' ... ,Tk • The distribution-free approach leads
to selection of that S.1 for which
(40)
Vi
is maxi.Im.lm am:mg VI' Vz'
=
m
.. + 1) (Goo + 1)
IT (L1)
1J
j=l
...
,Vk where L.. (G .. ) is the m.nnber of
1J 1J
observations less (greater) than the least (greatest) observation in
Aij
among Alj ,,\,A Zj ' ... ,Akj
for
j
= 1, Z, ..• ,ro.
Since we have a relatively large uncensored sample available (though
we are not certain where it is) we should be able to construct good distribution-free procedures in this case.
When Hi
is valid and we are consider-
ing Si versus the remaining sequences we are in effect in the situation,
described earlier in this Section,
~lherein
we have a relatively large
illlcensored random sample providing information on the population distribution.
When any other sequence - S i ' say - is the one compared with the remainder,
such bias as is introduced by the presence of Si among the remainder tends
to reduce apparent significance.
It therefore seems that a procedure in
which each sequence in turn is tested for symmetrical censoring against
the remainder, in the way described at the beginning of this section,
will provide useful information.
The situation becomes more complicated
if there can be IOOre tha.T1 one censored sequence, especially if the number
of censored sequences is not known precisely.
Sequential procedures, choosing among the hypotheses HI' HZ' ...
may be constructed in a straightforward way.
'1\
21
The discussion in this paper has been restricted to symmetrical
censoring of extreme values.
Analysis for other cases (e.g. censoring
from above or below (i.e. on right or left) ) follows parallel lines,
and will be reported on in the next paper in this series.
Topics concerned
with censoring in more than two sequences of samples will be studied in the
third paper in this series.
22
REFERENCES
[1]
Johnson, N.L. (1966) "Sample censoringi7 , Proa. 12th Conf. Des. Exp. Army
Res. Dev. Testing, 403-424.
[2]
Johnson, N.L. (1970) itA general purpose test of censoring of sample extreme
values" (In Essays in Probability and Statistias (S.N. Roy MemoriaZ
VoZwne)) Chapel Hill: University of North Carolina Press, pp. 379-384.
[3]
Johnson, N.L. (1971) "Comparison of some tests of sample censoring of
extreme values' AustraZ. J. Statist.!i 13, 1-6.
I ,
[4]
Johnson, N.L. (1973-4) "Robustness of certain tests of censoring of extreme
sample values; IiIIlJ, University of North Carolina Mimeo Series
Nos. 866" 940.
[5]
Johnson, N.L. (1974) lIStudy of possibly incomplete samples", Proa. 5th
Intern. Conf. Prob." Brasov, Romania
[6J
Schumann, D.E.W. and Bradley, R.A. (1959) ,liThe comparison of the sensitivities
of similar experiments: I','bdel II of the analysis of variance l l , Biometrias 3
15, 405-416.
© Copyright 2026 Paperzz