This work was partially supported by the United St~tes Air Force
Office of Scientific Research under Grant No. AFOSR-68-1415.
COMPARISON OF SOME TESTS OF SAMPLE CENSORING OF
EXTREME VALUES
by
N. L. JOHNSON
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 665
JANUARY 1970
COMPARISON OF SOME TESTS OF SAMPLE CENSORING OF
EXTREME VALUES
by
N. L. JOHNSON*
University of North Carolina at Chapel Hill
and University of New South Wales
I.
There are currently available a number of methods designed to re-
duce the possible effects of "wild" ("maverick") observations on the analysis
of sample values.
Among these may be mentioned "trimming" and "Winsorisation".
These methods involve the possible or sometimes automatic exclusion of extreme values among those observed.
Apart from these methods, for which
appropriate statistical analyses, taking proper account of the omission of
sample values, are available, samples may be incomplete owing to inadequate
recording, or, unfortunately, biassed selection of values which accord best
with some preconceived ideas or desires.
While, under properly regulated conditions, information on any censoring
of sample values should accompany the records of the values themselves,
this is not always the case.
Indeed, with the last situation described with
the preceding paragraph, such information is not to be expected; but also,
even in more respectable cases, information may be omitted by negligence.
The problems to be considered in this paper are those arising when it
is suspected that there has been some form of censoring, i.e., omission of
This work was partially supported by the United States Air Force
*
Office of Scientific Research under Grant No. AFOSR-68-l4l5.
2
certain order statistics of the original sample.
Complete, and reasonably
tidy solutions are obtained only on the assumption that the population distribution of an observed character is known.
However, study of this sit-
uation does give some clue as to what can be done when knowledge of the
population distribution is incomplete.
Problems of a similar kind have been discussed in an earlier paper
(Johnson (1962».
They were of a rather simple nature in that there was
usually a direct choice between two possible sample sizes.
The tests discussed in this paper have been developed for use in situations where observed values
X~, X~,
... ,
X~
are available which may
represent measurements on a complete random sample, or may be the remaining
part of such a sample after censoring.
~
X1 X2
Xr )
Denoting by
X1, X2, ... , Xr
the order statistics corresponding to
Xr the hypothesis Hs S
S states that the original
0' 1"'" r
random sample was of size (r+ Ij=o Sj)' with Sj values censored be1
1
,
l
,
tween
••• ,
Xj
H
and
0,0'00 .,0
Xj +1 (j = O,l, ... ,r;
Xo = -00,X r +1 = +00).
The hypothesis
states that there has been no censoring.
We will be concerned here only with censoring of extreme values, so
we assume
by
of
S1 = S2 = ... = Sr-1 = 0,
HSo'Sr' Construction
type HSo,sr has been
of tests of
and for breVity denote
Hs 0' a, .•• , Sr
with respect to alternatives
discussed by Johnson (1969) and will be briefly
recapitulated here.
2.
function,
It is supposed that each
f(x).
XI
is continuous has the same density
For the application of the tests it is necessary that this
function be known so that the corresponding probability integrals,
VI
= fX'f(x)dx
-00
can be calculated.
3
The most powerful test of
with respect to
HO,O
Hs
s
0' r
has a
critical region of form
Y1 8(1 - Y)
r
(1)
>
... ,
8 = So/Sr Yl , Y2 ,
corresponding to Xl' X ,
2
Yr are the ordered probability integrals
... , Xr respectively and a is the significance
where
level of the test.
(If
Ka (8)
Sr = 0,
the region is of form
Yl
This test is uniformly most powerful with respect to all
which
s/Sr = 8.
In particular, when
S
>
constant.)
Hso ' Sr
for
=S r
(8=1)' we have a critical
o
region of form
=
(2)
>
A ,
a
which is uniformly most powerful with respect to symmetrical censoring.
When
So = 0(8=0), we have a critical region of form
1 - Yr
constant
=
i.e. ,
(3)
>
which is uniformly most powerful with respect to censoring from above.
(The region
'Y 1
>
constant'
is uniformly most powerful with respect to
censoring from below.)
In order to construct a 'general purpose' test which should be good
(though necessarily not most powerful) with respect to all
ever the ratio
8
= So/Sr'
HSo,sr what-
an attempt was made by Johnson (1969) to use the
union of critical regions of form (1),
>
R .(8)
a
4
where
with
a'
fixed.
not be equal to
(The actual level of significance,
a'.)
a,
will of course,
By using some approximations, a test with critical
region of form
=
(4)
Y + (l-Y )
r
1
>
C
a
was suggested by this approach.
The remainder of this paper is devoted to comparisons among the three
tests just described.
3.
We first state some results on the joint distribution of the ordered
probability integralsY , Y , ... , Y
1
r
2
The joint density function of
Y
i
which we will use later.
and
Yr'
when
is valid,
is
=
(5)
(o<y
Also, the quantities
(j=O,l, ... ,r;
are jointly distributed as
Yo=O,
Yi =l)
1
<y).
r
5
where the
V's
2
X2 (Sa+1),
V
r
'X 2
means
4.
are mutually independent,
2
X2(Sr+l)
as
r
with
and
Vi' V2 ,
is distributed as
Va
... ,
V_
r i
each as
2
X2 °
(X 2r
degrees of freedom' 0)
The power of test
T1 ,
with respect to
Hs
s
a' r
is
=
where the region of integration
(R)
and
A
a.
satisfies
the equation
(7)
r(r-l)
J J (Yr-Yl)r-2dYrdYl =
a..
R
By writing
(l-y
r ) as
(1 - Y1 - [y r -y])
1
in (6) and expanding the various
binomial expressions, the formula can be expressed as a linear function of
quantities:
(8)
I
~+ ya(1_A.y-1_y}bdy
=
a,b
Y
where
Y+_ = tEl ±1l-4A a. ].
Calculation of
rather tedious.
and
for integer values of
a and b is direct but
It is aided by the recurrence relations
=
(9.1)
(9.2)
Ia,b
I a ,b+i
=
(a -; -1)
6
(9.3)
Equation (9.2) is obtained by integration by parts (noting that
1-Aa.Y±-1 -y ±
(9.4)
= 0) .
Combining (9.1) and (9.2), we have
I a+1 ,b
As
=
initial values we have
[~a+l]
[2 a (a+1)]-1 11-4A
(9.5)
a.
L
(a ~ -1)
j=o
= 10g(l + 11-4Aa. ) - 10g(l - 11-4Aa. ) .
(9.6)
Equation (7) can be written in the form
(7) ,
r I o,r-1 =
Some values of
Aa.
a..
satisfying this equation are shown in Table l.
This table also shows values given by empirical approximate formulae.
7
TABLE
Values of
a.
= 0.10
a.
3 -2
2.65( r+'2}
1
Aa.
= 0.05
a.
4.l( r+2}-2
= 0.01
7 -2
9.15(r+2"}
r
Aa.
2
0.182
0.207
0.235
3
0.122
0.150
0.195
4
0.0841
0.0876
0.109
0.114
0.156
0.163
5
0.0611
0.0627
0.0822
0.0837
0.125
0.127
6
0.0463
0.0471
0.0633
0.0641
0.101
0.101
7
0.0362
0.0367
0.0503
0.0506
0.0830
0.0830
8
0.0289
0.0294
0.0408
0.0410
0.0692
0.0692
9
0.0238
0.0240
0.0338
0.0339
0.0585
0.0586
10
0.0199
0.0200
0.0285
0.0285
0.0500
0.0502
Aa.
2.583r- 2
Large
3.997r- 2
For large values of
by the following argument.
where
x~, x~
VO' Vr
and
and
X~(r-l}
Aa.
r
8.315r- 2
an approximate formula for
A
a. can be obtained
From Section 2, we have
V = Ij:iVj}
I
(
respectively.
are independent variables distributed as
Putting
Aa.
= A~/r2
we have
8
Since
r- 1 (V o+V r+V
1
tends to 2 as
)
x
x~
(product of two independent
00,
with probability 1, it follows
vo
V {r- 1(V +V +V , )}-2
ro r
that the limiting distribution of
41
~
r
variables).
is that of
Hence for large
r, AIet
tends to the solution of the equation
Jo exp(-u-A'u-1)du
oo
=
et
et
i. e. ,
(11)
K1 (.)
where
is a Bessel function (see, for example, British Association
AI
Some values of
tables (1950».
from (11) are shown in Table 1.
et
A similar argument leads to the conclusion that the limiting power, as
r
~
(50 and 5r
00
remaining constant) is (for
~(5r+1)
2A et'
(12)
(K (.)
5r·i
50
Note that if
50
= 0,
The power of test
50)
A,~j
et
I -.-,J.
j=O
K5 -j+1 (2~)
r
the limiting power is
T
i
with respect to
I
5r+1 ,r-1 / I 5r ,r-1
<
Y+
HO, 5 r
is
(r+5 ) !
r
=
large
~
denotes a Bessel function of order
\!
Since
5r
( r-l) !5r! 15r' r-1 .
<
1,
it follows that for
5r sufficiently
9
r+S +1]
(IS +1 r-1 / IS r-1)
[ Sr ~1
r '
r'
=
and so the power tends to zero as
5.
S
r
The critical region for test
Yr
(13)
<
T2
being kept constant.
is
a l/r
and the power with respect to
HSo'Sr
(l-y)
(14)
r
As
r
-+ 00,
Pr[x~(s +l)
r
> -
So
-+00
and as
S
r
Sr
=
dy
(So and Sr remaining constant) the power tends to
-+ 00
As
1
<
-+ 00
2 log a].
Note that this does not depend on
SO'
(r and Sr remaining constant) the power tends to zero,
(r and So remaining constant) the power tends to 1.
If
So and Sr both tend to infinity, with So/Sr = 8 and r both kept constant, a rather curious situation arises.
depending on the values of
a
The power tends to zero or one,
(the level of significance), and of
To see how this happens, we consider the case
large, the beta distribution with parameters
normal, with expected value tending to
~
8
~.
For a given value of
to zero or one according as
a,
r
~
4.
For
r,
a
S
-1
1/r is less, or
this means that the power tends
is less, or greater, than
a = 0.05,
is approximately
and variance of order
the power tends to zero or one according as
(- log a)/(log 2).
So = Sr = S
For
(r+s), (s+l)
The power therefore tends to zero or one according as
greater, than
= 1.
r.
r
-r • For a given
2
is less or greater than
the power tends to zero as
Table 2 shows some values of the power for
a
= 0.05
S -+
and
00
r
if
= 4,
10
with various values of
S.
The power increases with
S
to a flat maximum,
and then decreases.
TABLE
Power of
T2
test with respect to
(a
S
= 0.05;
because
Hs ,S'
= 4)
r
Power
S
Power
7
0.174
19
0.207
9
0.185
21
0.208
11
0.193
23
0.209
13
0.199
25
0.209
15
0.202
27
0.209
17
0.205
29
0.208
(For lower values of
6.
2
S,
see Table 3. )
The theory associated with the general purpose test
Y1 + (l-Y r )
(10) for distributions of
parameters
(v +V )(V +V'+V )-1
is distributed as
So + Sr + 2,
V's).
OrO
r
T3 is simple,
(see equation
That is, the distribution is beta with
r - 1.
The critical region is
Y1 + (l-Y)
r
>
Ca
where
I
C
(2, r-l)
=
1 - a
a
and the power with respect to
The power depends only on
rately.
As
r ~ 00,
with
HSo,sr
is
(So+Sr).
(So+Sr)
1 - ICa(So+Sr+2, r-l).
and not on
So
and
Sr
sepa-
remaining constant, the power tends t~
11
Pr[x~(s +s +2)
As
(So+Sr)
7,
~~,
r
100a% point of
(upper
>
o r
X~)] •
remaining constant, the power tends to one.
The results of calculations based on formulae developed in the last
three sections are shown in Table 3.
T3
practical usefulness of
These figures appear to indicate the
as a 'general purpose' test.
against the possibility of having a very low power without
much relative to the most powerful tests available.
This test insures
sa~rificing
too
Of course, the latter
should be used when the type of censoring suspected (i.e., the value of
is very definitely known.
3
TABLE
Comparison of Powers
Test
Criterion
T1
T2
=
=
Y1 (l-Y r )
Yr
So+s r
r
(so ,sr)
{
4
~
4
30
{
~
T3
= Y +(l-Y
1·
r
l{
4
30
~
(a
= 0.05)
2
=
= (0,2)
6
(1,1) (0,6)
10
(3,3) (0,10)
(5,5)
0.124 0.206 0.120 0.594 0.080
0.239 0.339 0.533 0.925 0.644
0.841
0.998
0.294 0.086 0.780 0.131 0.955
0.381 0.177 0.949 0.549 0.999
0.424 0.200 0.967 0.648 0.999
0.157
0.821
0.996
0.167
0.281
0.303
0.470
0.845
0.892
0.716
0.989
0.996
e)
12
In interpreting the figures shown in Table 3, it should be noted that
considerably higher powers will be obtained when series of two or more sampIes, each possibly subject to the same system of censoring, are available.
It may be felt that the condition stated at the beginning of Section
2, namely that the true probability density function
is unlikely to be satisfied in practice.
f(x)
must be known,
While this is so, in the strict
sense that it is very rarely the case that a theoretically formulated model
gives an exact representation of reality, it will sometimes be the case that
there is sufficiently massive evidence to establish
relative
freq~encies,
not essential that
with adequate accuracy.
f(x)
f(x)
have a simple, or indeed any explicit, mathematical
Slight variations in form
can be tolerated without serious effect.
depend only on
and
from observed
It may be noted that it is
form -- a graphical representation can suffice.
of
f(x),
(Since the test criteria
Yr , inaccuracy in JX f(x)dx for values of
-00
x
in the central part of the distribution have little effect.)
It would, however, be interesting, but beyond the scope of the present
investigation, to inquire into the robustness of these tests with respect
to variation in
f(x).
REFERENCES
Johnson, N.L. (1962).
59-67.
"Estimation of sampel size",
Technometrics,
4,
Johnson, N.L. (1969)
"A general purpose test of censoring of extreme
sample values", S.N. Roy Memorial Volume, Indian Statistical
Institute.
British Association Mathematical Tables (l950).
(Part I).
Volume 6, Bessel Functions
© Copyright 2026 Paperzz