Johnson, N.L.; (1973)Robustness of certain tests of censoring of extreme sample values."

1
This research was supported in part by the u.s. Army Research Office,
Durham, under Contract No. DAHC04-7l-C-0042 •
.e
ROBUSTNESS OF CERTAIN TESTS OF CENSORING
OF EXTREME SAMPLE VALUES l
N.L. Johnson
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 866
March, 1973
ROBUSTNESS OF CERTAW TESTS OF CEHSORING
OF EXTREiiE SAr-1PLE VALUES 1
N.L. Johnson
University of North Carolina at Chapel HiU
ABSTRACT
The effects of inaccuracy in parameters of a hypothesized normal
distribution on certain tests of symmetrical sample censoring are
investigated •
.e
1
This research was supported in part by the U.S. Army Research Office,
Durham, under Contract No. DAEC04-71-C-0042.
Robustness of Certain Tests of Censoring
of Extreme Sample Values
1'1. L. JOHNSON
Univer>sity of North Carolina at Chapel Hill
1.
Introduction
Tests of censoring of sample values, and in particular of extreme sample
values, have been described in [1]-[3].
All these tests use the probability
integral transformation
x
Y=
.e
Loo
f(x)dx
of sample values of a continuous random variable,
f(x).
X,
with density function
If this density function is not accurately known, the test procedure
might be based on an estimated
f(x).
It is the purpose of this paper to give
an assessment of the effects of errors in
the tests described in [1]-[3].
f(x)
on the properties of some of
In particular we will consider tests with
critical regions
(i)
Y
l
(ii)
Y (l-Y )
r
l
(iii)
Y
l
where
~
C
a
C
a
~
+ (l-Y r )
• ••
~
Y
~
r
C
a
are the ordered probability integral transforms
corresponding to an available sample set of
the
C is
a
r
observed values of
are chosen to give a significance level
and (iii) do not, of course, have the same values.
a.
The
C 's
a
X,
and
in (i), (ii)
2
If it be supposed that the
r
of a complete random sample of size
s
r
observed values are the surviving members
n
= r+sO+sr'
alternatives
r
least and
greatest values have been censored, the tests (i) and (ii) are uniformly
most powerful tests of the hypothesis that
s
after the
= 0)
n > nO
= nO(~r)
n
with respect to
when censoring is restricted to be from below (i.e.
in case (i), or symmetrical
(i.e.
s
0
= s)
r
in case (ii).
Test
(iii) was suggested, on heuristic grounds, in [1] as being suitable for use
when the ratio
sOlsr
is unknown.
Subsequent investigation [2] has tended
to support the validity of this suggestion.
2.
General Theory
If the correct density function
Y~
J
(1)
.~
=
f(x)
is used in calculating
x.
f
J
f(x)dx
(j
= 1, .•. , r)
_00
then the joint density function of the corresponding order statistics
Y S Y s ... S Y
1
2
r
is
(r+sO+sr)!
(2)
sO! sri
If a density function
g(x)
is used, in place of the correct
the order statistics
f(x),
then
corresponding to
(3)
will in general have a distribution different from (2).
If
Y~
J
and
Yj [g]
are each strictly increasing functions of
is also a strictly increasing function of
h(Y.)
J
Y ••
J
In fact
x. ,
J
then
3
where
hey)
is defined by
:(y)
t(y)
hey)
=
f
_~
It follows that any set of
with
g(t)dt
j~nequalities
f_ ~
y =
f(t)dt
involving the
expressed in terms of inequalities invelving the
Y. ' s,
J
Y. [g] 's
can be
J
which have known
joint distributions.
In particular
(5.1)
pr[Yl[g]{l-Y [g]} ~
r
(5.2)
ca In] = Pr[h(Yl){l-h(Y r )}
~ C In]
a
and
.e
Pr[Yl[g] + {l-Y [g]} ~ C In] = Pr[h(Y ) + {I-h(Y )} ~ C In]
r
a
I
r
a
(5.3)
Calculation of these probabi.lities may sometimes be technically awkward,
but is straightforward in principle.
3.
Censoring from Below
This is the simplest case, since (5.1) can be written
(5.1) ,
and
Y
l
has a beta distribution with parameters
course, if censoring is from below,then
s
r
(r+s ).
r
= 0.)
Hence
Pr[Y [g] ~ ca In] = 1 - I -1
(sO+l, r+s )
l
h (C)
r
(6)
a
where
I(')
is the incomplete beta function ratio
[2].
(Of
4
In any particular case (i.e.
for any particular
need study only the appropriate function
Since
Yl[g]
~
h
-1
f(·)
and
g(.»
we
(y).
Ca
is equivalent to
Yl ~ h-lcc a )
it is clear that if
g(x)
is used in place of
the correct density function
f(x),
we are, in effect, using
with an actual significance level
(f(x»
a
k
defined by
(7)
C
a
instead of the required value
a.
As an example, suppose that
f(x)
r;:;-
.e
(v21f cr)
and that
g(x)
in place of
*
-1
is the normal density function
x-E,; 2
~(-)
exp{-
cr
}
is also normal, but with an incorrect pair of values
S, cr.
Then, in (4)
t(y)
=~ +
-1
cr W (y)
\
and
h(y)
(8)
= ~(cr*-l{s +
=
where
W(u)
=
r;:;-.
(v21f)
Conversely,
(9)
E,;*, cr*
-1
~(~
cr*
cr ~-l(y) - E,;*})
~-l( ) _ s*-s)
y
cr*
\,
5
We will obtain some numerical results for the case
of completeness of the available data.
nO" r -i.e. .test
In this case
C • 1 _ al / r
a
(10)
Taking a" 0.05
we have the following values of
5
r •
Co. 05-
15
10
0.4507
0.2589
Co. 05:
20
0.1810
0.1391
From (7) and (9)
C .. ~(J*
a*
0
(11)
Given
Ca*'
we can calculate
Values of
.e
4>
-1
4>
-1
10
15
20
-0.1239
-0.6469
-0.9116
-1.0844
Table I gives values of
and
(~*-~)/o.
r
a* .. (l-C a *) •
S
0:
(Co. OS) ..
)
are:
(Co. 05)
r
4>-l(C) + ~*-~
a
0
a*
for various combinations of values of
These should be compared with the nominal significance
level of 5%, on which the calculations are based.
0*/0
6
Table 1
Values of
a* (actual significance level)(Nominal level: 5%)
(Test for one-sided censoring)
0*/0
r=5
r=lO
r=15
.e
r=20
~*-~
-=
a
0.4
-0.4
-0.2
2
0.224
0.138
0.075
0.038
0.017
1
0.168
0.097
0.050
0.023
0.009
~
0.143
0.080
0.040
0.018
0.007
2
0.630
0.496
0.357
0.229
0.130
1
0.202
0.109
0.050
0.019
0.006
~
0.069
0.028
0.009
0.002
0.001
2
0.803
0.720
0.594
'0.446
0.299
1
0.224
0.117
0.050
0.017
0.004
~
0.038
0.012
0.003
2
0.904
0.828
0.738
0.603
0.456
1
0.239
0.136
0.050
0.015
0.003
~
0.022
0.006
0.001
(-
0
0.2
denotes "less than 0.0005")
The importance of having a reasonably accurate estimate of
If
a
is not accurately estimated, deviation of actual from nominal signif-
cance level increases rapidly with
4.
a is clear.
S~nmetrical
*
a.
and General Censoring
In the other two cases «ii) and (iii»
numerical quadrature.
it is usually necessary to employ
Formula (5.2) can be evaluated as
7
;?;
C In]
a
= E[Pr[h(Y 1 )
= E[Pr[Y I
(5.2)'
=
E[I1 _
Y2
;?;
;?;
C {1-h(Y )}-lln,y ]]
a
r
r
h- 1 (C {1-h(Y )}-I)ln,Y ]
a
r
r
(Y) ( r-l, s'O+l)]
r
where
h-l(C {1-h(Y )}-l)/y
a
r
{
distribution with parameters
Y.
r
r
(Y
r
(so+r), (sr+1);
r
;?;
.e
(sO+I), (r-l), and range (O,Y ).
r
=
Ca In]
;?;
where
=
Jh-1(C -l+h(Y »/Y
a
r
r
for
hey )
r
;?;
1 - C
o
for
hey )
<
1 - C •
a
{
When there is no censoring
(s
o=
s
r
r
a'
then
= 0)
and (5.2)', (5.3)' become
"
E[{l-y.(Y )}r-l]
(12)
J
(j = 2,3
Note that
Y2(y)
=1
for
and
r
=r
Jl {y[l-y.(Y )]}r-l dy
0
Y3(y) = 0
for
J
respectively)
y < h-l(~[I-/I-4Ca ])
y > h-l(~[I+/l-4C
and
a
has a standard beta
r
Similarly, from (5.3),
(5.3)'
C ,
the conditional distribution
is beta with parameters
Y ,
hey ){I-h(Y )}
otherwise
1
and expectation is taken with respect to
given
for
r
y >·h-l(l-C).
a
a
]);
r
8
Hence the limits of the integral for
from
0
to
1
to
h-l(~[1-/1-4Ca ]),
= {h-1 (I-C)} r
(13)
-1
(l-C)}
+ r
a
II-1
h
r
can be changed
and also
a
{h
E[{1-Y2(Yr)}r-l]
{y[1-y (y)]} r-l dy
3
(I-C)
a
1
-1
r-l
+ r [ 1
{y-h (C -l+h(y»}
dy
h- (I-C)
a
a.
The integrals in (12) and (13) will usually have to be evaluated by
As
r
quadrat~re.
increases it may be difficult to retain adequate accuracy.
We now turn to the special case considered in Section 3, with
f(x)
g(x)
.e
h(y)
=
~(~ ~-l(y) _ ~*-~)
h-1 (y) =
0*
0*
N 0 * ~-1 (y) + ~*-~
0
(J
)
Using (8) and (9) we find
(14.1)
YrY2(Yr)
=
4>[~. ~-l[H(!!- :\y )_~)] + ':-' ]
[.
for
Y
r
in the limits
and
(14.2)
for
Y y (Y )
r 3 r
0*
r
0*
J
9.
Introducing the function (defined for
oszSl)
= ~(A ~-1 (z) +
o(z; A,B)
B)
we have
o
= o(y;
h(y)
o;'c '
and (14.1), (14.2) can be re-written
(14.1)'
for
Yr
~*-;
)
y Y2(Y )
2
r
o
in the limits
0*
o(J:![1'±¥'l-4C]
. .. c:l .,.
Cf
;*-;)
-
(14.2)'
.~
o
Note that the inverse of
o(z; A, B)
o(a; A-I, _BA- l ).
is
The actual significance levels of the tests are then, for the test for
symmetrical censoring
y+
(15.1)
r
J
C
.;;;.a_ _
~*,.-__~
{y_o(
1-0 (y,. !:!.- - - - )
0*'
0*
y-"
±
where
0*
o
~*-~ )}r-l dy
o
~*-~) ,
1I-4Ca ]).0*
'0
; -0-
and, for the "general purpose" test
(15.2)
1
.0* ~*-;)]r + rJ {y-o(C -l+o(y.~ _ ~*-~). 0*
[ o(l-Ca'o
' 0
a
t o*'
0* '0
Y
where
Y
'=
o(l-C .
a'
0*
0
~*-~)
0
~*-~)}r-1 dy
o
10
(Note that these formulae will apply, with appropriate choice of function to
~(o),
replace
to any distribution depending only on a location parameter and
-1 w( x-I; ).)
a
a
Tables 2 and 3 are analogous to Table I, and are baaed on (15.1) and
a scale parameter
a, i.e. with a density function of form
The values of C •
(15.2) respectively.
O 05
used are as follows:
Value of C •
O 05 in
r -
Table 2
Table 3
5
0.08183
0.65741
10
0.02842
0.39416
15
0.01413
0.27940
20
0.00841
0.21611
Table 2: Values of a * (actual significance level) (Nominal level: 5%)
(Test for symmetrical censoring)
.e
/a"~t:-t L-
a*
r-5
r a 10
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
2
0.466
0.461
0.445
0.418
0.381
0.336
0.285
0.230
1
0.050
0.048
0.041
0.043
0.023
0.018
0.014
0.008
1
2
0.002
0.002
0.002
0.001
0.001
2
0.815
0.811
0.801
0.783
0.756
0.719
0.672
0.612
1
0.050
0.047
0.041
0.031
0.021
0.012
0.006
0.003
2
0.937
0.931
0.926
1
0.050
0.047
0.040
1
...
1
2
r-15
2
r-20
2
0.977
0.971
0.969
1
0.050
0.047
0.040
1
"2
11
Table 3:
Values of a* (actual significance level) (Nominal level 5%)
(General purpose test for censoring)
* '~ _' ~* _~ I
a
r=5
r=lO
r=15
.e
r=20
/rJ,\~=
0
0,2.
0.4-
too
\~q.
I. L
2
0.305
0.309
0.323
0.346
0.378
0.420
0.472
0.531
1
0.050
0.056
0.075
0.111
0.165
0.21t1
0.336
0.445
1
2
0.030
0.039
0.123
0.202
0.304
0.421
0.542
2
0.716
0.722
0.740
0.767
0.801
0.838
0.874
0.907
1
0.050
0.060
0.094
0.155
0.247
0.364
0.495
0.623
1
2
0.006
0.011
0.030
0.072
0.146
0.253
0.385
0.525
2
0.893
0.897
0.906
1
0.050
0.063
0.106
1
2
0.001
0.004
0.015
2
0.957
0.958
0.962
1
0.050
0.066
0.116
1
2
0.001
0.002
0.008
In Table 2, it is noteworthy that when
a
= a*
and
*
I~ -~I/a
is fixed,
the actual significance level varies only slowly with r, the apparent sample
size.
In Table 3, we notice that for
to be a minimum value of
larger values of
r
=5
a * as a* /a varies.
,/,
and I~ -~I > 0.5, there appears
There is a similar effect for
r, but it does not show within the confines of this table.
It is clear that for reasonable control'of significance level, a*/a
must be quite close to 1, though some variation (say up to 0.25) in
*
I~ -~I/a
can be tolerated.
12
REFEREI~CES
[1]
Johnson, N.L. (1970)
A general purpose test of censoring of sample
extreme values.
(In Essays in ProbabiZity and Statistias (S.N.
Roy Memorial Volume) ), Chapel Hill, University of North Carolina
Press, pp. 379-384.
[2]
Johnson, N.L. (1971)
Comparison of some tests of sample censoring of
extreme values, Austral. J.
[3]
Johnson, N.L. (1972)
Statist.~
11, 1-6.
Inferences on sample size:
Communiaation in.. pta tistias '. 1, 17':'26.
Sequences of samples,