I
I
.e
I
I
I
I
I
I
SAMPLE CENSORING
by
N. L. Johnson
University of North Carolina
Institute of Statistics Mimeo Series No. 492
October 1966
A discussion of some problems ar1s1ng
when there is doubt 1Jhetl1er complete
sample records are available.
Ie
I
I
I
I
I
I
I
.e
I
This paper was presented at the 12th conference
on Experimental Design in Army Research,
(Gaithersburg, Maryland, October 19-21, 1966).
The work was supported in part by Army Research
Office Contract AROD-4, and in part cy Air Force
Contract AF-AFOSR-76o-65.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
1.
Introduction
There are currently available a number of methods designed to reduce
the possible effects of "wild" ("maverick") observations on the analysis of
sample values.
Among these maybe mentioned "trimming" and "Winsorisation".
These methods involve the possible or sometimes automatic exclusion of
extreme values among those observed.
Apart from these methods, for which
appropriate statistical analYses, taking proper account of the omission of
sample values, are available, samples may be incomplete owing to
inadequate recording, or, unfortunately, biassed selection of values which
accord best with some preconceived ideas or desires.
While, under properly regulated conditions, information on any censoring
of sample values should accompany the records of the values themselves, this
is not always the case.
Indeed, with the last situation described with the
preceding paragraph, such information is not to be expectedj but also,
even in more respectable cases, information may be omitted by negligence.
Th.e problems to be considered in this paper are those arising when it is
suspected that there has been some form of censoring of the original sample.
Complete, and reasonably tidy solutiona are obtained only on the assumption
that the population distribution of an observed character is known.
study of this situation does give some clue as to what can be done when
knowledge of the population distribution is incomplete.
Problems of a similar kind have been discussed in an earlier paper [lJ.
They were of a rather simple nature in that there was usually a direct
choice between two possible sample sizes.
.-
I
However,
1
I
I
2.
-.
Formal Statement of Problem
It will be supposed that there are available r observations of a
character (X) which may be regarded as observed values of random variables
xl' x ' ••• ,xr •
2
These are a sub-set of the n (~r) variables
xl', ~', ••• , x'n corresponding to a complete random sample of (unknown)
size n.
If r
= n,
then the 'sub-set' is identical with the complete sample.
We will be interested in testing whether this is, in fact, the case.
Various
kinds of alternatives, specifying different kinds of censoring which might
be applied to the complete sample, can be considered.
Certain special
kinds of censoring have been discussed in earlier pap~s [2J[3J, and the
results of these investigations will be summarized in Sections 3 and 4.
Then,
in Section 5, we will consider problems associated with general types of
censoring.
Certain practical problems arising in application of the tests
described in Sections 3, 4, and 5 will be discussed in Section 6.
Discussion will be restricted to situations in which Xl"
x ', ••• ,
2
can be regarded as n independent continous random variables, with. a
known common probability density function, represented by f(x).
x~
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
2
I
I
I.
I
I
I
I
I
I
I
3.
Symmetrical Censoring of Extremes
VIe
will suppose that if censoring occurs it takes the form of exclusion
of the s greatest and s least among the original n sample values.
Then
X{, ~~ ••• , x~ are the r central values among an original set of
n(=r+2s) values.
Denoting this hypothesis by H
the joint probability
s,s
S x2
density function of the r ordered variables Xl
being a
rearrangement of xl' x 2'
... ,
x~
~
... ~ xr (these
in increasing order of magnitude)
is:
(1) p(xl ' x2 ' ••. , x
r
IH
s,s
) =
(r+2s)'
s
sr
2' [F(Xl )] Cl-F(Xr )] n F(x.)
(s~)
j=l
J
(Xl -< x2 -< •.• -< x r )
.4
where
F(x) =
J_
co
f(x)
ax
Ie
The hypothesis that there has been no censoring and therefore that the
I
I
I
For brevity this will be denoted by H •
I
I
I
I
••
I
complete sample is available is, in the notation already introduced, H
0,0
o
The most powerful test of F
against the alternative H
has a
s,s
critical (rejection) region of the form.
o
p(X , ••• , X
r
l
where C is a constant.
I Hs,s )
>
-
Cp(X , ••• , x
r
l
I HO)
Whatever be the value of s, inequality (2) can be
written in the form.
where K is a constant.
Inequality (3) does not depend on s, so the test
defined by this inequality is uniformly most powerful with respect to
H
for all s > OJ ie. with respect to any symmetrical censoring of the
s,s
extremes of the sample values. The value of K must be chosen to give a
required level of significence, ex say, when H 0
3
is true.
This value
I
depends on Q and r, and may be denoted by K(Q,r).
Table 1 gives a few values of K(Q,r).
r
~
Then
For
10 the approximations
K(O.lO,r) ~ 2.65(r+l.5)-2
.
K(0.05,r) ~ 4.1(r+2)-2
K( O.Ol,r ) ~•
9.2 ( r+3.5 )-2
give useful results.
Mathematical analysis connected with the determination
of K(a,r) is contained in Appendix I.
A discussion of the evaluation of the power of this test is contained
in Appendix II.
Table 1 Upper 100 ex
7 Significance Limits of F(X1)[1 - F(Xr )]
0
0.05
0.01
2
0.207
0.235
3
4
0.150
0.109
0.195
0.156
r"Z
5
6
0.0822
0.125
0.0633
0.101
7
8
0.0503
0.0408
0.0830
9
0.0338
0.0585
.-I
I
I
I
I
I
I
_I
I
I
I
I
0.0692
I
I
I
4
••I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
4. General Censoring of Extremes
If the requirement of symmetry is dropped we need to consider hypotheses
of form Hs s corresponding to exclusion of the s smallest and s largest
0' r
0
r
individuals in the original sample, vnth So and sr not necessarily equal.
In this case there is no longer a uniformly most powerful test of H •
O
There is a uniformly most powerful test of H with respect to the subclass
O
Hes ,s
r
••
I
in which s 0 Is r (=
e)
is constant.
It has a critical region of form
[If sr = 0, we t~~e 13 = ~
and replace (5) by F(X ) ~ constant]
l
To obtain a significance level equal to CX, the value:Jf
K(o:, r, e), given
H is valid (Le. there is no censoring), IlDlSt make the probability that
o
inequality (5) is satisfied equal to cx.
In [3] a heuristic method proposed
by S. N. Roy [4] is applied to suggest a possible test of H with respect
o
to all al ternative hypotheses of type H
s , s
o
(for any values of s
r
0
and s ).
This calls for construction of the union of regions lite (5) with
0: ~o:', over all values of
e.
Points (F(xl)' F(xr ) ) on the boundary of
the critical region IlDlSt satisfy the equations.
(6.1)
I
I
I
I
r
(6.2)
From
-A
{[F(Xl)]e[l - F(xr)])=OK ( 0: , r, 13
r
13
(6.1) and (6.2) it follows that
log F(X ) = d log K(O:', r,e)/o 13
l
r,e) is known, F(X ) can be found from (6.3) and then F(x l ) is
l
(6.3)
If
K(o:',
determined by (6.1).
However exPlicit evaluation of K( 0:', r,
5
e)
is
r
I
tro~~blesome,
and approximate methods were used in [.3] leading to the simple
(through approximate) formula:
F(X l ) + [1 - F(x ) ] ~ K (a, r)
r
l
for the union of critical regions. Here K ( a,r) represents a constant
l
(7)
which can be chosen to give a required value, a
say, for the significance
(Note that a' appears only in the construction of ('7); it is not
level.
the significance level of the resultant test.)
Although an approximate argument, applying a heuristic principle has
been used in reaching
(7),
the critical region so obtained has a natural
appeal; and seems worthy of further consideration.
The distribution theory associated with the critical region (7) is
If H
is valid then F(x ) + [1 - F(x )] is distributed
s o ,s r
1
r
2
2
~
2
as ~(s + s + 2)/(~(s + s + 2) + X2(r-l» where X2(s + s + 1) and
oro
r
0
r
~(r-l) are mutually independent. (Equivalently, the distribution is a beta
very simple.
distribution with parameters (s
s
o
+ s
= s r = O)that
o
(8)
Kl(a,r)
2, (r-l).
= upper
100 a
+ 2), (r-l).)
r
It follows (putting
7o point of beta distribution with parameters
These values can be obtained from Table 16 of [6].
The power of the test with respect to a specified alternative hypothesis
H
s , s
o
is also easily calculated.
r
In fact
= l-I K (s
=
where I
p
(M,N) = [B(M,N)r
1
Jo~
+ s
+2, r-l)
lor
I 1 _K (r-l, So + sr + 2)
1
tM-\l_t)N-l dt is the incomplete beta
function ratio.
.'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
••
6
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
For given s
o
and s , as r tends to infinity the
r
2
Pr[~(s
(10)
o
+ s
pmJer tends to
,2]
r
+ 2) ;::: X4,1-ex
(,;[here x2 1
v, -exdenotes the upper 100 ex
7 point of the distribution of
0
with v degrees of freedon ).
A feH values of the pOller are shown in Table 2.
It appears that the
asymDtotic (r ~ ~) va]ues give a good indication of true value for
r
> 30.
Table 2
~s
POHer
so +
r
=
r
= )+
r =
r
S
=
2
= 0.05)
of the general purpose test (ex
s
0'
r
6
18
10
14
0.862
0.167
0.4'70
0.716
30
0.281
0.845
0.989
~
0.303
0.892
0.996
0.938
A special case of sone interest arises when censoring at one extreme
':)nly is suspected (Le. So = 0 or sr = 0).
In this case the uniformly
most pOHerful test has the critical region
y
r
< ~/r
(if s
o
= 0)
or
Yl > l-a
l/r
(if sr
= 0)
The pOl"er of the test ,'lith critical region y
to the alternative H 's is
o r
7
r
,< ex
l/r
Hith respect
A(H
....
o's
)
r
=
(r+s)!
r
(r-1)!s!
r
f
l/ r
1
s
yr- (l-y) r dy
0
= I l/r(r, Sr +1)
0:
(where I denotes the incomplete beta function ratio).
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
8
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
5.
General Censoring
We first introduce the notation H
s o ,sl' ••• ,s r
hypothesis that Sj observations have been removed between x. 1 and x.
JJ
for j=l, 2, ••• ,(r+l) with xo
= _00
,xr+1
= +00
•
In this notation
considered in Sections 3 and 4 would be H
the H
s
s ,s
o r
o
,0,
0,
••• ,
0,
s
r
Also, for convenience we will write
y.
J
(11)
Yo
= F(x.)
(j
J
= 1,
•.• r)
= 0,' Yr+l-- 1
Then the best critical region for testing the hypothesis of no censoring
(H
0, 0,
••• ,
(12)
0, 0
) against the alternative H
s ,
o
s.
(Yj +l - Yj) J
is of form
...
> K(a, r, so' sl'
, sr )
j=o
It is clear that there is a uniformly most powerful test with respect to
fi
-
any set of alternatives H
so' sl' •.. , s r
for which the ratios
• sr are constant, but not with respect to any other sets
of alternatives.
principle, as in
While one could attempt to apply Roy's heuristic
B J, to construct a general purpose critical region for
the whole set of alternatives H
s o ,sl' ... , s r
the effect of approximations
might well be much more important in the more general case, and is certainly
more difficult to gauge.
We therefore consider more or less arbitrarily
chosen criteria which, however, do have some relation to criteria suggested
from theoretical considerations •
•e
I
to denote the
9
I
I
We first consider a test vnth critical region
r
(13)
g =.II
J=O
(Y'+ l - yJ > K2 (a, r) = K2
J
J
It is quite likely that this criterion may be felt to have some practical
These will be discussed in Section 6, but for the present we
drawbacks.
\'Till just consider how to evaluate ~ in (13), at any rate approximately.
It will be convenient to approximate to the distribution of log g,
rather than of g itself.
The moment generating function of log g, when
Hs ,
Sl ••• , sr is valid, is
r
o
... ,
=(r+&sj)!
s ]
x
r
r
n s.
j=o J
x
I
I
I
I
I
I
_I
:s y, :s Y2 :s ... :s Yr .::.: L
[The region of integration is 0
-.
Remember that Yo = 0 and Yl = ~]
Since the joint probability density function of Y ' •.. , Y is
l
r
r
I Hs, s 1.'.' s
r
p(Yl , ••• Y
o
r(r+l+I: s )
)
r
=
o
j
r
II r(S.+l)
j=o
J
r
s.
II (y.+l-Y.) J
j=o J
J
it follows that
s.+t
(Y j +l -Y j ) J
dyl ••. dYr
=
r
r(r+l+I:s )
o j
and hence from (14) and (15)
(16)
E[gt I H
so' sl'
... ,
s ]
r
r
r
II r(t+s.+l)
(r+I:s .)!
o J
j=o
J
=
r
-r
r «r+l)(t+l) + t s )
II Sj!
o j
j=o
10
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Taldng logarithms and differentiating, the fol101·linc; expression for the
mth cUL1Ulant of log g is obtained:
(17)
/)n(log glH s ,S'I'
:J
_
"',
= ~ y(m-l)(s.+J_)
s )
r
_ (r+l)my(m-l) (r+l+E
J
j=o
In ::)articular when the null hypothesis H (== H
O
(18)
K (log glH)
m
°
=
s.)
j=o ,1
(r+1) y(m-i)(1)
0,0, ••• ,
.:)
) is valid
_ (r+l)m y(m-l)(r+1).
The polygamma functions have the values
Y (1) =-7 = - 0.57722
and Y (m-l)(l)
where S
m
=
= (_l)m(m_l)!
1 + 2
-m
+ 3
-m
S
m
(m
~ 2)
+...
Hence
Ie
I
I
I
I
I
I
I
I·
I
(m 2: 2)
For z not too small, we have, to a good approximation
.
(20.1)
Y (z)
(20.2)
y(m) (z) • (_1)11-1 (m-l)! (z - 1/2)-m (m 2: 1)
; log (z - 1/2)
i-Thence
(21.1)
Kl(-log glHo)
(r+l)(0.57722 + log (r+ 1/2) )
(21.2)
Km(-log glHo)
(r+l)(m-l)! [S
11
m
- (m-1)-1(r+l)/(r+ 1/2))m-l]
I
I
Noting that
(i)
the least possible value of (-log g) is (r+l) log (r+l),
corresponding to Yj
(ii)
=j
/
(r+l) for j=l, 2, "', r
[K3
(-log glHo) ]2
(2S -1)2
3
[K
(-log glH ) ]3
(r+1) (S2 _1)3
2
o
=
'7.35
r + 1
and
K4 (-log g IHo)
K
2
(-log g IH )
O
6S 4 -2
(r+l)(S2-1)2
(while for -l with (r+1) degrees of
=
10.80
r + 1
freedom,K~/~ = 8/(r+l) and
2
K4/K2 = 12/(r+l) )
o) • 0.645(r+l)
. vTl1ile var(0.57722 -lr+l) = 0.666(r+l)
(iii) var(-log gl H
2
-log g - (r+l) log (r+l) to be distributed as 0.57722 x (X witl1
(r~l)
(22)'
degrees of freedom) or, equivalently
1.732 [-log g - (r+l) log (r+1)] to be distributed as X2 with
(r+l) degrees of freedom.
exp
This implies
[-l r+l,a/l. 732]
(r+l)r+l
where
2
X r+l' a is the lower
2
100 a / o point of the distribution of x with (r+l)
degrees of freedom.
(If -log g - (r+l) log (r+l) is approximated by 0.5587 x21.0332(r+1)' then
means and variances agree while the values of
12
tS/ ~ and
I
I
I
I
I
I
_I
it appears that we might take, as an approximati::m,
(22)
-.
4/ { for the
K
I
I
I
I
I
I
I
-.
I
I
I
••
I
I
I
I
I
I
Ie
I
I
approximating distribution are 7.74(r+l)-1 and 11.61(r+l)-1.)
These approximations cannot be expected to be good unless r is fairly
large.
In the extreme case r = 1 with g = Yl (l-Yl) we have exactly
(23.1)
o ] = (1_4G)1/2
Pr[g > GIH
~
G ~ 1/4)
while (22) gives
The
a~proximation
(23.2) is substantially less than the true value (23.1)
though it does have the correct limits (1 and 0) as G tends to 0 or 1/4.
In order to assess the power of this test we return to equation (17).
This
gives the cumulants of log g .'Then a general alternative hypothesis H
s o,sl"'" s r
is valid. It would seem reasonable to fit the distribution of
[-log g - (r+l) log (r+l) ] by that of a multiple of
and second moments agree.
X2 , so that first
It may be that better approximations to upper
percentage points of -log g would be obtained by fitting the first
three moments (instead of the initial point and first two moments - see [4]).
This method might therefore be employed when the pmoJ'er is, say, above 0.75.
I
I
I
I
I
••
I
(0
13
6.
I
I
Modified Tests
The test criteria described above are all based on the probability
integral transformation
(24)
IX
y
= L~f(X)dX.
They explicitly assume that f(x) is known exactly (in practice to a close
approximation) and that there are no errors in observation of x.
This
last condition is never satisfied when x is a continuous variable.
There
is always some kind of grouping error occasioned by the finiteness of the
number of digits used in recording the observations.
This is particularly
important in relation to test functions like g of (13) in Section 5.
If
it so happens that any two of the y's are equal the value of g is zero
and the null hypothesis will be accepted.
Clearly, if this happens because
of the use of too coarse a grouping interval, the test is likely to be
very insensitive.
Furthermore, the larger r is, the more likely it is that
at least t,vo x' s (and so tuo y' s) wi 11 be equal, thus giving ri se to a zero
value for g.
We are thus led to consider modified tests, less sensitive
to this kind of effect.
A simple way of effecting this is to use only a
selected number of the transformed order statistics Yl' Y2' ••• , Yr
... ,
y
a 1~
(with the value s 1 :.:: a
l
< a
2
<: ••• < a
k
< r fixed
before analysing the data, of course) and to apply a test with critical region
k+l
ga = II (Ya
j
j=o
with Y
'11:+1
= 1,
The value of
I<,
Ya
=0
0
(A natural choice would be to take the a's at
equal intervals apart.)
depends on the required signifiCance level,CX , and also on
the selected a '.s, as well as on r.
J
In fact the distribution::Jf g, when
a
14
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
H is valid, is the same as that of g, with r replaced by 1:, \Then
o
H i s valid and wi tIl
So' sl' "', s1:
S.
,J
= a J• +l-aJ.
- 1
(j=O, 1,2, .•• , k)
hence, the same calculations as tl1:lse needed to evaluate the ::JO'I'Ter of the
test using e; are required in calculating the value
K3
in (25).
Also,
of coursc, calculation of the pOITer of the test with critical region
(25) vdll follow the same lines.
A si1ular kind of modification can be applied to tests of nymmetrical
censoring of extremes (Section 3 ).
In this case it would be natural to
ignore the least and greatest m observations, and use only Yl.1+J., ••• Yr - m'
The unifornuy most powerful test of H against symmetrical alternatives
o
H
has a critical region form similar to (3), vi2:
s,s
Ie
I
I
I
I
I
I
I
•e
I
Determination of K is, hOvTever, more troublesome than for K.
4
The
equatior.
r!
(27)
JJ
m (
_
)r-2m-2(l_
)mn~T
d
Ym+l Yr - m Ym+j
Yr - m "'V m+l Yr - m
=
ex
(iThere the region of integration is Ym+l (l-Yr _m) .::: Kl j';
° .:s Ym+l .:s Yr - m .:s 1)
has to bc satisfied.
Evaluation of the integral of the left hand side, with K replaced by K,
4
gives the power of the test with critical region (3) with respect to the
alternative hypothesis H •
m,m
relevant to this problem•
~le
notes in Appendix II are therefore
15
7.
I
I
Conditions of Applicabilit;v
It may be felt that the condition stated at the beginning of Section
G,
namely that the true probability density function f(x) rmlst be known, is
unlikely to be satisfied in practice.
While this is so, in the strict sense
that it is very rarely the case that a theoretically fornmlated model gives
an exact representation of reality, it will sometimes be the case that there
is sufficiently massive evidence to establish f(x), from observed relative
frequencies, with adequate accuracy.
Slight variations in form of f(x) can be
tolerated without serious effect, particularly if a modified test of the type
described in Section 6 is used.
It may be noted that it is not essential
that f(x) have a simple, or indeed any explicit, mathematical form - a
graphical representation can suffice.
It would, however, be interesting, but beyond the scope of the present
investigatio~
to inquire into the robustness of these tests with respect
to variation in f(x).
(i.e. to use of an incorrect function, fl(X) say, in
(24) ).
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
16
I
I
I
.e
I
I
I·
I
I
I
Ie
I
I
I
I
I
I
I
REFERENCES
[lJ
Johnson, N. L. (1962)
"Estimation of sample size", Technometrics,
4, 59-67.
[2J
Johnson, N. L. (1966)
"Tests of sample censoring"
Tech. Conf. ASQC, 699-703.
[3J
JohIilson, N. L. (1966) "A general purpose test of censoring of
extreme sample values" , Submitted to S. N. Roy Memorial
Volume.
[4J
Pearson, E. S. (1959)
of non-central
[5J
Roy, S. N. (1957)
New York:
[6]
"Note on an approximation to the distribution
J?-",
Biometrika) 46,364
Some Aspects of Multivariate Analysis, Wiley.
Indian Statistical Institute, Calcutta.
Biometrika Tables for Statisticians (Ed. E.S. Pearson and H.O. Hartley)
Cambridge University Press 1958.
.e
I
Proc. 20 th
1'(
I
I
Appendix I
We have to consider the evaluation of K(a,r) from equation
Putting Yj
=
(4).
F(X j ) (as in (11) ), the joint probability density
of Y and Y , given HO' is
1
r
(A.l)
p(Y , Y I H ) = r(r-l)(y -y )r-2
o
r l
l
r
Hence K(a,r) (now written as K for convenience) satisfies the equation
(A.2)
r(r-l)
JJ
(Yr - y l )r-2
dyldYr
=
a
Yl (l-y) ~ K
The region yl(l-y ) > K can be defined by the inequalities
r
-
Yl ::S Y S l-K/Y l and these imply also
r
l-Yl - K/Yl ~ 0 or Y- ,:S Yl
where Y±
:s Y+
= [1 ±/1-4lC]j2
Hence from (A.2)
(A.3)
r
r Y+
JY-
(l-Ky-l_y/- l dy
=a
Expanding the integrand and integrating term by term leads to the equation
r-l
~ " r-j-l
(A.lj.)
r I: (rj-l)(_l)JKJ I:
(r-~-l) (_l)i h. "+1(/1-4K)
~
~-J
j=o
i=o
Where
h (z)
= log (~)
l-z
h (z)
= 2- m m-l[(l+Z)m_(l_Z)m]
o
m
Note that for m > 0,
(A.5)
h (h-4K)
m
(_l)j(m-l- j ) Kj ]Jl-4K
o,:Sj
j
= [m- l I:
:s
=
rb -m
(m-l)/2
(ll-hK)
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
i
I
I
I
•e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•e
I
For r
= 2(1) 9,
the left hand side of (A.4) is shown in Table A.l below•
Table A.l
r
1 + ,.I"f.:4'K
- log =---..;;;;...~.;;,;;...
vl-4K
=
1-
2
1
2K
3
1 ... 8K
6K
1 +
4
1 ... 166
5
3
7
12K(l + K)
26K
K + 128
3
~
20K(1 + 3K)
1
2
~
1 + 97K + 2a:;·If
6
x
../l-hK
30K ( 1 + 6 K + 2K )
i!
+ 759 K + 3558
+ 1021~
555
8
1 + llQ.S K + 8654 ~ + 7492
555
9
1+ 10618 K + 125634
35
35
~
42K (1 + 10K + lO~)
56K (1 + 15( + 30~ + 50)
72K(1+21K+7o~ ..;- 3;K)
~
+ 218044 I2 + 32768 K4
35
35
(For exa.rqple for r
=
3, (1+8K) .;r::riK -
6K log 1 +
"'l-4K
1 -
)1
=a. )
-hK
The calculations rapidly become more complicated as r increases.
It
therefore is desirable to search for some approximation to K(O:,r) which will
give useful results for r large ( and preferably for r
empirical forIml1ae have been given in Section 3.
approach, starting from equation
CA.3).
~
10).
Some
Here we use an analytical
We first make a succession of
transformations, aimed at obtaining an integrand for which useful bounds can
be set •
Firstly, putting y
= z.JK
ii
l
Y
+ (l_Ky-l_y)r-l
dy
=
JK
l
Y_·
A(K) (l-IK (z -1 + z) )r-l da
l/A(K)
where A(K) =(1/2) (l-+Jr:IiK)/JK •
= et
Next making the transformation z
the integral becomes
/K flOg A(K) et (l-ff (et + e -t) )r-l dt
-log A(K)
=JK" flOg A(K)
-log A(K)
e -t (l-IK (et + e -t) )r-l dt
= 2JK
(A.6)
flOg A(K)(1-2/K"cosh t{-\OSh t dt.
o
Now making the transformation v
1
(A. 7)
(l_v)r-l
2JK
1
= 2IKcoSh
(l - 4K) -1/2 v
t, we obtain
dv.
Integrating by parts, this is equal to
(A.8)
(1_v)r-2(v2 - 4K)1/2 dv.
(r-l) f l
Thus equation (A.3) can be written
12JK
1
(v2 _ 4K)1/2 (1_v)r-2 dv= ex
~~ing the final transformation v
we obtain
(A.9)
r(r-l)(1-2JK)r- 1/2
= 2~
1
1
+ (1-2JK:)u
((l-2JK)u2 +
2JK
u ) 1/2
o
, (1_U)r-2 du
= ex
Since
12/K ru < (1-2Jif")U
2
+ 2lKur!
~jl-21K u ~21K,J;"'
it follows that
(A. 10)
::;
(1_2./K/-l/2k
tJ1f
r(r+l)
r(r+l/2)
<
-
ex
(1_21K)r-1/2[~1_ 2.fTs{ + 2JK: ~ r(r+l) ]
r(r+l/2)
iii
-.
I
I
I
I
I
I
_I
2/K
r(r-l)
I
I
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I·
I
I
I
Ie
I
I
I
I
I
I
I
As can be deduced by direct analysis, K--1
as M
00 J but since
(1-21K")r:s a
it
f~llows
l r
that K ~ 1/4(1- a / )2
and hence Kr 2 cannot tend to zero.
If we put K
=
Cr -2 (where C is, of course a function of r and a )
the~ a~proximately
e -2JC.j;(f2 C1/ 4
(A.ll)
This i:r.rplies that C lies between fixed limits, and suggests that, for
-2.
large r, K is of the form C r
( The form of function - C ( r+D )-2 1
1
used as an approximation to K in Section 3 was suggested by this analysis.)
An alternative, heuristic approach is as follows:
If H be valid, Yl(l-y ) is distributed as uv/(u + v + w)2 where
r
O
u, v and '1;1 are independent
random variables with 2, 2, 2(r-l) degrees
l
of freedom respectively.
If r is large
pr~u+~:,,,)2 > ~] ~
Pr[uv > 4c]
(since w/[2(r-l)] - 1 as r ~ 00).
Hence we have K - Cr-
the equation
J
oo
1
'2
exp(- ~ - 2C/u) du
=a .
o
.I
0
iv
2
where C satisfies
I
I
Appendix II
The joint probability density function of Yl and Yr ' when Hs ,s is
o r
valid, is
(r-l-s -I- S )~
o
r
(A. 12)
= s o ~(r-2)~ s r :
Hence
(r+s
(A.13)
=
o
-I- S
r
)
~
s o nr-2)~s r !
s
s
Noting that (l-y ) r = ( (l-y ) - (y -y»)
r
1
r
1
r we see that the integral in
(A.~)
is equal to
s
.
Y+ s
r
sr
.
sr-J
1
r 2
(r+j-l)- (l-K/yl-y l ) - dyl
Yl 0 ~ (. )(-l)J(l-Yl)
Y
j=o J
- y
s
S
s -j
+ s
r
r
j
r
1 r-2 r 2
."
2 .
=
Yl 0 ~ (j )(-1) (l-y .)
(r+j-l)- ~ (~)(-l)~rYJ-~(l-Yl)r- -~ dY
l
1
y.
j=o
i=o
~
-
1
l
sr r-2
(A. 14 )
=
~
~
'l
s
. r
2
(-l)j+~(j )(r~ )(i-l-j-l)-~
~
j=o i=o
Y+ '
s +r-2-i-j
Y s-~(l_Y ) r
dY l
Y 1
1
Using (A.5) this can be expressed explicitly in terms of K.
The reSUlting
formula is rather cumbersome, and does not give much insight into the
dependence
of power on s 0 and s.
.
r
The following alternative approach,
although it depends on some quite rough approximations, should give a
reasonably accurate idea of the nature of this dependence, \4hen r is
large compared with s
o
-,
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
and s •
r
v
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
From (17) it follows that
(A.15)
K (-log(Yl(l-Y )))
m
r
= (_l)m[ y(m-l)(s +1) + y(m-l)(r) +y(m-l)(s +1)
a
r
- 3m y(m-l)(s
+ s
a
+ r + 2) ]
r
Using the approximate formula (20) we obtain
s
o -1
27 - I: j
j=l
s
r
- I: j
j=l
-1
-log(r-l/2)
+ 3 log(s +s +r+3/2)
o r
and, for m > 2
co
co
(m-l)! [I:
j-m + I:
j-m
j=s o +1
.23 1
J r+
+(m_l)(r_l/2)m-l)-1
- 3m (m-l)(s
o
+ s ~"+5/2)m-l)-1
r
If r is large; then for the smaller values of m( > 2)
co
co
(A.16.3)
. (m-l)! [I:
j-m+ I:
j-m]
~=s +1
j =s o +1
tJ
r
m(-log (Yl(l-yr )})
K
Note that r does nat appear in this approximation.
In particular, taking m = 2
co
•
(A.17)
I:
co
j -2 +
. j=s +1
o
I:
j -2
j=s +1
r
The variance decreases as sand/or s increases.
o
r
The expected value
(K ) also decreases.
l
A further approximation to (A.16.3) gives
(A.IB.l)
K (-lag (yl(l-y )));
m
r
•
and in particular
vi
(m-2)! [(s
a
+ 1/2)-(m-l)+(s +1/2)-(m-l)]
r
I
_.
(A.18.2)
If s
o
= s r = s,
formula (A.IS.l) becomes
2(m-2)~ (S+1/2)-(m-l)
K (-log(Yl(l-y »))
m
r
while (A.16.1) becomes
~ j-l_lOg(r_l/2) + 3 log(2s+r+3/2)
j=l
If r is large this last equation may be replaced by
s
(A.19.3) Kl(-log(Yl(l-y») ~ 2., -2 1: j-l + 2 log(r+3s+5/2)
r.
j=l
If s increases to s+l, K, decreases by approximately
(A. 19.2)
Kl(-log(Yl(l-y»))
•
2j-2
2(S+1)-1 -6(r+3 s +5/2)-1
It is not suggested that it vnll always be appropriate to use these approximations,particularly those appearing later, which depend heavily on r being
large compared with s
o
and s.
r
The approximations are exhibited because
they bring out rather clearly the way the distribution of -log(Yl(l-Yr»)
depends on s
o
and s •
r
I
I
I
I
I
I
I
el
I
I,
I
I
I
I
I
-.
vii
I
© Copyright 2026 Paperzz