Johnson, N.L.; (1971)Inferences on sample size: sequences of samples."

*
This research was partially supported by the Army Research Office,
Durham, under Grant No. DAHC04-?1-C-0042.
INFERENCES ON SAMPLE SIZE: SEQUENCES OF SAMPLES*
by
N. L. Johnson
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 784
No ve.mb vz., 1971
"-
Inferences on Sample Size: Sequences of Samples
By
N. L. JOHNSON
University of North Carolina at Chapel Hill
1.
Introduction
In earlier papers [1]-(3], various aspects of analysis of possibly incom-
plete random samples have been discussed.
from a single random sample only.
These analyses all apply to data
The present paper describes some extensions
of these methods when sets of samples are available.
When more than one sample is available, the field of
hypothes~s.
. nate to that of having complete oamples, becomes much richer.
alter-
Some of the
more interesting possible situations are discussed, though no exhaustive general
theory is developed.
A secondary aim of this paper is to lay foundations for later extension
. of the methods to cases when the analytical forms of the distribution(s) of
observed random variable(s) are not completely known.
In such cases it is
almoat essential to have a number of samples; useful results can hardly be
expected from a single sample (even if it is quite large).
Techniques for
such problems are not developed in the present paper, but knowledge of methods
appropriate when population distribution is known is an essential preliminary
to
development of such techniques.
2. Notation and PreliminarY Formulae
As in the earlier papers. it will be supposed that observed values of
independent continuous random
.e
function
r
1
f(t)
v~ri3.bl~8
are being used.
ordered values
X
SX
1l
i2
with a common.(pop1l1ation) density
Tne 1-th eample
S••• SX
iri
•
(1-1,2, ••• ,m)
eompr1ses
Such censoring as may have occurred i.
2
e
supposed limited to censoring of extreme values, in which the
least and
iO
Siri greatest values of an original. complete sample of size n • r + 8
i
10
i
have been omitted, leaving the r
+8
observed values.
i
i ri
.
8
The (ordered) probability integral transforms
X
ij
(1)
f__
Yij •
f(t)dt
have the joint density function
11
IT
(2)
-(r +8 +Sir1) I
i 1o
i-l _sio l sir !
i
The symbols
1f/(x)
will denote the diga:3& function of argtunent x,
vex) - ~
(log r(x»(-
Successive further derivatives,
(1)
:x
(x).,
log (z-l)I)
(2)
(x)....
are the triSamm&,
tetragamma . • •• functions.
3. Estimation of Sample Size
In [1]. problems of estiQ3t1cn of total size of a random sample. given
least (or greatest) values observed in the sample, were discussed.
the
r
Bar~
the.. r.sults are exteneled to the case
wh~
it is known that
11
eemples
3
all have the same original size.
~
n.
but only the least
are recorded in the first. second....
r i .r2 ••••• r
values
m
In the
m-th samples respectively.
notation of Section 2. this means that
iO - 0;
sir - n-ri •
i
From the joint likelihood function of the ordered X's
,-!
n
l(~ln) • l(Xll•••• 'X
(4)
8
(~
r-r.,
In).
mrm
i-l
n-r
n ri
(l-Y ir )
i
}I
ri
i
r-r
j-l
.
f(X ij )
we seek to obtain a maximum likelihood estimator of n.
Regarding n
as continuously variable. we obtain the equation
m lJI~ft+l) -.
(5)
m
r 1ft(ft-ri +l) i-l
m
loa[
r-r (l-Y ir »)
i-l
for the maximum likelihood estimator ft.
i
approximate value of
An
n
can be
ob tuned by making
which gives
(6)
equals 1 (whi~ has probability zero) equation (6) has a
. Provided no Y
iri
1
unique root greater than max(rp .'" rm) - 2" The appropriate integer value
for
1
n is that between (n - 2)
and
(ft
1
+ 2).
(If these are integers, either
can be used.)
If
(5) ,
r.1 • r 2 - ••• - r m'
1ft(&
i.e.
+ 1) -
then (5) becomes
~(n -
.
r-l
r + 1) - m-1 log [
r (ft j-O
1
j) - - m-
n
1
j-l
(1 - Yi )]
nj-l (l -
r
III
log [
Yir) ] •
4
In this case, (6) becomes
(6) ,
which, for
Ill·
gives
A.
(6) "
n "r
The
D
1,
y-1
1r -
1
2' •
Cramer-Rao lower bound for the variance of an unbiased estimator of
1s
(7)
For
••. • r III • r,
this is
(7) ,
Y
ir1 has a beta distribution with parameters
(8.1)
and
(8.2)
From
(9.1)
and
(9.2)
(~)"
we see that, for
E[a] ••
r(r-1
m· 1
)-1
1
n - 2'
5
From ( 9.1 ) we see that there is a bias of about
that the true value of E[n]
1
(r-l) -1 n - 2
.
(Note
cannot differ from (9.1) by more than 1).
Table 1 contains approximate values of the variance and mean square
error of
1\
as given by (6)". and also values of the Cram6r-Rao
lower bound
(from(7) with m-l).
Table 1: Aperoximate Variance and Mean Square Error
of n. and Cramer-Rao Lower Bounds
m· 1
(Approximate)
r
n
4
4
6
8
10'
12
15
Var(n)
3.56
16.00
35.56
62.22
96.00
160.00
M.S.E.(A)
4.25
18.25
40.25
70.25
108.25
180.25
(Cramer-Rao Lower
Bound) X m
Efficiency (X) of
N(m-1•••• ,m-1)
0.7024
4.1427
33
46.
9.6329
17.1295
26.6279
44.6267
48
49
50
50
45
2.16
8.64
18.00
30.24
54.00
2.65
9.85
20.25
33.85
60.25
0.6705
3.6046
7.9267
13.5892
24.5866
1.74
6.67
2.15
49
12
15
13.06
15.53
0.6547
3.3359
7.0739
26.12
28.82
14.5893
73
10
1.54
1.91
12
15
5.5
13.89
12
12
15
15
15
6
6
8
10
12
15
8
8
10
10
7.53
60
63
65
65
67
71
6.:!5
15.25
0.6453
3.1748
8.5595
52
11
76
1.43
7.14
1.78
7.89·
0.6390
4.5594
53
76
.1.32
1.65
0.6327
55
In view of the above results it seems worthwhile to seek some alternative
estimator for n.
6
is an unbaised estimator of n with variance
(10)
is an unbiased estimator of n.
ai
proportional to
(ri -2) (n...ri+l)
to calculate this value of
take a i
The variance of N(·)
proportional to
ai •
...1
•
is minimized by taking
As n is not known. it 1s not possible
For a first approximation it is reasonable to
r i ... 2.
or even just to take
r l • r 2 • ••• • r m).
Table 2 gives some numerical comparisons between
(which i8. of course. optimal if
and
(12)
». nm...2
var(N(.-1......-1
m
r (n-r +1) (r ...2) -1
i
1
L
1-1
m
r
• n(n-1)
(r ... 2)-1 ... ~
2
m
1-1 i
•
••• • aII •
-1
II
7
Table 2: Variances of (a) N(al""'~)
(b)
m :I 2 r
1
9
2
4
4
6
6
8
10
8
6
8
10
6
5
6
7
8
m- 3
-e
m-4
e
r
r3
-
10
5
4
7
6
10
8
4
6
6
8
8
6
6
6
6
8
10
8
8
10
10
5
4
7
6
10
8
10
8
8
6
8
(b)
-
8
10
(a)
2
• 2-0.7083n
•
0.2n -0.72n
0.2083n
• 2-O.72n
•
0.16n
0.1875n2-0.6875n
• 2
O.ln -0. 6173n
0.1125n2-0.6125n
• 2-0.60416n
•
O.ln2-0.62n
0.10416n
O. 0769n2-0. 57990
0.071402-0.5774n
2
0.0714n -0.58160
0.07290 2-0.57290
,
..
•
0.125n2
0.6250
• 2
•
0.0830
0.5830
0.0625n2 - 0.5625n
5
5
1
7
10
10
10
5
5
7
1
10
10
10
10
10
6
8
r4
N(m-1 .....m-1)
0.125n2-0.4687S0
0.1296n2-0.4630n
0.1428n 2-0.4898n
0.1482n2-0.4815n
0.0714n 2-0.4082n
0.0722n 2-0.4056n
0.01690 2-0.41420
0.0718n 2-0.41110
0.045502-0.3843n
0.0463n2-0.3796n
0.050en2-o.3900n
0.05090 2-0.3843n
0.055602-0.4136n
0.0602n2-0.3935n
..
•
0.4167n
0.06330 2
2
0.3839n
0.OS56n
2
0.C4110
0.3150n
4
8
4
4
1
6
10
8
8
8
6
6
8
10 10
-
4
4
6
6
8
8
6
6
6
6
8
10
0.0909n2-0.34710
0.011102-0.3703n
0.0526n2-0.30470
2
0.05880 -0.3114n
2
0.0333n -0.28670
2
0.0385n -0.2929n
0.0385n2-0.3047n
0.0417n2-0.3056n
A.045Sn2-0.3182n
•
0.0625n2
0.041702
5 2
0.0312 n -
-
-
2 5n
0.09375
n -0.3437
0.1146n2-0.3646n
0.0561n 2-0.3031n
0.0594n 2-0.3094n
0.0339n 2-0.2839n
0.0391n2-0.2891n
0.0375n2-0.28750
0.0443n2-0.29430
0.0495n2-0.2995n#
0.3125n
0.2917n
0.28125n
8
It can be seen that little is lost by using
e
for the mnount of variation in values of
r
column of Table 1 gives the efficiency of
Cramer-Rao lover bound. in cases when
r
l
N(m- 1 ,. ...m- I ).
shown in the table.
The last
Nem-1 ••••• m-1 ). relative to the
• r 2 • ••• • r • r.
m
We note that in the case of symmetrical censoring with
r m • r.
at any rate
r
l
• r
2
• .,. •
the maximum likelihood estimator of
n· r+2s (sO • sr • s).
n
satisfies the equation
The statistic
. is an unbiased estimator of
n.
It haa variance
-1
-1
nm (r-3) (n-rt2).
°e
3. Tests of Sample S1z,
If we wish to test the hypothesis that the available data represent the
whole of the original samples, and still to confine ourselves to situations
(n .u •••• -n DU).
m
l 2
where the original sample sizes are all the same
need consider only caees for which
r -r .....-r •
l 2
m
smaller than others then (under the condition
n
l
For if some
r's
then we
are
• n 2 • ••• • nm • n)
the
corresponding samples must be incomplete and there is no need for a test.
It is shown in [2] that, for a single sample. a test with critical region
. of form
is uniformly most powerful with respect to all alternatives to the hypothesis
So • sr·· O.
.~
for which
8 /8
0
i8 the same for all samples
(n • r+SO+Sr)
r
• 8.
If the number of available oboervations
(r ·r - ••••r m·r)
l 2
is also the same then
and the complste sample size
9
is uniformly most powerful with respect to all alternatives for which
As particular cases we have (i) cerworing
fron
beZOlJ)~
for which
.O/sr.
sr • 0
and
the critical region 1s of form
and (1i) syrmtetnoa't OBnBoJ"i,ng~ for which
'.
.0· sr t
and the critical resion
18 of form
m
ni 1 [Yil (1-Yi r )] > Ca •
N
Of course OfmSozeing f%v;m above
can be treated by similar methods
(sO-o)
to those appropriate to censoring from below.
The values of
C
a
have to be chosen to give the required sisn1ficanee
level t in each case.
In the subsequent discussion we will consider a rather more general
situation in which the hypothesis tested is that the complete sample size is
though this i. no
.
longer the only case of intere~t.
to taking
nO
equal to
The hypotheais of "completeness" corresponds
r.
3.1 Censor1ngfrom Below
From (2) t puttiq
ratio of
n· n'
siO· n-r
against
n·
Do
and
is
8
ir
• 0 'we see that the likelihood
i
"
o.
10
- constant. )(
So a test with critical region
(13)
is uniformly most powerful with respect to the set of alternatives hypotheses
11 > nO'
given thia kind of censoring.
all equal(provided of course nO
~
This ia ao even if the
max(rl•••••
ri'a
are not
r.».
Each Yi1 haa a beta distribution with parameters n-r+l. r.
III
•
The dis-
.
tributlo11 of
TT Yl1
i-1
ls complicated. but a useful approximation may be
III
cODs~ructed by considering the distribution of G· -2 10g.CI:[Yi ) -
-2
I
i-1
log Yr
The cumulant generating function of -10geY1
is
G
-'l' log• Yi
(14)
10&e E[e .
]
J • loge E[Y-'l'
1
- 10&e r(n-r+l-T) - loge r(n+l-'l')
Hence the s-th cumulant of -log. Yi
1s
~.<-108eYi) • <_1)s[,(s-1)Cn-r+l) - .(8-1) (n+1)]
(15)
r-l
- (a-1)1
I
j-O
(n-j)-~
r-l
So -2 logeYi
independent
'e
ia dUtr1buted..
I
j-O
x~
variables, and
(n-j)-lwij
where WiO '" ""i,r-1 are
11
m
(16)
G - -2
I
i-l
r-l
r
i-l j-o
III
I
Wj
where
-1
(n-j)
are independent
is distributed as
loge Yi
wij
2
X211l
So, to tost tho hypothcsis
-1
r-l
I
•
(n-j)
j-O
wj
variables.
n· DO (against alternatives n
> DO)
we
use the critical region
m
-2
I
i-l
log Y < C
e i
u
where
(17)
.e
Pr[
r-l
I
j-O
(no·3)
-1
wj
< C ] - u
a
(Note that it 18 the lower tail of the G-distribution which gives 8ignificance.)
It 18 pos8ible to give explicit formulae for the probability in (17) (sea
Appendix I). since each
W
j
is distributed
88
a
2
X
with an even number of
degrees of freedom, but except for unrealistically . . .11 value8 of
r
and
m,
these would not be useful for purposes of calculation. Useful approxiDiations
~
~
(at least for ~2) can be achieved by reaarding j!O (Do-j)
"j as
approximately equivalent to
c~,
with
c
and
v
choeen to give the correct
first and second .o.ents, i. e.
(18.1)
(18.2)
Approximate values of the power can be obtained by replacing
'e-
lor
Dl·
1,
exact values are euily calculated. as shown in [3].
mation would be expected to improve..
m increases (in that the
nO
by
n.
'!'he approxi"j I a,
and
12
also the approximation, both become more nearly normal).
~
would also be expected, for given
coefficients
(n_j)-l
r
and m,
are in ratios closer to
as
1.
n
Better approximation
increases, because the
Investigations summarized
in Appendix II confirm these expectations.
3.2 Symmetrical Censoring
The
first part of discussion follows exactly similar lines to that in
Section 3.1, and i8 therefore condensed.
The critical region
(19)
chosen 80 that
(20)
.~
gives a test of the hypothesis
n. nO which is uniformly most powerful witJ:1
respect to alternatives
n > nO'
also is true even if the
r1' s
From (3), with
r 1 • r,
given that cE:n8oring is symmetrical.
are not all equal. provided nO
1
810 • s1r • I'n-r),
generating function of -loSe[Yil(1-Y1r)]
we obtain the cumulant
as
n-r +l-t)-log r(----2
n-r +l)]-[log r(n+1-2t)-log r(n+l )] •
2[10g e r(----2
e
e
Q
Hence. 1f
G • -2
I
i-I
log [Yil(1-Y1 )]
e
r
(22)
1
Since I(n-r)
~
max(rl' ••• ' r.>.
(21)
til
.
~
This
must be an integer
and (22) can be written
13
(22) ,
Although we do not have a simple representation, as in section 3.1, it
seems reasonable to approximate the distribution of
c • 21 K2 (G)[K 1 (G») -1 ;
(23)
G by that of
CX
2
v with
1
v· 2(K 1 (G)] 2
[K 2(G»)
•
3.3 General Purpose Tests
If the value of
e(-sO/sr)
powerful test of sample size.
Y
1
with
Ica (2,r-l)
is not known, we do not have a uniformly most
In [2] a test of completeness with critical region
+ (l-Y r ) > Ca
- I-a,
has been propooed, for the single sample case.
This
test was derived on heuristic arguments, but has bean shown [2 ] to have pro-
e
perties rendering it a useful "general purpose" test when
-_
is not known.
:Put
(1 • 1,2.... ,m).
The density function of
Vi
is
and 80 t,a have the likelihood ratio
l(V1 , ••• ,vm!n'). i(2+no-r,r-~1
l(vl, ••• ,vmlno)
A(2+n'-r,r- 111
So a uniformly most powerful test of the hypothesis
are to be used) &Gainst the 8et of alternatives
--
the cri tical region
n·
n > nO'
110
(if only
V1'·.·.V.
is obtained by using
14
m
(24)
nVi> C
1-1
0
m
with
Pr[
IT Vi
colnol •
>
i-I
a.
r ,r2 , ••• ,rm of
l
observations available in the different samples, and we now give some formulae
Again this is so even when there are different numbers
appropriate to this more general case.
The value of
depends on DO'.' rl, •••• r m•
Co
In order to develop use-
m
ful approximations we use the criterion
G - -2
I
i-I
log Vi.
e
Bence the 8-th cumulant of G i8
'e
(25)
The distribution of
G i8 that of
r -2
r
m
i
I
(26)
(n-j)
-1
W
ij
i-I juO
where the W' 8
2
X2. varbhlea.
are 1ndependent
(26) can also be expressed as
&-2
I
. (26)'
(n-j)-
1
j-O
I
i
where . R • max(r .r ,. •• , r );
l
which
.e
r
i
number of
~
j+2.
r i ••
2
The
m
Wj's
(j)
W
ij
•
R-2
I
-1
(l1-j)
j-o
and
r(j) denotes summation over all
i
2
are independent X
variables, with
2mj
greater than or equal to
(j+2) •
If r l -r2-· ..-rm-r. then (26) I becomes
r-2
I (n-j)-lwj
(27)
j-O
wj
i
for
mj •
15
X22m variables. (Compare (16).)
As in section 3.1. the distribution of G may be approximated by that
4It
2
of
cXv with, in this case
r-2
r-2
c •
(n_j)-2][ I (n_j)-1 1-1
(28.1)
j-O
j-O
[l
(28.2)
v - 2.[
-1 2 r-2
-2 -1
r
(n-j) ] [r (n-j) 1 •
j-O
jeO
r-2
Variation in accuracy with •
and nri11 be exactly similar to that in
Section 3.1•
•4. Some Numerical Comparisons
Table 3 gives some values of -2 log CO.OS
(13), (19) and (24).
.e
for each of the three tests
Values in parentheses were calculated from approximations
by (1) 'using CX~ approximation and (ii) making an ad hoc correction based
on comparison between exact and approximate values in cases when the fonr.er
was calculatod.
The (exact) values for m. 1 (case (b» are taken from (3].
Table 3: Critical limits for {a) one-sided (b) symmetrical and
(c) seneral purpose tests (Values of -2 lOSe Co. 05)
~Os
r
m
4
1
2
3
10
4
1
2
3
4
.e.
'
(a)
1.281
3.821
6.734
(9.65)
2.703
(6.85)
(11.45)
(16.15)
(b)
4.435
(10.66)
(17.40)
(24.40)
7.11S
(16.50)
(25.55)
(36.80)
.,
(c)
0.572
1.839
3.318
(4.90)
1.862
(4.72)
(7.79)
(11.00)
Table 4 gives powers of these tests, with a· 0.05, with respect to
altemative hypotheses
n· r+2, 1+6. r+10,
Values in parentheses were
16
obtained by using
~
cx2v
approximation, with the
Ca
values corresponding to
Table 3.
(In Appendix II there is some evidence indicating that as n
2
increases. the cx'V approximation rapidly increases in accuracy.) For the
"one-sided" and "sYtll1lletrical" tests the "best" forms of alternatives are
assumed, i.e ••
sO· 0, sr • n-r
for "one-sided",
sO· sr • ';-(n-r)
'~symmetrical".
For the "general purpose" tests, power depends only on
for
(SO+sr) (-n-:-r).
Table 4:
Power of tests (a). (b) and ec)
(5% Significance Levell
Power of (a)
r
n
4
6
10
m-
14.
10
1
2
.294
.780
.955
.557
.989
.364
.907
12
16
20
.988
'e
*
3
4
(.749)
(.872)
(*)
(*)
(tit)
(*)
(.636)
(.829)
(.923)
(*)
(*)
(*)
(*)
(tit)
(*)
Power of (b)
r
n
4
6
m-
.206
.594
.841
10
14
10
1
(.269)
(.798)
(.982)
12
16
20
2
(.364)
3
(.522)
(.996)
(.933)
(*)
(tit)
(.442)
(.992)
(.664)
(n>
(*)
(*)
4
(.658)
(*)
(*)
(.795)
(*)
(*)
Power of (c)
r
n
4
10
.e
m-
1
2
3
4
6
10
.167
(.420)
.470
.296
.827
14
.716
.978
(.958)
(.999)
(.530)
(.991)
(*)
12
16
20
.238
.732
(.419)
(.547)
(.969)
(*)
(.996)
(* denotes "over
.9..9
.9995")
(*)
(.677)
(tit)
(*)
17
The figures in Table 4 exhibit the rapid increase in power with m.
~
the
number of samples in the sequence.
Such powers will
f(t)
is not known.
n0~
be attainable if the population density function
However. they do indicate the possibility that with a
sequence of moderate length. good power may be obtained even when
not completely known- for example when the form of f(t)
parameters have to be estimated.
f(t)
is
is known. but some
18
~
REFERENCES
[1]
Johnson. N. L. (1962)
Estimation of sample size.
Technomet~C8~
!. 59-
67.
[2] Johnson. N. L. (1970)
sample values.
A general purpose test of censoring of extreme
Essays
MemoPia~ Vo~ume).
-~
in P2'ObabiUty and
Chapel Hill:
Stati8tics~
(S.N. Roy
University of North Carolina Press.
19
Appendix I
TIle results obtained below are not original, but the derivations ure given
to assist comprehension.
The symbols
variables distributed as
X2h'
2
{W(h)}
j
The symbols
will denote independent random
{aj }
denote positive constants.'
TIle characteristic function of
is
where
0_
k
k
n
l b
j-l j
Putting
t -
(2a" i)
-1
(1-2a it) - 1
u
gives
IT
k
(A. 1)
bj •
provided no two
k
identity
of
u';j
k
j~l
a fa
j
1
(l-aU/aj )-
are equal.
b j - 1.
bj'S
must be negative
Hence, for
y >
0
(A.2)
0_
where
b
j
is given by (A.1).
t · 0, we obtain the
Y is distributed as a for.mat mixture
1
2
a X with weights b
(j-l, ••• ,k). (Some
j
j 2
It follows that
variables distributed as
of the
Note that, putting
(i£
k > 1).)
20
We next consider
which may be regarded as the sum of two independent random variables. each
distributed as 1 ,
From the mixture representation (A.2) we see that the
distribution of 1
1s also a formal mixture. as set out in the following
1
2
table:
Distribution
(2) _
2
a W
(::a x )
j
j 4
j
a
0_
(1)
(1)
W +aj,W ,
j
j
j
Weight
b
2b
j
Again using (A.2). the distribution of
mixture of
with weight
CA. 3)
wi th weigh t·
Hence. for
y > 0
(A. 4)
We now briefly consider
y
•
3
r
j-l
a w(3)
j
j
2
j
bjl
(j < j')
is a formal
21
which has the formal mixture distribution set out below:
(j r/ j')
(j < j ' < j ") •
To obtain a representation of the distribution of
note that
w(l). a w(l) + ( wei) +
w(l»
(2) +
a jW
j
aj'j'
jjl
a jj2
aj t j l
(1)
(a Wj
j
The distribution of
that
"-
(a w;2)
.1
+
~s
ajtwj;»
a
j
+
a
j
t
(1)
Wj t
distributed
2
X4 with weight
2
a X with weight
j 2
(A.5)
)
2
a j 'X 2 with weight
can be obtained from (A. 3) •
as
a mixture of
(l-a ,/a )
.1
(l-aj/aj ,)
Similar formulae can be obtained for any
11
L
j-l
a W(m)
j.1
-1
(l-a !a ,)
j
j
(A.6)
k
-1
.1
(l-aj, /a )
After some manipulation we find that for y > 0
Y •
•
-2
.
j
-1
We find
22
The
length
of. th~
formulA increases qUi te rapidly wi th
rn.
In the particular case (16). which can be written. in our present notation
CA. 7)
we obtain from (A.I). putting k·
b
(A. 8)
•
j
IT
(1 t;j
I'
and a • (n-j+l) -1 •
j
n-j+~) -1.
n-t+l
IT
(n-t+l)
t,rj
j-t
• (_l)r+j(n)(r)
,
I' j
n-j+l
For m· l(and y>O),
(j.l.2..... r)
from (A.2)
(A.9)
For m • 2. from (A.4)
2
n 2 I' -~(n-j+l)y r 2
(r) j!l e
[(j) (n-3+l) {l+~(n-j+l)y}
(A. 10)
r
+ 2(-l)j(;>j(n-j +l)-1
(-1)3' (;,)j'(j-r)-1 1
j¥j
Some particular cases
(used
in calculating Tables 3
(Note that G in (27) is obtained from (16) by Changing
r
to
(r-l).)
r
n
4
4
(Sy_ 128)e-Y/2+36(y_1)e-Y+8(3Y+8)e-3Y/2+(2~)e-2Y
6
200 (3y-2) e- 3y / 2+2025 (2y-3) e-2Y+648(5y+12) e- 4y / 2+100 (3y+23)e- 3y
10
3
Pr[Y2 > y]
3
3
210 2)([ (~ _ li:~)e-7y/2+(9y _ ~3~e-4Y+( 8y1 1;.0)e-9Y/2+(~ 14~~)e-5Yl
4
-lly/2+(6 -11) -6Y+l72 ,2016) -13y/2+(4 .628) -7y]
1')012 x [(8l1Y_ 1888)
363 e y e
'W 169 e
'j'Y 147 e
36 (y-5) e- Y+32 (3y+2)e- 3y / 2+9 (2y+13)e-2y
6
225 (2y-l1) e-2Y+288(5y+2)e-5y/2+100 (3y+l?)e- 3Y
14
"-
are set out below.
and 4)
10
y
1202x{ (~6Y- 2~~)e -4Y+(2y4)e-9Y/2+(~"W1i~~)e-5 ]
14
-6y+(18Y + 36.) -13y/2+,,9 Y + 387) -7y]
364 2x [(3
if_12.)
16 e
13
169 e
\28
196 e
23
For m· 3,
(A.ll)
In particular, for
And for
r · 3, n • 4
r . 4, n • 4
24
Appendix II
k
Using .the notation of Appendix
is
d~stributed
~
exactly as
2
X '
2m
c and
v
if
al=a2••••• ~.a then Y • a
m
For general values of
to obtain a useful approximation by
with
I~
l
w~m)
j-l J
{a } we might hope
j
Ym to be distributed as c ~~
suppos1n~
chosen to make first and second moments agree.
That is
k
r
cv • E[Y ] -2m
8 ;
m
j-l j
or. equivalently
k
c -
I
2 k
ajl
j-l
l
j-l
k
2 k 2
.aj;m-lv - 2( raj) /
j-1
r
aj •
j-l
Approximations of this kind have been used quite widely with satisfactory
results ([4 ][ 5] etc.).
In order to check how suitable the approximation is in our particular case
some numerical comparisons are presented here.
k
For sums of the form Y - jEl(n-j+l) -lw~m). with n
m
equal to k
As
n
-
J'
an integer at least
the least accurate approximation would be expected when n. k.
increases, so that the ratios
distribution should become closer to a
n:
(n-l): ••• : (n-k+l)
c
exact and approximate values of Pr[Ym>y]
xv2
distribution.
approach 1, the
Table A.l contains
for k-5 with .1,2
and n-5,8,lO
to exemplify this point.
The exact formulae are
n • 5:
"-
y 2
Pr[YI>y] - 1 - (1_e- / )5
Pr[Y2>Y] • (2;y_ 4~5)e-Y/2+(lOOy_ 7~O)e-Y+(150Y+100)e-3Y/2
+ (5Oy.5~5)e-2Y + (~ 1~I)e-5Y/2
25
n • 8:
e
Pr[Y >y]
1
D
70e- 2y -224e- 5y / 2 + 280e- 3y - 160e- 7y / 2 + 35e- 4y
Pr[Y >y] • 56 2 [(2i y 2
1~~5)e-2Y+(40y
_ 3;2)e-5y /2+(75y+25)e-3y
+ 2575) -4y]
+ (200 + 15200) -7y/2 + (25
~
147 e
16 y
192 e
n • 10:
Pr[Yl>y]. 2l0e-3y - 720e- 7y / 2 + 945e-4y - 560e-9y / 2 + 126e-5y
P [Y > ] • 252 2 [(25. _ 50) -3y+(200 _ .12800) -7y/2+(225 _ 225) -4y
r 2 y
12 Y
3 e
-:rt
147 e
I;'Y
16 e
+ (2~0y+ 6:~0)e-9Y/2+(~ + 3;)e-Sy]
For m· 3.
and n· 5.
Prey > ] • (.125 2 _ 2625 + 98500 ) -f1-(500 2 _ 40001 + 68000 ) -y
3 y
\B"7
8 y
48
e
y
3
e
+ (1125y2 + 15001 + 34750)e-3y/2_(250y2+2750y + ~i25 )e- 2y
+ §.ill ) -5y/2 •
+ (25SY 2 + 645
8-Y· 12
e
For calculations of approximate values (based on
the following values were used:
-e
n • 5
c
0.6410
7. 124m
8
0.lS80
9.4l0m
10
0.1332
9.692m
'V
c
x:
distributions)
26
e
Table A.l: Comparison of Exact and Approximate Values of
Pre
n •
Ill·
1
"e
Exact
Approx.
0.5
1
1.5
2
.9995
.991
.959
.899
J
.717
4
5
6
.511
.348
.225
.142
.998
.982
.943
.882
.712
.526
.363
.238
.149
1
2
3
4
6
8
10
12
14
Exact
n • 5
5
y
7
m-2
5
r
(n_j+l)-lw~m)
j=l
J
{ Approx.
.9997
.995
.973
.828
.9990
.991
.964
.821
.580
.344
.587
.356
.182
.088
.188
.089
4
.9998
.9994
6
.992
.988
8
.942
.934
The improvement in accuracy with
>
y]
n • 8
Exact
Approx.
.983
.982
.836
.575
.333
.080
.015
.002
.0004
.834
.576
.336
.080
.014
.002
.0003
.9992
.9990
.933
.648
.309
.031
.0016
.932
.649
10
.811
.809
n
12
.620
.626
n • 10
Exact
Approx.
.951
.649
.312
.118
.011
.0008
.993
.743
.279
.058
.0009
.311
.030
.0014
14
.421
.431
16
.259
.267
is marked. but with
18
.147
.151
Ill.
.950
.650
.312
.118
.011
.0007
.993
.743
.279
.051
.0008
20
.079
.078
less so.
This suggests that is might be worthwhile de""Oting special efforts to obtaining
exact values for significance limits. while relying on approximations for
evaluation of power. .
'e