Ighal Mohammad; (1956)On the classification statistic of Wald." (Navy Research)

ON THE CLASSIFICATION STATISTIC OF
l~TALD
by
Mohammad Iqbal
University of North Carolina
This research was supported in part by
the Office of Naval Research under Contract
No. NR-042031 for research in probability
and statistics at Chapel Hill. Reproduction
for any purpose of the United States Government is permitted.
Institute of Statistics
JvTimeograph Series No. 159
November 1956
ERRATA SHEET
Notation
4(12)
92(3) will mean page 92, line 3 •
Replace m = m by w = I m I
3
3
Replace Z by z .
6(11)
RePlace.//Nl+N2 by
iV(18)
V NI N2
2
- m3
13(2)
Replace
19(9)
Re ad N( e )
21(5)
Replace (1- ~
e
by (1 _ ~)2
21( 6)
Replace
a~
22(1)
~~
as
n
02
, g
'0 all
Read as
24(6)
0 by
~m2
2
- m3
~
0
•
•
n
by
28
2
23(1)
N
2
~
Q'
t'
From - ~n to the end of line 2, is to be enclosed in square
brackets.
h
Put) after ~.
n
27(!~)
Put t"./ between II and III .
29(18)
Replace f(x) by
f(x) and 0 :: c :: 00 by 0 ::: c <
48(7)
Insert a multiplier ~ on the right.
51(5)
Replace 16 by 64 • (and this corresponds to p = 2.)
62(9),113(7) Replace nm by Inm \
3
3
m31 ~
62(18), 92(3),115(7) Replace m by
3
69(1,5)
Replace 1¢(t)-¢(t)1
by
I¢(t) - ¢(t)l.
I
I
00
I
70(14), 88(16) Replace e- j bye- V
81(9)
Put dv: after Iem(v)
85(7)
Read 'yJ (v) = • 2
90(5)
Replace ~ by ~ •
103(5)
Read l: as 1: •
119(15)
T)
n
Replace
V>
nm3
by
_J.'X- 2
IV I
-1
•
>
A. 2
Inm31
•
2
2
135(11)
Replace e
by e
139(18)
Replace suffices by suffixes •
140(6),(10) Replace r by y.
141(6)
a ij
)
a ij
d atj
Replace -=r by
Read )(i as
Jtl
2a ij
and 145(7)
Read a;~
as aijz
i
•
•
\
AC KNa
~
LED GEM E N T S
I wish to put on recqrd my deep sense of gratitude to
Professor Harold Hotelling for his inspiring guidance throughout
the preparation of this work.
I feel myself greatly honored on
having had the privilege of working under his direction.
I am also greatly indebted to Professor R. C. nose for the
confidence derived through his encouragement at various stages.
My thanvs are also due to the Fulbright Foundation in Pa 1(istan,
the Institute of International Education and the Office of Naval Research for their financial assistance which made this study possible.
The help of Mrs. Kattsoff, Mrs. Spencer and Mrs. Kiley for the
careful typing of the manuscript is gratefully acknowledged.
Mohammad Iqbal
•
scientific investigation can be final;
it merely represents the most probable
conclusions which can be drawn from the
data at the disposal of the writer. A
wider range of facts or more refined
analysis, experiment and observation will
lead to new formulae and new theories.
This is the essence of scientific progress.1!
I®O
Karl Pearson 1898
•
iii
TABLE OF CONTENTS
PAGE
CHAPTER
ii
ACKNOWLEDGEMENT
...........
CONSIDERED BY WALD . . . .
INTRODUCTION
I.
A P~OBLEM OF CLASSIFICATION
...
1
1.
Introduction. • . .. • .
2.
Statement of the problem •.
3
3.
An example of its importance
5
4. The statistic proposed by ir/ald
5.
II.
vii
·.
5
Further work on the problem
ON AN ASYMPTOTIC
EVA,LU~TI01\T
1
9
OF /1 TRIPLE INTEGRAL.
12
1.
Introduction
12
2.
The integral and its domain
12
3.
Order of the variables
14
).+ •
An important limit
5.
A triple integral
6.
The integral over
7.
An upper bound to error
.··.
~,
8. The integral over D2
9.
ml , m2 and mj
.···
25
an asymptotic approximation
•
·
•
·
····
Comparison of
11.
The integral over the domain
12.
Summary of Chapter II
II
and
12
29
34
43
·
.. ··...
D*
....
• ·
···.
An upper bound to the value of 1 2
10.
18
•
48
51
53
5rr
iv
CHAPTER
III.
PAGE
ON THE ASYMPTOTIC DISTRIBUTION OF
TleN ST'TISTIC • • • •
•
•
•
•
0
T~ALDI
oJ
Q
S CL .'\SSIFIC:,-
60
•
1.
Introduction.
60
?
lATald I s approximate classification st.9tistic and
its moments
• • • • • •
• • • .
61
3.
The asymptotic distribution of
4.
5.
An
v for
p
= 2m
66
75
integral equation due to Wilks
A note on Bessel functions
76
6. Distribution of v for odd values of p
79
7. The use of a differential equation in the evaluation of an integral • • ••
8.
9.
10.
IV.
••••••.••••
81
The asymptotic distribution of v for even and
odd values of p
•••••••
• • ••
82
Note on the construction of tables •.
Summary of Chapter III
AN~SYl'1PTOTIC
. ,. .. . . . .
••.••••• . . • . •
88
91
SERIES EXP:\NSIOl\T FOR THE DISTRIBUTION OF
....
92
1.
Introduction ••
92
2.
An asymptotic series for the distribution
93
3. The constant of integration for the first approximation •
. • . • • • •
108
4.
The tail are8S for the first approximation
110
5.
Comparison with the results of Chapter III
113
6. Summary of Chapter IV
115
.
v
V.
THE APPLIC'TI0N OF TCHEBYCHEFF-M,iRKOFF INE0UALITIES
TO A
SPECT~L
CASE • •
1.
Introduction
2.
The integral over
3.
The integr al over
4.
The integral over
5.
Moments of
..
116
116
V •
D
l
D
2
lIS
·.
.·• ·
117
·... ···.
....···
..
D
119
·.
6. Some results due to Tschebycheff and Narkoff
119
120
7. tpplication of Tschebycheff-Markoff theorems to
this case
VI.
NON-NULL C:\SE
•
•
•
•
1.
Introduction
2.
The joint distribution
• •
. . . . . . . . . . ..
123
128
•
•
•
•
•
•
•
•
•
•
i
•
•
•
•
•
'"
It
•
•
5.
6.
~n
•
128
• • ;, • • • • • ••
129
···
130
...
f(~, m2 , m )
3
for large n and p ::: 1, an
asymptotic form of
Distribution of U
independent approach
•
·.
3. Note on confluent hyper geometric functions
4.
•
·..·
·····
The asymptotic mean and variance of the stC3tistic
..·..· ·····
U . . . . . .. . . .
132
133
137
7. Correction term for the variance of the linear
discriminant function
VII.
·
·.
·· ··
SOME REL:ITED UNSOLVED PRORLE11S
1.
On classification statistics of
2.
The quadratic discriminators
145
148
~rTald
Possibility of a differsnt approach
and Anderson
148
148
·.
150
vi
CH'PT~
P·:GE
4.
Efficiency. . . . • • •
150
5.
The gre ater me"ln vector
151
BT~LI00R~PFY
. . . . . . . . .
152
.
vii
INTRODUCTIONl
In his paper rtOn a statistical problem arising in the classification of an individual into one of two groups!!
~braham
Professor
L-50_72 ,
Wald Mude an attempt to put the theory of discrimi-
nant functicns on rigorous mathematical foundations.
by
usi~g
the late
He demonstrated
very ingenious geometrical arguments spread out over several
lemm,9s that a function V =
nm
! L-(1-~)(l-m2)-m~ _7
3
the classification statistic instead of
/NIN2
!
!~~
can be taken as
U, where
Nl +N 2
- ) is the usual discriminant function with the
U= ~
Z s lJ z. (-y,-x
i=l j=l
1
J j
p
p
..
population parameters replaced by their sample
and N and N2
l
are the sizes of the two samples from the two p-variate normal popula-
tions, and n
of
=
N
l
+
N - 2. Weld also obtained the joint distribution
2
~,m2
and m - f(~, m2 , m ).
3
3
It would be desirahle to obtain, in a usable form, the distri-
bution of V from
extremely difficult.
f(~,
m , m ). Such a simplification appears
2 3
It is related to the problem of the non-central
lNishart I s distribution for which T.
were
estim~es,
a~le
variables.
llJ.
:,nderson and
M.
:'.
Girshick
to obtain manageable expressions only for two or less
It seems that this general distribution of the discrimi-
nant function, or the classification statistic as Wald calls it,
ISponsored by the Office of Naval Research under the contract for
research in probability and statistics at" Chapel Hill. Reproduction in
whole or in part is permitted for any purpose of the United States Gov~
ernment.
2The numbers in square brackets refer to the bibliography listed
at the end.
viii
would involve the figurative distance, 6, between the centers of the
two populations.
One approach to this highly involved distribution
would be to obtain a series of powers of 6 with each coefficient involving
nand
The present work is chiefly concerned with the
p.
examination of the first term of this series with special attention
to its value when
n
is large.
In the first chcp ter of the present work, a brief historical
introduction to the theory of discriminant functions is
f~llowed
a mathematical formulat ion of the problem following lrJald.
by
The re-
sults obtained.by him and also by some subsequent workers on the oroblem are briefly described.
·e
The next two chapters deal with the problem of findiDg the distribution of
is a
lar~e
number.
Given
~
N +N - 2
l 2
Explicitly this problem can be stated as follows:
V in the null case, by supposing that
f(ml,m2,m3)dmldm2d~=
p-3
Canst.
n
n-p-l
IlVl j2l1-1'11 ~ d~dm2d~
to find the distribution of
It will be noticed that the sample sizes,
N
l
V suitable
and
N , do not
2
separately occur in the joint distribution and the assumption that
n
is large, which is obviously milder than the assumption that
N and
l
One mmplifi-
N2 are both large, introduces certain simplifications.
cation that is obtained is that the statistic itself approximates
...
.
ix
because of the order in probability of the variables entering into the
distribution.
The same
ass~ption
entails simplifications both in the
integrand and in the domain of integration.
In the second chapter, which can be regarded as dealing with
the mathematical aspects of the prcblem, methods have been developed
which will enable us to evalu ate triple integr als giving the moments
An upper bound to the error in using these simplifications is also worked out which enables us to put reliance in the
aporoximations in suttable cases.
The third chapter deals TAith theaaymptotic distribution problem.
4fter £inding an expression for the kth moment of
the asymptotic distribution of
of p.
v, we obtain
v both for even and for odd values
For even values of p, the uniqueness of the distribution, which
i~obtained
by the help of its moment generating function, is also es-
tablished.
For odd values of p, use had to be made of an integral
equation due to S. S. Wilks
-/-55-7;
and, because of the fact that we
are considering only the principal term in the kth moment.f
uniqueness of the result cannot be guaranteed.
v, the
This section is there-
fore presented on a heuristic basis and has to be left for further
discussion and rigorization.
In Chapter IV we have obtained an asymptotic series for the distribution of w
=
I m31
,which is proportional to
by observing that fora fixed
v.
This is done
w the range of integration for
~
and
m2 is a lenticular region enclosed by two hyperbolic arcs in the plane,
x
w
c
a constant.
Integration is carried out over this region by using
suitable transformations, and the first three terms of the asymptotic
series are obtained.
For the first approximation, we have also dis-
cussed the method of finding the tail areas.
N + N2 = 20, P = 3 •
l
V are found, and use is
Chapter V deals with the special case
. In this case, the first seven moments of
made of the inequalities due to Tchebycheff and Markoff in setting up
bounds on probabilities of the type p(V ~ ~)~
These limits are rather
crude due to the fact that a small numher of moments is being used.
The example, however, illustrates ene way of proceeding to
disc~ver
something about an unknown distribution when its first few moments are
known.
Chapter VI contains a few remarks on the
non-~ull
caso.
It
starts with expressing the joint distribution of m , m and m dis2
l
3
cussed by Sitgreaves /-45' 7 in another form suitable for large n.
- -
This chapter also contains a brief discussion of the asymptotic distribution of U for p
= 1. In the next section we exemplify the differ-
ential method by finding the mean and variance of U.
The concluding
section of this chapter deals with finding the variance of the linear
discriminant function when the sampling fluctuations of the means are
tck en into account.
In the last chapter are listed a few unsolved prob-
lems related to the problem of classification.
CHAPTER I
A PROBLEM OF CLASSIFICATION CONSIDERED BY WALD
1.
Introduction.
The problem of classification ie the problem of assigning an
individual (or an element), on which a set of measurements is available, to one of several groups or populations.
The problem admits
a simple solution when the distributions of measurements in the alternative populations are completely kncwn or what is the same thing
as saying that the sizes of the samples available from the various
populations, on the basis of which we have to make a decision, tend
to infinity, so that the sample estimates of the parameters tend
stochastically to their population values.
are not +arge, the problem becomes rather
If, however, the samples
complicated~
Research in this area of Multivariate Analysis was started wi. th
his introduction of the linear discriminant function by Sir Ronald A.
Fisher
L-IO_7
in 1936.
P
D = Z f.zi, in which z
i=l ~
the coefficients
Ii'
The linear discriminant function is
= (zl
••• zp)
is the new observation, and
following Fisher, can be obtained by maximizing
the square of the difference of the
expectations of
populations divided by the standard deviation of D.
D in the two
The linear dis-
criminant function provides the best solution of the problem of classification provided that,
(1)
The number of alternative populations is two,
2
(2)
The form of the distributions in both populations is
multivariate normal,
(3)
The parameters are all known,
(4)
The covariance matrj.ces of the two populations are equal.
It may be remarked that Welch
L-53_7
observed that even
without making any assumptions of normality or equality of covariance matrices, the problem of obtaining the best function to discrim:i.nate between t1>10 completely specified populations may be
solved.
He demonstrated that the desired function is simply the
ratio of the two probability distributions, and the criterian level
to which this function is referred is deducible either from Bayes l
Theorem with given a priori probabilities or by the use of a lemma
by Neyman and Fe arson L-)2
7 when
are minimized in any given ratio.
the errors for the two hypotheses
He proved that under the four
assumptions stated nbove the function obtained in this manner is
identical with the linear discriminant function,
Von Mises ~3l_7 considered the problem of classification
when the number of populations is m, and showed how to subdivide the
s3mple space into m parts so as to minimize the maximum error of
misclassification.
-7
Rao /-39
-
gave explicit Bayes solutions with given a priori
probabilities or ratios of errors for the alternative populations,
and discussed the construction and use of doubtful regions and related problems.
3
In all these cases it is assumed that the distributions are
completoly specified.
If, however, as will frequently be the case,
one cannot justify the supposition that the distributions are completoly known, and the only information at hand is what is contained
in the samples available from various populations, we run into
rather complicated distribution problems.
W':lld
L"1;o _7
in 1944 set out to sol VG the problem of classi-
fication for the case of two altern[ffiivG populations.
Instead of
using a distri.bution-free approach he si.mplified it further by introducing the following two restrictionsl
(1)
The form of the distributions is multivariate normal.
(2)
The two populations have the same covariance matrix.
Though it would be desirable to solve the problem without.
making either of these assumptions, still one can argue that in
many practical problems arising in numerous fj.elds of scientific
inquiry it is not unreasonable to make the two assumptions stated
above.
In this chapter we propose to give a mathematical formula-
tion of the problem, and to state the conclusions of Wald, and of
subsequent workers on the problem.
2.
Statement of the problem.
/#
xll
(2.1)
Let
X =
x12
.·•
x 2l
x 2p
.
xN 1
1
x lp
•
..• ·
•
·
x
NIP
/"
4
,./
Y2l
·• ·
·•·
YN 1
··•
···
Yll
(2.2)
and
y=
2
Y1 2
YIp
Y2p
YN2·P
be two random samples from two variate normal populations )(x
and
jf
both having the same, though unknown, covariance matrix
Y
E, and different Unknown mean vectors
and
respectively.
Let
be an observation on a new individual which is known to have come
either from
of both x
lTx
or from
Jry '
= (xl'" xp ) and
variates corresponding to
1T
x
but is distributed independently
= (YI ... yp)' the two
and 1T respectively.
.
Y
Y
sets of
On the basis of the information supplied by X, Y and Z the
such that if the probability of one type of misclassification is
held fixed, the chnnce of second typo of misclassification is minimum.
3.
As an example of the importance of this problem we can con-
sider a candidate applying for admission to an institution with
certain test scores.
He may have
t~
be accepted or rejected de-
pending on his chances of success or otherwise on the basis of the
scores of candidates admitted in previous years.
4.
the statistic proposed by 1liald.
(4.~
U
=
Wald considered as classification statistic
p
Z
P
Z s
i'
J
z. (y. - x j)
~
i=l j=l
obtained by considering this problem
J
Z&}T
as one in testing the hypothesis H:
x
"x
against the
alterna~
tive that
and by replacing the population values of the parameters by their
optirnum estimates obtained from the samples in the statistic obtained by using the fUl)damental lemma of Neyman and Pearson.
'where
(4.2)
and
-x.
~
e.
=
Nl
Ex
'I"~
a=l iat
N
1
Thus
6
The statistic
U can be rewritten as
where
(4.5)
and where
and
are distributed independently of each other according to p-variate
normal distributions with
E (z)
=
{:
if
z e
Jtx
if
z e
rry
and
,
and with the same covarinnce matrix
Since the
Sij
/-a..
7.
l.J-
-
are distributed independently of the set
~}
~~
( zl •.• zp' zl
••• zp ), the distribution of
U remains unchanged
if we define
s ..
l.J
n
=
Zt
0;..
2
1 i aI
/
/ n
,
7
where
n
and writing
Nl
=>
i'
(s J) for (s, ,)-
1
, W31d observed that the distri-
~J
bution of U is the same
p
(4.7)
V ==
as
N2 - 2
+
that of
P
'j
Z s~ t
, 1
i,n+l
J"'"
E
i=l
whore the probability element of
p
Z (t
i=l i,n+1
i
)2+
/-~
L
J ,n+
2
is given by
ia
/-
1
exp
(2n)P(2+ 2 )
(4.8)
t
t.
p
n
Z
E
t:
~a
i=l a=l
7
P
Z (t
i=l i,n+2
i)2
p
/ IT
_/ i=l
+
n+2
"IT
a=l
dt ia
whcO're
(4.9)
p=
Cf'1 p 2
t
~ = (t.
- 1'''' 2
p
)
ij , 1, j ~ 1, 2 •.• p.
Here Wald introduced two sets of numbers (u ••• u + ) and
n 2
l
••• ' v 2) satisfying the relations
n+
are certain functions of
(J.J. .10)
n+2 2
~i'
... p
. . . t p)
n+2 2
Zu=Zv=l
a=l a,
a=l a
vi and
and
G
n+2
Z u v
a,=1 a a
=0
~nd
using a very ingenious
distribution of V
geometrica~
argument, concluded that the
is the same as that of
(4,11)
where
m
==
eu 2
~
10:=1 a:
(4.12)
and the joint pl'obabi..1ity distd.bution of
,/
E
"'~
.r11 , , ,
,
r
r
r. pp
p1
..
..•
Ip
~
\
~,m2
)
ID
3
is given by
n+?-lJ
--.,.-
.
/
and
dm om?dm •
1
3
9
p
t'lhere
r ij = Z t.
a=1
tho domain
t
~a j a
and
g is the constant of integration, in
0 ~ ~ ~ 1, 0 .: m:, :: 1, -
v'~m?
~ m :: v'~~
3
,
and
zero othEJrwise;
1
F (t) ~
where
,
.k
and
5.
~rther ·W~rk
Anderson ['
1;..7
considered the statistic
- ,) W =~ Z s ij zi (-Yj-X
(5.1)
J
i j
which is much like
of
on the problem.
Z
= (Zl
dividual.
1
~ Z ~ S
~ i j
ij (- )(- )
Yj+X.
Yj-x.
JJ
U, since it differs from U by terms independent
••• zp)' the measurements observed on the new inHe c;valuated the expected value of the matrix of non-
central Wishart variates occuring in the joint distribution of m ,
l
m? and m in the special case when
3
(5.2)
Sitgreaves
L-45 _7 gave an analytio derivation of the
dis~
tribution of W in the case considered by Anderson ond also obtained exactlythe constant of integration in the joint distribution
10
of
~m2(l13.
We shall refer to the following result from her paper
in the next chapter.
,
where
-> 0 !I-MI>
0
where
and
~
1
= 6' Z- 5, and where
Earlier
bution of
~,
=0
~i
Pi
==
bution of
m
3
Harter L-18 _7
m and m
2
3
k k
1 2
are defined in (5.2) •
in 1951 corisidared the joint distri-
of (4.13) in the
' i == 1, ,.. p
degem~rate
case
and obt ained the approximate distri-
in the special cases
11
(I-a)
n even,
p odd
(I-b)
n even,
p even.
The technique he used in deriving this distribution was
ossontially
exp,~ding
the two binomials constituting the integrand
m and m of (4.13)in tile degen2
3
erate case, and integrating with respect to m and then with rel
spect to m , The number of terms in the distribution of m thus
2
3
obtained deponds on n, which is not a small number in any practical
in the joint distribution of
~,
situ3tion. Moreover the solution thus obtained is not an asymptotic
series in which the leading terms could be considered as approximating the true distribution for large n.
The latest paper in historical order of development of the
theory of discrimination is that of Rao
-/-40-7,
in which he devel-
oped some general methods by using the ic1e.as of sufficient statistics and fiducial probability distributions, by using Which, the
discrimination problem can be solved utilizing only the sample information. 'The distribution problems connected with the test
criteria suggested in the p3per have, however, yet to be tackled,
CHAPrER II
ON
1.
p~
ASYMPTOTIC EVALUATION OF A TRIPLE INTEGRAL
Introduction.
The integral with which we shall be concerned in this chapter
is the one obtained from the joint distribution of m , m and m ,
l
2
3
given by Wald ~50_7, by putting Pi = 0 = ~i' 1 ~ l,2, ••• ,p.
For the sake of convenience, we shall refer to this case as the Central
Case or the Null Case.
In this chapter, we shall find the value of
the integral for large values of n, which is equal to N + N2 - 2, by
l
introducing certain simplifications both in the integ~and and in the
domain of integration.
Justifications shall be given for the simpli-
fications introduced, and the final result shown to be a valid
asymptotic apnroximation in the sense of Poincare.
Moreover, an upper
bound for the error involved in the asymptotic approximation will be
found.
2•
~
integral and
~
domain.
The triple integral to which we refer corresponds to
over D
o
elsewhere.
where the domain D is defined by the following inequalities which
13
insure a real, positive integrand in its interior.
(2.2)
D:
The
ine~ualities
in (2.2) show that the domain is bounded
by two right elliptical cones in three-dimensional space having
vertices at (0,0,0) and (1,1,0) respectively and having a common
base in the plane m + m2
l
We
def~ne
= 1.
two other domains Dl and D2 as follows:
° :s ml
O~ml~l
(2.3)
° ~ ~ ~21
ml m2 - m3
?::
~
1
m1 + m2
(2.4)
°
::; 1
0~m2=S1
2
(1-ml )(1-m2 ) - m >
3-
°
Then it is easy to see that D = D + D2 , except for the set
l
of points lying on the plane m + m2
l
= 1, which are counted twice.
The truth of this statement can be seen easily by noticing
that the regions defined by the two domains are the interiors of
two cones, one obtainable from the other by a simple transformation
14
6 ides
of three dimensions.
Moreover, except for the points lying on the
plane m + m
2
l
= 1,
of the plane m + m
2
l
=1
and lying on oppos i te
in the space
the two regions are mutually exclusive because
the point set corresponding to D lies on the origin side, whereas
l
the other corresponding to D2 lies on the non-origin side. The fact
that D and D2 between themselves include all the points of D can be
l
seen by observing that
m~mr,
.L
(2.5)
and
(2.6)
c:
2
- m
3
ml + m2
~
2:
0,
1
>
- m23 2:
(1-m l )(1-m2 )
OJ
2
(1:'m )( I-m ) - m 2: 0,
2
1
3
ml + m2 2: 1
:> m m _ mC:" > 0
1 2
3-
,
~lhere ====~> is read as "imply".
As a consequence of this result, we can find the value of an
integral over D by adding up its values over the two domains D and
l
D2 • The fact that points lying on the plane m + m
2
l
=1
have been
taken twice would not make any difference because they form a set
of Lebsegue measure zero.
3.
Order
~
the variables m ,m and m ,
l 2
3
To examine the order of the variables mI' m and my we have
2
15
first to define them following the original paper of Wald
L-50_7.
For the sake of clarity, therefore, we add the following paragraph.
Denote by S the 2n + 1 dimensional surface in the 2n + 4
dimensional space of the variables u l , •••un+2 , vl , ••• vn+2 defined
by the following equations:
~2
2 n+2 2
Zu = L v = 1
~=l ~ ~=l ~
(3.1)
n+2
~ u~v~ = 0.
~=l
Let u .•. un+2 , vl .••v n+2 be random variables whose
1
joint
probability distribution function is defined as follows: the point
density function is defined by
dS
ps·
s
Then for any subset A of S, the probability of A is equal to
the 2n + 1 dimensional value of A divided by
J
dS.
It should be
noted that the probability density function (3.2) is identical with
the probability density function we would obtain if we were to assume
16
that u , ••• ,u + , v , ••• ,v + are independently, normally distributed
n 2
n 2
l
l
with zero means and unit variances and calculate the conditional density function under the restriction that the point (u 1 , ••• ,un+ 2 ,
v l ' ... ,vn+2 belongs to S.
Variables m , m and m) which are equal respectively to
2
l
P
i:
(3=1
2
'
u t3
r
2
L;
v(3
13=1
can be redefined by using (2.1) as follows:
n+2
2
2
u / L uf3
f3= 1
13=1 13
P
m =
l
r
P
2
m2 = L vf3
13=1
(3.3)
n+2
/
2
v
13=1 13
p
/
m = r u v
3 (3=1 13 13/
I:.
/
V
(n+2 ·2 n+2 2
l. vf3
L u
13
13=1
f3=1
where p is the number of variables and n
=N+
,
lIt
r.::
-
2 is the number
of degrees of freedom.
With this explanation about the variables entering into the
discussion we shall prove the following theorem:
Theorem (1).
The variables m , m2 and mj defined in
l
(3.3) in
terms of u. and v., i=1,2, ••• ,n+2, which are N(O,l), variates; are of
1
order n-
l
1
in the probability sense.
17
Definition.
We write X = 0 L:f(N} 7, and say that ~. is of
N
P
I IIJ
probability order 0 L:f(N)_7 if for each
A€
N
>0
such that L:P
€
,
> 0 there exists an
I xN I ~ Ae f(N)_7 ~ l-e for
all values of
> NO(e),
Proof of the theorem:
Since u. and v., i=l, ••• ,n+2, are all independently, normally
~
1
distributed with zero means and unit variance,
p-l (I-m. )n-p+1 dm., 1=1,2
(3.4)
m
i
1
1
since each of m and m is of the form
1
2
Thus
E(m )
i
= -Rn+2
and
p(n-p+2 )
=
(n+2}2(n+3)
o (12)
n
and by Tchebycheff's inequality, namely
(3.6)
P
it is immediately seen that for given
that
€
there exist k
l
and k 2 such
18
Hence m and m2 are of order ~ in probability.
l
To see that m is also of order ~ in the probability sense,
3
we note that
n+2
L::
2 n+2
13=1
u'
(3
P
Y:
u
L
2
v·
13=1 13
But
(3.10)
therefore
P
2
L v
,8=1 13
n+2 2
2
13=1 13
n+2-2
L u
13=1 (3
E v
13=1 13
(3.11)
From this, by noting that m
l
that
= 0p (!)
n
and m
2
= 0p (l),
n
we conclude
1
m = 0 (-) •
p n
3
4.
An important limit.
In this section we shall prove a result which will be he1p-
ful in finding asymptotic values of triple integrals of the type
19
iff
.
Im,ml
m VIm V2m v3
1
2,
m31
m2
a
Il-ml
m3
I-m2
m,
I
b
D
where b is a large number.
Theorem 2.
"
The result can be stated as
If m , m ,
l
2
m, are random variables as defined
each depending on n, and each being of order n
-1
in
in probability,
then
pUm
(4.2)
1
n
Proof:
->
We shall replace ml , m2 and
The variables
ity.
00
0, ~
and
r are
This means for a given
m, by ~, ~ and *respectively.
therefore of order one in probab11€;
there exist numbers N(e), Ale' A2€
and A , such that
3e
for n
> Ne
-
If in (4.3) each of Ale' A2 € and A € is replaced by Ae = max (Ale'
3
A2e , A,e)' the inequalities will still hold. In terms of these
variables we have to show
plim
2
n
(4.4)
n --i> 00
n
7
_ ~+13 + OB-r
n
2-
e -a-f3
=
1
20
".
./
To show this we consider the;funccion
(4.6)
g(a,~,y)
= log
f(a,~,y)
and expand it by Taylor's Theorem, with a remainder after two
terms, namely
(4.7)
g(o:,t3,y) = g(O,O,O) +
+ (ex
o
(a.~ + f3'£0
cti + f3 ~ +
... 2
1;Y)
+ y
~
)g(o,O,O)
g(G,4>,r),
where
°
< f3
O<W <y
<4;4>
We have
g(a,~,'Y) = a + ~ + n log £1 _ a~t3 + a~-~
n
so that
2
_7 ,
21
o
1.I3/n
~=l- ... .
~) (1
(1 -
og
di3 = 1
2
~)
-
-
,
r2
n
a:
1
n
- ------~2
7
( 1 _ ~)(l _~)
n
n
--2
n
2
~)
n
- 1:.(1 -
Also
n
,
~2
o g
dt)2
ln
-
2
(1 -
dr2
,
=
2
Ii(l
2
d g
~n
g
n
- l.r;<::-7
- £) (1 - f?)
n
n
n
If _ £ _ ~ + 213+ 7
k~
n
n
2
n
2
7
2-
=--------2-2
I( 1
-
~) (1
-
~)
-
12 _7
n
,
22
2
rl
de;
- Y
:t.
/n-'
-~ = - - - - - - - - - - - OCOfJ
d2 g
(1 _~)( _ 2~)
n
_ ....;...
c
_
~n~
~-
,
.
(1 _ £)( _ _
21')
n
2
....;;;n;.....-
d2 g
~7 -
dg
dg
Now g, di ' diB
"1>
2
2
ex
f3
"I
£(1 - -)(1 - ~) - {? 7
n
n
n·-
'
dg
or
are all zero for (a,f3,I')
= (0,0,0).
Thus (4.7 ) gives
(4.8)
where the value of each of the derivatives involved is calculated
23
(4.9)
ci
1
R =
2
g
Ll - -n
r'
'!rt::41
- -n + Q<I>- 2_
$2
)
. 2n( l - ....
n
2
7
- ~{l
2n
41 2
- -)
n
+
n
2
_t )
1(1
n
n
Using the inellualities of
(4.3) in (4.9) we get
and
(4.10)
2
(1 - Ii" - -2 )
2A
2
A
n
in the probability sense;
where the two expressions inside the
brackets in (4.10) are calculated from (4.9) by
fact that R may be positive or negative.
2
c~nsidering
the
Since A is finite and
independent of n , therefore both the expressions tend to zero as
n tends to infinity or
PUm
n
R
2
>
= 0
00
24
Hence
n
L
That is,
n
_ Ci+f3 +
Cf,f3-y21
n
2
n
2
n
n
is asymptotically equivalent to
7 = Plim
2-
PUm eo.:+t.3 Tl _ a+f3 + et(3-1
R
e 2 = 1
the pro ba bOlot
Note: An alternative proof of the
e ~-f3 ~n
~
1 ~ Y sense.
statement (4.2) shall be provided if we are able to prove that
(4.12)
fn
plim
c
h
log (1 - -n + - 2 +
n
n:->co
where c stands for Cf, + (3
J
and h
= Cf,f3
2
- y and Ct,(3 and yare res-
tricted by the conditions (4.3).
It is easy to verify that
(4.13)
-x ~ log
n:x
x
(1- ii) $
x
x3
2
n
2n
in which the lower bound is written by observing that
- log (1 -
Ii'x ) :; nx +
x2
x3
x
~ + ~ + ••• ~ n +
2n
3n;)
x
(n)
3
2
+
x
(n)
+ ••• ,
~'5
and both the -limits coverge to zero because of the restrictions
(4.3).
5.
This proves the statement (4.12) and hence the Theorem.
~
triple integral.
In this and the remaining sections of this chapter, we shall
confine ourselves to the study of the integral
(5.1)
I =
ffJ
D
where D is defined in (2.2) • To find an asymptotic approximation
to the value of I we first write
I
= 11
+ 12 '
26
where 1 and 12 denote the values of the integral over Dl and D2 1
To find 1 , we shall first evaluate the integral
1
IfJ
and then find an upper bound to
that is, an upper bound for the error committed in replacing the
n-p-l
2
the factor ~(l-ml)(l-m) - m
3
_7
2
in the integrand by
n
- -(m + m2 )
e 2
1
Using (5.3) and (5_4), we can state that
It will then be demonstrated that both the error committed in
approximating
11 by III and the value of 12 are negligible as com-
pared to the least possible to value of I - Mathematically
1
lim I
E
._ E
11
n->oo
=0
27
and
lim
n
-~
I 11 - E
co
=0
•
As a consequence of (5.5) and (5.6) we can write
(5.8)
and as a result of (5.1), (5.7) and (5.8) we can write
(5.9)
This will be the general line of argument to be followed in
obtaining an asymptotic approximation for the value of I.
It would appear from Theorem 2, proved in section 4, that
the approximation e
n
- 2'(m l +m2 )
2
for Ltl-m l )(1-m2 )-m}-7
n-p-l
2
is
valid only in the domain D*C:D which is such that throughout D*
l
the variables m., 1=1,2 and 3 are 0
~
(!).
p n
We shall, however, work in
terms of the division D and D of the total domain and use the ex2
l
ponential approximation over the whole of D because of certain siml
plifications which result.
Justification of the results thus ob-
tained is prOVided by two factors:
(1)
The integrand shows that almost the whole of the density
is concentrated in that part of the domain D which is close to the
l
origin.
In fact, if we define a domain D*/
LD by the inequalities
m m2 - m,
l
2
>
0
ml + m2
<
A
(5.10)
-n ,
then it 1s shown in section 11 that D* contains almost the whole of
the density.
This is probably the main reason why the exponential
.x-
approximation, which is true over the domain D , gives close results.
(2)
The discussion on the upper bound to error given in
section 7 actually proves that the loss of accuracy in using
n
e
- '2(m l +m2 )
n-p-l
2
-2instead of ~(1-ml)(1-m2)-m)-7
is negligible when
n is large.
It may also be remarked at this point that the exact value of
I 1s known from Sitgreaves ~45_7.
fore, that the
as~~ptotic
It would appear obvious, there-
value of I could be obtained from the
one given by Sitgreaves by using Stirling's approximation to r(x).
This would no doubt hold true provided we were interested merely in
the asymptotic value of I.
The
reason, however, for our following
an independent approach is that we are interested in finding the
solution to a distribution problem.
'l'he techniques
~nd
simplifica-
tions used in' the asymptotic evaluation of I, which emerge mainly as
a result of the supposition that n tends to infinity, will be used
in evaluating the limiting moments of a certain statistic, to be
called Wald's approximate classification statistic.
This distribu-
tion problem will be our subject of discussion 1n chapter III.
6.
The integral over Dl1 an asymptotic approximation.
(6 .1) Let Ii •
Iff (
"'1m2
-m~)
p-3·
2 lfl-"'J.)( 1-m,J -m;"7
E.:f!
d"'J.dm2dm3,
Dl
where Dl is defined by the following inequalities:
2
m m2 - m
1
3
m1 + m2
(6.2)
and where m
1
= 0p (1:.)
n
>
0
<
1
1
0
~
m1
.<
0
~
m2
< 1
j
for 1=1,2, and 3. We shall replace the second
n
- 2(ml +m2 )
factor in (6.1) b y e , but the operation of integration
after this replacement needs some justification. There is no loss
of generality if we consider a similar univariate case and prove
that it is possible to replace a binomial raised to a large power by
an exponential factor to which it increases. We shall state this
result formally as
Lemma.
Let rex) be a function of the real variable x, such that
rex)
Then
< A
if 0
=:; x
~
c,
where 0 < c <
00.
30
jC
. (1
(6.3)
o
lim
n
~>
00
c
~
x n
- -) f(x) dx
n
e -x
=
1.
f(x) dx
Proof
(6.4)
Let I
=J0
d
Then (6.3) states that lim
c
- e -x_7 dx.
F(1 _ ~)n
n
I I d I = o.
->
n
f(x)
00
It is known that
o <-
e
-x
2 -x
x)n
- (1 - -n <
x e
n
(see, for instance, Whittacker and watson ~54_7, page 242).
There-
fore, using this and condition (1), we get
x2 e -x
(6.6)
n
<
x2e -x
n
This shows that for all c
(6.7)
The quantity on the right hand side of (6.7) is positive and tends
to zero as n increases.
Hence (6.3) is established.
We shall rewrite (6.1) as
31
(6.8)
and evaluate Ill' an asymptotic approximation to I l • This will be
followed by a Section on an upper bound to the error in using III
in place of I •
l
To integrate with respect to m we first put
3
Thus
(6.l0)
Integration with respect to t~gives
if
(6.11)
Making the transformation
(6.12)
ml
m
2
=Z
=z
cos
2
9,
rj
sin'- G,
we have
d{m m )
1 2
------=
d(z,9)
2z sin G cos G ;
and so (6.11) becomes
2/
(6.13)
z=o
rr/2
J
n
z
p-l
e
- -z
2
p-l
p-1
cos 9 sin 9 dzd9.
9=0
Integration with respect to 9 yields
(6.14)
r(~)r(P;l)
I
11
=----
and putting ~z
c
r(~)
=t
r(~)r(~)
rep)
we obtain the form
Now for large n it is well known that
(6.16)
(6.17)
This further simplifies to
(6.18)
1
f
z=o
n
- 2Z z p-l dz;
e
33
To be more exact we can write
n
co
"2
(6.19)
i
e -t t p-l at
=j
00
e-t t p - l dt
-J
e- t t p - 1 dt ,
n
~
c.
and successive integration by parts shows that the right hand side
of (6.16) reduces to
(6.20)
2
n) P - 1 1
nP r(p)-(2'
n/2-2
e
p-l
n/2-"·'
e
p-l
which can be written as r(p) - O(~)
nh
e
n
.
S~nce -
p-1
e
n/2
tends to zero
as n tends to infinity,
4n(p-2)t (l+~) •
(6.21)
nP
We will show in Section 7 that t he error in taking
an approximation to the ~~irie 'of II
is negligible
'in
n
III
as
comparison to
the v.alue>;·of. Ii"'"
f
It may be remarked in
passiu~
that the integral occuring in
(6.11) could also be evaluated by usin g Dirichlet's formula ~54,
p. 258_7 namely
J
0' -1
J ... f
t n
n
f(t 1t2 •••+tn)dt 1 •••dt n
1
(6.22)
n
r.
r(O'l) r(0'2)
-= r(Q'l + 0'2 +
7•
An Upper Bound
.!.£
0'.-1
i=l
J.
f(z)dz •
~rror.
In this section we shall consider the following problem:
How much error is committed by replacing
n-p-l
2
by
in the tripe Dntegral (6.1) over the domain D ? We shall consider
1
two separate cases,
(A)
and
(B)
>
and find an upper bound to error in both cases. 'The larger of these
35
shall ultimately be taken as the upper bound.
~
! ..
Let I
d
. dEmote the difference
1
n-p-l
I d will not decrease if in the factor ~l - ml - m2 + ml m2 - m~7
1
m Tm2
2
l
we omit m and replace m m2 by ( 2
l
3
).
2
Making these changes, we get
The variable m can be integrated out by using the transforma3
tion
,,
36
Using (6.22) in the double integral involved in (7.3), we get
j
1
r(p-l)r(E)
2 .
r( p2 )
t=O
(7.4)
p-1J.-(1 - 2"z)
z
... !!. z
- e 2 _ 7'
dz •
n-p-l
Replacing z/2 by t, (7.4) becomes
t=o
The expression
(1 _ t)n-p-l
- e
-nt
can be written as
(7.7)
(1 - t)n-p-l _
e (n-p-l)t
•
e- (p+l)t
,
can
be written as
(7.7)
(1 - t)n-p-l _ e(n-p-1}t
• e(p+l)t
1
which can further be written as
(P+1)t'+
31
••.
-7 ;
and using the fact that 1 - Y ~ eYwe find as a first approximation
that
(7.9)
(1 - t)n
l
l
_ e nt < (1 _ t)n _ enlt + e nlt (p+1)t •
This reduces to
(7.10)
(l-t) n' _en t <en 't (p+l)t
by using the well known inequality
Using (7.10) in (7.5), we get
38
1
iP~
(7.12)
t P (P+l)e(n-p-l)t dt .
t=o
The integral in (7.12) can be simplified by replacing
(n-p-1)t by wand extending the upper limit of ,integration for w
to infinity instead of
n-~-l
This will on1y increase the upper bound for I d
,and we get
1
a simpler result, namely
00
p
w e
w=o
On simplifying (7.13) we obtain
(7.14)
<
I
d
1
-
4n(p+l) t
(p-l)
1
(n-p-l) p+1
'
which 'gives an upper bound to error in case A.
-w
dw.
39
Case B. Now suppose
and let
Omitting the factor m m2 -m;, which is non-negative in the
1
domain of integration, one can write that
(7.16)
Integrating out m by the same transformation as was used
3
in Case A, we get
40
jf
(7.17) I *d <
1
Using (6.22) on the dnuble integral involved in (7.17), we
obtain
1
(7.18)
* <
Id
1 -
r(~) re
P;l)
r(¥>
rep)
n
)
zP-1L-e
- -2 z _
n-p-l
(1-z) - r
dll •
z=o
vJrite
n
(7.19)
- 2'z
e
n-p-1
(1 - z)
2
-
= e
~
2
n'z
-2
e
n'
2'
(1 - z)
,
Where n' = n-p-l • Then
(7.~0)
_ ~z
n-p-1
e 2 _ (1 - z) 2
n1z
= e-
~L-1
-
pt1
~Z
C.
+
nf
'2
(1 - z)
41
Since e- Y < 1, as a first approximation, we can write
-
(7.21)
nz
n-p-1
; 2
_ (1 _ z)
2
and the use of (7.11) gives
n-p-1
n
(7.22)
- '2z
e
2
- (1 - z)
,
Replacing n I by n-p-1, and using this result in (7.18), we get
1
<
r(~)r(~)r(~)
r(p)
I
n-p-l e
pl-l
z
•
-r
-n-p-l
-z
2
dz
z=o
We put n-p-1
2 z
= w to
get
n-p-1
j
2
wP<-l e-w dw
w=o
Extending the range of the integral to infinity and integrating, we
get
42
*
/
4-
I dl
(
2 ) p+l :It ( p-cr),• r{p+2)
n-p-l
';")p-2
_ r (p)
The larger of the bounds I
d1
8 (p+l)'
= _..;.;,:rr~,--~'_~
(
) {n,·p-l
,
) pi- 1
p-l
and I d*' namely
1
(7. 2 6)
can therefore be taken as an upper bound to the error involved in
replacing the factor raised to a high power in the integrand by an
exponential factor.
It may be remarked here that (7.26) gives only
a first approximation for the upper bound to error, and that a
closer bound would be obtained if we considered four terms in the
expansion of e-(p+l)t in (7.8), and three terms in the expansion
p+l
of e
- Tz
in (7.20).
Needless to say, we can get closer and closer
bounds by considering a larger number of terms in (7.8) and (7.20).
It should be noted that the result (7.26) enables us to put greater
confidence in our approximation of the value of I, which is of order
1
- ; because (7.26) asserts that the maximum error committed by
nP
supposing that 1 1 is approximated by III is of order
1
p+l;
and
n
therefore negligible for large values of n.'
The bound 8:rr(p+l)l
p-l
R
D
1
(n_p_l)p+l
can be rewritten as
--!np+l
.
by using the inequality
1
(n_p_l)pi-l
<~
for large n.
npi-l
Thus
(7. 2 7)
"Error
<
l&c(pi-l)~
p-l
n
1
p+l
R
D
=
n
1
pi-l
II
say.
As a result of the discussion in Sections 6 and 7 we can write
a formal proof of
Theorem 3.
Proof:
From the results of Sections 6 and 7 we can write
(7.28)
where
I D I < RD
J
1 .
1
=
l&c~rl)t
P
and
lim:. ( en ) = 0
n
->00
Multiplying both sides of (7.28) by nP , and taking the limit as n
tends to infinity, of the right hand side in
4n(p-2) t +
the truth of Theorem 3 is established.
8.
The integral over D •
2
4n(p-2) t en
44
We now consider the integral
(8.1)
where D2 is defined by the inequalities
(8.2)
m + m2
1
>
1
0
~
1
0
-< m,_,
-<
<
-
m1
c:
1
To integrate (8.1) with respect to m , we make the transforma-
3
tion
1
m
3
= L(1-m1 )(1-m2 )t_7 2'
t
-
1
'2
dt.
Then
If we notice that
f(c)
f(b)f(c-b)
F(a,b,c,x) ='
j
o
1
Gb - 1{1 _ G)C-b-l(l • Gx ) -a dG,
provided that
I x I < 1, we can rewrite (8.4) as
This step is justified by the fact that
(1-m )( I-m )
2
l
m1m2
in the domain under consideration; except that on the surface of
the plane m + m2 = 1 we have an equality sign in (8.?). But the
1
omission of the point set determined by the plane m1+ m2 = 1 does
not alter the value of I rc , since it forms a set of measure zero.
(8.8)
Since F(a,b,c,x)
= 1 + ~b
x + a~!;+l~~(b+l)
2
x +
where, for the hypergeometric function involved in (8.6), we have
which the rth.term is
1
of order -r1 • Hence for asymptotic purposes
n
46
the first term in the expansion of F(a,b,c,x), namely unity, will
provide a reasonably good approxtmation.
With these considerations in view, we can rewrite (8.6) as
jf
. J·
p-3
- 2(n-p+2)
Each of the double integrals involved in (8.9) can be changed
to a repeated integral and evaluated. As an example, we consider
the first one, namely
(8.10)
This can be written as
(8.U)
m =0
1
m2 = I-m1
Integration by parts shows that
±'
(8.12)
p-7
p-3
p-5
2
n-p+2 • n-pt4 • n-p+6 • (1-m1 )
2
n-p+6
~.
2
+ •••
Using (8.12) in (8.11) we get the series:
(8.13)
+ ~
p-5
n-p+2 • n-pt4 •
Writing the values of the integrals involved above, (8.13)
gives rise to the series:
(8.14)
r(~)r(~)
2
n-p+2
r(n}
p-3
p-5
2
r(n+ 5 )r(n - 5 } ;
2
2
+ n-p+2 • n-pt4 • n-p+6 • - - - - - + •••
ren)
48
Similarly we can find the series expansions for the values
of the remaining double integrals involved in (8.9).
Using (8.14)
as the value of (8.10), and similar values for the other integrals
(8.9) gives
_
O( 1 )
3n
n 2
(8.15) 1 2 = _ .....;.;.:J'(.l.(n;...-...=.p...r..)..;..t_n_p
(n-2) t(n-p+2)2 -
Use can be made of Stirling's approximation to the value of r(x),
namely
1
(8.l6)
r(x) = e -x
XX
Z L-I +
I~X
+ ...
Equation (8.15) shows that the principal term in the value of 1
is of order ~, where the actual value of,I
nn
n <:::2--.
principal term by terms which are of order
2
1
2
differs from the
~
and higher.
n 2
9.
An upper bound to the value of 1 ,
2
We can start from (8.6) and write
p-3
n-p r{!)r(n-p+I)
2
2
2
(m1m)
L(l-m l ) (l-m,,;)_7
--~~2
2
~
~
r(n-r
)
where it is known that
(l-m ) (1-m2 )
l
m ffi2
l
(9.2)
<
L
The maximum value of the hypergeometric series involved
will correspond to the case in which
(l-m )(1-m )
2
l
ml m2
= 1,
that case, using the formula
,
r (c ) r (c -a -b)
F(a,b,c,l) = r(c-a)r(c-b)
Since
(9.4) can be rewritten as
j}
Transformation
(9.7)
2
ml
=z
cos
m2
=Z
. 2 9
SUI
9
,
we get
and in
reduces (9.6) to
I
<2
2 -
2
r(~)r(n;2)
,J
~
Co
r(n-l)
T
z=l
j;
z
p-2
(l-~)
n-p
p-2
p-2
sin 9 cos 9 d9 dz •
9=0
By using the formula
1f
)~
a-l
sin 9
b-l
cos G d9
o
1 r(~)r(~)
:0
-2
for
b
r(a~ )
integration with respect to 9 , (9.8) reduces to
2
j
p
zp-2(1 2'z)n- d z.
z=l
This inequality is the same as
1
j
(9.11)
wp-2n-p
(l-w)
dw.
1
2
w=-
Observing that
/
1
'2
we have
p-2 (l_w)n- p dw
w
=
1
_ p-2
2n-l(n_ptl) n-ptl
j
1
'2
1
wp - 3(1_w)n-ptl dw,
51
f(P;l)r(P;l)
(9.12)
r(p-l)
1
n p
2 -
(n-p+l)
Inequality (9.12) shows that we can find a number RD such that
2
Slight simplification would indicate that if n is so large
that Stirling's approximation for r(n) is valid then R = 16
D2
would give a liberal upper bound.
10.
Comparison of I
l
and I .
2
In Section 8 we proved that
(10.1)
where c
is some constant, and in Section 9 we established that
(10.2)
A comparison. of these results with the value of I
(10.3)
l
namely
where I
nI is
a certain constant less in absolute value than another
known constant which is independent
(10.4)
o~
n? ShOWB that
lim
n->
00
n
This statement follows from the obvious fact that 2
infinity more rapidly
than nP where p is finite.
tends to
This means
that the relative contribution of the domain D to the value of
2
the Integral I carried over D is negligible in the limit.
Theorem ()+).
Proof.
From (5.2) I = II + 12 ,
Usin.g (10.4) we have 1...-1
from theorem 3, Il~ ~(p-2)~ • Hence 4n(p-2)t
n
p.
n
p'
1
and
which can also
be written as r(~)r(P;l)r(~) (~)p, is an asymptotic approximation
to the value of I.
As a further check of the correctness of our approximation
Ill' we can compare it with the exact value of the integral
referred to in Section 5.
(10.6)
That value can be written as
4n(p-2)t
Is
= n(n-l) ••• (n-ptl)
53
where the subscript's' is for the author of the formula.
Compar-
ing it with III written in (10.5), we have
n
->
n(n-l) ••• (n-p+l)
nP
lim
lim
n
00
->
00
=
1.
Hence our approximation is asymptotically equivalent to Is' the
-7.
exact value, in the sense of Poincare ;-13
-.
11.
The integral over the domain D* .
Domain D* was defined in Section 5 as that subset of D in
1
which mi
= Op(~) for 1=1,2 and 3. Since -
Jm1m2 ~ m3 ~
vmlm2
one way of characterizing this domain would be to say that D*
corresponds to the inequalities
(11.1)
2
>
0
o ~ m1+m2 <
A
n
mm
l 2 - m3
,
where A is a finite number, independent of n.
We can evaluate the integral over D* as follows:
-
Let
(11.2)
Integration with respect to m by the usual transformation gives
3
'
*
(11.3)
I
r(~)r(P;1)
=---r(~)
, and
Putting
and integrating out 9, we get
A
jn
(11.4)
n
z
p-l
e
- -z
2
dz
z=o
Substituting w for ~ z, (12.4) can be written as
A
*
(11.5 )
I
=
r(~)r(~)r(~)
j
r(p)
2'
p-l-w
e
dw.
w
w=o
Thus
(11.6)
r(~)r(P;l)r(~) 2 P
*
I
=
r(p)
(ii)
00
j
£r(p)
7
wp-l e -w dw_,
A
2'
which on further simplification gives
00
(11. 7)
I*
= 4rc(p-2) t
nP
)
A
2'
wp-l e -w dw
55
this vulue with the exact value Is we have
Co~paring
00
*
I
r
n -> 00 S
lim
(n.8)
= 1
- ...,..-1-::'"'('-:"
(p.,l)l
j
wp-l e -w dw,
A
:2
which is also =
I
lim
n
-->
*
III
00
Since A might be a large number though not of the order of
.'1,
tile
term
00
1
(11.9)
(p-l}l
j
wp-l e -w dw
A
2'
shall be small compared to 1; e.g., for p=3 and A=200, we get
J
00
1
(p-l)l
p
w
w - 1 e- dw = e-LOOflo02+ 200 + 2_7 ,
A
2
which will give a small fraction, and the fact is established that
almost the whole of the density is concentrated in the domain D*
near the origin.
~s
Equation (11.8) would indicate that even for A
small as 10, and p
cent of the density.
=3
say D* accounts for more than 99 per
In practice, however, A can be taken larger,
consistent with (11.1).
Another point needing clari£ication is the use of the
exponential approximation over the domain .. D - D* • At this
l
stage the justification is provided by the upper bound to error
n-p-l
n
involved in using e
2
2
instead of Ltl~ml)(I-m2)-m3-7
- '2{m1+m2 )
inside Dl , which was worked out 1n Section 7.
.
The upper bound to
~
1
error for Dl was found to be -p+T.
A closer bound can be worked
n
out for the domain D-I - D*, and it can be shown that it is a
constant times the same upper bound multiplied by an integral of
This upper bound can be obtained by following the same
lines as those followed in Section 7.
57
12.
Summary of Chapter II.
In this chapter we have considered the asymptotic evalua-
tion of the integral
(12.1)
1=
ffi
D
where D is determined by the fact that both factors involved in
(12.1) are non-negative, and 0
< m.1 .<-
-
1, i=1,2.
Two simplifica-
tiona used in the evaluation of I are:
(1)
D can be split up into two domains, Dl and D2 , by the
plane m + m = 1. The contribution due to the domain D , for
2
2
l
which ml + m ~ 1 is negligible in the limit, in comparison with
2
that of D •
l
(2)
The integral over D is evaluated by replacing the
l
n-p-l
2
2
fa~tor ~(l-ml)(1-m2)-m3-7
n(
by e
- "2 m +m2
1
)
• The justification
for the approximation thus obtained is prOVided partly by the
probability order of the variables, and partly by the bounds to
error found SUbsequently.
With these simplifications it is proved that
(12.2)
,
and that the exact value of I can be written as
58
(12.3)
where the second term is the remaining contribution due to Dl ,
and the remaining terms give the integral over D2 • Bounds have
been found, (7.27), for J
D1
and, (9.13), for the integral over
D2 • These have been shown to be negligible as compared to the
principal term in the value of I, giving 4n(p-2)t as an asymptotic
nP
apprOXimation to the value of I.
CHAPTER III
ON THE ASYMPTOTIC DISTRIBUTION OF 1rJALD'S CLPSSIFICATION
STAT1ST1C IN THE NULL CI1SE
1.
Introduction.
We are dealing with the problem of classifying an individual
into one of two groups or
popula~ions
such that the information re-
garding the two populations is based on two samples of sizes
N2 respectively.
N and
l
One may be called upon to consider the following
three situations:
(A)
N
l
and
N2 large,
(B)
Nl + N2 or n( = Nl + N2 - 2) large,
. N and N small.
(C)
2
l
The study of case A is equ~valent to the study of a linear
function of normal variates, that is, treating the statistic U, defined in Chapter I, or the linear discriminant function, as normally
distributed with means and covariance matrix replaced by their sample
estimates to get the mean and variance of the approxtmating normal
distribution.
This case has been completely exploited by several
workers in this field.
The results available in case
Sections
4 and 5 of Chapter
I.
C have been summarized in
The difficulties involved in obtaining
the exact sampling distribution of
joint distribution of
~'~2
and m being substantial, it makes sense
3
61
to ask wheth€.I' it would re possible to get the distribution of
case R.
V in
Obviously the results obtained would not be as exact as one
would like to have, but they should be better than the large sample
normal approximation of case
A.
It is thus in the sense of large
n
that we shall use the words "asymptotic ll and "limiting", and it should
be noted that the assumption n large is
less restrictive t han the
assumption
N
and N2 both large.
l
In this ch8pter we shall find the asymptotic moments of a
statistic
v which will be called l.Jald' s approximate classification
statistic, and then,use those moments to find the limiting distribution of
v, in the null case, separately for even and for odd valli es
of p.
2.
Waldls approximate olassification statistic and its moments.
From Chapter I we recall that
~ald
expressed the statistic ulti-
mately as a function of three variables, and stated that
Vo:
can be considered as the classification statistic.
by section 2 of Chapter II.
Kolmogoroff
L- 25_7,
Thus, by a convergence theorem due to
the distribution of
by the distribution of
V can be well approximated
nm as stated by Waldo
3
There is no loss of
62
generaity in considering
as the statistic instead of nm . 1nTe shall refer to this as the approxi3
mate classification statistic ofWald, as against the exact statistic
V suitable for small samples.
2A.
Limiting moments of the statistic.
, the kth moment about the origin,
k
we shall discuss briefly the value of the integral
riS
a first step in finding
'V
p-3
(2.3) I(k)=
n-p-l
]/'f nm3 k(~m2-m;) 2L(l-~)(1-m2)-m; _72-d~d~dm3 •
D
If we recall the discussion about the domain
D from section 2, Chap-
ter II, it can be easily seen that the integral can be written as the
sum of two integrals over the interiors of the two cones defined by
D
l
and
D2 • Thus (2.3)
can be written as
(2.4)
where
Iik ) and
I~k) denote the values of the integrals over the two
cones
D
l
D •
2
and
Define
By the procedure followed in Section 6, Chcpter II, we get
63
(2.6)
which, for k :: 0, gives
III of Chapter II.-
By following the methods of Section 7 and 9 of Chapter II, we
can show that the upper bound to the error in estimating
Ii~)
is of order
n
~+1'
k
Ii )
and an upper bound to the value of
by
l~k)
is
Thus we can write
where
(2.8)
and
•
It should be noted that it is the upper bounds to, and not the
exact values of,
In'
1
and
I
D2
that are known; and to avoid dup1i-
cation in their derivation, since they are obtained in exactly the
same way as similar bounds were found in Chapter II, we write the resuIts.
e.
(2.10)
They are
64
and
r(~)r(¥)r(I9)r(£~)nk
(~ .11)
I D2
~ r(n+~-l)r(P_l)
2n+k - p(n_p+k+l)
It is easy to see from (2.7), (2.10) and (?ll) that
In
lim
n ->
00
1
min I(k)
=0
,
and
ID
lim
n ->
showing thereby' that
In
1
2
00
and
= 0 ,
min r(k)
In
are negligible in comparison with
2
the principal term in the value of I(k) •
Dividing (2.7) by III we get the expression for the asymptotic
moments, namely
(2.1?)
where
. nP+ k 2k +1
• (n"!J"iol)k+p+i
and
65
We shall rewrite (2.13) and (2.14)
as
and
(? .15)
,
where
(2.17)
Rl
(k,n )
p+k+1 k+l
2
k+p+l 1 P ,
(n-p.l)
r(2)r(~)
)
) n
k+l k+p
= r(T)r(T)(k+p
(k+p+l
and
p+k+l
n
• (n-p+k+l)
(2.18)
•
We will also write (2.12) as
(2.19)
•
We will refer to
the principal term in the value of
as
V
k
, be-
cause, as can be easily verified,
R (k,n)
(2.20)
n1
lim
n ->
(~ .21)
lim
n ->
,.../
00
v
k
R (k,n)
n2
"....
00
v
=0
=0
0
k
To conclude this section, therefore, we can state that (2.19)
gives the
kth moment
v
k
; and, because of (2.20) and (2.21), we
6(
can write
(2.2?)
3.
The asymptotic distribution of
v
p = 2m •
~
In this section we shall find the asymptotic distribution .f
v for even values of p.
By applying the general result we shall
also explicitly obtain the distribution for
Lemma
p = 2,
4 and 6.
5.1.
and
r(~)r(£f:)
f or large n ,
r(p-l)r(~)r(%)
Proo£':
The maximum value of
r(%)r(~)~ ~ -1:, thr:rl)!or;
th')
n
•• :l" 2 fQr n.> 2p
(n-p'-l) . '.
~th -of (3~1.) is
To prove (3.2) we consider
-
+ 2, and
a.,;,tablished.
67
r(n-2)
2"
r n+k-l
~
-. no t e th at n-p+k+l
1
We fh-au
for all large
n
and
n
2
n+k
p+k+l
(n-p+k+l)
~+k
<3
n for a11 1 arga n; and s 1noe -2nn+k- l < 1
<
1 for all n ~ 5, (3.2) follows.
Lemma 5.2
The series
and
are both convergent.
Proof:
Let uk
denote the kth.term of the series.
For the series
(3.5), we have
The ratio
(3.8)
Using Stirling's approximation to factorials, we have
..
e
68
lim
k ->
00
uuk
k+l
:=
lim k~l
k ->00 4
(~).
This simplifies to
lim
. k _>
"1
uk
u k+1 =
00
2t
00
The ratio test stat es that the behavior of a series
Z uk is deter-
k=O
mined by the following formula:
(3.11)
~ ::::::8:0:::::6:r1fo : :
Uk
If lim
---- = c
k ->00 ~+l
I
Series diverges if
,
1
,
c < 1
"-
Application of this formula shows that series (3.5) converges if
t
< 1/2.
Consider now the series (3.6).
Since
r(~)r(¥)
(3.12 )
r(p-l)r(~)r(~)
k+l
r(~)
2
T
o
lim
k ->
00
so that the series (3.6) converRes for all values of
therefore, we can say that for
t
1
< ~
t.
In particular,
both the series (3.5) and
,
69
e
(3.6) are convergent.
In
there are three error terms if we approximate
by
.-v
• Since the
k
other two are negligible in comparison to the upper bound to RD (k;n),
~k
v
1
it will be enough to consider the contri~ution of this to
moment generating function of
v
¢(t) the
k•
1ATe define
(3.16)
then by (2.19)
where
e- n
is the contribution due te other error terms and is easily
seen to be an infinitesimal of an order higher than that of ~.
n
By virtue of Lemmas 1 and 2, we write (2.17) as
1 00 t k
!0(t)
¢(t)i < - Z --k Rl(k,n)~~
I
I
nk=O.'
n
uniforml V for all \ t
I
<
ITO i
<
1
~
Therefore by Paul Levy's theorem~9, p. 96_7
(3.20)
,
..
70
tt
where
F (v)
denotes the sequence of cumulativa dist:' iliution functions
1
n
corresponding to
¢n (t)
,,-.../
, and F(v) corresponds to ¢(t).
Thuswe
have proved the following theorem:
Theorem ,.
If
F
n
(v)
corresponding to
is the sequence of cumulative distribution functions
for large values of n, and F(v) is the distrik
bution function corresponding to 'V , then given e, there exists
an Ne ,
~uc~ that
v
I nF(v) - F(v) I
k
< e
for
n > N
e
Theorem 6.
When
p, the numher of variables, is even, the asymptotic dis-
tribution of
v is given by
f(v)dy =
where
m
Z b. f.(v) dv
j=l J J
,
2m = p,
1
j-l -j
'F[J) v e
f. (v)
(J .24)
J
v>O
,
=
0
otherwise
and where the b .'s are suitable constants depending on m.
J
Proof:
(J.26)
Let p = 2m.
71
·
e
On exp!mding the right hand side of (3.26), we get
-::"
=:
vk
(k+2m-2)(k+2m-4) ••• (k+4)(k+2)
2m-l
•
k~
rrmJ
The moment generating function for the corresponding distribution is
given by
00
¢(t)
)
~
k=O
tk.v
i(T))k
'
0
or
z
(3.28)
L
(k+?m-2)(k+2m-4) ••. (k+4)(k+2)
f(m)
2m- l
This can be rewritten as
.
dm- 1
(3.29) ¢~)
(t =c 1 -----1
m-
where
dtm-
c c ••• c _
O1
m1
Zt
k+m-1
+c
m-
2Zt
k+m-2
d k+l
k
+••• +c 1 --dt L:t
+cozt
,
are constants depending on m and these are ob-
tained by comparing the coefficients of like powers of k in (3.28)
and (3.29).
The uniqueness of the solution for
cOc
••• c _ follows
m1
l
from the fact that each of the expressions (3.28) and (3.29) consists
of a factor ~k multiplied by a polynomial of degree m-1
We can write (3.29) as
¢(t) =
m
Zc
i=l m-i
00
Z
k=O
m-i
.
(tk+m-~)
dt m- i
d
CD
For fUrther simplification, we write
Z
k=O
t k+m- i
as
in k.
72
t m- i
-r:t' and the operations of summation
which can be expressed as
and differentiation can be interchanged in the region of convergence
of the series, namely
It I
< 1 •
.,
.
Also, since
and further
m-i
d
dt m- i -
m-i-l A
1
_
r- A=l
Z t +
/_
1- t - -
dm- i
( 1 )
dtm-~• I-t
(,3.3 0 ) becomes
dm- i
m
Z c.
i=l
m-~
1
m 1. (l-t)
dt -
This can be rewritten as
r....J
¢(t)
=
~ c
(m-i)t
i=l m-i (l_t)m-i+l
It is well known that
::
m
Z
~~
c.
m-J.
i=l (l_t)m-i+l
is the moment generating function
1
(l-t )a
of
1
rraJv
f
a
a-l-v
e
v~O
(v) ::
o
v < O·
Hence we CAn write the distribution whose moment generating function
73
e
k
f(v)dv
m * f
= . Xc.
m-~
~=l
.(v) dv
m-~
This can" be expressed in a slightly better notation by writing
m-i.
This completes the proof of theorem 6.
Special Cases.
(i)
p = 2.
The kth moment is given by
tIV
V
k
=
r(~)r(~)
,
rn
which on simplification gives
(3.36)
The corresponding moment generating function is
"r-.../
¢(t)
=
co
l:
tk
,
k=O
which can be written as
=
1
I-t
From (3.38) we conclude that
if
v
~
0
otherwise
(ii) p =
For p
4.
= 4 we have
j
for
74
The
mo~ent
generating function for this, namely
~
¢(t)
,
=:
can be rewritten as
,-..,.J
¢(t)
tk
d t k"'l
...<;0 -dt (--;:;- ) + ...<;0(_)
2
=:
k
k
t:.
This on simplification becomes
The distribution of
v is therefore given by
f ( v )dv
=:
'l(
2 ve -v
+
e -v) dv
The moments in this case are given by
~
v
k
,
=:
which simplifies to
(k+4)(k+2)
8
k!
The corresponding moment generating function is
"""""
¢(t)
,
=
which can also be written as
Following the argument used in Theorem 6, this simplifies to
2
r.../
1 d 1
1 {j 1
3 ( 1
¢( t) == '8
~(l-t)+ 8 crt(l-=t) + '8 'l-t)
(3.49)
dt
which gives
r---.J
(J .,0)
¢(t) ==
1
+
4(1-t)3
3
+
8(1_t)2
3
8(1-t)
•
The distribution to which this refers is obviously
(3.,1 )
4.
f ( v)dv
= ('18v 2e -v
+
3 -v 3-v
ave + 8e )dv
An integral equation due to Wilks.
S. S. Wilks
L-" _7 considers
the moments and distributions of
some statistical coefficients related to samples from a multivariate
normal population, and exhibits a new method of attack.
He considers
two integral equations which he calls Type A and Type B, and uses
their solutions in deriving some now well known distributions.
The
first result adapted for the present use can be written as follows:
If
00
1
(4.1)
whero k's
and
independent of
,
a's
k,
are real and positive and Band
are
then
-8
f(v)
f(v)
= B
2
8
v
2
v
-x- -Bx
-1
e
dx
f(a )r(a )
2
l
The integral in (4.2) can be expressed in elementary functions when
al -8 2 is half of an odd integer; and this case, as we shall see later,
corresponds to the distribution of v defined in (2.3) for even
values of p.
If, however,
1 -a 2 is an integer, the integral is
a Bessel function and this situation arises if p is odd. Before
8
using (4.2) in finding the distribution of v, we shall, for the sake
of completeness, add a note on Bessel functions.
5.
A note on Bessel functions.
The equation
2
iw
dw
2
2
z ~ + Z dZ + (z -n )
=0
dz
is called Bessel!s differential equ~ion of order n,
and Bessel
functions are defined with reference to this equation.
Its only
singularities are at
~
z
= 0 and z;:
00
c
solution in series of (5.1) near the origin enn be obtained
77
by supposing that
i
a.z
:t
=
w
is a solution.
It is found
that the discussion can be divided into four cases.
(a)
n
# i,
n
2i+l
r ---2-
where
i
stands for an integer.
In this case there are two independent solutions:
(5.2 )
J
and
J (z)
n
-n
,
(z)
where
=
J (z)
n
(_l)r
00
i:
z except possibly z
and is analytic for all values of
is called Bessel's function of
(b)
,
r=O r(r+l)r(n+r+l)
~
O.
It
the first kind.
If n = i an integer,
and J (z)
n
-n
satisfying the relation
J (z)
J
-n
are two linearly dependant integrals
(z)
= (_l)n
J (z)
n
•
In this case the solutions are
J (z)
n
and
Y (z)
n
where
Y (z) =J (z) log z
n
n
1
- -
00
E-·
_!
n-1 (
i:
2 r=O
(_l)r
2 r=O r(r+l)r(n+r+l)
n-r-lh (~)
rt
2
-n+2r
z n+2r
f¢(r) + ¢(n+r)
('2)
-
7,
70
where
rJ
(5.6)
Y (z)
n
Y"(r)
1
1
1 + -2 +
= -
... r1 '
=
r
1,2~;,
•.• and 1/;(0) = 0
is called "Ressel's functi.on of the second kind.
J (z) and J (z)
n
-n
are two linearly independent integrals.
00
(~)
l:
2r
2
i=O
and
are the two solutions.
Yo(z)
Y (z)
n
is Bessel's function of the second kind of order zero.
Sometimes a function G (z) is used instead of J (z) or
n
-n
asthe second solution of the Bessel's differential equation.
It is defined by
G (z) ~ 2 ~
n
SJJ1 nn
where
(5.11)
when n
n
r J -n(z) - e- inn J n(z)
.I-
is not an integer; and
G (z)
n
is an integer.
L-' -
(z) - e inn J (z)
J
-n
.
n
...:.-
2 cos n n
-
7,
_7 '
tit
79
Ifwe put
z = iv in (,.1), the result is
iw
2
dw
-v -:-2 + vdv
dv
which is known as Bessel's transformed equation.
Two solutions of
(,. .12), namely
n
1
00
=
I (v)
l:
r=O r(r+l)r(n+r+l)
K (v) "" in G (iv) = __n_ _
n
n
2 sin nn
L
I_n(v) - In(v)
_7 ,
are called respectively the modified Bessel functions of the first snd
second kinds of order n.
If
n
is a positive integer,
I
-n
(v) = I (v)
n
,
and
K (v)
n
6.
Distribution of
=
lim
•
e -> 0
v for odd values of p .
In (2.2~) 'we proved that
(6.1)
r(~)r(~)
,.
r(~) r(~)
which p-ives only the principiI term in the
vaIu~
of
·Since we
60
are not using the exact value but only an asymptotic approximation
-
. .
for the
valu~
00
~
of
vkf(v)dv, the results Dbtained by the use of
o
(4.1) and (4.2) can not be presented as being final. Moreover, since
the paper of Wilks referred to in Section 4 depends heavily on Stekloff's paper on the theory of
moments ~47
_7,
odd values of
as applied to the problem of
~losure
which is not easily available, the distribution for
is here presented on a heuristic basis.
p
It may
turn out to be the correct distribution, but it has to be left for
further discussion and rigorization.
Consider again the equation (6.1).
u=:v
2
If
,
then
~)r(k+ ~)
r(k+
(6.3)
•
r(~)r(~)
Comparing (,.3) with (4.1), we have
(6.4)
B
=:
4, al :;
%
and
8
2
=:
~
•
In this case (4.2) gives
p
(6.5)
Putting
p-2
- '2 2
f(u)du= 4
u
r(~) r(~)
p+l
00
du
1
x
u
- 2"" -x- 4i
e
dx
81
(6.6)
u :::
V
2
and
,
p::: 2m + 1
we get
-(x+
00
.jo
x -m-l e
ix)2
dx
According to '~Tatson £52, p. 183 _i the integral
v2
00
-x- 4X
~ x-m- l e
dx , has been studied by Poisson, Glaisher, Kapteyn
·0
and others.
The result stated in Watson is
2
K (v)
m
(6.8)
lvmjOO x -m-l e
=> - ( - )
2 2
7.
L.j..l'..
dx
v to the form
m = 0, 1, 2, ••• iQ this, we get the distribution of
Putting
:::r
+-::)
o
This reduces the distribution of
p
-(x+
1, 3,
v for
5, •..•
The use of a differential equation in the evaluation of an
integral.
In Section 6 we found that
p+l
x
- --n"
c
e
-x-
v
2
4X
dx
,
82
where
p
is the number of variates in the underlying normal distri-
butiolTS •
1\
known teclmique for l:::\rE1luating
¢( v) ::
i
v
p+l
ro
x
2
- -r e-x-17X
dx
o
is as follows:
p+l
¢(v) :: 2T
00
j
dz
•
o
Now we define
rev)
=
i
00
1 2 ·2
"?(z t ; )
p-2 e z
z
dz
..l.. -
>., .
o
where
v2;O.
Since the conditions for differentiation under the integral
sign are satisfied, we differentiate \Vev) with respect to
get
1
2
12+ ; )
- -(z
co
Y'<v)
=
-v
v
-
1
zP
e
2·
z
dz
Similarly
dz
and
83
Now
2
"l/.)
l( 2
-"'2'z.~
z
e
dz
00
2
12+ ..!..)
... -(z
i
L- z
1
~
p-~
1 2
- "'2(z + 2)
z
EI
1
7
_ dz:= - p-l
2
z
e
2
z
o
which is equal to zero identically therefore, using this identity we
obtain, from (7.4), (7.5) and (7.6), the following differential equation
The value of
riC v) can be found by using the solution of th1.s and the
fact that
I
¢( v)
8.
= _ if
v
p+l
(v) • 2 ~ •
The asymptotic distribution of
v for even and odd values of p.
We shall, in this section, derive the distributions of
v
again
by starting with the result,
1
(8.1)
2
p+l
OJ
.. 2'
x
-x-
e
v
we
dx
,
and
by evaluating the integral involved by the help of (7.8) and (7.9).
We divide the discussion into two cases.
Case A.
p := 2,
4, •.•
84
Let p:: 2
The differential equation (7.8), in this case, reduces to
where the symbol
( 8 .3)
D stands for the operation of
\.( ( v)
is the solution of (8.2).
= Ae v +
Be- v
Also for p
00
differ6ntiatio~.
=2
1 2
2
- '2(z + ..;)
"\f(v)=j e
dz
Z
o
by definition.
This gives
;;
\V (0):: /"2
(8.5)
V
\//
and
T (00)
=
0
Thus
II/()
i v
(8.6)
"'V1;-v
2
e
and hence
where
~(v)
stands for the integral occuring in (801).
Hence we have the result
f( v)dv
=
e -v dv
•
85
·
e
Here
00
(8.9)
~.1
'(v)
but from
dz
,
(8.4) and (8.6)
1
00
1
°
(8.10 )
-
~(Z
2
2
~'
+ .;)
z
e
n
dv
-v
=V ~ e
Differentiating both sides of (8.10) with respact' to
by
-v,
we get
-v
-e
(8.11)
v
Thus
(8.12)
Hence from (8.1)
f ()
v dv
Here
= 2'1( e -v
i°
00
(8.14)
y (v)
....
~
+ ve -v) dv
2
12+ v2)
- -( z
e
2
z
Using the reasoning of example 2, we get
(8.15 )
'!f'ev) ~Ii
ve -v+e -v
Z
dz
•
v and dividing
•
86
Therefore
(8.16)
This value substituted in (8.9) gives the' follow.l.~ distribution for
p ""
6.
f(v) dv
=
-v
)8
+)V
-v
S
2-v
+v e
dv
The process can obviously be carried on to get the distribution of
v
for all even values of
Case B.
p ::: ),
(Bl ) Let
p =
p.
5, ...
3·
The differential equation satisfied by ~(v)
in this case
reduces to
,
C8 .17)
which is the modified Bessel equation of order zero, and is satisfied
by
Therefore
(8.18)
But
(8.19 )
¢Cv)
•
87
i-5 2 ,
(see for instance, Watson
p. 79
_7 )
Therefore
Substitution of this in (8.1) gives
(8.21)
(B 2 )
Let P ""
5.
Here
(8.22)
'rev)
j
=
co
dz
o
1 2
2
- -( z + v )
00
(8.23)
la
1
z
- e
2
Z2
Hence, on differentiating with respect to
v and transposing suitable
factors, we get
I·
(8.24)
'rev)
=
-Ko(V)
v
=>
Kl(V)
v
-
Thus
f
(8.25)
¢( v) ""
-vKl (v) - Kl ev)
v3
which, by using the formula
(8.26)
gives
I
vK (v) - nK (v)
n
n
=>
-vK ( v)
n+l
,
88
( 8.27)
and consequently
as the distribution for
p
=
5.
This nrocess can obviously be continued to obtain the asymptotic distribution of
v
for all odd values of p.
This section also shows that we get the same distribution of
v
for
p = 2m by the two methods, namely
(1) The use of the moment generating function }
(2) The application of the integral equation given in Section
9.
Note on the construction of tables.
Case A (When p
=
2m)
The distribution of
(9.1)
v
in this case is
f( v) dv ""
m
Z b. f . ( v) dv
j=l J J
where
(1
\ rmV"'.i-le -j
f . (v)=
J
and where the
L
0
v > 0
otherwise
b.'s are constants which can be found for Any given
J
integral value of m.
The evaluation of the integrals of the type
j
x
00
f( v)dv
can
4.
.
e
89
obviously be made to depend on tal)les of
degrees of freedom.
the cases when
p
YJ-?
distribution with even
For illustration it will be enough to consider
=2
and
p
= 4.
f(v)dv= e-vdv
and the substitution
v
)~2
shows that
= -
2
p( v > l!)
2
= p(
'X 2
> a)
which gives the method of tabulat ing areas for the distribution of
In thi.s particular case it may be more convenient to use the
tables of exponential function.
Here
f.()
v dv =
'l(
2 G-v
+ ve -v) dv
,.'x- 2
Putting
v
=~.
this becomes
The two frequency functions inside the square brackets
quency functions for two and four degrees of freedom.
are~2 freConsider the
following table giving tail areas for these distributions.
v.
.
90
V
4.00
i
4.2
4.4
4.8
5.0
5.2
2 .13534
.12246
.11080
.10026
.09072
.08209
.07427
4 i .40601
.37962
.35457
.33085
.3 0844
.28730
.26739
t
from table 7, Pearson and Hartley ~35_7o
Averaging these as suggest-
ed by (9.5), we have the following table for
4~4
4.6 '.
.23269
.21556
.
4
x
.'
p
.27068
.25104
'\/' ~ x )=p( v > '2.)'
x
p(,....
.19958
,,18470
.17083
From this table it is possible by linear interpolation or
by
using the formulae for interpolation when the arguments are not
equally spaced, to find thp. values of
p ,..
x
corresponding to
p=
.25,
.20 '3tc.
Similar remarks anuly to the construction of tables for p
8, •.• :)
Case B.
P
= 2m
+ 1.
For this CAse we proved in Section 6 that
TQbles for these distributions can be constructed by using
the series for
K (v)
m
and integrating term by term.
=
6,
91
10.
Sma~ary
of Chapter III.
In this chapter we have discussed the distribution of
for large values of no
The
kth moment
v
=
Inm3 \
k
E(v ) is found in Section 3
by following the methods of integration developed in Chapter 110
These
moments have been used in finding the asymptotic distribution of
v
for even values of
p by the help of the corresponding moment generat-
ing function.
For obtaining the large sample distribution of
odd values of
p, use has been made of an integral equation due to
S. S. Hilks.
v
for
CHAPTER IV
AN ASYMPTOTIC SFRIES EXPANSION FOR THE DISTRIBUTION OF
IN THE NULL CASE
w=
1.
Introduction.
Harter
L-18_7
has obtained the distribution of
m as a double
3
series by starting with the joint distribution of m , m and m of
l
2
3
ihTald in the special case when P i = 0 = t i' which we call the null
case, and which has been the subject
ceeding chapters.
~f
our discussion in the pre-
The series obtained by Harter would present diffi-
culti9s in practical applications, since in any practical situation
the number
n, which is determined by the sizes of the two samples,
will not be very small.
For large
n
the investigator wishes to use
that distribution of
m in which the ratio of each term after the
3
-1 t i
I s also obvious
first to the preceeding term is of order n
that the main point in getting such a series 1s to obtain terms beyond
the first.
Of these, however, the second and third approximations are
of chief interest and are doubtless easier to calculate than any of
those of higher order.
'Recause of these considerations in this chapter
we shall obtain the first three terms in the distribution of w =
as an
asympt~tic
series.
For the first
a~proximation
t m31
the constant ef
integration will be found, and the method of finding the tail areas
for the construction of tables will also be discussed.
It might be noted that the statistic
tic
v
defined in Chapter II.
w is ~ times the statisn
Towards the end we shall also compare
93
the result. of this chapter with that of Chapter III.
2.
An asymntotic series for the distributioq.
consider the joint probability distribution of ml , m2 and
w, which is the same as the probability distribution of m , m2 and m
l
3
i~e
except for the constant of integration because
Let
C denote the constant of integration.
Then
(2.1)
The region of integration is determined by
m1m2-w
1
=:
x -+ y
2
> 0
> 0
1 .:: m2
> 0
and
,
o -< w -< -12 for the variate w =
To integrate with respect to
m
> 0
-
l~~
which alsft determines the range
fixed, and put
~
(1-ml )(1-m2 )-w
D =:
(2.?)
2
m1
m
2
=:
p-3
X
and
- Y
Im31 .
m , we shall keen w
2
.
This gives
n-p-l
2 ?/-(
( 2.3 ) f ( x,y.w)d;ldydw "" ?C(r 2 -y2 -w)
_ x-I) 2 -y 2 -w 2_7 2
d::i::dydw
For fixed w, the two expressions in the brackets in
are zero on hyperbolas in the (x,y)-plane.
x - y
Mnreover
= 0 are the asymptotes of x 2-y2=w2 , and x
x - y - 1 = 0
+
x
f(x,y,w)
+
y
=
y - 1
~
0
0
and
and
2
are the asymptotes of the hyoer~ola (x _ 1)2 _ y2 = w .
f~r
The region of integration
by the two hyperbolas and is
x
and
sh~wn
y
is thus the area
encl~sed
in the figure on the adjoining page.
The coordinates of the points of 1ntersection
A and
B €Jf
the two hyperbQlas are
1
A =
£
2 2'
-12' (I.41 - w)
_7
1
B =
1
1
2
'2
f -, -(- - w) 7
24-
The probability distribution of w will be given by the following
double integral:
/1
V. '4
r(w)dw
~ 20
I
.J /1
Y=-14
i -ex
-w
2
2
-w
2
n-l"-l
- 1 )?-y 2 - w2_ 7 2
dx dy dw •
p-3
2" =
r
r:-3
2-"'2
(x -y -w )
Put
(2.5)
2
..
e
'-I
J
x-l+y=O
96
and
n-p-l
.-.,~
2
AI so rep 1 ace th e POS1't·1ve roo t
=q
V/ Y2+w 2
bY a
To perform the integration with respect to x, we shall suppose
Using (2.5) and by noticing the symmetry of the
Y to he constant.
inte?rand in
y, we can write (2.4) a~
/1
It'T;
(2,6)
f(m)
= 4c
-w.
2
I-a
J J
•
-y=O
f(w)dw, where
( x 2-a 2)rL-( r-l )2 -a 2_ 7q dx dy •
x=a
Let
This transformation sets
values of x
a
t~
I-a,
UP
a one-to-one correspondence between the
and the values of v.
Furthermore, as
x increases from
v increases monotonically from zero to one.
From (2.7) we have the following:
(2.8)
x =1 -
r.2
/ a +(l-?a)(l-v)
2
v
2
+ •••
·
e
97
1
1-2a
2 7 - "2
v +-.....
~ v _
dv •
(l-a)
(2.11)
To examine the convergence of the series in
v which will be
obtained as a result of this transformation, we regard
v
as a com-
plex variable and equate to zero the quantity under the radical sign
in (2.9).
Thus, if
va
denotes a singularity, then
2
(l-v)
a
2
=
-a
~ , which gives
J..-ca
vo = 1 .±
ia
/1-2a
This shows that the two singularities are situated on the line parallel
to the imaginary axis at unit distance and are equidistant from the
point 1.
Also
( l_a)2
2
1-2a
or
=
l-a
JRa
Both the singularities lie outside the unit circle around the origin,
since
(1_a)2
1-2a > 1 because
a
2
>
a .
Using the transformation from x
to
v, we get
.
e
98
1
(1-2a)
(l_a?+l a
r
(
via
v +
We write (2.14) after expanding the last two binomials, but
omitting terms involving cubes and hipher powers of
v
since they
will not affect the first three t8rms of the desired asymptotic series.
This gives
/1
w2
v'4-
/
I
.J
(1_2a)q+r+l
( l_a)r+l
a
r
y=O
1
J'
vr (1_v)2q+l
v +
v-=O
22
r 2 L- (r-l ) (l-~a+a )
2(1-a)
4a
~.
.
2
a( 1-2 a L7v + •••
2
(a +4a-2)(1-2a) v 2 • .• _ 7dv dy
_/ -1+ 1-2a ) 2 v - ->----~.-;..r.4----(l-a
?(l-a)
•
jl
l
,
e
99
This can further be reduced to
2
(a +4a-2)(1-28)
4
2(I-a)
+
2
7
r(1-3a+a )(1-2a)
)3
- +
2a(1-a
(2.17)
and
dy
and r
q
by
p, gives
n-2 p-3
E+ 1
(
2 T
f(w)=2 {' C ) 11-2a) ~
-"Y
dv
v after replacing
Integration with respect to
their values in terms of n
"}
co.
r(n-p+l)r(9)
r(
(I-a) ?
2
2n-p +l)
2
(
1+ /- 1-2a + (p-3)(1-3a+a ) 7 p-l +/- p-3)(p-5)(1-3a+a
- (l_a)2
4a(l-a)
- 2n-p+l 32a~(I-a)2
(p-3)a(I-2a)
).j.(I_a)2
2
<l+4a-2)(1-2a)
(P-3)(1-3a+a )(1-2a) 7.
2(I-a)4
+
2a(l-a)3
-
2)2
2
(n -1)
in which a
}'rl
.... J
+
(2n-p+l)(2n-p+3)
Y ,
,)
is an abbreviation for
;,.....2--~2
VY
+w •
We shall write
(2.18)
To integr8t€ with respect to y we make the transformation
(2.19)
Z
=
122
2w _ 2 / yc.+wc.
2w-l
The limits C)f integret ion for z
yare zero and
tion of
y.
;r-:;
Vu
-w.
Also
will be zero and one, since those of
z
is a monotonically increasing func-
This transformation will change the integrand essentially
into the product of two factors, one of which is a high power of
and the other a series of ascending powers of
z.
l-z
Thus (2.19) will
change the integral into a sum of beta functions suitable for giving
an asymptotic series for the distribution of
W.
To effect this transformation we have to find the values of
various factors involved, and we hAve
(??O)
,
101
1-2w
2
2
2 /y+w=w_l+
z+
W
(2.21)
1
r
1- ( y2+w 2)2 = ( 1 - w)_ 1 - 1-2w
2(1-w) z _ 7
,
1
2 = ( 1 - 2w)(1 - z )
1-2(y2+2
w)
(2.23)
,
and
1
dy
(1-2w)2/-2w
dZ
-+-
7
(1-2w)z
1
1
..,
2
2
2z L-4w+(1-2W)z_7
The singularities
zl" z2 and z3 of the resulting series in
are determined by the equations
1-2w
l+-W
(2.26)
z=O
l-2w
1 - 2C1-w) z
=0
and
1 +
respectively.
""'l-2w
1'iW z = 0
z
..
102
From these we have
2w
(2.28)
zl =
~w-l
(2.?9)
z2 =
2(1-w)
1-2w
and
(2.30)
Z
3
4w
::::
~
Since the range of w is from 0
to ~, we find from the above three
equations
(2.31)
-00
(2.32)
<
zl < 0
2 < z2 <
00
and
(2.33)
-00
< z3 < 0
In otherwords, two of the singularities lie on the negative half ef
the real axis and one on the positive side in the
z
?lane.
To be
ablato get a convergent series in z we have to make sure that these
singularities do not lie in the unit circle around the origin.
To
examine this, we proceed to find the range of values of w for which
z
>
l-
(i)
I zll
>
1
if
?w
1
1-2w >
( ii)
Iz2!
> 1
if
2f:2:)
( iii)
Iz31
>
1
if
4w > 1 or
1-2w
>1
or
or
1
if w >
if
if
2>
Ii
1 which is true
w >
1
'6
..
These investigations indicate that the resulting series in z will
converge for
of
vt~
~, which does not cover the whole range of values
w >
2'1 •
which is zerr, to
We shall, however, proceed to make this
transformation and subsequently find the probability distribution of
w as a series o:f powers of
! ..
Even the first approximaftion of the
fl
resulting series will be shown to give close results, especially for
finding the right hand tail areas.
Makinr the transformation (2.19) in (2.17), we get:
n-l p-2
--r --r
(2.34)f(w)=C
(1-2w) -P~l
1
2(1-w) 2""
1
)
1
- ~(l
"
z
n-2
-z
p-l
p~l
)7'1
~
+
2
-zw- Z_ 7
1-2w
'1
L
-
1-2w
2(1-w)
z
7- 2
z=o
1
- '2
L-l+
14w2w z_7
l~ 1
+
p-l \iiI
rI. (
2n-po4ol
z,w )
,
+
where
~l(z,w)
and
¢2(z'w)
can be written down after making the
transformation in the r21evant factors in ('">017).
To get three terms
in the probability distributi.on of w we need only retain the term
independent of
z
and the term containinR
z
from
¢l(z,w) and the
.
.
e
104
term independent of z from ¢2(z,w).
If we retain only those terms in the various expansions involved
in (2.34) which contribute to the first three terms of the desired
series, we get
n-l p-2
(2.35)
few) =
22
(1-2w)
CL
l
w
p-l
2
2(1-w)
n-2
z -2(1_z)2£1+ (P-l)~;-2W)
1.
j
1
z+
z=O
2
2
Ll+ ( p-l)(1-2w ) z+ ( P -1)(1-2w)
4 (l-w)
l-2w
/ 1- ~z
-
-
oW
p-l
2n-p+l
3:?(1-w)2
2
z +
3(1-2w)2 2
z +
l28w2
(L-
2
l-2w + (P-3)(1-3w+w ) 7
(1_w)2
4w(1-w)
-
2
-(1-2w) ---..".-+
l-2w
+/
- (1_w)3
(1_w)2
(p-3)w(1-2w)
4tl~w)~
_.
)
J}
2
(w +4w-?)(1-2w) + (P-3)(1-3w+w 2 )(1-2w)
+ ...
2(1-w)4
2w(1-w)3
Further simplification gives
dz •
.
105
n-l
p-2
22
= c (1-2w)
1
.
w
p-l
2(1-w)
J
z=o
1
1
z
1-2w
oW '
2(1
7
- ~- Z+
3(1-2w) 2
-
2
n-2
)2[1 /- (p-l)(1-2w)
-z
+_.
4w
/-(P-l)(P-3)(1-2W)2
232w
128w2"~
+
(2.35).
(p-l) 2 (1-2w) 2
(p-l)(1-2w )2
16w(1-w)
- 32w(1-w)
A, Band E are functions of
(
p-l
ll+2n-p+l
L- A+Bz+. '•• ,_7+
L- E+ ..._7+
] dz .'
w, and are known explicitly from
For the sake ~f brevity we write this
n-l p-~
27
few) :: c(l-2w)
w
1
p-l
2"
2(1-w)
I
. (p-l)(1-2w)
4(1-w)
2
(p _1)(1_2w)2
2
32(1-w)
)
p-l /7
p2_1
~l + 2n-p+l _ A + Bz+ ...... + (2n-p+l)(2n-p+3)
where
~
j
1
1
as
n-2
z - 2(1_Z) 2L-l+Alz+l\Z?+ ..
,_7,
z=o
.:r>~~l:;
(2n-p+-l)(2n-p+3)
L-E+ .' "_7+
,.,
}
dz.
.
e
106
The integral involved in this can be writt.en
1
J
(2.38)
n-2
z- 2(1_z)-21
(l+L- ~n~;~l
A+A
l
z_7
~
+
z=o
,
where the terms in curly brackets are arranged in three blocks according as they contribute to first, second and third approximations.
z, we have from these
Integrating with respect to
p-l A + -!- A 7
2n-p+l
n+l l -
+ _/
-
2
(p -l)E
p-l
(
(2n-p+l)(2n-p+3) + (2n-p+l)(n+l) .B+AA1 )
where
A=
2
(p-3) (1_3w+w )
1-2w
"2 +
4w(1-w)
(l-w)
,
.
107
e
B=
(1-2w)~ _ 1-2w
(1_w)3
A
(1_w)2
~ (P-3)(4w4-11w2+7W- 1 )
8w 2(1_w)2
= (p-1)(1-2w) + (p-1)(1-2w)
4w
1
4(I-w)
B = (p-1)(P-3)(1-2w)2
1~2w
-aw'
Ip2~1)(1-1W)2
+
32w2
-1
-
j
32(1-w)2
(p_1)(1_2w)2
32w(1-w)
)o-1)(1-2w)2
32w 2
.
and
2 2
2
E = (P-3)~p-5)(~-3W+W) _ .(P-3)w(1- 2wl _ (w +4w-2)(1-2w)
32w (l-w)
4(1-w)2
2(1-W)4
2
(P-3)(1-3w+w )(1-2w)
2w(1-w)3
+
Furthermore,
1
1 ( P_1)-1 1
2n-p+l = 2ii 1- ~ = "2!i
III
1
= - - -~ + 0(-)
3
n+1
n
n
n
+
p-l
7'"2 + 0
+
n
,
11
1
= --""l'\' + 0(-)
c
4n
n3
1
3
4n
",.;;;,
( 2n-p+1)(2n-p+3)
..,-----.
(2n-p+1)( n+l)
(1)
O(~)
n
,
,
'
•
·
108
·e
1
(n+3)(n+l)
1
... "2
n
,
+ 0(1:...)
3
n
Using these in (2.39), the first three termg of the series
can be written as follows:
n-l
p-2
p-l
(2.40) f(w)dw "" It (1_2w)-r w "'"T(l_w)
2[1+ fj- E? A + A1 _7 +
Explicit expressions for the first, second and third approximations
can be written by substituting the values of
(2.39).
A, B, A ,
l
~2
and E from
This series can be written as
(2.41) few)
= Kfl(w)L-l+ ~A(W)+ ~~(w)+ O(~) _7 ,
n
n
1
and is such that the ratio of each term to the preceeding term is ,
n
and is the desired asymptotic series.
3,
The constant of integration for the first approximation.
The first approximation to the distribution of w is
p-2
p-l
~()- -,-
(3.1)
f ( w) = KlW
where
1
O<w<-.
- 2
l-w
n-l
(1-2w)
~
+ O(n
-1
)
The cons tant of integr ation can be found by start-
ing with the constant of integration of the triple integral found in
.
109
Chapter II or directly by integration from (3.1) as follows:
~2 p-2
1
(J.2 )
K
l
Let
1-2w
=
.
(3.3)
=
n-l
p-l
~ W~(l-W}- ~(1-2w)~ dw
o
y, then
E~
1 n-l
p-1
1
1·)
2""
2
2dy
K = (l-y)
(l+y)
1
~
y
0
.
By comparin? with the hypergcometric integral, we can write
(3.4)
1
Kl
1
=
Ii
r(~)r(~)
r(n+~+l)
F(
U-l
n+1
T' 2 '
n+p+l
)
2,-1
Using the formula
(3.5)
we
F( a, b,c,x)
(l_x)-a F(a,c-b,c, -Xl)
x-
get
(J.6 )
Since the third coefficient in thenypergeometric series involved is
large, the value of the series can be approximated by 1 for large
that is
p-l
lim
F("'2
n ->00
and we can write
p
' "2'
n+p+l
2
1
'"2)
= 1
,
n,
IQa
1
P
2 "2
-
(3-.8 )
r(~)r(~)
r(n+~+l)
Using Stirling's approximation, i t can be further simplified tl'l
,
approxim~tely
giving
the constant of integration in the first aPDroxi-
mation.
4.
The tail areas for the first approximation.
Let
The transformation
y = l-2w reduces it to
I-x n-l
T
y
)
~-l
p-2
(l-y)
T
(l+y)
--'2
dy
a
This is the kind of integral we will have to evaluate for finding probabilities of the tyoe
pew > ~).
W2 write the integral involved in (4.2) as:
t
(L~. 3)
IJ,(t)
=
J
0
n-l
y""2 'A( y ) dy
,
III
where
A(Y)
=
(l-y)
~
(l+y)
- E;l
2
• Integrati(m by parts gives
t
n+l
T
~(t) = n~l Y
A(Y)
2
t n-l
2)
- n+l
o
1
y2 A (y)dy
0
This I!i ves
n+l
jo
2
""2A(t )- ~
2
~(t) = ~
n+J. t
n+J.
t
n+l
2
Y
1
A (y) dy
•
Performing another integration, we get
n+3
4-r
(4.6)
(n+l)(n;3) t
__ 4
t
jo
(n+l)(n+3)
From
(4.5)
~
I
A (t)+
rl
A (y) dy
y
•
we can write
n+l
2
T A(t)
1f.l.(t)-n+It
I
=
2
---1
n+
j
t
n+l
yT
A1 (y) dy
,
o
p-2
1~here
),,(y) =
p-l
""T
- T
(l-y)
(l+y)
o
-< y -< 1
(4.8)
I
and this shows that the maximum value of A (y) corresponds to the
112
minimum value of y, which is zero, and
< B
:=
2p-3
By
2
•
taking account of thi.s bound we can write (4.7) as
j
t
n+l
y2 dy
,
o
which is the same as
n+3
n+1
2
~(t) - n+21 t
A(t)
(4.11)
t
The range of
that given
2
(2p-3)2t
<
t n+1)(n+3)
is zero to one, and thus equation (4.11) asserts
&, we can find
such that
N
n+1
~(t) ~ ~
t~
n+l
I
A(t)
< e for n > N
•
n+l
2
---1
n+ t
T
A(t)
can therefore be taken as an asymptotic approxima-
tion of ~(t). Using this in (4.2) we get
1 P
2"2
n+1
?
p-?
T
¢(x)= 2 n (l-x)
x
<..~-x)
r(~) (n+1)
p-1
-7
[1
+
o(n- 1 )]
lATe can write th1.s as
'21 2p-2
. (
4.14)
pew >
x)
'2 =
2 n
r(~)
n+1 p-2
p-1
T
T(
T
1
(l-x)
x
2-x)
..C 1+0(n-)
7
..
113
which provides a formula suitable for finding the tail areas of the
5.
Comparison with the results of Chapter
II~.
omit terms of order}
We have shown in this chapter that if we
f(w)dw
:=:
n
p-2
p-l
n-l
T
-T
2
C w (l-w)
(1-2w)
dw
To get the corresponding first approximation for the distribution of
the statistic
, we put w
v=
p-2
C
f( v)dv = p
n
T
v
v
= -
n
in this, and get
p-l
(1-
- T (1!)
n
n-l
2-"2
2)
n
dv
n-l
Since
lim(l _ ~)2
n
n-->oo
,
.. 1
e -v
and
p-l
lim (1- !)-T
n
n-->oo
(1.i )
=1
1
we can write (5.2) as
2 .
B.:... -v
f( v) dv = Const. v 2 e . dv _;-1
+
1 _7
OCii)
,
which st3tes that for large n) v is approximately distributed as
~
114
2
'}./2 where the
'Y---2
has
p
degrees of freedom.
This is not in agreement with the probability distribution
v
~f
obtained in Chapter III, and this discrepancy can be easily explained
by the fact that the convergence of the eeriee'tn ··..
W3S
not guaranteed for the whole region of w.
101-103, that convergence of the series in
·"~n
(20-38)0' ::.
It was seen on pages
z· from which we obtained
f(w) by integration would be obtained only if
w>
1
'4.
However, it
appears that the two distributions would give close results if we are
As an illustration we shall find apnroxi-
interested in the tail areas.
mately the
5
%
F~ample.
point for
To find
p
x
=4
by the results of the two chapters.
P(v > x) = .05
such that
in the two
cases
(1)
From table 7, L~35_7w6 have the following values fQr the prob3bility integral of the ~ 2-distribution.
Giving
p( ,,--2 >
-
1--02 )
'. 2
~C1
7.8
8.0
8.2
8.4
8.6
8.8
2
.02024
.01832
.01657
.01500
.01357
.012::>8
4
.09919
.09158
.0845?
.07798
.07191
.06630
Sum
.11943
.10990
.10109
.09298
.08·548
.07858
1
.05972
.05495
.°5055
.04649
.04274
.03929
d.f
"2 Sum
j
•
115
The last row gives probabilities .f the type
'p(2v > )Lg)
p(v > v O) say, and shows that approximately p(v > 4.1) = .05
use (1).
Also from the same table we find that
or
if we
p(v > 3.9) = .05
if
we use (2).
6.
Summary of Chapter IV.
In this chapter we have considered the problem of obtai.. ning an
asymptotic series for the distribution of w =
m
3
by starting with
the joint distribution of m , ~ and m in the null case. The del
3
sired distribution is obtained by integrating out m and m2 over
l
a lana shaped region enclosed by the two hyperboles m m =w 2(a conl 2
stant) = (1-~)(1-m2)' To perform the two integrations, one with
respect to x =
~
we have, at each
+ m2
stage~
and the other with respect to
regarded the other
varia~les
y
= ~ - m2
as constants and
found a transformation which changes to integrand into a function of
~ci(l
the form
- z)
b
i
z, where b
is a large number, and where
z
i
varies from zero to one.
This leads us to a result of the type
few) = K L-fl(w)
+
~f2(w)+ ~ f (W)+ •••
n
3
_7.
First three terms of the asymptotic series have been obtained in
this
ma~~er.
For the first approximation we have also found the con-
stant of integration, and discussed the method of finding tail areas.
In Section
5 we
those of Chapter III.
have compared the results of this chapter with
•
CHAPTER V
THE APPLICATION OF TCHEBYCHEFF-MARKOFF INEQUALITIES
TO A SPECIAL CASE
1.
Introduction.
This chapter will be confined to the discussion of the
special case p
=J
and N + N
2
l
= 20.
In this case, starting
from the joint distribution of rol' m2 and m , we shall find the
J
moments of the exact statistic V.
These moments will then be
used in setting up bounds to probabilities of the type P(V
~ ~)
by the use of some investigations due to Tchebycheff and Markoff
~48_7, ~49_7.
This will provide, on the one hand, some exact
results of some importance for this case, and on the other hand
illustrate what can be done when the first few moments of an
unknown distribution are known.
2•
The integral over D •
l
As before, we shall denote by 1
1
and 1
2
the integrals
over Dl and D • Thus
2
(2.1)
Expanding the integral by the use of the binomial theorem, we
get
•
117
(2.2)
Iff
Each of the integrals in this sum can be calculated by
first putting
m,= (m1m2t) 1/2 to integrate with respect to t,
then following the procedure of section 6, Ctnapter II.
The result
is
(2.')
II = rr.
1
90 • 16 + 32
I
1
16 • 98 • 12 •
11
.
. ,01
l} +
1
+
+ ~1~28n--·~3~2-·~1~2~0---.~8
• 11 + 48
• 88 •
12 +
1
1
;2 • 26 • 88 • 8 + 32 • 1,0 • 64 • 6
256. 128 1• 16 • 17 - 7,
which is therefore the integral over D •
l
3.
The integral over D •
2
This has been found exactly in Chapter II} but anindepen-
dent derivation based on geometrical considerations could be
given here.
Iff
The value of 1 2
~e
want will be obtained by putting n
= 18.
•
11:
(3.2)
In (3.1)
Then let v
J2m 3 =
and
put
u = (m1 + m2 ) //2
and
v
=r
= (m1
/.j2 .
• m)
2
cos Q
z = r sin G •
Then
./2
(3.3) I = 2
Ju2+ 2(1-
1
12 u)
j
/
!2
/
.2n
1
u=r=O·
2
(1-
2 n-4
V2u+ u;r )
2·
9=0
J2
Integration with respect to G and r is immediate, and yields'
{2
(3.4)
12
= 2n
1'
J
2 n-2
u
2
(1- V2 u + 2)
.Tr5
du.
1
12
Puttingu =
02x
and integrating we obtain
=
I
2
'J(
(n_1)(n_2)2n-3
and for n = 18, it reduces to
,
2d9drdu .
•
IIJ
(3.6)
4.
1
2
;=:
1t
--..-;.;.---=-.
...
15
17 • 16 • 2
The integral over D.
The value of I, the integral over D, is
(4.1)
5•
= 18
• 17 .
4
Moments of V.
The kth moment
~
f(m~m2m3)
Due to the symmetry of
k
ml ,m2 and m in m ,E(V )
3
3
Putting k
= 2,
is given by :
=0
f~r
the joint density function of
odd values of k.
4 and 6 io(5.1) and integrating as in
Section 2, we get the following moments:
~2
(5.2)
~4
~6
= 6.5571637176,
= 459.6304942728,
= 25661.8~65464
>
We knm'1 that
V
one inside D.
Thus, since the range of 18m is from -9 to 9,
3
since V equals n m divided by a
3
quantity which cannot exceed one, and in fact remains less than
•
12 f1
the range of V in the case under
-a to a where a
cons~eration
is larger, say from
> 9.
We shall now use the moments obtained above iI, setting up
bounds for P(V
<
~)
to get an idea of the exact simpling distribution
of V.
6.
Some results due to Tchebycheff and Markoff.
We shall, in this section state without proof the results
which lead to the historic inequalities which were announced by
Tchebycheff and proved by Markoff, and which we shall use with the
moments obtained in the preceeding sectioD.
Theorem I.
Any three consecutive polynomials in an arbitrary
sequence fp.(x) 1 of orthogonal polynomials satisfy the relation
1
(6.1)
p (x) = (ax + b ) p
m
m
m
m- lex) - cmpm- 2(x) ,
where p (x) stands for the mth orthogonal polynomial.
m
a ,b and c are constants, a > 0 and c > O.
m m
m
m
m
Here
If the highest
coefficient of P (x) is denoted by k
m ,we have
m
a
km
= -km-l
m
an d c
<:l
.m
= ...m a m_1
The recurrence formula (6.1) is also true form = 1 if we
define p_1(x)
= O.
•
121
Theorem II.
The roots of the equation pm(x)
=0
, where pm(x) is
the orthogonal polynomial of degree m associated with the weight
function cx(x) on the interval (a,b) Jare all real and distinct;
and all of them lie in the range of definition of the polynomials.
Theorem III.
If
oo(x) ,
z-x
where o(x) is the weight function of the system of polynomials, then
wls satisfy the same relation (6.1) as the pIS, though with different
initial conditions.
Definition.
Let
m
= r.
(6.4)
where
p.
_1_
i=l z-C i
~
m(z) is
or
,
degree m-l in z, whereas p (z) is of degree m.
m
The c i are the roots of Pm(z)
= O.
Suppose a
< cl < c2
where (a,b) is the range of the basic function Q(x).
called the Christoffel numbers.
•••
< cm < b,
Then p. are
1
•
122
Theorem IV.
The Christoffel numbers are positive, and
b
~ Pi
m
i=l
J
=
~(x)
= a(~)
- a(a).
a
Note:
Because of theorem IV there exist numbers d
l
< d 2 < ... < dm_l
lying between a and b such that
Theorem V.
••• d
m- 1
that is
more precisely
i
L P.
j:::l J
<a(c i + l - 0) - a(a), i::: 1,2, ... m - 1 •
That is if F(x) is the class of cummulative distributions
the given moments, then
having
•
123
j
a
7.
c
. d
i
dF(x)
<
J
J
Ci
i
dF(x) =
~,
j=l
p. <
J
+1
dF(x)
a
a
Application of Tchebycheff-Markoff Theorems to this Example.
We have the following matrix of the moments of the distribution
studied in this chapter:
\-lO III 1-12 11
3 114
III 1l'2 11 3 114 115
=
112 11 3 114 11 116
5
11 3 114 11
0
6.557163716
0
459.630494
5 116 11 7
'"
6.557163716
0
0
459.6}0494
459.·630494
1
0
459.630494
0
256661.8465
0
256661.8465
0
,
/"
in which all four principal diagonel matrices are positive definite.
Let
(7.1)
•
124
be one of the orthogonal polynomials in the sequence corresponding
to the frequency function of V which gives rise to the moments found
in Section
5. Then, by the definition of orthogonal polynomials,
(f.e:)
Taking Gk(x)
= xk
for k
= 0,
zero, we get the following
and 2 and noting that odd moments are
e~uations
for finding the coefficients lD
(7.1):
(7.3)
and
6.5571637176ao + 459 .63049427~8a;;; + 256661.8465464a4 + 0
Taking a4
(7.4)
=1
in (7.3), we obtain
o = 353"2.38864
ar-c: = -608.80~7~4
a
and
as the solution of'7.3).
•
l~
Thus
Solving P4(x)
= 0 we
get the following four values of x ,
arranged in increasing order of magnitude
Now the function
given in
(6.2~,
(7.7)
~4(z)
can be found using its definition
which gives
~4,(Z)
=E
x
(z
4
4
- x· ) +
2
B (Z.
2
-
2
x )
;
z - x
and, using (5.2) and (7.4), this becomes
(7.8)
The Christoffel Numbers.
The Christoffel numbers as defined in Section 6 are the
numbers p. given by
1
•
126
m
Pi
= L:
-
i=l z - 01
So we have to split
(7.10)
into partial fractions.
(7.11)
Write (7.10) as
+
p~)
z + 3.423
z + 34. 26
P3
----+~---
z- 3.423
+
z - 34.726
Comparing the coefficients of like powers of z in (7.10) and
(7.11) we get
Pl
= P4 = .251
prCo
= P3 = .248
approximately, where p. corresponds to c. , the ith
1
equation
1
root of the
(7.5).
Thus we get the following table giVing bounds to probabilities
by using the formula, (6.5).
•
127
Table 7.12
Bounds for
Limits for E
p(v
<
s < -34.726
~)
P
=p
< .251
-34.726 < S ~ -3.423
o < P < .499
s:: 3.423
.251 < P S .747
-3.423 <
3.423 < S ~ 34.726
34.726 <
~
.499
< P .:s
1
.747<P~1
It can be seen that the bounds given above are far from
being close.
For obtaining bounds which are sufficiently close
and therefore useful we would have to calculate a large number of
moments.
The labor involved in finding enough moments, and pro-
ceeding with subsequent investigation based on those, however, is
prohibitive of any such invGstigation in these pages.
CHAPI'ER VI
NON-NULL CASE
1.
Introduction.
This chapter will be devoted to the study of the non-null case.
In these first few sections
W3
will consider the joint probability
distribution of ~, m and m given by Sitgreaves
2
J
tribution corresponds to the statistic
L-45_7.
This dis-
and has been obtained under the restriction that the mean vectors of
the two populations are proportional to each other.
shall convert this into a different form.
For large
n
we
Some of the difficulties in
I
proceeding beyond that point will also be discussed.
The next section will deal with the distribution of
,
ij C
U "" 2: 2: s zi Yj
i j
h
- xj ) for
is large so that
duces
s
2
the case
P "" 1, and on the assumption that
can be replaced by er
2
.
This assumDtion re-
U to the product of two normal variates whose distribution is
known; see for instance ~2
-
7,
/27 7.
-/-8-7 and --
possible to extend this to the case
It has not been
p >,1.
In Section 7 of this ch9Pter we have exemplified the differential
method which was quite popular with statisticians a few years ago.
The
illustration deals with the finding of approximat ions to the mean and
e·
129
variance of the statistic
U for laIrge sam!lles by tcking into
account the sampling fluctuations of the
sampl~
means and covariances.
Higher moments can also be found but the algebra involved is very
heavy.
The concluding section of the chapter deals with a practical
suggestion for modifying the variance of the discriminant function of
R. A. Fisher by taking into account the sampling fluctuations of the
means.
The sample covariances can be taken as the population co-
variances when
n
is large.
Thus the statistic
U in this case be-
comes
2.
The joint distribution.
The joint distribution of
00
~
j=O
m ,
~
A.
5 ' Z-1 0.9
l
and m
n+2
fC -2+
J.) A. 2'J 2
C-) Ck In..
fCE + j)jl 2
1 ~
2
where
.....
~
m
3
,
M=
l.. . m3
m
2
=>
3
given by Sitgreaves is
130
and
~5
and
k 0
2
are the mean vectors.
Using the not8tion of the confluent hypergeometric series,
we can write
(2.1) - as
p-3
n-p-l
! 3" II-~/f I
1M
F( n+2
2
p x)
2 ' -"
,
where
The function
F(a,c,x) is also written as
¢(a,c,x) or as IFI(a,c,x),
and is known as the confluent hypergeometric function.
3.
Notes for reference on confluent hypergeometric functions.
Consider the hypergeometric series
F( a,b,c,z )
=I
+ a.b z + a(a+l)(b.)(b+l) z2
c
c(c+~ ~
in which we suppose that both
gives a power series with
b
a
and
c
are positive.
whose singularity
at
00
••••
b -->
00
,
F(a,b,c~~)
as the radius of convergence.
defines an analytic function with singularities at
limiting case of this series as
-I-
0, b, and
It
00.
defines an entire function
is the confluence of two singularities of
F(a,b,c,~) and which can be written as
The
131
(3.2)
(
F a,c,X
a+l~ ~
x2 +
ax + caf 0+1
) = 1 + cIT
•••
It satisfies the confluent hypergeometric equation'
d2 '
d
x J. + (c - x) .Ql - ay
dx2
dn
(3.3)
Accordi.ng to Bateman
as a
L-4_7,
=
0
the asymptotic behavior of ¢fa~x)
has been discussed by Perron, Tricomi
--> 00
and Taylor.
An asymptotic form of F uniformly valid in the neighborhood of x=O
given by Taylor is
1
(3~4)
F(a,c,x)
where c and
r(c)(Kx)2 -
=
c
2
x
e2 J _
c l
Kx are bounded, and
1
L-2(Kx)2 _7+ O(
c/2 -
K
8,
and
-1
K
)
J is the notation
for Bessel functions.
If : x
is lJounddd and bounded away from zero and
arg x - arg K
<
n , then
1
x 3
,_,,2r(c)e2
K4
F( a,C,x ) =
c
3
- 2 xIi -
Q
2
I I I
2
cle 2i(Kx)
where with
..
c =(2rt)
1
s
"2-2
-2i(Kx)
(K)
+ c e
+ x
2
an integer, we have
~ in(e- t)(2C-1) x
e
1
e.
and
I I
22
ei.- exD Im(2T( x )
<
arg(Kx)~ ~
(?s+l) 'it' - e.
,
132
(2s-1) ri + e
and'where
~
1
arg(Kx)2
~
(2§+?) n - e ,
Im(y) denotes the imaginary part of y.
The first of these
results will be used in simplifying the distribution given in (2.3).
For large
(4 .1)
F(~, ~,x)
n, we have, by using (3.6),
=
2-p x
r(~)( - 2n41-J.-px)4e2 Jp_~-i J('2n+4·-p)x_7+ o( 2n:4-p).
2'"
Let
p
TftThere
is an integer.
q
=2
+
4q
Then, for lar ge n
x
n+2 p
( ).+.3 ) F(2'~'x
)
,
c1"i v'('2n+2-4q)x
2
r(2 Q+l)(2Q-n-l)-qe J
2
_7 •
Using the relation
I (z) = i- n J (iz)
n
n
where
I (z)
n
,
stands for Bessel functions with purely imaginary argu-
ment, (4.3) becomes
x
(4.5)F(n;2,~,x) (-1)Qr(2Q+l)(2 Q-n-l)-Qe212qL-V(2n-4Q+2)X
_7 .
133
Using this, we can write the joint density function of
2
_ ~ (ki+k~)
(4.6)f(mlm2m3)~d~dm3=ce
IMI
p~?
I I-MI
X
-r
4A.
p
and m as
3
n-~-l
e I p _ 2 l'"J( 2n~4-p )x
for all
~;m2
'-r. &']. dm~dm.3
•
satisfying (4.2); where
The difficulties in proceeding further.
Various methods have been tried to proceed beyond this point,
but none seems to work well.
The main difficulty, even at this stage,
is, that the coefficients in the expansion of the Bessel function involved are increasing.
As a consequence of this one would not be
justified in omitting terms in the expansion of I
p2
2 ~/2n+4~px
beyond the first few, and discuss'the distribution of nm • The
3
difficulty would probably be removed if we consider small values
-
7
~f
n, and try to integrate over the 16ns-shaped region of Chapter IV to
find the distribution of m , but the objection to that would be that
3
m or TIm is not a suitable statis0ic for small values of n.
3
3
discussion, therefore, had to be left at this point.
5.
The distribution of U for large
.§.pproach.
n
~ p~l,
This
an independent
134
The statistic U reduces to
s
2
by
z(y_x)/a 2 for large n, since
is then found from a large sample, and can therefore be replaced
c/
to which it approximates.
This does not imply that the
sample means can also be replaced by their population values since
for n
to be large it is enough that one of the sample sizes is large.
Moreover, none of the means has as many degrees of freedom as the variance.
z(y-i)/a 2 can be found under both the
The distribution of
hypotheses
(1) z e)[l which is
z e)(2
which is
as follows.
Let
Z 6
lTl •
The statistic
U can be taken as the product of
y-x which is
z*=2
two normal variates z which is
a
N(
a
2
We can, instead of z and z*, consider the variables
12
2
where u is N(m,a ) and v is N(O,a ), where
v,
a
~
12
and
=
m =>
lJ. ..
\I-lJ.
T
or
a
z
eJT 2
•
\I-~
\I .. ~ ,
a
u
and
according as z e \T
)'1 or
135
The distribution of the product of two independent normal var-
L-2-7 cmd
iates is known from the work of C. C. Craig
-/-8-7,
others, but for the sake of completeness
shall include a derivation.
Definition.
x is said to be a Bessel variate if
p-l
2
(5.1 ) f(x)dx = C x
1
e-
"2
I l(b x )dx
p-
ax
1
where
I
p-l
",re
Aroian
(b~2)
is the modified
Bess~1
,
function of the first kind
We shall now state without proof two lemmas.
Lemma 1.
If x
IV. r 2
~'
2
is
N(m,a), then
222
In fact, if A
m /a
=
,
x
2
= ~ is a Bessel variate.
a
then
,
'V 12
which shows that "
Lemma 2.
If xl
is a
and x 2
Bessel variate with
1
2
a= -, b '" A
are two independent Bessel variates with
respective distributions
p-l
f(x Jdx. '" ex.
J
J
J
~
e
-x
1
j I
then the distribution of
2
P-
l(bx. )dx.,
~
J
J
= xl
- x
2
(j -= 1, 2)
is given by
,
as
136
_-b,2
2p-l
2
~ ~
f(~)d~== ~---(-)
;n
where
Km(~)
~o (E)2r
£J
2
r==O 2
~2
----"'-
rtr(p+r)
The distribution of
where
1")2
1") is
p+r--
2
is the modified Bessel function of the second kind as
i-,4,
defined by Watson, or in vJhittaker and Watson
Let
( )
1 ~ d~,
K
U
u
V
0-
0-'
'!'"l::t-';"'I
12)
N(~,
0-
== uv where
v
0-
<'.J
and t is
_7.
u is
u
== -
y
and
p. 373
N(~,
/2).
0-
and ~ 2 are Bessel varia:. ss with
Thus by lemma 1, both
2
a = 1, b = m2 and
1
p = 2 as
0-
parameters; and by lemma 2,
theref~re)
th8 distribution of the product
is given by
00
Z
rtr (1)
r+ 2 2 2r
r=O
and by noticing that
lATe can rel-rrite this as,
1 m2
f(n )dU ==
:'2 .~
0_8_ _
n
00
z (E1)
2r
r=O ()
Replacing m by IJ. -
If
...!.-
K (U)dU
( 2r ) 1 r
v-IJ.
~
(j'
I
and by
v-~"""
v - ---,
we get the distri-
0-
2
137
butions under the two hypotheses.
6.
The asymptotic mean and variance of the statistic
zZ
s ij z. (-y. - x.) by the differential method.
i j
1r.f e
U=
~
J
J
~------....:..;.....;;.---
shall, in this section, find the mean and variance of U,
approximately for large samples, by a method which was formerly quite
pooular and still is sometimes used.
The object of the section is
mainly to exemplify this method, which can somctliues be aoplied in
getting moments of an unknown distribution.
Some of the sets of condi-
tions under which the method is apnlicable are discussed by Cramer
1.-9 _7 in Chapter 27, but we shal], like statisticians in the oast,
apoly it
~Jithout
stopping to verify the validity of the application.
Because of the heavy algebra involved it will be enough if we confine ourselves to the discussi.on of the first two moments.
I'
Let
(6.1)
U = l: l:
i j
s ij z.(y.x.)
~
J
J
be written as
(6.2)
b.z.
l l
,
where
(6.3)
•
IrITe define
ds ..
lJ
Then
138
Follmvinp this (:kiini t ~ on, We' let
,
(6.6)
where
T,Te
note that
(6.8)
10 find E( ds
Let
ij
):
i'
s J
be, expanded in Taylor's series.
irve have
... .
Therefore
Since
E(cts
kl
) = 0, this reduces to
,
. 2 ij
~ 6 d
dsk1dS t + •••
.-klrt'o0kl0(jrt
r
(6.10) E(ds iJ )= 1
2E /- E l.
°
7
To evalu3te this, we havE to find
()2(jij
o °kl'() art
and
E(dsklds rt )
The let ter of these is known from Hot-dling
L-23_7
as
139
E(ds
(6.12)
kl
ds
)
rt.
To find the second
=
0'
0'
kr It
~rder
+0'
n
0'
kt ~r
derivatives involved we proceed ,as
follows:
Consider the identities
. .(I
if k
o
if k
1:O'J.JO'
i
ik
= 5J
k
=j
•
f
j
Differentiating (6.13) partially with respect to
0' A
we have
aI'"
(6.14)
where
if
(6.15)
i
=k
=
a
=~
otherwise
Using
(6.16)
in
km
Z. O"k
k
J.
= orr:J.
(6.14) and simplifying, we get
,
(6.17)
which provides a formula for the first derivatives.
If the covariance matrix
supposed for the statistic
Z is the identity matrix, as can be
U, which is known to be invariant under non-
singular linear transformations, then the crls
Kronecker deltas with the same suffices.
can be replaced by
Thus (6.17) simplifies to
.
e
,
which states that
(6.19)
(J ••
= -
1
:= -
1
,
~~
(6.20 )
and that the derivative wUh respect to any other element is zero.
To
obtain
spect to
cr
ye the
we
differentiate partially with re-
equation~
(6.21)
This
giv8S~
Using (6.21) in (6.22)
obtain
and replacing tho
cr's
by
6's
as before, we
.
141
This gives
,2
.. (] i1.
".
o
2
('J i1
-= 6
2(] ij
--"2:- == 0,
(6.25)
(i =f j)
(1ij
and all other derivatives of the second order are
G180
zero.
Using thes~ results in (6.10), we have
- 6p ... 0 (-2)
n
n
(6.26)
(6.?7) E(dsijds gh ); ~ k Z !
•
k 1 r t
(J
I'T
gh
rt
Using (6.12), we can reduce this to
(6.2~) E(dsijds gh )= ~ ~
L L 1
n k 1 r t
~3ubstituting
-
(J
ij
(Jkl
the values of the first order derivative
.from (6.18) on the supposition of
Z
in t8rms of
ots
bei.n€ the identity matrix, we get
.
e
142
say
In terms of
a's we can use the notation
(6.30)
These results will now be used in finding
E(U) and Var (U) •
(6,31)· To find E(U)
Since
are allindependently distributed, the
expectation of the product is equal to the product of the expectations.
if
E(Y. J
x.)
:::
J
z
v. - IJ..
J
J
=.{~+~
=:
if
eTC.1.
d.
J
i
say
= j
.2.E
n
Thus
(6.35)
6 7
E(U) - 2; L. IJ.. ( v. - IJ..) / -6~' -} -E
i j 1. J
J ..l.
n-
.
..
e
This reduces to
(6.36)
To find
--
E( u)
= (1
6
+ -E)
n
6
p
~ I-L. (d.) + .J? ~ ~ I-L' (d.)
i=l ~ ~
n i j ~ J
ir'j
var( U)
We can write
(6.37)
U=
p
~
i=l
b.z. =
~ ~
p
~ b.z.
j=l J J
,
. where
Define
(6.39)
~.
J.
=>
where
(6.h O)
Then in )(1'
(6.1.j.1)
To find
and
O"u :: ~ ~ /-~.~.O" •. + I-L..:I-L 'O"b b + •••
. . J. J J.J
J. J
. .
]. J
]. J
_7
•
144
From these
t111]O
equations:
Since
from (6.3°) and
(6.46)
can be written as
But
Zo J'm0
m
~
kIn
{
I
if
°
otherwtse
,.
j=k
therefore we obtain
.
~
O'b •b ,'-'/ i.J
1 J
k
(6.48)
~ ~jkIn d1; d
i.J
m
+
~ m
n
rT
v
ij (_1
N
1
1)
N2
+-
Using (6.48) in (6041), we get
( 6 .49)
°u2 ,-./
Replacing
(6.50)
2
'" '"
i j
i.J
~i
O'u-v
Z Z
{..J
and
.
l,
::!. j k
If we supoose that
fAI'" 'I-'AjOi'
1
~j
Z /-0
m-
J
+ !-L.!-L.
~ J
( '"
i.J
'I'
i.J
k m
0 ij kIn
n
dkd
m
+ O'ij (2:
N +
1
1:
N2
)J].
by their values from (6.39), we reduce this to
1'k .
°i 'kIn
i' 1
oJmO",. + ~ !-L,!-L, 7dkd + Z LoO J(._ +~.dh.t:'!-L'
1J
n
1: J m i j
N1 N2 .' 1 J
Z = I, then
(6.50 ) reduces to
l'
0
•
(6.51)
2
aU
1\1
A
u
2
1
+ - Z Z Z Z IJ.-IIJ..
n i j k m ~ J
dkd 5" kIn
m lJ
(1
1)
+ P -N + -N
'1
2
2
f.Li·
where
!:J.
= Z Z a km dkd
k m
7.
•
m
Correction term for the variance of the linear discriminant
function.
In this section we shall find the variance of the statistic
~f-
(7.1)
U
which is the same as
supposition that
-Yj - -x j
U
=Z:Za
i j
with
n is large.
s
ij
zi
(-y.-x.
-)
J
J
,
ij
replaced by a ij
If
N
because of the
and N are both large then
l
2
can be replaced by the corresponding difference in the popu~~
lations, namely v j - IJ.j' giving for U a linear function of normal
variates. As an improvement we shall find the variance of U* by
taking into account the sampling flucutations due to the difference
sample means.
We have
E(y.)
J.
E(x·-!J.i)(X'-IJ..)
1
J
J
:= V.
1
= a i J. = E(y.-vi)(y.-v.)
1
J J
==
E(z.-IJ..)(Z.-IJ..)
1
Let
d.1 ==
\J.
J. -
"i
J-
1
J
J
•
~f
146
and
/
The correction term for the variance of
U* is the variance of
z = (zl' z2' •.. , zp) is fixed.
on the assumption that
U:1f-
(Jij(-y.
i j
J
"" £,
":' u')'
-
x,) z.
J ~
can be written as
where
wr =
Y
. r -x r
,
so that
(7.8)
Hence
(7.10)
5U
* = /~-
and thus
(J
f2
5U*
1
V N1+ -N2
~ ~ ~ij ow.
i j
J
(.J
'--
v
Zi
'
U*
•
147
which will give the correction term.
Since
but
i'
.
,
Z cr. (J J = 5 J
i J.r
r
and
z Z crij 1'1. d.
i j
:=
tJ
2
say.
J
J.
(7 .12) gives
E Z l cr
i j
i'
J z.z.
~ J
=p
+ 6
2
Adding this to the variance of the linear discriminant function
we have, for the corrected variance,
(7.16)
(J
2
u*
:::
(1 + -1 + -1)
Nl
N2
This formula shows that the variance
and
N
2
A
L.\
2
+ P(1
-N + 2:
1
N2
)
2
6 based on the assumption
are larrge tt is an underestimat e of the correct variance of
the discriminant function, but that the difference approaches zero
as rapidly as
N
1
and
N
2
approach infinity.
liN
1
•
CHAFTER VII
SOME RELI\TED UNSOLVED PROBIEMS
In this chapter we shall describe very briefly some unsolved
problems related to the problem of classification.
1.
On classification statistics of Wald and Anderson.
(a)
The preceeding discussion deals mainly with the distri-
bution of the approximate statistic
v
=
nm)
,that is the statis-
tic whose distribution approximates the distribution of
V
where
nm..,
V
=:
----~.:>-2
for large
n. 1rJe have discussed mainly the
(l-~)( I-m )-m
2
3
null case, and much work needs to be done in getting its distribution in the non-null Gase for the two statistics,
(1) Discussed by 1tfDld
L-50 _7
(2) Discussed by Sitgreaves
(b)
L-45 _7 •
The exact treatment of the sampling distribution of
V,
both in the central and the non-central cases is still wantinge
2.
The quadratic discriminators.
Let
~
and
populations, and
v
Zl
denote the mean vectors of two p-variate normal
and
Z2
the two covariance matrices.
There are
three s:ttuations that may arise in discussing the problem of classification, namely
•
149
,
( a)
(c)
and
If we suppose all the population
parameters to be known, then
in these three situations we get the following three statistics,
U
==
a
Z Z a ij z. ( v. -
i j
1.
~.
J
)
J
7
where in
U, r cr
ij
a I..
-
7 ==
j- cr ..
-
7-1
l.J-
, L-cr..
l.J -
variance matrix of the two populations, an d
7
being the conunon co~
ij , cr ij*
in Uband
Uc refer to the two covariance matrices in the two populations.
Thus the distribution problem underlying (2.1) (b) and (c) are those
of a general indefinite quadratic form with zero expectations of the
normal variates in (b) but not in (c).
The importance of this prob-
- 7.
lem has been stressed by Hotelling ;-22
-
ThiS, of course, is under
the assumption that the population parameters are known which amounts
to saying that
N -> 00
1
and
N -> 00,
2
and would be a first step
in discussing the distributions of the statistics
W :: Z Z (z. -
b
and
••
1. J
1.
x.1. )( z.J
- i.) ;- sij - sij*
J-
-
7
•
W
c
which
~re
values in
3.
= EELr zi
ij
- -x. )( z. - -x. )s ij - (z. - -y. )( z. - -y. ) s ij*
~
obtained from
and
U
c
J
~
J
~
J
J
-
7,
U and U by replacing the pnpulation
b
c
by their estimates from the samples.
PossibHity of a different approach.
(a)
It may be desirable to discuss the distribution of U =
~Tery
often ,_'t
~
. d
dent ms th o.
d
s ij zi (-Yj - x j) b Y some ~n
epen
4S
~
a
good start to examine in what form the non-centrality parameters would
enter into the distribution.
The answer to this sometimes
key to the solution of the distribution problem.
~rovides
a
Furthermore some
questions related to the behavior of the test can be answered even
without finding the actual distribution in the non-null case.
(b)
It might be worth while to try some altogether different
approach.
It is pass iblG that we run into some
problems.
Papers of Rao L-32
_7
and Roy
L-;5_7
si. mpler
distribution
should be useful in
this connection.
4.
Efficiency.
(a)
The idea of efficiency in problems on classification needs
to be developed systsmatically.
of efficiency where
misclassification
Kossack L-26__ 7 took I-P as the index
P is the common prohability of the; two types of
ove~
v3ri3tions of the parameters involved.
however, considered only the univariate case.
it as the ratio of two sample sizes.
Pitman
L-37_7
He j
defined
·...,
•
151
These and other idEAS CAn be 8xJmined in this connection.
(b)
If therc,
tion, th,,"1 some
(c)
tic
Are
m(~3SUrC
marc: st"ltisttcs thAn one for the SAm8 situ?of rc19tiv·, efficiency is needed.
The discriminant function of
~ Z a ij z.(v. - ~.)
~
i j
J
J
R.n.
Fisher
or the statis-
~re b3sed on the ass~~ption that ~l = L2 •
One important problem that calls for investigJtion is to eXJmine how
good is the lineBr discrimin1nt function when 3ctually
5.
~l
f
Z2 •
The': greater mean vector.
that even in the univariate case of SJrting numerous objects 'mown
to belong to one or the other of two
norm~l
populations with the same
known variance, the obvious rule of classifying an object to the popul:3tion
WhOS,3
meAn is closer to the me·'Jsure of the object, may not bG
the best rulo.
v:)ri9to
Their ob,jc::ctions'1pply to the corresponding multi-
situ~ltions
and should be considered in problems of cl!Jssifi-
cation in multiv3ri:ote 3nAlysis.
•
152
BIBLIOGRAPHY
F
-
1 7 Anderson, T.W.,"Classification b;r Multivariate Analysis lf ,
Psychometrika, Vol.XVI(195l), pp. 31-50.
-/- 2-7
14rion, L. A., liThe Probability Function of the Product of
two Normally Distributed Variables II, "~nnals of Mathematical Statistics, Vol. XVIII(1947), pp. 26S-271.
Bahadur R.R. and Robbins H. E., '~e Problem of the Greater
Mean ll , l\nnals of Mathematical St~ltistics, Vol. XXI (1950),
pp. 469-487.
Batemen, Harry, Higher Transcendental Functions. Vol. I and
II, McGr C1iI Hill Book Compm y, Inc.,. 1953.
Bose, R. C., liOn the Exact Distribution and Moment Coefficients of the D2-statistics ll , Sankhya, Vol. II (19351936), pp. 143-154.
-;- 6- 7
Chernoff, Herman, "Large Sample Theory!!, Annals of MCjthematical Statistics, Vol. XXVII (1956), pD. 1-23.
Cochran, t.v. G. and Bliss, C. I., "Discriminant Function with
Covariance", .';mals of Mathematical Statistics, Vol.
XIX (1948), pp. 151-176.
Craig, c. C. "On the Frequency Function of xy", Annals of
MJthematical Statistics, Vol. Vrr(1936), pp.-l-IS.
Cramer, Harold, Mathematical Methods of Statistics. Princeton University Press, 1951.
Fisher, R. "" ''The Us e of Multi pIe He asur ements in Taxonomic
Problems", Annals of Eu~enics, Vol. VII(1936), pD. 179188.
L-ll_7
F~she:c, F.~ ,~.,.. HTne St[lti6tic~1
Utilization of ~/Qltiple Measurements fJ , Annals of Eup.enics, Vol. VIII(1938), pp. 376-386.
Fix, Evelyn, and Hodges, J. L., "Discriminatory Analysis: NonParametric Discrimination: Consistency Problemsll, School
of ~viation Medicine. Project number 21-49-004 (1951).
Ford, W. B., Studies in Divergent Series and Summability. The
Macmillan Co., New York, 1916.
Goursat, Edourad, (Hedrick,E.R.,translator), Ii Course in
M·JthemBtical Analysis. Vol. I, Ginn and Company, New York,
1904.
.J
..
153
Grad, Arthur, and Solomon, Herbert, "Distribution of Quadratic
Forms and Some "~pplic ations II, Annals of Malthematic a1
Statistics, Vol. XXVI(1955), pp. ~64-477.
Gurland. John, "Distribution of Quadratic Forms and Ratios of
Quadratic Forms II, :mnals of Mathematical Statistic.§, Vol.
XXII (1953), pp. 416-427.
Gurland; John, "Distribution of Definite and Indefinite Quadratic Forms", .Annals of Mathemct.ical Statistics, VoL
XXVI(1955), pp. 122-128.
Harter, H. L. nOn the Distribution of Waldls Classification
Stotistic II, Annals of l"Iathemat ical Stat istics, Vol. XXII
(1951), pp. 58-67.
HotGlling, Harold s llNew Light on the Correlation Coefficient
and its Transforms", Journal of the Royal Statistical
Society, Series B, Vol. XV, No.2 (1953), pp. 193-232.
L-20_7
Hotslling, Harold, Notes on ~pproximation Techni~ues. (unpublished), 1955.
Hotelling, Harold, "Some New Hethods for the Distribution of
Quadratic Forms II ,;~bstract, Annals of ~1athematical Statis~, Vol. XIX (1948), p. 119.
Hotelling, Harold, IlWlul tivIJriate"ncil.ysis ", Statistics and
Mathematics in l1iolop-y, Iowa State College Press, (1954),
Dp. 67-80.
Hotelling, Harold, "Relation SetTrJeen Two Sets of V9riates !',
Biometrika, Vol. XXVIII (1936), pp. 321-377.
Hotelling, H::irold, liThe Generalization of Student I s Ratio ll ,
,mnals of Mathematical Stqtistics, Vol.II(1931), pp. 360378.
Hotelling, Harold, IlA Generalized T-test and Measure of Multivariate Dispersion II, Second Berkeley Symposium on iVlathematical Statistics and Probability, University of California Press, 1951, pp. 23-41.
Isaacson, S. L., "P:Klblems in Classifying Populations ll , Statistics and M3thematics in Biology. Iowa State College Press
(1954), pp. 107-119.
...
154
Kendall, M. G., Notes on Mill tivariate t:nalyses, Institute of
Statistics, mimeograph series No. 95, (1954).
Kolmogoroff, A. N., Found3tions of the Theory of Probability.
Chelsea Publishing Company, New York, (1950).
Kossack, C. F., flS ome Techniques for Simple Classification rr ,
Proceedings of the Berkeley Symposium on Mathematical
Statistics and probability, (1945-46), pp. 345-352.
-F27-7
Laha, R. G., IDn Some Properties of the Bessel Function Distribution", Bulletin of the Calcutta Mathematical Society.
Vol. XVVI, No.1, (1954), pp. 59-72.
MacRobert, T. M., Functions of a Complex Variable. Second
Edition, Macmillan and Company, Limited, London, 1933.
Mann, H. B. [md Wald, Abrahan, nOn Stochastic Limit and Order
Relationship", flnnals of JVIathematical Statistics, Vol. XIV,
(1943), pp. 265-275.
McCarthy, M.D., ("'On the Application of z-test to Randomized
Blocks ll Annuals of Mathematical Statistics, Vol. X(1939),
pp. 337-359.
j
Mises, R. V" lIOn the CIa ssific.stion of Observed Data into
Distinct Groups", linnuals of J.VIathematical Statistics, Vol.
16, (1945), pp. 68-73.
.
-;-32- 7
Neyman, Jerzy, and Pe ar'son, E.S" If Contributions to the Theory
of Testing Statistical Hypotheses ll , Statistical Research
Memoirs, Vol. 1(1936), pp. 1-161.
Ogawa, Junjiro, "Remark on Wald's Paper On a Statistical Problem ~rising in the Classification of an IndividUal into One
of Two Groups", Institute of Statistics, Mimeograph Series
No.
Pachares, James, "Note on the Distribution of a Definite
Quadratic Form B , _Annals of Mathematical Statistics, Vol.
XXVI(1955), pp. 128-ljl.
Pearson E.S. and H3rtley, H. 0., Biometrika Tables for Statisticians, Vol. r, cambridge, at University Press, 195&.
.'
155
PeErson, K9rl, Tables o£ Incomplete Beta Functions, Cambridge
University Press, 1934.
Pitman, E.J.G., Lecture Notes on Non-Parametric Statistical
Inference. (unpublished).
£38_7 Rao, C. R., liThe Utilization of Multiple Measurements in
Problems of Biological Classification ll , Journal of the
Royal Statistical Society, Series B, Vol. X (1948), pp.
159-193.
Rao, C. R., Advanced Statistical Methods in Biometric Research.
John Wiley and Sons (1952), New York.
Rao, C. R., IlA General Theory of Discrb':.ination lk1hen the Information about Alternative Popu.lations is based on
S3mples ll , Annals of Mathematical Statistics, Vol. XXV(1954),
pp. 651-670.
Robbins, H. E. and Pitman,E. J. G., r,r'"\pplication of the Method
of JVrl.xtures to Quadratic Forms in Normal Variates II, l.nnaJs
of Mathematical Statistics, Vol. XX (1949), pp. 552-560.
Robbins, H. E., IlAsymptotically Subminim:ax Solutions of Compound Statistical Decision Problems~l, Proceedings of the
Second Berkeley Symposium in MClthematical Stat istics and
Probability, University of California Fress, Berkeley,
pp. 131-148"
L-43_7
Roy, S. N., '~n a Heuristic Method of Test Construction, and
its Use in l'1ultivariate l\nalysisr,~, Annals of lVIathematical
Statistics, VoL XXIV, 1953, pp. 2~O-238.
Roy $ S. No, A Report on Some Aspects of Multivari %e Anci.l. ysj_s,
North Cl1lrolina Institute of Statistics, Mimeograph Series,
No. 121, 1954.
Sitgreaves, Rosedith, liOn the Distribution of Two Random
Matrices used in Classification Procedures fl , Annals of
Mathematical Statistics, Vol. XXIII(1952), pp. 263-270.
Smith, C.A.B., "Some Examples of Discrimination ll , A,nnalsof
Eugenics, Vol. 13(1947), pp. 272-2820
Stekloff, W., lIQuelques l\pplications Nouvelles de la Theorie
de Fermeture au Problemede Representction Approchee de
Moment.s l!, Memoire de l'"icademie Imperiale des Sciences
de st. Petersbourg. VoL XXXII, NO: 4, (1914), pp.
1506
t.
r
!
£)~8
7 S'Zeg0l'Gabor, Orthogonal
Polynomials. American !1athematical
Society Colloquium Publication, Vol. XXIII, 1939.
-
/-49 7 Uspensky, J. V., Introduction to Mathematical Probability.
McGraw Hill Book Compffiy, Inc., 1937~
i-50 7 Wald,
Abraham, r~n a Ststistical Problem Arising in the Classification of an Individual into One of Two Groups rr, Annals
of Mathematical Statistics, Vol. XV, 1944, pp. l45-Ib2:--
L-5l_/1ATald,
Abrahan, Selected Papers in Statistics and Probability.
McGraw-Hill Book Company, Inc., New York, 1955.
-
L-52_7 Watson,
G. N., Theory of Bessel Functions.
Cambridge University Press, 1945.
L-53
-
Second Edition,
7 Welch,
B. L., tlNote on Discriminant Functions"i, Biometrika, Vol.
XXXI (1939), pp. 218-220.
/-54 7 Whittaker, E. T. and Watson, G. N., A Course of Modern An~Y8is.
Fourth Edition, Cambridge University Press, 1952.
i-55 7 Wilks,
-
-e
S. S., nOn Some Generalizations of the Analysis of Variance", Biometrika, Vol. XXIV. (193 2), pp. 471-494.