Ogawa, Junjiro; (1956)A remark on Wald's paper: On a statistical problem arising in the classification of an individual into one of two groups." (Navy Research)

A REMARK ON WALD'S PAPER:
"ON A STATISTICAL
PEOBL~
ARISING IN THE CLASSIFICATION OF AN INDIVIDUAL INTO ONE OF TWO GROUPS
.'
by
JunJiro Ogawa
This investigation was done a.t the
Institute of Statistics, University
of North Carolina, Cha.pel Hill, under
the sponsorship of the Office of
Naval Research through Contract
NR 042031.
Institute of Statistics
M:1meogra.ph Series No. 143
January, 1956
•
n
REMJ.iliK Or-J WALD' S PAPER:
fl
Or-J
h.
STi.TIS'rIChL PROBLEN
"RISING lr-J TH.!.!; CLASSIFICATIOr-J OF AN INDIVIDUJi.L INTO ONE OF TWO GROupS"i
by JunJiro Ogawa
The Institute of Statistics, University of r-Jorth Carolina and
The Department of Mathematics, Faculty of Science, Osaka University
Summary.
In 1944, the late Professor A. Wald proposed a statistic for use in cltiss1t1cation procedure in the above mentioned paper
L-l_7.
That statistic U is as
follows:
uwhere ~'
= (zl'
!{N:2
••• , Zp) and ~*l
= (z!,
(1)
••• , z~) are distributed independently
of each other according to the non-singular p-dimensional normal distributions
with the same covariance matrix 110'.. 11 and means
l.J
respectively, and putting as usual
s .. , i
.lJ
<
-
j, i, j
= 1,
tion with the same
of freedom n
=
••• , p are distributed according to the Wishart distribu-
Iloij II
as the population dispersion matrix, and with degrees
Nl + N2 - 2.
Further go, ?o* and lIsij II are mutually independent
in the stochastic sense.
1. This investigation was done at the Institute of Statistics, University of
North Carolina, Chapel ,Hill, under the sponsorship of the Office of Naval Research
through Contract NR 042031. The author expresses here his hearty thanks to Professor Harold HOtelling by whom this chance was given to him.
2
Wald considered the exact sampling distribution of U, and taking account of
the fact that the distribution of U is essentially the same as that of
v =
p
1'.
P
ij
s
1'.
i=l j=l
(2)
ti,t*l t j ,t*2 '
where, of course,
1 n
Sij = - L t
t
, i,j = 1,2, ••• ,p
n 0=1 ia jO
and the probability element of the joint distribution of t
0:
= 1,
i
,0'
i
= 1,
••• , Pi
n+2 is given by
• • • 1
1
pCn+2) exp
(21r)
2
P
n+2
• "IT 'IT
dtio:,
i=l 0:=1
where pI = (Pl""'P ) and
-
p
t'
-
= (~l, ••• ,t
p
) are
certain known functions of ~ and
-
! respectively, he concluded that the distribution of V is the same as that of
(4)
calculated from the joint probability distribution of m , m , m which has the
l
2
3
following probability element:
in the domain 0 ~ ml ~ 1, 0 ~ m :5 1, - jml m2
< m3 :5 Jml m2 and 0 otherwise,
where
Fk(t)
-
~l
1
t
2
t
"2
c
2!r(!)
2 2
, ~(t)
!!
k-3
r(!)
2
(1 _ t 2 )2
Ii'f(k;l)
and
K(ml'm2 ,m3 )
=
I
J'll • ••
E
J'
nt2-p
2
r
pl
\I
·.. r pp
j.
I
l.p
• ••
(6)
u,
The relation (7) follows from Wald's Lemma 8.
Wald's proof of this result just described was divided into nine lemmas and
each lemma was proved by an ingenious method.
The author of this note believes
that the key point of Wald's proof lies in his Lemma 6, which states that it is
sufficient for the calculation of the sampling distribution of V which is the
same as that of V to start from the conditional distribution of random 2-frame
(~, ~)
under the condition that the random p-plane Lp is fixed.
But the proof
of Lemma 6 seems to be incomprehensive at least for the author himself, because
Wald had not used
expl~citly
the concept of the 1nvariant measure defined on the
Grassmann manifold consisting of p-planes in (nt2)-dimensional euclidean space
L-2_7.
Consequently the proof of Lemma 7 was also insufficient.
The author shall attempt new proofs of these two lemmas as an example of
applications of the theory of invariant measure defined on the analytic manifolds such as orthogonal group, Stiefel, and Grassmann manifold
L-3-7.
Explanation of the Problem and Proofs of Lemma 6 and 7.
Our purpose is the new proofs of Wald's Lemma 6 and 7, but for the sake of
4
readers' convenience,we shall describe the problem and follow Wald's proofs and
explain the necessary notations.
Mathematically speaking our problem which requires to find the sampling
distribution of the classification statistic ·U is as follows: i.e.
Let the probability element of the joint distribution of
••• , p;
1
p(n+2)
(2'1C)
2
a
=.1,
... ,
p(~2)
variates
n+2 be
exp
c?'"2
. ]
+ (t. n+2-TJ.)(t. +2- TJ j)t
~,
~
J,n
j
where of course
n+2
p
TI
IT
i=l i=l
dt.
~a
,
110..
11 is non-singular and symmetric variance-covariance matrix
~J
of the order p x p and
and
then we are to find the probability distribution of the statistic
P
p
V
= i=l
1:.
r.
j=l
ij
s
ti,n+l t
j,n+2
,
where
/Is. j II
~
_lIn
and s. j = - 1'. tiNt-lrY J
~
n 0=1
~ u~
i, j
= 1,
••• J P •
We shall sketch Wald's reasonings and results obtained up to his Lemma 6
briefly.
(8)
5
The statistic V given by (9) is invariant under the non-singular p-dimensional linear transformations of (t la , ••• , tpa)' a=l, ••• , n+2, so we can take,
without any loss of generality, that 1I00iJ II = I (unit matrix of p dimensions) in
(8). In other words, in considering the probability distribution of V,the probability element of
- 1
1
exp
__ -2
--p-:(""'n+-~2~)f
(21t)
i = 1, ••• , Pi a = 1, ••• , n+2 may be taken as
t~,
i
. p
n
I.
I.
i=l a=l
2
t ;La +
p
p
1:. (t i,n+l-Pi )2 +
i=l
(t i n+2-~i
t )2}
i=l
'
I.
]
.
2
p
n+2
xTI TI
i=l a=l
dt· a
JJ
·
(10)
This was the result stated in Lemma 1 and 2.
We take two sets of real numbers
(ul ' ••• , un+ 2 )
and
(v1' ••• , vn+ 2 )
satisfying the restrictions
n+2
I.
a=l
2
m+2
2
n+2
u = I. v = 1, I. u v = 0,
0:0:
a
a=l a
0:=1
(11)
and put
(12) ;
Hereafter we shall stand on the geometrical point of view.
sional euclidean space with the origin 0 = (0, ••• , 0), we put
Ri = (til' ••• , t 1 ,n+2)' i = 1, ••• ,
p = (u ,
l
p,
••• , un+2) and Q = (v l ' ••• , vn+2)
In (n+2)-dimen-
6
with reference to a certain fixed orthogonal coordinates system, then the vectors
..;.:..>
->
OP and OQ are two unit vectors orthogonal to each other and the system of such
-;>
vectors u = OP and v
->
= OQ
is called a "2-frame" in algebraic geometry.
-;>
and t i ,n+2 defined in (12) are the orthogonal projections of t. = OR i on u and
_.~
.1 respectively.
Let the orthogonal projection of the point R into the orthogonal complement
i
-
of the linear subspace spanned by u and v be Hi with the coordinates (r'l' •• "
~
~
••• , r.~,n+2)' and put
=
~
n
n+2
1:.
a=l
r
(13)
r
ia ja
and
v =
~
P -~J -t
1:. s
t.
1 j
2'
i=l J=l
~,n+
,n+
~
(14)
where as usual
IIs iJ II =
then
s..,
~J
i,j =l, ••• ,p are invariant against the rotations of coordinate axes.
Therefore, if we rotate the coordinate axes such that the (n+l)-th and the last
-;>
(n+2)-th axis will coincide with the vectors OP
-;>
and OQ
respectively, then the
coordinates of Hi with reference to this new coordinates system will be of the
form (ti , l'
... , t.J.,n ,
0, 0) and consequently
7
Lemma 3 states that if we calculate the distribution of V given by (14) from the
joint probability distribution of tia' i :::; 1, ••• ,Pj a :::; 1, ••• , n+2, which has
the probability element
1
(16)
--p"""(n+---:2"") exp
(2:rr)
2
where the 2-frame (~,!) is fixed, then the distribution of
-.
V is
independent of
(u,v) and is the same as the required distribution of V.
.
The 2-frame (u,v)
may be considered as a point of the algebraic surface de...
fined by (11) in the (2nt4)-dimensiona1 euclidean space, and this (2n+l)-dimensional algebraic surf.ace is called the Stiefel manifold, which we shall denote
by V
2 ,n
•
By Lemma 3 we know that the distribution of
V is uniform, so to speak,
on the Stiefel manifold V •
2 ,n
Taking account of the fact that the distribution given by (16) is invariant
under the (n+2)-dimensional orthogonal transformations, we introduce an invariant
probability measure on V2,n (which is essentially the Haar measure on V2,"n' " so
it exists and 1s unique), and denote it by
ffds = D
-1
n
"'fC
J=l
,
•
~J.d~~jdy,
where D is constant and ..d l , ••• , _n
d is a n-frame in the orthogonal complement of
the linear sub-space spanned by the 2-frame (u,v)
... and the product means the exter~
ior product of differentials (see the proof of Lemma 6 later).
Then the distri-
bution of V is the same as before if it is calculated from
IfP
n+2
1
... - L L
exp
(2~)p(n+2)/2 D
2 i=l a=l
dB,
(18)
8
and the distribution (18) is invariant under (n+2)-dimensional orthogonal transformations.
This was the result of Lemma 5.
Lemma 4 and 4' give the geometrical interpretation of the statistic V in (14):
Let the linear sub-space which is spanned by p vectors ->
OR , ••• , ->
OR be L , which
1
P
P
is called lip-plane" in (n+2)-dimensional euclidean space, and let the orthogonal
projections of P and Q into L be
p
P and Q respectively, and put
and further put
,
cos 9 cos 9 , b
2
1
1
= cos
2, b2
9 cos 9
1
(20)
then V can be written as
a
y=
b
1
b2
a
1
a
all
a
a
a
12
2
12
Further
a
a
a~2
12
12
(Lemma 4)
22
In other words, the statistic
the 2-frame (~,~).
all
Y depends
Ymay
only on the relative position of Land
p
be expressed as a function of 9 , 9 , 9 only
1
2
3
(Lemma 4'), Le.,
(21)
where m
l
9
Now we came to Lemma 6.
Lemma 6 states that for the calculation of the
sampling distribution of V we can start from the conditional marginal distribution
of
~,
X which is derived from (18) under the condition that the p-plane L was
p
fixed.
Proof of Lemma 6.
We shall arrange the p(n+2) variates t
t
t
ll
t 12
T
t
in the following matrix form, Le.,
t pl
21
...
22
t p2
•
=
iCt
= L-~l' ~2'
t l n+2 t 2n+2
••• , t
7 (say)
-p -.
t pn+2
N w since we have assumed that the joint probability distribution of t
i
= 1,
••• , p; Ct
= 1,
(22)
iCt
••• , n+2 is continuous, the probability of the set of
t~
in p(n+2}-dimensional space for which p column vectors t , t , ••• , t are linearly
~l
-2
-p
dependent is zero, so the p x p matrix TIT is symmetric and positive definite with
probability one.
If we choose an orthogonal matrix G of the type p x p appropriate-
ly, it can be put
G'T'TG
= L2
,
where L is a diagonal matrix
a
£1
£2
L
,
=
'_a
i.
p-
(24)
10
and we may consider that l
p
> 1. p- 1 > .•• > £1 > 0
with probability one.
Further
if we confine ourselves such that all the elements of the first row of the orthogonal matrix G are positive, i.e.,
glj
> 0,
= 1,
j
••• , p,
then the orthogonal matrix G satisfying the relation (23), is.determined uniquely
with probability one.
Put
A = TGL- l
then it is clear that
AI A
=
I
(26)
p
i.e., A is a p-frame of (n+2)-d1mensions.
From (25) we have the following factorization of T in the unique manner; i.e,
T
= ALGI
(27)
In other words, p(n+2)-dimensional euclidean space may be considered as the
direct product space of Stiefel manifold V n+2
p,
and p-dimensional orthogonal group manifold.
-p
and p-dimensional simplex L,
Elementarily speaking, since
p-frame A has p(n+2)- p(ptl) independent variables, the orthogonal matrix G has
2
p2 _
P(~l) independent variables and simplex L has p independent variables, the
matrix equation (27) may be considered as the transformation of the original independent variables t
ia
, i
= 1,
... ,
Pj 0:
= 1,
••• , n+2 into new independent variables.
If we do not want to specify the independent variables explicitly in the
change of variables, it will be of great convenience to make use of the "exterior
products" of differentials.
The differential of a function f(U , ••• , Un) of
l
11
independent variables ul ' ••• , un is defined as
(28)
and the exterior product of differentials is defined as the product which satisfies the associative law and anti-commutative; i.e.,
d.f dg = -dg d.f, and in particular df df = O.
It is easily seen that if the components of a column vector dx are the linear
combinations of the differentials du , ... , dUn; i.e., in matrix notation
l
de =
J d~
then we have
n
n
"IT
i=l
dx.
=
det. J
J.
TI
i=l
(30).
dU i ,
where the products of both sides mean the exterior products of differentials
L-4_7.
Taking differentials of both sides of (27), we have
dT
= dA
L G' + AdL G' + A L dG I
•
Since A was a p-frame, we can choose a matrix B of the type (0+2) x (n+2-p) such
that
L-A : B-1
is a (n+2)-dimensional orthogonal matriX, in other words, B is a
(n+2-p)-frame which lies entirely in the orthogonal complement of the linear
sub-space
Whic~:panned by the
p-frame A.
Premultiplying
J-~:'l and
I_
post-multi
B' _
plying G into both sides of (31), we have
AtdAL
+.~ - LGfdG
B1dAL
.
1
/
12
= Lp
Here we have made use of the fact that from G'G
dG'G
= -G'dG
The exterior product of the left hand side of
be
det [ [ . : ;
follows
r
G' n+2 ]
(32) is seen by use of (30), to
...R ~2
)(
dtiQ:
JI.
i=l 0:=1
On the other hand the elements of the matrix A'dA L+ dL - L G' dG are as follows:
i.e., let the p column vectors of A and G be
~i
and
then, because of the facts that ~i d~i = 0, ~i d~i
~i'
=
i = 1, ••• , p
res~ectively,
0, it follows that
(i,i) element is
and for i
f
j
we have as
(i,j) element
and
a!da.£j - £igi'dg.
-1
-J
-
... J
(j,i) element
Therefore the exterior product of (i,j) element and (j,i) element is
(;4)
=
~
Here we have made use of the facts that for i
f
j
13
and assume that j
> i.
It follows that the exterior product of the elements of
the matrix Al ciA L + dL - LG I dG is
In a similar manner we have the exterior product of the elements of the
matrix B' ciA L, i.e., put the column vector of B be Ej' J = 1, 2, ••• , n+2-p, this
is
P
2 n+2-p. 11
I
( det.L ) n+ -p '"'iT
I~
~Jd~j.
j=l 1;"1
Consequently we have the following relation in their absolute values or in
considering it as the relation of volume elements
n+2
p
TI 1T
i=10:=1
dt
io:
n+2-p p
· 1T
TI b j
j=l i=l -
d~i •
Now from the general theory of manifolds we know that
TI
i<j
g~dgJ represents the
_...L
-
Haar measure on the p-dimensional orthogonal group manifold and
n+2-p
TI
j=l
Tr
a'da •
i<j -i -j
p
TI
1=1
b~ da represents the Haar measure on the Stiefel manifold V
2.
--J -- i
p,n+ -p
Let the p-frame which specifies the p-plane L be H (i.e. if we consider the
p
p-plane Lp as an element of Grassmann manifold and denote it by
g , then H is an
14
analytic functionof
~
...
at least locally) and C be a p-dtmensional orthogonal matrix,
then
A
= HC
(38)
Taking differentials of (38), we have
is a p-frame lying in L p •
dA
= H.dC
+ d:H C •
Premultiplying B' and A' into both sides of (39), we have the relations
B'dA
= B'
A ' dA
= AC' H' dA
dH C
(40)
c
C' de + C' H' dH C
From (40) it follows that
(41)
This means that the Haar measure on the Stiefel manifold V
~
decomposes into
p,n+c-p
the product of the Haar measure on the Grassmann manifold
TI TI b '.
j
the orthogonal group man1fold
TI c 'dc •
i<j - 1 - j
1 - J
dh
i
and that on
Hence we have fhe following decomposition
of the volume element of p(n+2)-dimensional euclidean space
L
)(
n+2
TI
i=l 0:=1
dt
ia
c
( t- ~i, )n+2-p
J\
i=l
n+2-p
."IT
j=l
P
"IT
b'dh.
i=l - j -~
TT(t'~
J\
j
i<j
"IT
i<j
-
2)
b
~i
~
11 gi'd g
J\ d£1
j
1=1
i<j -
...
(42)
15
Now put the linkage factor between T and
~,!
then we have
(44)
From (44) it is easily seen that if we put
....
H
= OH, U = O~,
'"
= Oy,
!
where 0 is an orthogonal matrix of order n+2, then
The marginal distribution of the random p-plane derived from (18) is independent
of H, so the conditional distribution of
~,
! derived from (18) under the condi-
tion that Lp is fixed depends only on hhe relative positions of u,v and H, i.e.,
H'~
and Ht.! •
If we complete H to an orthogonal matrix of (n+2)-dimensions by adjoining
F(n+2 x n+2-p) and put
-Ht
•••]
F'
u=
-
r
u*
-1
u*
-- 2
]
_
*
=u,
-
-H'
... ]
v=
F'-
16
~*, ~*
then
is a point in V ,n and because of the invariance of dS we have
2
= dS*
dS
,
and consequently we have the conditional distribution of
~*,
v* under the condition
that Lp is fixed as follows: i.e.,
(46)
Because of the facts that cos 91
I~tl, cos 92
=
=
I~tl and cos 9
3
= -~f .!! / I~! I IYo! I, the statistic V depends on .~r, ~! only, hence Lemma 6
follows.
Lemma 7 states that the probability element of the conditional distribution
of u, v under the condition that H = M , where M is the p-frame consisting of p
p
p
unit vectors along the first p axes of the coordinates system, is proportional to
exp
L- -
n+2
'21
r.
p
r.
(Piu + ~ivr)
r
r=p+l i=l
2
_7
f(~,!)dS ,
(47)
where
('
r
-
f(u,v)
-
ll
= t:
0
r
lp
·..
r
·..
·..
r
lp
e
r
ij
= r. twtj:X'
i,j = 1,
0:=1
~,!} ,
(48)
pp
p
and
n+2-p
2
... , p .
Proof of Lemma 7.
We shall calculate the probability element of the proaability distribution of
~,v,
when the p-plane Lp is fixed to
~p
from (18).
In this case
17
·..
1
0
0 • •• 0
•
•
•
•
H
••
·.. 1
0 ·.. 0
•
=
,
0
•
•
0
·..
••
0
therefore, because of
A
= HC,
H'T
T
= ALG' = HCLG t
= H'HCLG' = CLG',
we have
t
CLG'
=
t pl
ll
·••
t lp
,
... t pp
and consequently we have
GLC'CLG'
=
i,j
Taking the determinants of both sides of
= 1,
"., p •
(51)
(49), it follows that
(52)
So the conditional distribution is proportional to
• "IT
i<j
c ~ dC
-.1
-
j
dS
(53)
•
18
But it is clear from (50) that
so the conditional distribution of t
ia
, i
= 1,
••• , p, a
= 1,
••• , p and
~,
v
under the condition that H = M is proportional to
p
:2
P
P
-t i va ) - 7 1T Tr dt.1.0: dS
1=1 0=1
(55)
Consequently we have the conditional marginal distribution of
~
given in
~,
v as the one
(47).
This completes the proof of Lemma 7.
From this point on, Wald's original
proofs may be followed.
REFERENCE8
L-l_7
A. Wald, "On a statistical problem arising in the classification of an
individual into one of two groups," Ann. Math. stat., V9L 15
(1~44).·
.
Shung-Chern, "On Grassmann and differential rings and their relations to
the theory of multiple integrals," Sankhya, VoL 7 (1945-46).
A. T. James, lJNormal multivariate analysis and the
Ann. Math. stat., Vol. 25 (1954).
A. T. James, loco cit., Lemma 4.1.
orthogon~l
group,"