•
·;\
ON THE lvIATHRMATICAL PRINC!PLES. UNDERLYING 1HE THEORY OF THE
'X 2 TFST
.
by
Junjiro Ogawa
University of North Carolina
This researoh was supported by the Offioe of
Naval Research under Contract No. Nonr-855(o6)
for research in probability and statistics ~
Chapel Hill. Reproduction in whole or in part
for any purpose of the United States Government
is permitted.
Institute of Statistics
Mimeograph Series No. 162
January, 1957
ON THE MATHF.MATICAL PRINCIPLES UNDERLYING THE THFORY OF THE
X 2 TEST*
by Junj iro Og aw a
of Statistics
University of. North Carclina Carolina, Chapel Hill, N. C.
I!18'~itute
Preface
Many contributions have been made so far to the theory of the
X2
test;
among others the author would like to men~ion R.A. Fisher /-, 7**,H.Hotelling
L-6_7, H.Cramer £4_7,G.A.Barnard £1_7, S.N.Roy and S.K.Mitra ["11.-7 and
S.K. Mitra
£7.J.
The author should rei'er to W.G. Goehran £3_7 also.
The rigorous proof of the theorem which will be
later on, that the
X 2 statistic
has the limiting
stat~d
in exact form
X 2 distribution with
de-
grees of freedom reduced by the number of the independent parameters which
;
were estimated from the sample, was first given by H. Cramer in his famous
textbook
£4_7,
but some steps of the proof were skipped.
Later S. N. Roy
- -7 and S. K. Mitra -/-7-7 reasoned along the same lines and
and S. K. Mitra ;-11
got theorems adjusted to various physical (or statistical) situations.
The purposes of this note are to present a complete and self-contained
proof of Cramer's theorem on the one hand, and on the other hand to explain
how the proof of the related theorems got by S.N. Roy and S.K.Mitra could be
thrown back on that of cramer1s theorem from the mathematical point of view.
1.
Cramer's theorem and its proof. We shall first start with the following.
Theorem 1 (Cramer),
Suppose that we are given r
Pr(~) of s < r variables
~
,
functions
Pl(g), ••• ,
~(al' ••• , as) such that for all points of a
* Sponsored by the Office of Naval Research under the contract for research in probability and statistics at Chapel Hill. Reproduction in whole or
in part is permitted for any purpose of the United States government.
•
end.
** The
numbers in square! brackets refer to the bibliography listed at the
•
•
non-degenerate interval A in the s-dimensional space of the aj' the functions
Pi(~)
satisfy the following condLtions:
r
l:
a)
1=1
Pi(.2:) '" 1 ,
b}
2
c)
d}
dPi
Every p.i(_u) has continuous derivat~ves ---
a
The matrix D ""
(td~
aa )1-1,. .. ,r
j
and
Uj
is of rank
8 ujCa k
s
jal, •.. ,s
Let the possible results of a certain. random experiment ~
r
() Pi
be divided into
mutually exclusive groups, and suppose that the probability of obtaining
o
a result belonging to the ioth group is
is an inner point of the interval A.
belonging to the ioth
so that
@~ou~,
r
.
'0
0
Pi "" Pl(!!o), where .2:0 ...Cal'" .,a s )
Let
vi
denote the number of results
which occur in a sequence of n repetitions of
,. n
2: 'V
i-l i
The eauations
(1.1)
of the modified
(eil ,
••• ,
X 2-minimum method
~8) such that ~ converges in probability to go
The value of
X 2 obtained
~.
_I.
by inserting these values of
as
n -->
g "" ~
00
into
('V i - nP1 (gif
(1.2)
••
then have exactly one so lution ~' =
nPi (£)
is, in the limit as
degrees of freedom.
n
--->
co, distributed in a
;r
2 distribution with r-s-l
..
3
Remark:
The proo.:r of this theorem is somewhat intricate, and will be di vidad
into two parts.
In the first part it will he shown that the equation (1.1) have
exactly one smlution ~ which converges in probability to the true value .9:0
as n - >
00.
In the second part we shall consider the variables
, i== 1,2, ••. , 'Y'
(1.3 )
•
where ~ is the solution of (1.1), the existence of which was already established in the first part.
I
It will be shown here that X - (Yl' .'., Yr) tends
to a certain singular r-dimensional normal distribution under the assumption
I
that v • (vI' ••• , v ) is the random veotor whose probability distribution
-
r
is given by the multinomial distribution
nl
\I
r
r
Proof of Theorem 1
Part I.
Put
(1.4)
~.
then the equations (1.1) can be rewritten as
•
4
Let
where
o
\/i-nPi
x.
(1.6)
='
~ 1/np~
i = 1, 2, ... , r ,
•
and
B=
...... ..
"
TJaing the matrix notations, the equations (15) turn out to be
1
-~
B IB(~-aO) :: n
(1.8)
,
B!
+
w (~)
The rank of the matrix B is s. because it is easily seen that
-1
V~
B
-1
rr::oP2
=
0
0
0
...
0
. . -r::o
-V
P;r
~
Pl
~Pl
(a 0.::»0
(:; as)o
J P2
(8 0. )0
~ P2)
(g-a
. 2 0
(~ o.s)O
1
1
0
d Pl
(a 0.)0
1
. . .. . . . . .
8
Pr
(a 0. )0
1
~)
(; 0. 0
and the rank of the second factor matrix whieh is the matrix
2
~P2
....
{;)P
<a a~o
D calculated
5
at ~ ~ ~, is s
matrix. is
by the assumption d) of the theorem, and the first factor
& ~ingu18rd:l..agon.al
matrix of degree r
by the assumption b).
I
Hence the s x s square matrix B B is positive definite and so has its inverse.
Therefore we have from (1.8) that
1
(
1.9)
!Jj -
!Jio ...
n
-
~
I
-L I
(1_1
(B B) J3 ~ + B B) ~ (,::) •
Now let the probability space uner1ying the present consideration be
( 1-2.. ,.;$.,
J-2.
p), i.e., let
be the set of the elementary eVE:nts.
some additive class of subsets of
non-negative cQmp1etely
to
k
add~tive
J-L
containing
J-l
& is
itself and P is a
set function defined on the subsets belonging
such that P(j-L ) ... 1. We shall denote the elementary event by
s'
/'
By the inequality due to Tchebyschev-B1.enayme, we have, for any A > 0,
(1.11)
then we can see that
(1.12)
Here we shall ahooee A such that simultaneously
-
A
--+0 and
vn'
for example,
1
:2
- > 0 as
n ->
o~
;
A-
q
A'" n , 0 < q < 1/4 satisfies the above requirements. By the
•
6
assumption b), it follows for any
(1.13 )
'Xi(.;) ( <
For two inner points
~l
.; e Qn
%'
and g2
that
i = 1, 2, •.• , r •
of A, we have the difference
and this can be reduced to the following form:
~e
shall mention here as a lemma some results from elementary calculus:
If a function ¢(x l , •• , x s ) is defined in some finite closed and convex
domain of s-dimensional Euclidean space such that
i = 1, ••• ,8, exist
u x.1.
i:¢ ,
7
ing to that domain, we have
(1.16)
where K is a constant depending only on the function
Proof of the u,JIMa.
!1 and
~2
This c an be shown as fo llows: For any two points
belonging to the domain, put
Then, since the domain was convex it is clear that
for
0
~
t
~
¢(~)
! belongs to the domain
1. 1Me shall apply the mean-value theorem of elementary differ-
ential calculus to the function
,
getting
j
. This means that we have for
(t)
t
=.f (0)+
t.
,
§ (gt )
J
=1
. Hence by applying Cauchy's ineaualtty we obtain
or
=,
<
I
/
s ~¢ ~
Z(.LZ:'....) •
i=l
axi
Ix- 1-x,
-2
0 ~
Q
< 1
8
"1'he function
~
i==l
(a ¢ )2 is .continuous in the domain and, since the domain
Xi
a
waB finite and c1osad,
~ (~¢
i=l
we denote by
2
K •
a xi )2
attains its maximum in that domain, which
Consequently we have proved the inequality (1.16).
By virtue of the inequality (1.16), we get the following
(1.18)
• If
where k 1ij
we put
=
r
Z k
i""l
1ij
and kl ==
max
[ k lj )
,
1 ~ j ~ s
then we obtain
In a similar manner we can get the following inequalities with suitable con-
(1.20)
}
(1.21)
1
(1.22 )
and
•
9
(1.23)
for all
j.
Hence from (1.1,) we can obtain
ICtJj(~1)-Wj(£2)J tE Kl~ 1£1~21
f~1-g21 ( £!2-~J
+K2 /!!1-£21 1£1-£:01 + K3
I ~2-~ I + K, / £:1-S!2 I ~
+ K4 (!!l-,gi
or
(1.24)
and consequently we obtain
(1.25)
;; K •
I!I.J.-!.!21-{;" + I gl-!!o I + I~2-.g0 ~ •
where the constants in (1.24) and (I.?,), which were denoted by the same
symbol K, may differ.
In order to solve the enuation (1.19) for
';e Q, we shall apply
n
the method of successive approximation, i.e , we define the vector
the recurrence relation
(1.26)
-
!!\I "" ~O+ n
1
~
I
(B B
)-L
I
('
!3 ~+ B B)
It will easily be seen from the de:f'inition (14) of
-1
~
-cv (a)
-
(
)
~ \1-1 •
that
a
-\I
by
10
Hence
we
have
(1.27)
and
(1.28)
Let us denote the greatest ebsolute value of the elements of (BtB)-~'
and ( B, B)-1 by
gl; then we have from (1.27) that
I
.1
0
1 a.J - a.J -<
rg_)..
0A./n
#enoe it follows that
where
K is a suitable constant.
2
ineauality by (1.26)
(1.3 0 )
a
"a I "'"
.~
I-v+l-vi
K
.3
From (1.28) we can obtain the following
J·
I-v
a -a
I (I -v-O
a -a I + I-\1-1-0
a
-a I +,va.
"'-;;:'n)..
-v-ll
From this we can show, for sufficiently large values of n, that
We shall prove (1.31) by the mathematical induction: ~~ assume that (1.31)
v-I and for \1=0 the inequality
\I
(1.31) reduces to (1.29).
From (1.30) we have
(1.32 )
from
° up to
is true for all values of
J
'~v'l-:!!V I ~ l!3~ [(4X'l~l}Krr-l><7n) v{I~v-~ol'l!!v-l-"oI' ,1n
On the other hand, it can be shown that
11
x
x
,
+2Ja-v- l-a-v- 2J+la
+ -v -a.
-v-~
-+/a
.J.-.a-n/<--+
2 -al-~AJ+?la2-al/+
•••
I::
-v ~aAI+'a
--....,
-v--~ - . t=
-v
Ivn
vn
t
Hence for sufficiently lsrge values of
n
-
such that
we have
(1.33 )
The infinite series
converges absolutely for any
and
~
e On and for sufficiently large values of n;
/\
if we define .!: by
then
<.l..
1
1/n 1-(4K2+l)K ~
J
-> 0 as n -> co
,
Ji1
for any
n
-"> 00
This means that
•
I;?
converges in probability to ~
as
12
1\
(1.35)
a
=
lim
v
for any
~
-'>
00
a
-v
and for sufficiently large values of
e Qn
we get
nj hence from (1.26)
1
/\
-2( , )-L' ( t )-1 (A)
5!. .. .2:0+n 13 13 -:8!+ B 13 itJ,g ,
(1.36)
for any
~ S
On
and for sufficiently large values of n.
Suppose that there exists another solution
probability to
as
~O
~
n --> 00; then, for sufficiently large values of n J
1
~ .. £:0 +
~
(,g)
-'2 , -1' ( ' -1
n (B:I3) R! B B) rV
* -> 1
n* such that p( On)
large values of n. It is clear that
for any
~ which converges in
~ t Q
therefore,
lim
n ->
P(Q
00
For sufficiently large values of n
71'
A
a - a
-
n
as
n ->
n Q*),..
n
and for sufficiently
lim
.
n -->
and for any .;
= (B 'B)-II'f&I
00
,
8
00
-r(Q) '" 1
n
Qn
n On*
we have
(~)- CtJ(~) _7
and hence it follows by (1.25) that
f).
1.£~ - t'sr:1 ::. K. I ~g - /\~ f~~
(1.38)
If
I~-~I
n ->
00.
> 0 for all
+
~
15!.-~0'
+ 1/\
.9:-~0
lJ
n, then (1.38) leads to a contradiction 1
~
•
0
as
This means that for sufficiently large n there exists an exactly
one solution
~ which converges in probabtlity to ~O as n
--> 00.
13
~art
Fo-r the sc1ution
:?
{i
we have
hence, by using the inequalities (1.31), we can see that every component
,
-1
A-
of (B '8) _Q) (a)
-
A2
I
is dominated by K -n
in its absolute value, where
is a certain constant independent of n.
Therefore, we can write
1
(1.4
0)
~-~o.n-2(? 'Bt\ I!+(B ''8)-~ (£)
_1
"'n
Let 1
,
'2 ,
,
f')
KA c
(13 ~)13 x+ -
,
I
-
n
Q
-
,O·~
-
,
QJ'
_< 1 ,j=l, ... ,8 •
= (Yl' "" yr ), where
then it can be rewritten as fo11owsr
(1.41 )
•
Now
K
,
14
2
'{j Pi
1Ihere (a CtttdCtr)*
means that the value of
2
a
Pi
a ~af
is'calculated at a
~O and~. On the other hand, since
certain point on the segment joining
it follows that
j\
0
A
>..2
>..
Ctk-a. .. O( + - ) .. O( - )
(1.42 )
K
.yn
vn
n
Consequently we have
(1.43)
In a similar manner we can see that
and
(1.45)
Hence we obtain the relation
/\
T\\2?
(
r!-vnB~-~)+
vn ~
(1.46)
1
O~ 9~
:: Ij .. 1, ••. , s.
substituting (l.~O) into (1.46), it follows that
(1.47)
y
-
I
=>
(l-B(B B)
-1'
:ro..
'R)x + -
-
where 9 is a random vector such that
-
2
e 1
yn -
0 <
Q.
~
~
1)
i =1, •.• , r, and
IS
K is a -oert.ain const;m.t. independent of n. Since the random vector
ra. 2
- 9 convergoes in proba'hility to 0 as n - > CQ, by using the theorem which
vn-
was stated in the lest paragraph of 822.6 of Gramer's bookL-'f_7, we can
see
that the distribution of 1
(I-B(E IB)-~' )!.
that of
L8 _7 in
is, in the limit as
n
--> 00,
the same as
The author would like to refer also to Ogawa
this connection.
Now, if we nut
the limiting distribution of x
as n
normal distrihution with mean 0
The statistio
X
2
..
'
Xl'
is the r-dimensional singular
and the covariance matrix
11 = I
( 1. 48 )
--> 00
-
EE' .
whose distribution is under consideration, has
the same limiting distribution as that of the statistic
.y 2
(1.49 )
/I.,* ..
I(
(I
(I
-1_"
_L I )
!. I...B B B)!3 ) (I-B B B)!3
= xl(I_B(B'B)-~I)
-
-X
!
•
Since the s x s matrix BrB is positive definita (of course symmetric),
we can transform it by a suitable orthogonal matrix C into a diagonal form
whose diagonal elements are all positive, i.e.,
2
o
~l
2
I
1J.2
,
C (B B)C
::r
o
In other words, it means that
•
2
!-L s
16
whence we have the relation
(1.50 )
. BC M- l
column vectors of the matr1X
~ ) means that s
(1 .~O
of s
If we oomplete BCM-1 to an r-
orthogonal vectors of r-dimensions.
dimensional ~rthogonal matrix K by adjoining
(r-s) column vectors
b l' ••. , -r
b , i.e., if we set
-8+
K ...
('!=lCM- l : b
• •• b )
: -s+l
-r
,
and
KK= I
,
then it will be seen that
(1.51)
KIB(~'B)-lB'K ... (K'BCM-l)(K'BCM- l ),
:]
,
(1.52 )
Now make the orthogonal transformation
(1.53)
then
el
oe
C~) .." Q and C C.~f ) = K
I
(Ir-,EE
and
(1.55)
X*2 =! "KK (I-B(B B) -1 )KK ,!=~ K (I-B(B B) -1'B )K~
I
'I
f orm a system
t
I
)K
17
(0
Since .E
I
. . 'lr
+ •
=>§./Lo
I
'Ii =
.2 '
we have
s
,
-1'
p'K =p (1=1CM
: b
-
-
i--.-A--....
• -s+...
lA1here
,
'a ,.,. p bi t1, g ,.,.
g
•
1 ••• _b..J:=(O ... 0 : e +1 •••
--0
S
6
+ 1. • .• , r •
Therefore,
I
,
1{ pp K
=>
[: :J.
Conseouently the covariance matrix of
(~
where
r+ l' "',
~)
s
is
,
From the relation E ~ => 0 we have
Consider the following generalized Helmert's orthogonal transformation:
(1.,8)
where
er )
e
e
e
"e
e s+1
r.?.--------2
~/ e
1+ ••• +e
V
s+
r
j
2
2
s+2
VI 8 2s +1+···+er2
r-z--- ---2
e s+·
2+ •••+el'
e _
r 1
-e
- 2
8
r;:~
e 5+ I-f'8 s+ 2+ •••+er
V
{c. -2
e 1+ • • +e
v 5+
r
-8
/,2
2~------~
(e s+ l+···+er )(es+ 2+···+ e r')
1/ (e
y
s+
5+1
r
r
(2
III
.... .
s+l s+2
e
e
v'
r-l
-e
...
2
2
r
s+
l+ ... +e )(e
2
e5+1+·· .+er
212
-tI (e
2+ ••• + 8
r T
s+l
e
l'
2
S+
1+ ••• +6 2)(6
2
s+
2
1
2+.··+ e )1
l'
!
I
'2
2
es+ 3+·· .+er
I
o
(1.59)P::
\
-e
e
5+2 1'-1
-es+ 28 r
... -;::;===:::;=:;::===::;;
I
2
2
2
2 3+···+e 2/
22
2
e
(e
2+
•••
+e
)(e
'\
(e
2+
•••+e)(e
3+···
)
J s+
r
s+
rJ
s+
r
s+
l'
•
o
\..
•
./r-
o
....
V
•
-e'Z"
Ji
-rl
r
-e
e
1'-1 :r
)(
r
r;y.--2-~
V
(81+e )e
r-
r
l'
/
I-'
co
19
then it is clear from ~1.57)· that
(1.60 )
"s+l == 0
and, therefoI'€,
(1.61)
)( ; = ";+2+"'+ ,,;
The covari9nce matrix of
("s+l' "s+2
p(r
-Q)p
rs .
I
=:
,
."'~) is
2P(Irs -eer)p
-
I
I
-Pe. elP ,
rs -
where
-el~(e s+ l,e s+ 2,···,e)
r
,
and
( 1.62)
1
o
Pe
,
=:
o
because
r
2
Z e
s+l g
=:
trQ == tr pp
I
-
~
tr
r
0
1
1
E'p = Z p.
=:
1. "!A/henoe we have
I
(1.63)
I r-s -PQP ==
"s+2'···' '11r
This means that "s+l=O with ?robability one, and
dependent normal variates with means
Thus the limiting distrihution of
limiting distribution of
freedom r-l-s.
2.
1\ 2,
OJ •• J 0
X; as
is the
n
and covariances
-"> 00.
are inI
r-sand consequently the
l'
;12 distribution with degrees of
This completes the proof of Theorem
l~
Some remarks to Theorem 1 and its proof.
The parameter space of a multinomial probability distribution is, in
20
geneNl, the'1nt&l'a~ion Df the (r-1)-dimensional flat
dimensional unit interval
{EJ
call the parameter space
Jr.
a .:: Pi ~ 1, i .. 1,
~thematica11y speaking,
of
J.
>
c
2
A to the subset
> 0, i=1,2, ••.
In other worda, assuming that the true parameter value
inner point of
iro'
We shall
the assumptions of Theorem 1 specify the
Jra -1\f\{l!;Pi
Jt.
~ p -1 and the r1 i
?, ••• , r
mapping of a non-degenerate s-dimensional interval
(2.1)
r
if the subset
ltO
of
"IT
,r;
£ o is an
is contained in the image
of some non-degenerate interval of Euclidean space mapped by the function
E'(g)·(Pl(~)' ••• , Pr(~) ) such that RO oorresponda to an inner point of
Condition (d) assurea that the dimensionality
that interval and the mapping fUnotion pea) satisfies the condition (c),
then the theorem holds true.
is preserved by this mapping.
Modified
X 2-minimum
equation (1.8) is the same as the usual normal
- -
equation of the least-SQuares method up to the term uV (n).
.
/
The fundamental
idea of Cramer's proof is that, in the limit as n->
00,
tion can be reduced to that of the usual
method.
least-souare~
the whole situa-
In addition to the conditions (a) - (d), if we assume further that
a 1 ... a~, ... , at ... a~(t < a) and put
..
.......... ....
• •
,
21
a*'
(?3 )
= (a t +l'
)
... , a s '
and if tU~(g*), j ". t + 1.•.. , s are defined analogously with
the modified
(2
X2-minimum
.,~)
(R)
j(g)"
then
eQuation comes out as follows:
B*'B*'
<.~t-~) .. n-iB*'~+ !£*(~*h
Accordin@ly we have in this case
1.* ...
and the;- statistic has, in the limit as n ->
00,
the same distribution as
].*'J!.*' • !'(I_B*(B*'B*)·~*t)!
(2.6)
,
which turns out to be the )( 2 distribution with degrees of freedom r-1(s-t)=
r-s+t-l •
From the usual theory of the least squares ~see for example, Ogawa
£10_7>,
we oan conclude that ::I.*'1.*-;/J!. is independent of 1.'1. and is dis-
tributed as )( 2-distribution with degrees of freedom t
in the limit as
n-> 00 •
The author wants to add one more remark:
,-
Cramer's theorem assures that
there exists one and onlv one solution which is consistent of the modified
X 2-minimum equation.
In our multinomial probability distribution oase,
22
it will be easily seen that the modified ;\ 2_minimum equation coincides
with the marimum Ukelihood equation in the Cram'rfs sense.
A.
Wald
['12_7
proved that the statistic which gives the greatest maximum of the likelihood
function (if any at all) is consistent under the following 8 assumptions:
Assumption 1: Cumulative distribution function F(x;e) is either discrete
for all 9 or is absolutely continuous for all Q.
Assumption
~,
ltJe shall define auxiliary functions as follows:
f(x,8, of l@e~1::F f(X;9') and f*(x,9 ,,)=f(x,9,() i f f(x,9,
... 1
F) >1
and
otherwise
¢(x,r)a sup f(x;9) and ¢*(x,r) ... ¢(x,r) if ¢(x,r) >1 and
Ie / >r
a 1
othe~Nise •
For sufficiently small rand for Bufficiently large r the expected values
00
log r*(x,g,( )dF(x;eO) and
-00
j
00
~ 10g¢*{x,r)dF(x;eO) are finite and
-00
90 denotes the true parameter point.
Assumption 3:
If lim
i
->00
eiaS, then lim
f(XJQi)"'f(X~9) for all x except
i ->00
perhaps on a set which may depend on the limit point S (but not on the
sequence {Qi} ) and whose probability measure is zero under the probability
distribution corresponding to the true parameter point eO.
Assumption
4:
If 91 is a parameter point different from the true parameter
point 90 , then F(X~Ql) 1 F(XJ~o) for at least one value of x.
Assumption 5:
If lim
9i '" 00 , then lim
f(x;ei)=o for any
i -> 00
i ->00
X
except perhaps on a fixed set (independent of the sequence
f QJ
whose
23
prohahility is zero under th.e probability distribution corresponding to the
true paramet.Etr point eO.
Assumotion 6:
l'lw the true parameter point
eO we have
00
) I log r(7 J~o) I dF(XJ90 ) < 00
•
-0)
Assumption 7:
The par ameter space
1'r
is a closed subset or the k-dimen-
sional Cartesian space.
Assumption 8:
f(x,e ,
f) is a measurable function of
x
for any
Q,
and
In our special case we may put
I
Q
....E and !a '" (xla ' x 2a ' •.. , xra )
n
Vi '"
l x ' i
a=l ia
= 1,2,
••• , r
... Pxra
(2.8)
l'
where
1
x
in
=
,
'
i f in the ith repetition of the random ex-
periment
the event happens which belongs
to ith class
o
ASSUMPtion 1 is trivial.
Otherwise
Assumption 2 is satisfied with
r*(x,Q,
f
):1 1
¢*(x,r)= 1 •
Assumotions 3 and '-t are also trivially satisfied.
Assumption; l.s also trivial •
•
24
Bence Assumption 6 is also satisfied. We shall confine ourselves to the
2
subset j(~ = ~{.EJPi ~ c > 0, i=l, .", r} ; then Assumption 7 is satisAssumption 8 is trivial because f(x;g) is a step function in this
fied.
case.
Thus the solution
A
~
should be the one which gives the greatest
maximum of the likelihood function.
3.
On some related theorems.
S. N. Roy and S. K. Mitra ~11_7 proposed
the following modification of the theorem of Cram6r.
Theore~~.
now
fk(~)'
(Roy and Mitra)
k = 1, .", t <
In addition to the
8 be
t
stateme~ts
of Theorem 1, let
real valued-functions of
(e) at every point .!: in A, the functions
fk(~)
~
such that
are continuous and possess
continuous partial derivatives with respect to a up to the second order.
j
, f.
(f) the
matrix(~)
a~
1s of rank t
for every a
-
in A.
For k = 1,2, ••• ,t, let f~ be a member in the range of f k in A.
Denote by H the hypothesis
o
HO'
l<.g)
=:
!.O
1
then if HO is,""true, the equations
(3.1)
for minimizing
X2
in the modified sense subject to HO' have exactly one
~ I _ (~
IN ~ 1
~
~
1)\
system of solutions ~ - (II' •• (Is)'!- = (11.1 ••. At)' suoh that ~ -> ;;:0
in probability as n --> 00 and the statistic
~
r
Z
i=l
is, in the limit as
n ->
of freedom r-s+t-l; also
n ->
00,
2
m
nPi(~)
distributed as X 2 distribution with degrees
2
y2
2
HO = /\.1-;t: under HO is, in the limit as
00,
X
distributed independently of
degrees of freedom
("i-nPi(~»
X2
as a
X2 distribution with
t.
This can be proved easily from the remarks in
§
2.
l~Te shall con-
sider the transformation
here we have assamed, without any loss of generality, that the principal
minor does not vanish identically, Le.; that
I
af l
-JGl
. . • -dantf l
•
•
10
~ft
-aaft . • - at
(11
This means that
al' •.. , at
•
(1
can be expr:ssflsed as functions of
and
More precisely, in some neighborhood of the point
~
,£>0
~ __,,0 ~
_ 0
~ _ 0
~l = ~l""'~t-~~'
+~-a~
v
•
v+ l""'~S -a S
there exists implicit functions
such that
,
~l'"
~t
26
and they are continuous and possess continuous partial derivatives up
t~
the second order.
If we denote by
S
the image in
';-space of the interval A mapped
under the transformation (3.3), the assumptions of the theorem assure that
the neighborhood contains some non-degenerate interval of s-dimensional
Cartesian space.
Hence under the hypothesis Ho' the functions
(3.6)
are defined in a non-degenerate (s-t)-dimensional interval So
and possess
r
continuous partial derivatives with respect to
order.
Consequently the r
functions
-Pi
~t+l'
•• "
~s
~+l'
as functions of
~s
possess continuous derivatives up to the second order.
j '"
t+l, "., s
up to the 2nd
••• ,
In fact, for
...
and
(J.8)
,
and these are all continuous in SO'
Furthermore, if for some point in So
27
there exist
line~r
relations
~
0, i=l, •.• ,r
,
then this means, from (3.7), that
i=l, ... , r J
thus we get
at +" (.;*)= ...
.1. -
i
j
=
a (.;.*)
s-
= 0,
1. e ., the rank of the matrix
= 1, •• " r is s-t •
= t+l, ••• , s
Thus Theorem 2 is reduced to Theorem 1 with
~+l'
~s
••. ,
playing
the role of aI' ••• , as.
The second part of Theorem 2 is clear from the remark in
§ 2d
Next we shall consider the following
Theorem 3:
(Roy and Mitra)
experiment
l
Let the possible results of a certain random
be divided into r
mutually exclusive groups, <:lnd suppose
thct the probability of obtaining a result belonging to the ith group is
o ~ c 2 > 0. Let vi denote the number of results belonging to the ith group,
Pi
which occur in a sequence of n repetitions of
..R
~
so that
r
~
vi
=n •
1
Let us consider the hypothesis
where
fj
possesses continuous partial derivatives up to the second order
and the matrix
(~fi)
9 Pt
j =1, ••• ,r-s-l
t =l,.,.,r-l
is of rank r-s-l
28
Then the modified
It 2 minimum eouation subject
to H
O
fk(E) - 0, k =l, ••.• r-s-l
l:p
have one and only one solution
ability as
n -->
00
i
-1
~
and
'1
such that
~ - > ,£0
in prob-
and
is, in the li~mit as n -->00, distributed as a )( 2 with degrees of freedom
r-s-1.
This theorem can also be reduced to Theorem 1.
Amo~g
r-1 independent
parameters P1, ••• ,Pr-1' there are r-s-l relations (functionally independent)
and the assumptions of the theorem assure us that r-s-1 of them can be expressed as function of the other
s
parameters and these functions satisfy
the conditions of Theorem 1.
We now state and sketch proofs of the analogous theorems in the
analysis of variance situation.
Theorem 1.* Suppose that rs real-valued and one-valued functions
-
p .. (e),
~J
-
i=l, ••• ,r; j=l, ••• ,s of e are defined on a certain non-degenerate interval
-
A of t-dimensiona1 Euclidean space such that
( a)
r
~ Pij(~)=POj(~)=l for j=l, ••• ,s
i=l
.
and all
~.
in A,
?9
(b)
2
Pij(!) ~ c > 0 for all i,j and ~ in A,
(c)
every
Pij(~)
has continuous partial derivatives with respect to
the component of
(d)
the matrix
up to the second order, and
~
:P(~)=(;:~j)
is of rank t
for all
~
in
1.\.
We shall consider s mutually independent random experiments
~ l' ••. ,
into r
:f, s
' and in each experiment the possible outcomes are divided
groups.
The probability of getting a result which belongs to the
:f. j
ith group in the jth experiment
inner point of A.
is
P~j~ Pij(~O)'
Let the frequency of the result belonging to the ith
group in nOj repetitions of
i
j
be nij , so thct
s
r
~
= nOj '
ni ·
i=l J
j=1, ••• ,8
•
Z n",., = n •
j=~_ 'JJ
X2-minimum equations
Then the modified
(3.12)
\'Jhere .§O is an
s
Z
r
~
j=l i=l.
;\1
A
have one and only one solution Q = (91 •.•
the statistic
)C2 =
ls, in the limit as
n-->
00
s
r
Y,
Z
j=l
i=l
f\
et )
which is consistent,
subject to the ratios
r j = nOj/n being held
fixed, distributed as a?( ? with degrees of freedom rs-s-t.
Proof of Theorem 1*
Put
&~d
30
r
so that
i:lqij(~)~rj for all
s
t
j=l
j;
then the e~uation (3.12) can be written as
r
1:
i~l
Put
then (3.12*) can be rewritten as follows:
k=1,2, ••. ,t.
Let
••. x
where
(3.16)
and, furthermore, let
,
rs
),
31
and
•
Using matrix notation, (3.1,) can be expressed as
1
t
B B(~-!O)
0.18)
-'2
=n
t
B~
+f!!..
;
(~)
hence we get
1
:= - §O
(3.19)
=
-2( I )-L t
n B B 13 ~
or
Let r
= min.
j
r j ; then
(3.20)
then we have
p(Q ) > 1 - ~
n
>.. t::r
and, for any
6
Qn' we have
,
I
+ (B B
)-1
w
(
~)
•
32
•
As before, we can show that for any two points ,..§II
where
and ~2 in
K is a constant depending only on the functional form of Pij
and
Thus we can show that the equation (3.19)
the closure of the interval A.
has one and only one consistent solution
Let
A
A
! .
J!.= (Yll ••• Yrl ••• Yls ••• Yrs )' where
A
nij-nOjPij<'~)
Yij
.. /
(/\
V nOjPij ~)
Then we obtain in exactly the same manner as before
Y:Z
-
_l'li
- v ~~
X
1\
' 2
B(e - e ) + lL~.J4\
- -0 vii ~
2
-r!:,J1\ .
'L'
IO.
= (I-B(B B)-t3 )x +
- vn -
2
The limiting distribution of )(
,
=;r
1. 1., as n
-> 00, is the s arne as that of
'
'-L
X *2
= ! (I-B(B B) t3
The limiting distribution of
!
as
t
) ~
n --> 00 is a rs-dimensional normal
33
tit
distribution with mean 0 and the covariance matrix
o
(3.27)
11
,
=
o
where
We shall aoo1y the orthogonal transformation
,
I
-
l;=Kx
where K, defined analogously to
-
K in the proof of Theorem 1, is given by
K=(BCM-~t.1
- +
••• b
-rs
)
Then
•
The covariance matrix of
5
is
,
K /1 K
v.1here q*j' = (0 ••.
-
Since
=I
-
0 0 ... r:o
V qj
\I qrj
*'
('
2j K = 0
s
1
Z j=l r
f
*#
-r
K q .qjK
j
0 ... 0) 1
,
j '" 1, ... , s
t
(j»
••• 0' (ejt +)1 ... ers
'
where
(J ·32)
,
34
we have
I(t)
[
,
o
where
On the other hand, we have mutually orthogonal s relations, i.e.,
*'
= O!
~j!
j ~
1, •.. , s
,
or
e(j)~.
+ ..+
t+l "'t+l •
e(j)~
rs
TS
= 0 j=l, ••• ,
'
S •
Let
e(j»
•
rs
'
then
We shall apply the orthogonal linear transformation
,
* =P,5*
.!)
where the first
s
rows of Para
,
. . '-a(S)'/ ,I -e(~)J;
then we have
J
,
(j)
.!
1 \
'= '. 1
p
s
o
o
Thus we have shown that
(3.38)
}/2
2
2
A* ~t+s+l+"'+~s '
35
and the covariance matrix of
«
I
•.. ,
"\s is
Under the same conditions as Theorem 1*,
Theorem 2*. (S.N.ROY)
have u
11t +1
sup~ose
we
t) relations:
(3.40 )
such that
(e)
t functions
fk(~)'
k
:=
1, ••• , u possess continuous partial deriva-
tives up to the second order, and
(f)
the matrix
Then the modified
~fi
<{) Qk)
is of rank u for all Q in
A.
X 2-minimum equations
subject to H has one and only
O
one consistent solution .~ and the statistic
-'
;r 0=2 j=l];s i::1r
1;
is, in the limit as n
subject to the ratios r j = nOj/n being held
fixed, distributed as a ;\2 with degrees of freedom rs-s-t+u •
--> 00
. And if the hypothesis HO is true, then
is, in the limit as
dependent of
subject to r j :: nOj/n being held fixed, in2 and is distributed as a
2 with degrees of freedom u.
X
n
--> 00
X
Proof of Theorem 2*
Without any loss of generality, we can assume that
fl
l
~ 91
(:3.42)
'() f u
~
91
,
;;J f l
· ~eu
·.·
9fu
.
· · a 9u
•
{.O
~
,
=!o
and the determinant of the left-hand side of (3.42) is a continuous function
io.
of ~~ therefore (3.42) holds true in some neighborhood of
de~enerate
interval
Ao
of which QO is an inner point be contained in
that neighborhood of QO.
instead of
Let a non-
O as a domain of
A. Thus we may assume that (3.42) holds for all ~ in A.
l~Te
can proceed by taking
A
~,
Consider the transformali ion
then by a well-known theorem on implicit functions, we have
where the functions
¢(~)
,
s are continuous and possess continuous partial
derivatives up to the second order such that
••• Q~),i=1,2, ••• ,t
(3.45)
Substituting (3.44) into Pij(~)' we get
(3.46)
such that functions
qij(j) are continuous and possess continuous partial
derivatives up to the second order.
Hence under the null-hypothesis HO'
rs functions
Pij(~)
becomes
37
functions
* where ~*f -(~+1"'"
qij(!)'
~)J and these functions
qij<'~*) satisfy the conditions (.a), (b), (c) and (d). We shall show that
only (d) is satisfied; indeed, since,
ael
1- qij c9 Pij
() %
-891 :;
BPij
% +... + f)
Q
u
~ Qu
"91h
'd Pij
+ B e'h' h=u+l" •• ,t ,
i
2q j
if the rank of the matrix (
) is less than t-u at some point
a~
(which is the projection of A into the
sub~space
f
-
* in A*
spanned by the last t-u
coordinate axes), then we can find
such that
On the other hand, we have assumed
t
for all! in
~.
that the matrix
~ ~i,,1
k)
(~
is of rank
Thus we get
* ) =0,
~(J
h
= u+l,
••• , t
•
Conseauently the problems are reduced to the case of Theorem 1 * with independent
Theorem
t-u parameters
1: (S.N .Roy).
experiments
t
•
Suppose that we have
l' ••• ,
are divided into r
*
~
':6 s'
s
mutually independent random
a?'ld the possible outcomes in eaoh experiment
exclusive groups.
Let the probability of getting a re-
38
suIt belonging to the ith class in
~j
o
Pij ~
Let n
ij
be
2
0
> 0
denote the number of results belong to the ith class in nOj repeti-
tions of -{ j' so that
~
i=l
nij&nOj ' and let
Let us consider the hypothesis
where f k are continuous and possess continuous partial derivatives up to
the second order in the parameter space
"IT =[RI
0 <
2
0
~ Pij :; 1,
fPij"'Po{'l,j=l, ...
,sj
o
g~
of whioh .E is an inner point and the matrix (;.--) is of rank rs-s-t.
OJ Pij
2
Then the modified ~ minimum equations subject to HO have one
/\
and only one const stant solution p, and the statistic
-
is, in the limit as n-->
distributed as a
It
subject to all ratios r j being held fixed,
2 with degrees of freedom
00
rs-s-t
•
This theorem can also be reduced to Theorem 1* ,
4. On the distribution of a modified
In testing goodness of fit by the
X 2 statistic.
It 2 method,
there are many cases in
39
which we already have sample values (not the frequencies).
For simplicity
of discussion, we shall mention the simplest situation, because in this
way the statistical feature of the problem will become clearer. For the
multiparameter case the author would like to refer to Herman Chernoff and
E. L. Lehmann
L1.-7.
Suppose that we have random samples xl' x 2' ••• , xn from a certain
population having an absolutely continuous distribution function F(XJ5)
and a probability density function f(x,.a) which satisfies the following
regularity conditions:(see Cram'r ~4-7)
1) For almost
a 1 x, the derivatives 2log f
d
a
exist for every a belonging to a non-degenerate interval A.
2)
For ewry a in A, we have
~3l0~.l:
a
2!
aa
< F2(x)
and
<H(x), the functions Fl(x) and F (x) being integrable over
2
a
(-00, +00),
while
00
~ H(x)f(xt a ) ax
< M,
-00
where Mis independent of a.
3)
For every a
in
A, the integral
and positive. Furthermore, we assume that the true value of the unknown
parameter is aO•
Let the whole interval of x be divided into r
classes as follows:
40
r
and let the class frequencies be
vI' ••• , vr ' so that
E vi
I
= n.
In other
words, if we define auxiliary random variables as follows:
Xl
(4.1)
{I
=:
0
ij
otherwise
and
,
(4.2)
then we get
Further let
Then it is a well-known fact that the statistic
0)2
(vi-nPi
E
0
i=l nPi
r
is, in the limit as
r-l.
If
~
n ->
00,
is that modified
distributed as
A 2 with
X2.minimum estimate of
degrees of freedom
a., in Cramer's sense,
from the frequency data which satisfies the condition of consistency, then
Theorem 1 in
Jf 1
tells us that the statistic
II 2
r (vi-npi ( a.)
i~l
is, in the limit as n ->
00,
nPi('2)
distributed as
A2
with degrees of freedom r-2.
•
41
Let a* be the maximum likelihood estimate from the original sample
in Cram'r's sense, i.e., a solution of the maximum likelihood equation
'd log
a
L:= 0
a
which is a consistent estimSi or of a, then it was shown by cramtfz.
~(a*-a)=
0
\J
1
k 2$
~
(dlOg f(x;a»
a
i=l
a
£4_7 that
+ 0(1) in probability,
ab
where
(4.8 )
Now let us consider the limiting distribution of the statistic
,
in the limit as
n ->
00
•
Sinoe
it follows that
\/(a )
i
F( ai}a.
,yn(-n-
*
) )=
vn( :V(Si)
on -
_ ,,1::(
- yn
v(ai )
n
-
0
*
QR i
ni - (a. _ao)(~)o)+ 0(1) in probe
0
R
i
!Jn i
)-
~~)o
k
20.
vn
ndlog f(xj;a O)
2;
j=l
C) a
+- 0(1)
in probability.
By using (4.2) this can be rewritten as followsr
42
in probability.
Now it can easily be shown under the regularity conditions 1) - 3) that
'J
& x. j =
ai
~ ~
0
dF(x;ao)=n.,
~
log
cenO(~a
f(x;aO
)=0,
a
-00
and after some calculations the covariance C'ij between two variables
comes out to be
where 6ffl = 1 if" f =
f',
and ... 0
otherwise.
By using the Central Limit Theorem, we can see that the limiting dis-
I
is the r-dimensional normal distribution with mean Q and the covariance
matrix
A
0
0
/1 : (n (l-n )j
i
o
0
1 (in i ~nj
~ -)
•
k.c:Ba 8a i,j=l, ••. ,r
Therefore, the limiting distribution of the random vector
43
as n-->
00,
is also the same r-dimensional normal distribution.
Now, since
i ... 1,2,u.,r
we can conclude that they are distributed, in the limit as n -->
00,
as
the r-dimensional normal distribution with mean 0' and the covariance matrix
(4.18)
1 0 0
-1 1 0
o -1 1
·..
0
0
0
0
• ••
0
0
-1
•
1
• ••
000
1 -1 0
0 1 -1
...
o
..•
• ••
0
0
11
0
0
1
On the other hand, we get
-:to
vi-npi(a )
in prob., as n -> 00;
vnPi(a*)
whence we see that the vector
(
vl-nPl(a* )
,
...
)
jnPl(a*)
is, in the lindt as n -> en, distributed as the r-dimensional normal
44
distribution with the mean
-
0' and the covari ancs matrix
= 1, ••. ,r
or
-
(4.19)
- I - PP
,
_ hh
,
,
where
(4.20)
.E =<
•
Now
,
h p
- -
(4.21)
,
.!: 1:
=B
,
1
h
-
=-
r
~
o
{) Pi
---- ::
k i=<l.a a
o
1 '7 Pi?
2
=< ~ ~ - () :: c , say
k i""1 P~
a
1
r
a
It will easily be seen by Schwarz's inequality that
i.e.
0
45
Summing up (4.22) with respect to i, we get
o
1 ~Pi 2
r
E "t)(r;r)
i=l Pi .~
<X
~
-00
and consequently we obtain
This quantity c 2, which had already appeared in author's 1952 paper ~9_7,
should be called
~he
relative efficiency of the
amount of information of the original data at
<X
gro~ped flat~.
In fact, the
= <xo is given by
(4.24)
n
where L = Jt f(xi;a), while the amount of information of the grouped data
i=l
.
at a
= aO
is given by
--e(11cg L(f) 2
~
aO
(
a
-)
c£
=
a'
n
-e
o
0
Q log Pi 2 r
1 9 Pi 2
C
2: vi ) =n 2: 0 ( - ) ,
aO i=l
~ a
i=l Pi ;; a
r
I
(
where
L(f)
=
n_t_
r
j( "it
1=1
Therefore the
rela~ive
date is given by
(4.26)
efficiency of the grouped data compared to the original
46
If we chooser-2
columns appropriately such that the matrix
K
:0:
(
1'-2
h
'•--------.
• • -c
,12 )
is an orthogonal matrix, then it is seen that
( 4.27)
It
Itt
t
t
K (I-FE .~ )K-I-K.EE. K-K ~ K
1
=
0
•
1
0
1
c
0
X*2
0
0
•
...
. ?
1
This means that the statistic
1
0
0
0
1
0
0
0
•
1_0 2
0
0
is, in the limit as n ->
00,
distributed
as
A;.2+(1-c
where
.x;.2
and
Xi
)Xi .
are mutually independent ;X2..variates with degrees of
freedom r-2 and 1 respectively.
use the estimator
2
This result shows, so to speak, that, if we
a * instead of ~, then only the
degree of freedom is reduced.
c 2 fraction of one
..
•
47
REFERF,NCES
L17l3ernard, G.A.,"Significance tests for 2 x 2 tables",Biom.,Vol. 34(1947),
pp.123-138.
----
£2
-
7 Chernoff,
H. and Lehmann, E.t., "The use of the maximum likelihood
estimat es in It 2 tests for goodness of fit", Ann. Math. Stat.,
Vol. 25(1954),pp.579-586.
r3- ? Cochran,W
.G., tiThe X 2 tests of goodness of fit ", A."'..n .Math.Stat.,
Vol. 23(1952), pp. 314-345.
r4 7 Cram~r,1946.H., Mathematical M.ethods of Statistics. Prine eton Univ. Press,
- -
~5
-
-
7 Fisher, R.A., I~n the interpretation of chi-square from contingency
tables and the calculation of p", Jou't'n. of R.S S., Vol. 455
(1922), pp. 87-94.
-
;-6 7 Hotel1ing, H., '~he consistency and ultimate distribution of optimum
statistics", Trans. Amer. Math. Soo.,Vo1 32(1930), pp. 847-859.
- -
1-7 7 Mitra,
-
S.Te, "Contribution to the statistical analysis of categorical
data«, North Carolina Institute of Stat. Memeo. Series 142, 1955.
-
;-8
- -
7 Ogawa,
J., «4 limit theorem of Cram6r, its genera1izationsand an
application". Unpublished.
r9- 7 Ogawa,Osaka Math.
"Contributions to the theory of systematic statistics I",
Journ.,Vo1. 3(1952).
J.,
-
~107 Ogawa,
-
l-n7 'ROY,
-
-
J., 'On the sampling distributions of classical statistics in
mr~ltivariate analysis", Osaka Math. JoU;::!:"
Vol 5(1954).
S.N. and Mitra, S.K., "An introduction to some nonparametric
generalizations of analysis of variance and multivariate analysis",
Biom.,Vo1.43(1956) "PP'. 361-376-.
.
.
. .',.; .
;-J.!l1!\Ta1d, A.,
"Note on the consistency of the maximum likelihood estimat e",
Vol. 20(1949), pp. 595-601.
_An_n~.~M_at~h~.~S~ta~·~~"
-
~137 Pearson, E.S., 'The choice of statistical tests illustrated on the
-
•
interpretation of data classed in a 2 x 2 table ", Biometrika,
Vol.34(1947), pp. 139-167 •
© Copyright 2026 Paperzz